Week 2
Week 2 notes
Loading modules
styles of loading
loading your own
navigating paths
when you name a file with a module's name
Loading styles
There are a variety of styles for loading modules, and in general you are allowed to have a preference. However, many modules have community driven styles of importing. You'll see this a lot in the data science space.
Let's first talk about your default namespace. Basicaylly, this is the thing that knows other stuff exists. If your python session knows that variable of module of function exists, then it's in that namespace. You can use things like range()
and print()
like that because they exist in the default name space. When you define a function within your current namespace, that function becomes part of your working namespace.
Importing modules makes the functions and data from those modules available to your current namespace. The general suggestion is to import things in a way so that you know which module they came from. Remember that there's no real distinction between functions within the default library and custom functions defined within that script, so if you lose this contextual metadata, it can make your script difficult to read.
Some modules will have clashing function names, or possibly function names and variable names that clash with some of your custom named items within your code. Strange behvior can be seen when there are clashes in the namespace.
Thus, it is important to import modules in ways that make it clear where that content came from.
import module
will import everything from that module, but make it only available via the dot notation. So function y()
within module
would need to be called like module.y()
.
from module import y
will import just the y()
function from module
and make it available within your namespace as just y()
. You can list multiple functions you'd like to import as such: from module import y, z
will import y()
and z()
directly into the namespace.
import module as xyz
will act like import module
but the module's name will be registered as xyz
. This means you would call the y()
function like xyz.y()
.
from module import *
will import every function from that module directly into the namespace, this one is to be avoided.
These will also work with attributes within the module.
import module
should be your default choice, unless that module's documentation shows your something else. pandas and friends have custom
import coolstuff
joke, answer = coolstuff.bad_animal_joke("gecko")
print(joke)
print(answer)
joke, answer = coolstuff.bad_animal_joke("BUNNY")
print(joke)
print(answer)
print(coolstuff.joke)
print(coolstuff.answer)
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.
Why did the snake cross the road?
Just to be a sneaky snakie
del coolstuff
import coolstuff as cs
joke, answer = cs.bad_animal_joke("gecko")
print(joke)
print(answer)
joke, answer = cs.bad_animal_joke("BUNNY")
print(joke)
print(answer)
print(cs.joke)
print(cs.answer)
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.
Why did the snake cross the road?
Just to be a sneaky snakie
del cs
from coolstuff import bad_animal_joke
joke, answer = bad_animal_joke("gecko")
print(joke)
print(answer)
joke, answer = bad_animal_joke("BUNNY")
print(joke)
print(answer)
print(coolstuff.joke)
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-30-2e5187d0d798> in <module>
11 print(answer)
12
---> 13 print(coolstuff.joke)
NameError: name 'coolstuff' is not defined
from coolstuff import *
print(joke, answer)
joke, answer = bad_animal_joke("gecko")
print(joke)
print(answer)
joke, answer = bad_animal_joke("BUNNY")
print(joke)
print(answer)
print(joke)
print(answer)
# see that while we imported those variables, but our clashing overwrote them.
Why did the snake cross the road? Just to be a sneaky snakie
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.
Why did the bunny cross the road?
To get to the bunny club.
working with random numbers
The random module has all sorts of wonderful things for making psuedorandom numbers! https://docs.python.org/3/library/random.html
import random
random.seed(42) # set a random seed so this is reproducible, this should always be your standard action
random.randrange(1,11) # uses the same argument pattern as `range()` and randomly selects 1 integer value from those options
2
random.randrange(1,7) # for a six sided die
1
random.randint(1,6) # works like the above but makes stop inclusive
6
colors = ['red', 'green', 'blue', 'fuzzy']
random.choice(colors) # randomly selects an element from the sequence that you pass it
'blue'
options = ['a', 'b', 'c', 'd', 'e', 'f']
random.shuffle(options)
options # will randomly shuffle IN PLACE the list that you give it, only works on mutable sequences
['d', 'a', 'c', 'e', 'f', 'b']
Plus all kinds of cool math stuff.
Now here's a good example of this being used. Say you want to take a random sample of some data. We're going to load some text, get the words, shuffle them, then take the first 10%.
some_wikipedia_text = "In 1957 the Soviet Union began deploying the S-75 Dvina surface-to-air missile, controlled by Fan Song fire control radars. \
This development made penetration of Soviet air space by American bombers more dangerous. The US Air Force began a program of \
cataloging the approximate location and individual operating frequencies of these radars, using electronic reconnaissance aircraft \
flying off the borders of the Soviet Union. This program provided information on radars on the periphery of the Soviet Union, \
but information on the sites further inland was lacking. Some experiments were carried out using radio telescopes looking for \
serendipitous Soviet radar reflections off the Moon, but this proved an inadequate solution to the problem."
# https://en.wikipedia.org/wiki/SOLRAD_2
words = some_wikipedia_text.lower().split()
print(words)
['in', '1957', 'the', 'soviet', 'union', 'began', 'deploying', 'the', 's-75', 'dvina', 'surface-to-air', 'missile,', 'controlled', 'by', 'fan', 'song', 'fire', 'control', 'radars.', 'this', 'development', 'made', 'penetration', 'of', 'soviet', 'air', 'space', 'by', 'american', 'bombers', 'more', 'dangerous.', 'the', 'us', 'air', 'force', 'began', 'a', 'program', 'of', 'cataloging', 'the', 'approximate', 'location', 'and', 'individual', 'operating', 'frequencies', 'of', 'these', 'radars,', 'using', 'electronic', 'reconnaissance', 'aircraft', 'flying', 'off', 'the', 'borders', 'of', 'the', 'soviet', 'union.', 'this', 'program', 'provided', 'information', 'on', 'radars', 'on', 'the', 'periphery', 'of', 'the', 'soviet', 'union,', 'but', 'information', 'on', 'the', 'sites', 'further', 'inland', 'was', 'lacking.', 'some', 'experiments', 'were', 'carried', 'out', 'using', 'radio', 'telescopes', 'looking', 'for', 'serendipitous', 'soviet', 'radar', 'reflections', 'off', 'the', 'moon,', 'but', 'this', 'proved', 'an', 'inadequate', 'solution', 'to', 'the', 'problem.']
random.shuffle(words)
samplesize = int(len(words) * .1) # flooring the float result
print(samplesize)
print(words[:samplesize])
11
['dvina', 'some', 'of', 'dangerous.', 'lacking.', 'more', 'using', 'carried', 'radar', 'but', 'frequencies']
while loops happen
use that random number generating while loop thing that I do
While loops are used in Python, but rarely in data processing. There are sometimes when you have truly unpredictable data structures, and yes I have totally used them this way. But generally there are other ways.
However, if you are working with user input, interactivitiy, etc. while loops are really a good way to go. The usage is pretty standard.
status = True
while status:
print("Chugging along...")
This will make an infinite loop.
I generally like to break out of my while loops rather than depend on sentinel values, but each work.
You can use input()
to get text from the user.
status = True
while status: check = input("Do you want to go on? (type no to end the program): ") if check.lower() == 'no': status = False print("Bye!") else: print("Let's hit it again!")
while True:
check = input("Do you want to go on? (type no to end the program): ")
if check.lower() == 'no':
print("Bye!")
break
else:
print("Let's hit it again!")
Do you want to go on? (type no to end the program): eu
Let's hit it again!
Do you want to go on? (type no to end the program):
Let's hit it again!
Do you want to go on? (type no to end the program): aoeu
Let's hit it again!
Do you want to go on? (type no to end the program): u
Let's hit it again!
Do you want to go on? (type no to end the program): ao
Let's hit it again!
Do you want to go on? (type no to end the program): uae
Let's hit it again!
Do you want to go on? (type no to end the program): aoe.
Let's hit it again!
Do you want to go on? (type no to end the program): a
Let's hit it again!
Do you want to go on? (type no to end the program): no
Bye!
But sometimes you can use these for super classic reasons.
Let's calculate how many times it takes to randomly count up to 100.
import random
total = 0
count = 0
while total < 100:
total += random.randint(1,10)
count += 1
print(total)
print("The final total is", total, "and it took", count, "trials to get there")
3
12
20
22
23
25
28
31
38
48
50
57
64
74
82
91
96
105
The final total is 105 and it took 18 trials to get there
dictionaries
These are the key/value paired data structures.
Historially these have been unordered, but with 3.7 they are technically ordered. The pairs will consistantly report out in the order in which they were put in, but this does not mean that they have an index position. This was also considered an experimental feature within the language, so you should still never write your code such that it depends of the order of the dictionary to be stable.
You will always look up information from within a dictionary via the key.
Dictionaries use {}
to note them.
Then they have key: value
within them.
You must always provide a key, and you must always provide a value.
Keys must be immutable data types. 99% of the time you are going to use a sring for this, but you may want to use an int. But you can also mix them. This would be very weird to do.
Values can be literally any data type, but there may only be one. This unually means that you are going to use a data collection type like a list or another dictionary so that you can hold more.
Also, most of the time you are going to be iteratively building up a dictionary instead of just banging one out.
You can access a key's value by:
dict[key] # will return the value
This will return the object that you are accessing. Note this will not return a copy, and list mutation bugs will stand.
But this does mean that you can mess with these values like you would any other variable, even if you are using the key lookup syntax.
You can add things into a dict by:
dict[new_key] = value_for_it
dict[old_key] = value_for_it
Note that there will be no warning if you overwrite an existing key's values.
d = {}
print(d)
{}
test = {'a': [1, 2]}
result = test['a']
result.append(3)
print(test)
# the better way to do this is the following:
test['a'].append(4)
print(test)
{'a': [1, 2, 3]}
{'a': [1, 2, 3, 4]}
There are a few patterns for looping over stuff and adding things into a dictionary. Here's the classic word count. Let's revisit our list of words.
We can use the in
keyword to check for something being a member of the dict's keys.
print(words)
['dvina', 'some', 'of', 'dangerous.', 'lacking.', 'more', 'using', 'carried', 'radar', 'but', 'frequencies', 'sites', 'telescopes', 'the', 'fire', 'union.', 'looking', 'on', 'using', 'the', 's-75', 'soviet', 'off', 'serendipitous', 'the', '1957', 'the', 'soviet', 'space', 'further', 'made', 'electronic', 'the', 'radars,', 'information', 'of', 'on', 'air', 'the', 'program', 'of', 'flying', 'reflections', 'penetration', 'this', 'provided', 'of', 'cataloging', 'the', 'control', 'soviet', 'deploying', 'off', 'the', 'radars.', 'these', 'fan', 'approximate', 'the', 'proved', 'the', 'soviet', 'began', 'operating', 'a', 'surface-to-air', 'were', 'song', 'radars', 'borders', 'began', 'us', 'and', 'individual', 'controlled', 'of', 'but', 'by', 'inland', 'moon,', 'this', 'problem.', 'location', 'an', 'development', 'in', 'force', 'inadequate', 'the', 'american', 'reconnaissance', 'to', 'out', 'was', 'radio', 'air', 'periphery', 'this', 'information', 'program', 'bombers', 'by', 'solution', 'soviet', 'union', 'aircraft', 'union,', 'missile,', 'on', 'for', 'experiments']
counts = {} # see how I'm always opening my cells with the base value? this lets it reset every time you rerun things
for word in words:
if word not in counts:
counts[word] = 1 # base case
else:
counts[word] += 1 #increment if its in there
print(counts)
{'dvina': 1, 'some': 1, 'of': 5, 'dangerous.': 1, 'lacking.': 1, 'more': 1, 'using': 2, 'carried': 1, 'radar': 1, 'but': 2, 'frequencies': 1, 'sites': 1, 'telescopes': 1, 'the': 11, 'fire': 1, 'union.': 1, 'looking': 1, 'on': 3, 's-75': 1, 'soviet': 5, 'off': 2, 'serendipitous': 1, '1957': 1, 'space': 1, 'further': 1, 'made': 1, 'electronic': 1, 'radars,': 1, 'information': 2, 'air': 2, 'program': 2, 'flying': 1, 'reflections': 1, 'penetration': 1, 'this': 3, 'provided': 1, 'cataloging': 1, 'control': 1, 'deploying': 1, 'radars.': 1, 'these': 1, 'fan': 1, 'approximate': 1, 'proved': 1, 'began': 2, 'operating': 1, 'a': 1, 'surface-to-air': 1, 'were': 1, 'song': 1, 'radars': 1, 'borders': 1, 'us': 1, 'and': 1, 'individual': 1, 'controlled': 1, 'by': 2, 'inland': 1, 'moon,': 1, 'problem.': 1, 'location': 1, 'an': 1, 'development': 1, 'in': 1, 'force': 1, 'inadequate': 1, 'american': 1, 'reconnaissance': 1, 'to': 1, 'out': 1, 'was': 1, 'radio': 1, 'periphery': 1, 'bombers': 1, 'solution': 1, 'union': 1, 'aircraft': 1, 'union,': 1, 'missile,': 1, 'for': 1, 'experiments': 1}
We can also do a filter and make some collections. See how the values are empty lists?
wordcats = {'<=3': [], '4-6':[], '7+': []}
for word in words:
l = len(word)
if l <= 3:
wordcats['<=3'].append(word)
elif l <= 6:
wordcats['4-6'].append(word)
else:
wordcats['7+'].append(word)
print(wordcats)
{'<=3': ['of', 'but', 'the', 'on', 'the', 'off', 'the', 'the', 'the', 'of', 'on', 'air', 'the', 'of', 'of', 'the', 'off', 'the', 'fan', 'the', 'the', 'a', 'us', 'and', 'of', 'but', 'by', 'an', 'in', 'the', 'to', 'out', 'was', 'air', 'by', 'on', 'for'], '4-6': ['dvina', 'some', 'more', 'using', 'radar', 'sites', 'fire', 'union.', 'using', 's-75', 'soviet', '1957', 'soviet', 'space', 'made', 'flying', 'this', 'soviet', 'these', 'proved', 'soviet', 'began', 'were', 'song', 'radars', 'began', 'inland', 'moon,', 'this', 'force', 'radio', 'this', 'soviet', 'union', 'union,'], '7+': ['dangerous.', 'lacking.', 'carried', 'frequencies', 'telescopes', 'looking', 'serendipitous', 'further', 'electronic', 'radars,', 'information', 'program', 'reflections', 'penetration', 'provided', 'cataloging', 'control', 'deploying', 'radars.', 'approximate', 'operating', 'surface-to-air', 'borders', 'individual', 'controlled', 'problem.', 'location', 'development', 'inadequate', 'american', 'reconnaissance', 'periphery', 'information', 'program', 'bombers', 'solution', 'aircraft', 'missile,', 'experiments']}
But that's not very pretty so let's loop over things and look at it differently.
for key, value in wordcats.items():
print(key, len(value), 'words')
<=3 37 words
4-6 35 words
7+ 39 words
Comprehensions
list comprehensions
basic ones
nested ones
logic ones
These can make simple for loops only need one line.
While you caaaaaan make these really complex, you shoudn't. Remember clarity over length.
These are really best used with filtering or when you are trying to do a processing step over all the values to something.
Here's the core syntax:
[return_this for this in sequence]
When you are reading a comprehension, you should start reading at the for
keyword, then to the right, then look at the left of the for
keyword. This structure will create a new list populated with the elements returned by the leftmost item in the syntax.
This structure is the equivalent of writing a list accumulator pattern, but done so in one line.
This is best used when you have a simple filtering or transformation to perform on a sequence of data.
sen = "the quick brown fox jumps over the lazy dog".split()
print([w.title() for w in sen])
['The', 'Quick', 'Brown', 'Fox', 'Jumps', 'Over', 'The', 'Lazy', 'Dog']
print([len(w) for w in sen])
[3, 5, 5, 3, 5, 4, 3, 4, 3]
You can also put in a boolean expression testing the iterable. This goes in after the sequence
element in the syntax.
print([w for w in sen if len(w) == 3])
['the', 'fox', 'the', 'dog']
You can also nest these, but this can be a lot and weird so we're going to avoid it.
dict comprehensions
Dictionaries also have a similar structure.
{key: value for key, value in dict.items()}
or
{thing: calculation for thing in sequence}
{key: len(value) for key, value in wordcats.items()} # counts how many there are
{'<=3': 37, '4-6': 35, '7+': 39}
{w: len(w) for w in sen}
{'the': 3,
'quick': 5,
'brown': 5,
'fox': 3,
'jumps': 5,
'over': 4,
'lazy': 4,
'dog': 3}
function syntax
core syntax
args
kwargs
scope warnings
A quick skeleton:
def functionname(parameter1, parameter2): # define
result = parameter1 + parameter2 # do stuff
return result # return only one thing
hw_result = functionname("hello", "world") # call the function
print(hw_result)
Function rules:
the function with
lambdas
These are structures used to make functions you don't want to name or save. These are meant for nameless one line functions, that will exist briefly to do something and then never again.
While you can make them complex, this is not the best structure to do so.
These are used often when you have a function or method that will apply a function on something, but you don't want to define a function just for this. You can, but maybe it's too trivial or one off. You can put a lambda in place of that function call to satisfy the method, but not add something to your namespace.
https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7 has a good example of how this is used
This is not a commonly used structure, but it's important to know what you are looking at when you see one.
So you can make them self contained and callable if you save them to a variable.
b = lambda x, y: x + y
b(1,3)
4
But you can put them in things.
counts = {w: len(w) for w in sen}
print(counts)
{'the': 3, 'quick': 5, 'brown': 5, 'fox': 3, 'jumps': 5, 'over': 4, 'lazy': 4, 'dog': 3}
sorted(counts, key = lambda item: len(item), reverse = True)
['quick', 'brown', 'jumps', 'over', 'lazy', 'the', 'fox', 'dog']
countedcats = {key: len(value) for key, value in wordcats.items()}
countedcats
{'<=3': 37, '4-6': 35, '7+': 39}
sorted(countedcats, key = lambda item: countedcats[item])
['4-6', '<=3', '7+']
"""
Example:
In [3]: thing(10) # should give 20
Out[3]: 20
"""
'\nthing(10) # should give 20 \nOut[3]: 20\n'
Last updated
Was this helpful?