Week 2

Week 2 notes

Loading modules

  • styles of loading

  • loading your own

  • navigating paths

  • when you name a file with a module's name

Loading styles

There are a variety of styles for loading modules, and in general you are allowed to have a preference. However, many modules have community driven styles of importing. You'll see this a lot in the data science space.

Let's first talk about your default namespace. Basicaylly, this is the thing that knows other stuff exists. If your python session knows that variable of module of function exists, then it's in that namespace. You can use things like range() and print() like that because they exist in the default name space. When you define a function within your current namespace, that function becomes part of your working namespace.

Importing modules makes the functions and data from those modules available to your current namespace. The general suggestion is to import things in a way so that you know which module they came from. Remember that there's no real distinction between functions within the default library and custom functions defined within that script, so if you lose this contextual metadata, it can make your script difficult to read.

Some modules will have clashing function names, or possibly function names and variable names that clash with some of your custom named items within your code. Strange behvior can be seen when there are clashes in the namespace.

Thus, it is important to import modules in ways that make it clear where that content came from.

import module will import everything from that module, but make it only available via the dot notation. So function y() within module would need to be called like module.y().

from module import y will import just the y() function from module and make it available within your namespace as just y(). You can list multiple functions you'd like to import as such: from module import y, z will import y() and z() directly into the namespace.

import module as xyz will act like import module but the module's name will be registered as xyz. This means you would call the y() function like xyz.y().

from module import * will import every function from that module directly into the namespace, this one is to be avoided.

These will also work with attributes within the module.

import module should be your default choice, unless that module's documentation shows your something else. pandas and friends have custom

import coolstuff
joke, answer = coolstuff.bad_animal_joke("gecko")

print(joke)
print(answer)

joke, answer = coolstuff.bad_animal_joke("BUNNY")

print(joke)
print(answer)

print(coolstuff.joke)
print(coolstuff.answer)
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.
Why did the snake cross the road?
Just to be a sneaky snakie
del coolstuff
import coolstuff as cs

joke, answer = cs.bad_animal_joke("gecko")

print(joke)
print(answer)

joke, answer = cs.bad_animal_joke("BUNNY")

print(joke)
print(answer)

print(cs.joke)
print(cs.answer)
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.
Why did the snake cross the road?
Just to be a sneaky snakie
del cs
from coolstuff import bad_animal_joke

joke, answer = bad_animal_joke("gecko")

print(joke)
print(answer)

joke, answer = bad_animal_joke("BUNNY")

print(joke)
print(answer)

print(coolstuff.joke)
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.



---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-30-2e5187d0d798> in <module>
     11 print(answer)
     12 
---> 13 print(coolstuff.joke)


NameError: name 'coolstuff' is not defined
from coolstuff import *

print(joke, answer)

joke, answer = bad_animal_joke("gecko")

print(joke)
print(answer)

joke, answer = bad_animal_joke("BUNNY")

print(joke)
print(answer)

print(joke)
print(answer)

# see that while we imported those variables, but our clashing overwrote them.
Why did the snake cross the road? Just to be a sneaky snakie
Why did the gecko cross the road?
To get to the gecko club.
Why did the bunny cross the road?
To get to the bunny club.
Why did the bunny cross the road?
To get to the bunny club.

working with random numbers

The random module has all sorts of wonderful things for making psuedorandom numbers! https://docs.python.org/3/library/random.html

import random
random.seed(42) # set a random seed so this is reproducible, this should always be your standard action
random.randrange(1,11) # uses the same argument pattern as `range()` and randomly selects 1 integer value from those options
2
random.randrange(1,7) # for a six sided die
1
random.randint(1,6) # works like the above but makes stop inclusive
6
colors = ['red', 'green', 'blue', 'fuzzy']
random.choice(colors) # randomly selects an element from the sequence that you pass it
'blue'
options = ['a', 'b', 'c', 'd', 'e', 'f']

random.shuffle(options)

options # will randomly shuffle IN PLACE the list that you give it, only works on mutable sequences
['d', 'a', 'c', 'e', 'f', 'b']

Plus all kinds of cool math stuff.

Now here's a good example of this being used. Say you want to take a random sample of some data. We're going to load some text, get the words, shuffle them, then take the first 10%.

some_wikipedia_text = "In 1957 the Soviet Union began deploying the S-75 Dvina surface-to-air missile, controlled by Fan Song fire control radars. \
                       This development made penetration of Soviet air space by American bombers more dangerous. The US Air Force began a program of \
                       cataloging the approximate location and individual operating frequencies of these radars, using electronic reconnaissance aircraft \
                       flying off the borders of the Soviet Union. This program provided information on radars on the periphery of the Soviet Union, \
                       but information on the sites further inland was lacking. Some experiments were carried out using radio telescopes looking for \
                       serendipitous Soviet radar reflections off the Moon, but this proved an inadequate solution to the problem."

# https://en.wikipedia.org/wiki/SOLRAD_2

words = some_wikipedia_text.lower().split()

print(words)
['in', '1957', 'the', 'soviet', 'union', 'began', 'deploying', 'the', 's-75', 'dvina', 'surface-to-air', 'missile,', 'controlled', 'by', 'fan', 'song', 'fire', 'control', 'radars.', 'this', 'development', 'made', 'penetration', 'of', 'soviet', 'air', 'space', 'by', 'american', 'bombers', 'more', 'dangerous.', 'the', 'us', 'air', 'force', 'began', 'a', 'program', 'of', 'cataloging', 'the', 'approximate', 'location', 'and', 'individual', 'operating', 'frequencies', 'of', 'these', 'radars,', 'using', 'electronic', 'reconnaissance', 'aircraft', 'flying', 'off', 'the', 'borders', 'of', 'the', 'soviet', 'union.', 'this', 'program', 'provided', 'information', 'on', 'radars', 'on', 'the', 'periphery', 'of', 'the', 'soviet', 'union,', 'but', 'information', 'on', 'the', 'sites', 'further', 'inland', 'was', 'lacking.', 'some', 'experiments', 'were', 'carried', 'out', 'using', 'radio', 'telescopes', 'looking', 'for', 'serendipitous', 'soviet', 'radar', 'reflections', 'off', 'the', 'moon,', 'but', 'this', 'proved', 'an', 'inadequate', 'solution', 'to', 'the', 'problem.']
random.shuffle(words)

samplesize = int(len(words) * .1) # flooring the float result
print(samplesize)

print(words[:samplesize])
11
['dvina', 'some', 'of', 'dangerous.', 'lacking.', 'more', 'using', 'carried', 'radar', 'but', 'frequencies']

while loops happen

use that random number generating while loop thing that I do

While loops are used in Python, but rarely in data processing. There are sometimes when you have truly unpredictable data structures, and yes I have totally used them this way. But generally there are other ways.

However, if you are working with user input, interactivitiy, etc. while loops are really a good way to go. The usage is pretty standard.

status = True

while status:
    print("Chugging along...")

This will make an infinite loop.

I generally like to break out of my while loops rather than depend on sentinel values, but each work.

You can use input() to get text from the user.

status = True

while status: check = input("Do you want to go on? (type no to end the program): ") if check.lower() == 'no': status = False print("Bye!") else: print("Let's hit it again!")

while True:
    check = input("Do you want to go on? (type no to end the program): ")
    if check.lower() == 'no':
        print("Bye!")
        break
    else:
        print("Let's hit it again!")
Do you want to go on? (type no to end the program): eu
Let's hit it again!
Do you want to go on? (type no to end the program): 
Let's hit it again!
Do you want to go on? (type no to end the program): aoeu
Let's hit it again!
Do you want to go on? (type no to end the program): u
Let's hit it again!
Do you want to go on? (type no to end the program): ao
Let's hit it again!
Do you want to go on? (type no to end the program): uae
Let's hit it again!
Do you want to go on? (type no to end the program): aoe.
Let's hit it again!
Do you want to go on? (type no to end the program): a
Let's hit it again!
Do you want to go on? (type no to end the program): no
Bye!

But sometimes you can use these for super classic reasons.

Let's calculate how many times it takes to randomly count up to 100.

import random

total = 0
count = 0

while total < 100:
    total += random.randint(1,10)
    count += 1
    print(total)

print("The final total is", total, "and it took", count, "trials to get there")
3
12
20
22
23
25
28
31
38
48
50
57
64
74
82
91
96
105
The final total is 105 and it took 18 trials to get there

dictionaries

These are the key/value paired data structures.

Historially these have been unordered, but with 3.7 they are technically ordered. The pairs will consistantly report out in the order in which they were put in, but this does not mean that they have an index position. This was also considered an experimental feature within the language, so you should still never write your code such that it depends of the order of the dictionary to be stable.

You will always look up information from within a dictionary via the key.

Dictionaries use {} to note them.

Then they have key: value within them.

You must always provide a key, and you must always provide a value.

Keys must be immutable data types. 99% of the time you are going to use a sring for this, but you may want to use an int. But you can also mix them. This would be very weird to do.

Values can be literally any data type, but there may only be one. This unually means that you are going to use a data collection type like a list or another dictionary so that you can hold more.

Also, most of the time you are going to be iteratively building up a dictionary instead of just banging one out.

You can access a key's value by:

dict[key] # will return the value

This will return the object that you are accessing. Note this will not return a copy, and list mutation bugs will stand.

But this does mean that you can mess with these values like you would any other variable, even if you are using the key lookup syntax.

You can add things into a dict by:

dict[new_key] = value_for_it
dict[old_key] = value_for_it

Note that there will be no warning if you overwrite an existing key's values.

d = {}

print(d)
{}
test = {'a': [1, 2]}

result = test['a']

result.append(3)

print(test)

# the better way to do this is the following:

test['a'].append(4)

print(test)
{'a': [1, 2, 3]}
{'a': [1, 2, 3, 4]}

There are a few patterns for looping over stuff and adding things into a dictionary. Here's the classic word count. Let's revisit our list of words.

We can use the in keyword to check for something being a member of the dict's keys.

print(words)
['dvina', 'some', 'of', 'dangerous.', 'lacking.', 'more', 'using', 'carried', 'radar', 'but', 'frequencies', 'sites', 'telescopes', 'the', 'fire', 'union.', 'looking', 'on', 'using', 'the', 's-75', 'soviet', 'off', 'serendipitous', 'the', '1957', 'the', 'soviet', 'space', 'further', 'made', 'electronic', 'the', 'radars,', 'information', 'of', 'on', 'air', 'the', 'program', 'of', 'flying', 'reflections', 'penetration', 'this', 'provided', 'of', 'cataloging', 'the', 'control', 'soviet', 'deploying', 'off', 'the', 'radars.', 'these', 'fan', 'approximate', 'the', 'proved', 'the', 'soviet', 'began', 'operating', 'a', 'surface-to-air', 'were', 'song', 'radars', 'borders', 'began', 'us', 'and', 'individual', 'controlled', 'of', 'but', 'by', 'inland', 'moon,', 'this', 'problem.', 'location', 'an', 'development', 'in', 'force', 'inadequate', 'the', 'american', 'reconnaissance', 'to', 'out', 'was', 'radio', 'air', 'periphery', 'this', 'information', 'program', 'bombers', 'by', 'solution', 'soviet', 'union', 'aircraft', 'union,', 'missile,', 'on', 'for', 'experiments']
counts = {} # see how I'm always opening my cells with the base value? this lets it reset every time you rerun things

for word in words:
    if word not in counts:
        counts[word] = 1 # base case
    else:
        counts[word] += 1 #increment if its in there

print(counts)
{'dvina': 1, 'some': 1, 'of': 5, 'dangerous.': 1, 'lacking.': 1, 'more': 1, 'using': 2, 'carried': 1, 'radar': 1, 'but': 2, 'frequencies': 1, 'sites': 1, 'telescopes': 1, 'the': 11, 'fire': 1, 'union.': 1, 'looking': 1, 'on': 3, 's-75': 1, 'soviet': 5, 'off': 2, 'serendipitous': 1, '1957': 1, 'space': 1, 'further': 1, 'made': 1, 'electronic': 1, 'radars,': 1, 'information': 2, 'air': 2, 'program': 2, 'flying': 1, 'reflections': 1, 'penetration': 1, 'this': 3, 'provided': 1, 'cataloging': 1, 'control': 1, 'deploying': 1, 'radars.': 1, 'these': 1, 'fan': 1, 'approximate': 1, 'proved': 1, 'began': 2, 'operating': 1, 'a': 1, 'surface-to-air': 1, 'were': 1, 'song': 1, 'radars': 1, 'borders': 1, 'us': 1, 'and': 1, 'individual': 1, 'controlled': 1, 'by': 2, 'inland': 1, 'moon,': 1, 'problem.': 1, 'location': 1, 'an': 1, 'development': 1, 'in': 1, 'force': 1, 'inadequate': 1, 'american': 1, 'reconnaissance': 1, 'to': 1, 'out': 1, 'was': 1, 'radio': 1, 'periphery': 1, 'bombers': 1, 'solution': 1, 'union': 1, 'aircraft': 1, 'union,': 1, 'missile,': 1, 'for': 1, 'experiments': 1}

We can also do a filter and make some collections. See how the values are empty lists?

wordcats = {'<=3': [], '4-6':[], '7+': []}

for word in words:
    l = len(word)
    if l <= 3:
        wordcats['<=3'].append(word)
    elif l <= 6:
        wordcats['4-6'].append(word)
    else:
        wordcats['7+'].append(word)

print(wordcats)
{'<=3': ['of', 'but', 'the', 'on', 'the', 'off', 'the', 'the', 'the', 'of', 'on', 'air', 'the', 'of', 'of', 'the', 'off', 'the', 'fan', 'the', 'the', 'a', 'us', 'and', 'of', 'but', 'by', 'an', 'in', 'the', 'to', 'out', 'was', 'air', 'by', 'on', 'for'], '4-6': ['dvina', 'some', 'more', 'using', 'radar', 'sites', 'fire', 'union.', 'using', 's-75', 'soviet', '1957', 'soviet', 'space', 'made', 'flying', 'this', 'soviet', 'these', 'proved', 'soviet', 'began', 'were', 'song', 'radars', 'began', 'inland', 'moon,', 'this', 'force', 'radio', 'this', 'soviet', 'union', 'union,'], '7+': ['dangerous.', 'lacking.', 'carried', 'frequencies', 'telescopes', 'looking', 'serendipitous', 'further', 'electronic', 'radars,', 'information', 'program', 'reflections', 'penetration', 'provided', 'cataloging', 'control', 'deploying', 'radars.', 'approximate', 'operating', 'surface-to-air', 'borders', 'individual', 'controlled', 'problem.', 'location', 'development', 'inadequate', 'american', 'reconnaissance', 'periphery', 'information', 'program', 'bombers', 'solution', 'aircraft', 'missile,', 'experiments']}

But that's not very pretty so let's loop over things and look at it differently.

for key, value in wordcats.items():
    print(key, len(value), 'words')
<=3 37 words
4-6 35 words
7+ 39 words

Comprehensions

  • list comprehensions

  • basic ones

  • nested ones

  • logic ones

These can make simple for loops only need one line.

While you caaaaaan make these really complex, you shoudn't. Remember clarity over length.

These are really best used with filtering or when you are trying to do a processing step over all the values to something.

Here's the core syntax:

[return_this for this in sequence]

When you are reading a comprehension, you should start reading at the for keyword, then to the right, then look at the left of the for keyword. This structure will create a new list populated with the elements returned by the leftmost item in the syntax.

This structure is the equivalent of writing a list accumulator pattern, but done so in one line.

This is best used when you have a simple filtering or transformation to perform on a sequence of data.

sen = "the quick brown fox jumps over the lazy dog".split()
print([w.title() for w in sen])
['The', 'Quick', 'Brown', 'Fox', 'Jumps', 'Over', 'The', 'Lazy', 'Dog']
print([len(w) for w in sen])
[3, 5, 5, 3, 5, 4, 3, 4, 3]

You can also put in a boolean expression testing the iterable. This goes in after the sequence element in the syntax.

print([w for w in sen if len(w) == 3])
['the', 'fox', 'the', 'dog']

You can also nest these, but this can be a lot and weird so we're going to avoid it.

dict comprehensions

Dictionaries also have a similar structure.

{key: value for key, value in dict.items()}

or

{thing: calculation for thing in sequence}

{key: len(value) for key, value in wordcats.items()} # counts how many there are
{'<=3': 37, '4-6': 35, '7+': 39}
{w: len(w) for w in sen}
{'the': 3,
 'quick': 5,
 'brown': 5,
 'fox': 3,
 'jumps': 5,
 'over': 4,
 'lazy': 4,
 'dog': 3}

function syntax

  • core syntax

  • args

  • kwargs

  • scope warnings

A quick skeleton:

def functionname(parameter1, parameter2): # define
    result = parameter1 + parameter2 # do stuff
    return result # return only one thing

hw_result = functionname("hello", "world") # call the function
print(hw_result)

Function rules:

  • the function with

lambdas

These are structures used to make functions you don't want to name or save. These are meant for nameless one line functions, that will exist briefly to do something and then never again.

While you can make them complex, this is not the best structure to do so.

These are used often when you have a function or method that will apply a function on something, but you don't want to define a function just for this. You can, but maybe it's too trivial or one off. You can put a lambda in place of that function call to satisfy the method, but not add something to your namespace.

https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7 has a good example of how this is used

This is not a commonly used structure, but it's important to know what you are looking at when you see one.

So you can make them self contained and callable if you save them to a variable.

b = lambda x, y: x + y
b(1,3)
4

But you can put them in things.

counts = {w: len(w) for w in sen}

print(counts)
{'the': 3, 'quick': 5, 'brown': 5, 'fox': 3, 'jumps': 5, 'over': 4, 'lazy': 4, 'dog': 3}
sorted(counts, key = lambda item: len(item), reverse = True)
['quick', 'brown', 'jumps', 'over', 'lazy', 'the', 'fox', 'dog']
countedcats = {key: len(value) for key, value in wordcats.items()}
countedcats
{'<=3': 37, '4-6': 35, '7+': 39}
sorted(countedcats, key = lambda item: countedcats[item])
['4-6', '<=3', '7+']
"""
Example:
In [3]: thing(10) # should give 20                                                                      
Out[3]: 20
"""
'\nthing(10) # should give 20                                                                      \nOut[3]: 20\n'

Last updated

Was this helpful?