Week 2

Week 2 notes

Loading modules

  • styles of loading

  • loading your own

  • navigating paths

  • when you name a file with a module's name

Loading styles

There are a variety of styles for loading modules, and in general you are allowed to have a preference. However, many modules have community driven styles of importing. You'll see this a lot in the data science space.

Let's first talk about your default namespace. Basicaylly, this is the thing that knows other stuff exists. If your python session knows that variable of module of function exists, then it's in that namespace. You can use things like range() and print() like that because they exist in the default name space. When you define a function within your current namespace, that function becomes part of your working namespace.

Importing modules makes the functions and data from those modules available to your current namespace. The general suggestion is to import things in a way so that you know which module they came from. Remember that there's no real distinction between functions within the default library and custom functions defined within that script, so if you lose this contextual metadata, it can make your script difficult to read.

Some modules will have clashing function names, or possibly function names and variable names that clash with some of your custom named items within your code. Strange behvior can be seen when there are clashes in the namespace.

Thus, it is important to import modules in ways that make it clear where that content came from.

import module will import everything from that module, but make it only available via the dot notation. So function y() within module would need to be called like module.y().

from module import y will import just the y() function from module and make it available within your namespace as just y(). You can list multiple functions you'd like to import as such: from module import y, z will import y() and z() directly into the namespace.

import module as xyz will act like import module but the module's name will be registered as xyz. This means you would call the y() function like xyz.y().

from module import * will import every function from that module directly into the namespace, this one is to be avoided.

These will also work with attributes within the module.

import module should be your default choice, unless that module's documentation shows your something else. pandas and friends have custom

working with random numbers

The random module has all sorts of wonderful things for making psuedorandom numbers! https://docs.python.org/3/library/random.html

Plus all kinds of cool math stuff.

Now here's a good example of this being used. Say you want to take a random sample of some data. We're going to load some text, get the words, shuffle them, then take the first 10%.

while loops happen

use that random number generating while loop thing that I do

While loops are used in Python, but rarely in data processing. There are sometimes when you have truly unpredictable data structures, and yes I have totally used them this way. But generally there are other ways.

However, if you are working with user input, interactivitiy, etc. while loops are really a good way to go. The usage is pretty standard.

This will make an infinite loop.

I generally like to break out of my while loops rather than depend on sentinel values, but each work.

You can use input() to get text from the user.

status = True

while status: check = input("Do you want to go on? (type no to end the program): ") if check.lower() == 'no': status = False print("Bye!") else: print("Let's hit it again!")

But sometimes you can use these for super classic reasons.

Let's calculate how many times it takes to randomly count up to 100.

dictionaries

These are the key/value paired data structures.

Historially these have been unordered, but with 3.7 they are technically ordered. The pairs will consistantly report out in the order in which they were put in, but this does not mean that they have an index position. This was also considered an experimental feature within the language, so you should still never write your code such that it depends of the order of the dictionary to be stable.

You will always look up information from within a dictionary via the key.

Dictionaries use {} to note them.

Then they have key: value within them.

You must always provide a key, and you must always provide a value.

Keys must be immutable data types. 99% of the time you are going to use a sring for this, but you may want to use an int. But you can also mix them. This would be very weird to do.

Values can be literally any data type, but there may only be one. This unually means that you are going to use a data collection type like a list or another dictionary so that you can hold more.

Also, most of the time you are going to be iteratively building up a dictionary instead of just banging one out.

You can access a key's value by:

This will return the object that you are accessing. Note this will not return a copy, and list mutation bugs will stand.

But this does mean that you can mess with these values like you would any other variable, even if you are using the key lookup syntax.

You can add things into a dict by:

Note that there will be no warning if you overwrite an existing key's values.

There are a few patterns for looping over stuff and adding things into a dictionary. Here's the classic word count. Let's revisit our list of words.

We can use the in keyword to check for something being a member of the dict's keys.

We can also do a filter and make some collections. See how the values are empty lists?

But that's not very pretty so let's loop over things and look at it differently.

Comprehensions

  • list comprehensions

  • basic ones

  • nested ones

  • logic ones

These can make simple for loops only need one line.

While you caaaaaan make these really complex, you shoudn't. Remember clarity over length.

These are really best used with filtering or when you are trying to do a processing step over all the values to something.

Here's the core syntax:

[return_this for this in sequence]

When you are reading a comprehension, you should start reading at the for keyword, then to the right, then look at the left of the for keyword. This structure will create a new list populated with the elements returned by the leftmost item in the syntax.

This structure is the equivalent of writing a list accumulator pattern, but done so in one line.

This is best used when you have a simple filtering or transformation to perform on a sequence of data.

You can also put in a boolean expression testing the iterable. This goes in after the sequence element in the syntax.

You can also nest these, but this can be a lot and weird so we're going to avoid it.

dict comprehensions

Dictionaries also have a similar structure.

{key: value for key, value in dict.items()}

or

{thing: calculation for thing in sequence}

function syntax

  • core syntax

  • args

  • kwargs

  • scope warnings

A quick skeleton:

Function rules:

  • the function with

lambdas

These are structures used to make functions you don't want to name or save. These are meant for nameless one line functions, that will exist briefly to do something and then never again.

While you can make them complex, this is not the best structure to do so.

These are used often when you have a function or method that will apply a function on something, but you don't want to define a function just for this. You can, but maybe it's too trivial or one off. You can put a lambda in place of that function call to satisfy the method, but not add something to your namespace.

https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7 has a good example of how this is used

This is not a commonly used structure, but it's important to know what you are looking at when you see one.

So you can make them self contained and callable if you save them to a variable.

But you can put them in things.

Last updated

Was this helpful?