Week 1
Welcome to Python! This class will attempt to provide two things:
A fast overview of the core Python language, presuming you are fluent in another programming language.
A sampling of some of the most important standard library modules related to working with data preparation for data science.
The Zen of Python: How is it different from other languages?
Designed with intent for readability, the language design has been purposeful. Let's look at this this little easter egg (you don't need to execute this code right now).
These ideas are important to us. I'll discuss this more in the lecture.
Variables
Different from other languages like Java
No declarations of data type of values (unless they already need to exist with a value to be incremented)
Can be any data type
Freely change data types without declaration
Freely change values
Make them on the fly, no need to declare they exist, no need to declare the data type
Printing
Printing in Python requires the print()
function. You can put pretty much anything inside this function and it'll say something about it.
by default this will go to the console or standard out
most objects have their own internal print method, which is automatically called when you pass it into
print()
.As a default,
print()
will include a newline character at the end of execution, but this can be changed with optional parameters.print()
even has a nice keyword argument to write the contents out to a file. We'll see that in later lessons.
An example of variable assignment and performing math operations within the print function:
An example of a boolean data type printing the value:
An example of performing string operations and string method calls within the print function:
Expressions
You've seen some of these before in the previous examples. These are the small fragments of code that evaluate into a result.
They can be small (like x + 1
) or very very long
Most lines of code are made up of several expressions that execute in a particular order (usually PEDMAS) or are chained together. Part of learning this language is learning how to
The key: if you don't know what content and data type an expression will return, you are almost certainly going to make a bug in your code with that expression. Here are strategies:
There is reasonable consistency in how things are
Strings
You can use ''
or ""
or """"""
.
''
and ""
operate the same.
""""""
allows you to have rendered newlines, and is used mostly for documentation and other meta stuff.
Numbers
There are only two you'll need to mess with, int
and float
. Integers and floating point values.
Don't worry about size, all that is taken care of you.
There are other ways to mess with numerical representation if you have a specific need, but generally numbers are straight forward.
Returned values from int to int computations will mostly give you ints, but when you include a float you'll almost certainly get a float back.
Recasting variables
Every data type name (or built in class name) has a function version to recast stuff.
int()
,float()
str()
etc.
Not all items can be cast to these data types, and you will get an error if this is the case. Pretty much everything can become a string, but you will find some errors around numerical recasting.
Testing for data types
There are a few ways to do this, each with pros and cons. None are disallowed, and all may be valuable for certain problem solving needs. But here is the generally best way:
The isinstance()
function is designed to do this. It will return either True
or False
. It plays the nicest with class inheritance levels and is pretty flexible, so that's why it's recommended.
Pass a group of data types to check multiple.
But you can also do an equality check with the results from the type()
function.
Testing for floating point digits
This isn't something so directly supported like testing to see if something is an integer. Floating point numbers are things with decimal values. Unfortunatly, this is not supported because there are various international conventions for formatting decimal values. But there are ways of processing things.
Let's presume that these numbers will come in with periods. You can remove the periods from the strings, and then further test to see if that is numerical. This works because you cannot have more than one decimal within a number.
It is true that you can just attempt to cast the number into a float within a try and except, but that doesn't fit cleanly into a logic structure. You can use the replace string method, calling it such that it will only replace the value one time.
Testing for types with special object functions
Many objects have special methods for testing stuff about that object's content. This can include general stuff, like if the letters are all capitalized, but also if a string value is all numerical. There are many, but here are some. These can work in subtle and surprising ways, so you'll always want to reference the documentation.
For example, str.isdigit()
tests for the 0-9 numericalness from a very python oriented way, but str.isnumeric()
is checking to see if a string contains only digit characters. So this means that you can have digit characters from any unicode language and it will come up true.
Read here: https://lerner.co.il/2019/02/17/pythons-str-isdigit-vs-str-isnumeric/
Try/except
There are a ton of features to this structure, but the core is this:
Lists
Are the core position indexed array data structure.
When you see []
out on their own, you've got a list.
Can be any mix of objects, but do try and keep them the same.
List methods
These are how you mutate a list.
list.append(stuff)
will addstuff
to the end oflist
.list.count(stuff)
will return a count of how many timesstuff
appears withinlist
.list.index(stuff)
will return the first found index position ofstuff
withinlist
.
Slicing (for strings and lists)
This syntax is shared across strings and lists.
sequence[start:stop:step] every element is optional, but you need to retain the :
to keep the delimiters clear.
list[index] gives you a single element list[start:stop] slices that range list[start:stop:step] slices that range, hopping by step
for
loops
for
loopsPython will define the iterable variable you name every time the for loop runs, and it will persist after the for loop executes.
Everything within the indent level of the for loop here will execute when the for loop runs.
Python also knows how the sequences unpack and takes care of it for you.
Strings unpack by character, and lists unpack by element.
You can loop over numbers by using the range() function. Range can be called by: range(stop), range(start, stop), range(start, stop, step) and these numbers follow the same conventions as slicing.
This is why you want to be careful about the mix of data types, because you'll have to add in logic for what you are doing to them.
Counter pattern
Accumulator pattern
Boolean expressions
These are expressions that evaluate to True
or False
, and you've seen a few examples of these.
The usual boolean operators work this way, but there are other methods and functions that will also provide these tests.
logic structures with if
/elif
/else
if
/elif
/else
Python just uses whitespace to mark the blocks of code. Only if
is required, and can only appear once.
You can also use elif
like else if, and you can have an unlimited number of these.
You can also add a last else
block at the end to catch any checks that haven't yet triggered any other blocks. This is optional, but you can only have one and it must be the last block in the chain.
You can also nest these as needed.
Last updated