Week 1
Welcome to Python! This class will attempt to provide two things:
A fast overview of the core Python language, presuming you are fluent in another programming language.
A sampling of some of the most important standard library modules related to working with data preparation for data science.
The Zen of Python: How is it different from other languages?
Designed with intent for readability, the language design has been purposeful. Let's look at this this little easter egg (you don't need to execute this code right now).
import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
These ideas are important to us. I'll discuss this more in the lecture.
Variables
Different from other languages like Java
No declarations of data type of values (unless they already need to exist with a value to be incremented)
Can be any data type
Freely change data types without declaration
Freely change values
Make them on the fly, no need to declare they exist, no need to declare the data type
Printing
Printing in Python requires the print()
function. You can put pretty much anything inside this function and it'll say something about it.
by default this will go to the console or standard out
most objects have their own internal print method, which is automatically called when you pass it into
print()
.As a default,
print()
will include a newline character at the end of execution, but this can be changed with optional parameters.print()
even has a nice keyword argument to write the contents out to a file. We'll see that in later lessons.
An example of variable assignment and performing math operations within the print function:
In [1]: x = 1
...: print(x)
...: print(x * 100)
...: print(x * x)
1
100
1
An example of a boolean data type printing the value:
In [2]: seenBefore = False
...: print(seenBefore)
False
An example of performing string operations and string method calls within the print function:
In [3]: phrase = "no semicolons needed!"
...: print(phrase)
...: print(phrase + phrase)
...: print(phrase.upper())
no semicolons needed!
no semicolons needed!no semicolons needed!
NO SEMICOLONS NEEDED!
Expressions
You've seen some of these before in the previous examples. These are the small fragments of code that evaluate into a result.
They can be small (like x + 1
) or very very long
Most lines of code are made up of several expressions that execute in a particular order (usually PEDMAS) or are chained together. Part of learning this language is learning how to
In [7]: print(x + 10) # returns an int
11
In [8]: print("You said: " + phrase) # returns a single stringprint
You said: no semicolons needed!You said: no semicolons needed!
In [10]: print(x > 10) # returns a boolean value
False
The key: if you don't know what content and data type an expression will return, you are almost certainly going to make a bug in your code with that expression. Here are strategies:
There is reasonable consistency in how things are
Strings
You can use ''
or ""
or """"""
.
''
and ""
operate the same.
""""""
allows you to have rendered newlines, and is used mostly for documentation and other meta stuff.
In [15]: print("hello" + 'hello')
hellohellohellohello
In [16]: print("""why yes
...: there are newlines
...: and this is valid""")
why yes
there are newlines
and this is valid
Numbers
There are only two you'll need to mess with, int
and float
. Integers and floating point values.
Don't worry about size, all that is taken care of you.
There are other ways to mess with numerical representation if you have a specific need, but generally numbers are straight forward.
Returned values from int to int computations will mostly give you ints, but when you include a float you'll almost certainly get a float back.
In [17]: print(3 * 6)
18
print(3.1 * 6)In [18]: print(3.1 * 6)
18.6
In [19]: print(12348481238121924619246192461293719247192461 * 100032423946234)
1235248510303928884959610285595346128398198959310914141874
Recasting variables
Every data type name (or built in class name) has a function version to recast stuff.
int()
,float()
str()
etc.
Not all items can be cast to these data types, and you will get an error if this is the case. Pretty much everything can become a string, but you will find some errors around numerical recasting.
In [21]: print(str(13778237))
13778237
In [23]: print(int(39.99932312)) # will truncate, floor rounding
...:
39
In [24]: print(float(31239))In [24]: print(float(31239))
...:
31239.0
In [25]: print(int("hello world"))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-e092507d8694> in <module>
1
----> 2 print(int("hello world"))
ValueError: invalid literal for int() with base 10: 'hello world'ValueError Traceback (most recent call last)
Testing for data types
There are a few ways to do this, each with pros and cons. None are disallowed, and all may be valuable for certain problem solving needs. But here is the generally best way:
The isinstance()
function is designed to do this. It will return either True
or False
. It plays the nicest with class inheritance levels and is pretty flexible, so that's why it's recommended.
Pass a group of data types to check multiple.
But you can also do an equality check with the results from the type()
function.
print(isinstance("hello", str))In [27]: print(isinstance("hello", str))
True
In [29]: print(isinstance("hello", int))
False
In [30]: print(isinstance("hello", (int, float)))
False
In [31]: print(isinstance(3.14, (int, float)))
True
In [32]: print(isinstance(3, (int, float)))
True
In [33]: print(type(3) == int)
True
Testing for floating point digits
This isn't something so directly supported like testing to see if something is an integer. Floating point numbers are things with decimal values. Unfortunatly, this is not supported because there are various international conventions for formatting decimal values. But there are ways of processing things.
Let's presume that these numbers will come in with periods. You can remove the periods from the strings, and then further test to see if that is numerical. This works because you cannot have more than one decimal within a number.
It is true that you can just attempt to cast the number into a float within a try and except, but that doesn't fit cleanly into a logic structure. You can use the replace string method, calling it such that it will only replace the value one time.
print("9.2455".replace(".", '', 1))
...:
92455
Testing for types with special object functions
Many objects have special methods for testing stuff about that object's content. This can include general stuff, like if the letters are all capitalized, but also if a string value is all numerical. There are many, but here are some. These can work in subtle and surprising ways, so you'll always want to reference the documentation.
For example, str.isdigit()
tests for the 0-9 numericalness from a very python oriented way, but str.isnumeric()
is checking to see if a string contains only digit characters. So this means that you can have digit characters from any unicode language and it will come up true.
Read here: https://lerner.co.il/2019/02/17/pythons-str-isdigit-vs-str-isnumeric/
z = '100834'In [36]: z = '100834'
In [37]: print(z.isnumeric())
True
In [38]: y = '123.456'
In [39]: print(y.isnumeric())
False
In [40]: print(y.isdigit())
False
In [42]: print('003712'.isnumeric())
True
In [43]: print('௩௰à¯'.isnumeric())
True
In [44]: print('二百三'.isdigit())
False
In [45]: print('二百三'.isnumeric())
True
Try/except
There are a ton of features to this structure, but the core is this:
try:
tries to run this code
except:
this code runs instead if any errors are dectected
In [46]: try:
...: y = float(3.14)
...: print(y)
...: except:
...: print("cannot convert")
...:
3.14
In [48]: try:
...: y = int('3.14')
...: print(y)
...: except:
...: print("cannot convert")
...:
cannot convert
Lists
Are the core position indexed array data structure.
When you see []
out on their own, you've got a list.
Can be any mix of objects, but do try and keep them the same.
lista = [1, 2, 3, 4]
listb = ['c', 'p', 'w']
listc = ['d', 1, [], int, 9, 23.99]
List methods
These are how you mutate a list.
list.append(stuff)
will addstuff
to the end oflist
.list.count(stuff)
will return a count of how many timesstuff
appears withinlist
.list.index(stuff)
will return the first found index position ofstuff
withinlist
.
In [50]: lista.append(1)
...: lista.append(1)
...: print(lista)
[1, 2, 3, 4, 1, 1]
In [52]: print(lista.count(1))
3
In [53]: print(lista.index(1))
0
Slicing (for strings and lists)
This syntax is shared across strings and lists.
sequence[start:stop:step] every element is optional, but you need to retain the :
to keep the delimiters clear.
list[index] gives you a single element list[start:stop] slices that range list[start:stop:step] slices that range, hopping by step
In [54]: print(lista)
...: print(listb)
[1, 2, 3, 4, 1, 1]
['c', 'p', 'w']
In [55]: print(lista[0]) #returns just the one element
...: print(lista[:3]) # returns a list with that slice
1
[1, 2, 3]
for
loops
for
loopsfor iterable in sequence:
# do stuff
Python will define the iterable variable you name every time the for loop runs, and it will persist after the for loop executes.
Everything within the indent level of the for loop here will execute when the for loop runs.
Python also knows how the sequences unpack and takes care of it for you.
Strings unpack by character, and lists unpack by element.
You can loop over numbers by using the range() function. Range can be called by: range(stop), range(start, stop), range(start, stop, step) and these numbers follow the same conventions as slicing.
In [56]: for num in range(5):
...: print(num)
...:
0
1
2
3
4
In [57]: for num in range(3, 8):
...: print(num)
...:
3
4
5
6
7
In [58]: for character in "hello world":
...: print(character)
...:
h
e
l
l
o
w
o
r
l
d
In [59]: for item in lista:
...: print(item)
...:
1
2
3
4
1
1
In [60]: for item in listc:
...: print(item)
...:
d
1
[]
<class 'int'>
9
23.99
This is why you want to be careful about the mix of data types, because you'll have to add in logic for what you are doing to them.
Counter pattern
In [61]: count = 0
...:
...: for item in listc:
...: count = count + 1
...:
...: print("There are", count, "items in listc")
There are 6 items in listcThere are 6 items in listc
Accumulator pattern
In [62]: total = 0
...:
...: for num in lista:
...: total = total + num
...:
...: print("The sum of lista is", total)
The sum of lista is 12
Boolean expressions
These are expressions that evaluate to True
or False
, and you've seen a few examples of these.
The usual boolean operators work this way, but there are other methods and functions that will also provide these tests.
In [63]: print(1 < 3)
True
In [64]: print("cat" == "CAT")
False
In [65]: print("c".isdigit())
False
logic structures with if
/elif
/else
if
/elif
/else
Python just uses whitespace to mark the blocks of code. Only if
is required, and can only appear once.
if boolean_expression:
# do this code
You can also use elif
like else if, and you can have an unlimited number of these.
if boolean_expression:
# do this code
elif boolean_expression:
# do this code
You can also add a last else
block at the end to catch any checks that haven't yet triggered any other blocks. This is optional, but you can only have one and it must be the last block in the chain.
if boolean_expression:
# do this code
elif boolean_expression:
# do this code
else:
# do this code if nothing else triggers
You can also nest these as needed.
if boolean_expression:
if boolean_expression:
# do this
elif boolean_expression:
# do this
else:
# do this
else:
# do this code
Last updated
Was this helpful?