Week 1

Welcome to Python! This class will attempt to provide two things:

  1. A fast overview of the core Python language, presuming you are fluent in another programming language.

  2. A sampling of some of the most important standard library modules related to working with data preparation for data science.

The Zen of Python: How is it different from other languages?

Designed with intent for readability, the language design has been purposeful. Let's look at this this little easter egg (you don't need to execute this code right now).

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

These ideas are important to us. I'll discuss this more in the lecture.

Variables

  • Different from other languages like Java

  • No declarations of data type of values (unless they already need to exist with a value to be incremented)

  • Can be any data type

  • Freely change data types without declaration

  • Freely change values

Make them on the fly, no need to declare they exist, no need to declare the data type

Printing

Printing in Python requires the print()function. You can put pretty much anything inside this function and it'll say something about it.

  • by default this will go to the console or standard out

  • most objects have their own internal print method, which is automatically called when you pass it into print().

  • As a default, print() will include a newline character at the end of execution, but this can be changed with optional parameters.

  • print() even has a nice keyword argument to write the contents out to a file. We'll see that in later lessons.

An example of variable assignment and performing math operations within the print function:

In [1]: x = 1 
   ...: print(x) 
   ...: print(x * 100) 
   ...: print(x * x)                                                            
1
100
1

An example of a boolean data type printing the value:

In [2]: seenBefore = False 
   ...: print(seenBefore)                                                       
False

An example of performing string operations and string method calls within the print function:

In [3]: phrase = "no semicolons needed!" 
   ...: print(phrase) 
   ...: print(phrase + phrase) 
   ...: print(phrase.upper())                                                   
no semicolons needed!
no semicolons needed!no semicolons needed!
NO SEMICOLONS NEEDED!

Expressions

You've seen some of these before in the previous examples. These are the small fragments of code that evaluate into a result.

They can be small (like x + 1) or very very long

Most lines of code are made up of several expressions that execute in a particular order (usually PEDMAS) or are chained together. Part of learning this language is learning how to

In [7]: print(x + 10) # returns an int                                          
11
In [8]: print("You said: " + phrase) # returns a single stringprint
You said: no semicolons needed!You said: no semicolons needed!
In [10]: print(x > 10) # returns a boolean value                                
False

The key: if you don't know what content and data type an expression will return, you are almost certainly going to make a bug in your code with that expression. Here are strategies:

  • There is reasonable consistency in how things are

Strings

You can use '' or "" or """""".

'' and "" operate the same.

"""""" allows you to have rendered newlines, and is used mostly for documentation and other meta stuff.

In [15]: print("hello" + 'hello')                                               
hellohellohellohello
In [16]: print("""why yes 
    ...: there are newlines 
    ...: and this is valid""")                                                  
why yes
there are newlines
and this is valid

Numbers

There are only two you'll need to mess with, int and float. Integers and floating point values.

Don't worry about size, all that is taken care of you.

There are other ways to mess with numerical representation if you have a specific need, but generally numbers are straight forward.

Returned values from int to int computations will mostly give you ints, but when you include a float you'll almost certainly get a float back.

In [17]: print(3 * 6)                                                           
18
print(3.1 * 6)In [18]: print(3.1 * 6)                                                         
18.6
In [19]: print(12348481238121924619246192461293719247192461 * 100032423946234)                                                                      
1235248510303928884959610285595346128398198959310914141874

Recasting variables

Every data type name (or built in class name) has a function version to recast stuff.

  • int(), float()

  • str()

  • etc.

Not all items can be cast to these data types, and you will get an error if this is the case. Pretty much everything can become a string, but you will find some errors around numerical recasting.

In [21]: print(str(13778237))                                                   
13778237
In [23]: print(int(39.99932312)) # will truncate, floor rounding 
    ...:                                                                        
39
In [24]: print(float(31239))In [24]: print(float(31239)) 
    ...:                                                                        
31239.0
In [25]: print(int("hello world"))                                              
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-e092507d8694> in <module>
      1 
----> 2 print(int("hello world"))

ValueError: invalid literal for int() with base 10: 'hello world'ValueError                                Traceback (most recent call last)

Testing for data types

There are a few ways to do this, each with pros and cons. None are disallowed, and all may be valuable for certain problem solving needs. But here is the generally best way:

The isinstance() function is designed to do this. It will return either True or False. It plays the nicest with class inheritance levels and is pretty flexible, so that's why it's recommended.

Pass a group of data types to check multiple.

But you can also do an equality check with the results from the type() function.

print(isinstance("hello", str))In [27]: print(isinstance("hello", str))                                        
True
In [29]: print(isinstance("hello", int))                                                                     
False
In [30]: print(isinstance("hello", (int, float)))                                                                  
False

In [31]: print(isinstance(3.14, (int, float)))                                                                         
True
In [32]: print(isinstance(3, (int, float)))                                                                       
True
In [33]: print(type(3) == int)                                                                       
True

Testing for floating point digits

This isn't something so directly supported like testing to see if something is an integer. Floating point numbers are things with decimal values. Unfortunatly, this is not supported because there are various international conventions for formatting decimal values. But there are ways of processing things.

Let's presume that these numbers will come in with periods. You can remove the periods from the strings, and then further test to see if that is numerical. This works because you cannot have more than one decimal within a number.

It is true that you can just attempt to cast the number into a float within a try and except, but that doesn't fit cleanly into a logic structure. You can use the replace string method, calling it such that it will only replace the value one time.

print("9.2455".replace(".", '', 1)) 
    ...:                                                                        
92455

Testing for types with special object functions

Many objects have special methods for testing stuff about that object's content. This can include general stuff, like if the letters are all capitalized, but also if a string value is all numerical. There are many, but here are some. These can work in subtle and surprising ways, so you'll always want to reference the documentation.

For example, str.isdigit() tests for the 0-9 numericalness from a very python oriented way, but str.isnumeric() is checking to see if a string contains only digit characters. So this means that you can have digit characters from any unicode language and it will come up true.

Read here: https://lerner.co.il/2019/02/17/pythons-str-isdigit-vs-str-isnumeric/

z = '100834'In [36]: z = '100834'                                                           

In [37]: print(z.isnumeric())                                                   
True
In [38]: y = '123.456'                                                          

In [39]: print(y.isnumeric())                                                   
False

In [40]: print(y.isdigit())                                                     
False
In [42]: print('003712'.isnumeric())                                            
True

In [43]: print('௩௰௭'.isnumeric())                                               
True

In [44]: print('二百三'.isdigit())                                              
False

In [45]: print('二百三'.isnumeric())                                            
True

Try/except

There are a ton of features to this structure, but the core is this:

try:
    tries to run this code
except:
    this code runs instead if any errors are dectected
In [46]: try: 
    ...:     y = float(3.14) 
    ...:     print(y) 
    ...: except: 
    ...:     print("cannot convert") 
    ...:                                                                        
3.14
In [48]: try: 
    ...:     y = int('3.14') 
    ...:     print(y) 
    ...: except: 
    ...:     print("cannot convert") 
    ...:                                                                        
cannot convert

Lists

Are the core position indexed array data structure.

When you see [] out on their own, you've got a list.

Can be any mix of objects, but do try and keep them the same.

lista = [1, 2, 3, 4]
listb = ['c', 'p', 'w']
listc = ['d', 1, [], int, 9, 23.99]

List methods

These are how you mutate a list.

  • list.append(stuff) will add stuff to the end of list.

  • list.count(stuff) will return a count of how many times stuff appears within list.

  • list.index(stuff) will return the first found index position of stuff within list.

In [50]: lista.append(1) 
    ...: lista.append(1) 
    ...: print(lista)                                                           
[1, 2, 3, 4, 1, 1]
In [52]: print(lista.count(1))                                                  
3

In [53]: print(lista.index(1))                                                  
0

Slicing (for strings and lists)

This syntax is shared across strings and lists.

sequence[start:stop:step] every element is optional, but you need to retain the : to keep the delimiters clear.

list[index] gives you a single element list[start:stop] slices that range list[start:stop:step] slices that range, hopping by step

In [54]: print(lista) 
    ...: print(listb)                                                           
[1, 2, 3, 4, 1, 1]
['c', 'p', 'w']

In [55]: print(lista[0]) #returns just the one element 
    ...: print(lista[:3]) # returns a list with that slice                      
1
[1, 2, 3]

for loops

for iterable in sequence:
    # do stuff

Python will define the iterable variable you name every time the for loop runs, and it will persist after the for loop executes.

Everything within the indent level of the for loop here will execute when the for loop runs.

Python also knows how the sequences unpack and takes care of it for you.

Strings unpack by character, and lists unpack by element.

You can loop over numbers by using the range() function. Range can be called by: range(stop), range(start, stop), range(start, stop, step) and these numbers follow the same conventions as slicing.

In [56]: for num in range(5): 
    ...:     print(num) 
    ...:                                                                        
0
1
2
3
4
In [57]: for num in range(3, 8): 
    ...:     print(num) 
    ...:                                                                        
3
4
5
6
7
In [58]: for character in "hello world": 
    ...:     print(character) 
    ...:                                                                        
h
e
l
l
o
 
w
o
r
l
d
In [59]: for item in lista: 
    ...:     print(item) 
    ...:                                                                        
1
2
3
4
1
1
In [60]: for item in listc: 
    ...:     print(item) 
    ...:                                                                        
d
1
[]
<class 'int'>
9
23.99

This is why you want to be careful about the mix of data types, because you'll have to add in logic for what you are doing to them.

Counter pattern

In [61]: count = 0 
    ...:  
    ...: for item in listc: 
    ...:     count = count + 1 
    ...:  
    ...: print("There are", count, "items in listc")                            
There are 6 items in listcThere are 6 items in listc

Accumulator pattern

In [62]: total = 0 
    ...:  
    ...: for num in lista: 
    ...:     total = total + num 
    ...:  
    ...: print("The sum of lista is", total)                                    
The sum of lista is 12

Boolean expressions

These are expressions that evaluate to True or False, and you've seen a few examples of these.

The usual boolean operators work this way, but there are other methods and functions that will also provide these tests.

In [63]: print(1 < 3)                                                           
True

In [64]: print("cat" == "CAT")                                                                       
False

In [65]: print("c".isdigit())                                                                       
False

logic structures with if/elif/else

Python just uses whitespace to mark the blocks of code. Only if is required, and can only appear once.

if boolean_expression:
    # do this code

You can also use elif like else if, and you can have an unlimited number of these.

if boolean_expression:
    # do this code
elif boolean_expression:
    # do this code

You can also add a last else block at the end to catch any checks that haven't yet triggered any other blocks. This is optional, but you can only have one and it must be the last block in the chain.

if boolean_expression:
    # do this code
elif boolean_expression:
    # do this code
else:
    # do this code if nothing else triggers

You can also nest these as needed.

if boolean_expression:
    if boolean_expression:
        # do this
    elif boolean_expression:
        # do this
    else:
        # do this
else:
    # do this code

Last updated