Week 3
Files, operating system operations, and time
making file names with leading zeros, zfill
flat text files chapter 10
csv files PCB chapter 6.1
json PCB chapter 6.2
break
os and sys modules, why these are important
shutil
time
datetime
time conversion
Readings for this week:
https://docs.python.org/3/library/pathlib.html (skim through the explainations of things)
Python Crash Course chapter 10
Python Cook Book (https://vufind.carli.illinois.edu/vf-uiu/Record/uiu_8507281) sections 6.1 and 6.2
https://docs.python.org/3/library/os.html focus on reading through the intro descriptions for each major section. There are about 9 of them, and you can see them on the left side panel of the page. Don't worry about reading through all the functions.
Automate the Boring Stuff with Python (https://vufind.carli.illinois.edu/vf-uiu/Record/uiu_8500455) chapter 15
File paths and pathlib
You may or may not have seen that macs and windows have different formats for file paths. This means that if you are trying to programmatically generate a path that will work on multiple operating systems, you have to do some strange things.
Also, even if you are not worring about that, trying to just get the file name out of an arbitrarily deep file path usually would result in a ton of strange string manipulations. Also, if you were trying to get the root name of a file out so you could add a different file extension, you would have to do some even worse manipulation.
import glob
for f in glob.glob('*.ipynb'):
print(f.split('/')[-1].split('.')[0])
week1
file creation project
week2notes-Copy
week2notes
testingtime
week1classdemo
test
Week3Notes
This is where pathlib comes in. It understands the file system from an object oriented perspective, so you are operating on and extracting information from the file path using object attributes and methods rather than manipulating a string without conceptuale awareness of the core content meaning.
First, import pathlib. This will give you access to the Path()
function. You can pass this function a full file name or a directory name. Passing it a directory will give you access to the glob()
method for searching for file paths.
I'm going to give this the .
which represents my current directory.
import pathlib
p = pathlib.Path('.')
print(p.absolute()) # get the absolute path
print(p.is_dir()) # check if this is a directory or not
/home/nbuser/library
True
The glob()
method of a Path object allows you to provide pattern searches for files within that directory. Put these in as string. This returns a generator object, which is beneficial for when you have a huge number of files. We can go ahead and cast it to a list to see the contents, or we can loop through it to see them individually. Now that we have multiple here, we can add a few checks about what they are.
Warning, generators will only allow you to loop through them once before their contents go away.
print(p.glob('*'))
results = list(p.glob('*'))
for subp in results:
print(subp, subp.is_dir())
<generator object Path.glob at 0x7fa71ba40af0>
week1classdemo.ipynb False
week2notes.ipynb False
README.md False
test.ipynb False
.ipynb_checkpoints True
testingtime.ipynb False
__pycache__ True
Week3Notes.ipynb False
examplefolder True
week2notes-Copy.ipynb False
file creation project.ipynb False
week1.ipynb False
Week 2 demo.ipynb False
coolstuff.py False
test.py False
folders = [path for path in results if path.is_dir()]
folders
[PosixPath('.ipynb_checkpoints'),
PosixPath('__pycache__'),
PosixPath('examplefolder')]
There are many useful methods for getting metadata about object at question.
print(p.is_dir(), p.is_file(), p.exists())
# checking if it is a folder, if it's a file, and if that path exists.
True False True
When you have a directory, you can use several operators to build up a new path.
example = pathlib.Path('examplefolder')
newpath = p / example
print(newpath.absolute())
print(newpath.exists())
print(list(newpath.glob('*')))
/home/nbuser/library/examplefolder
True
[PosixPath('examplefolder/smalltext.txt'), PosixPath('examplefolder/demo.txt'), PosixPath('examplefolder/uppersmalltext.txt'), PosixPath('examplefolder/data'), PosixPath('examplefolder/boomboom.txt'), PosixPath('examplefolder/countsmalltext.txt')]
You can also ask for more informaiton about it. https://pbpython.com/pathlib-intro.html has several great diagrams
print(p.absolute()) # the absolute path for this
/home/nbuser/library
for f in newpath.glob('*'):
print(f.name, f.stem, f.parent, f.suffix, f.suffixes, sep = "--")
# check out the https://docs.python.org/3/library/pathlib.html#methods-and-properties here for more info on them
smalltext.txt--smalltext--examplefolder--.txt--['.txt']
demo.txt--demo--examplefolder--.txt--['.txt']
uppersmalltext.txt--uppersmalltext--examplefolder--.txt--['.txt']
data--data--examplefolder----[]
boomboom.txt--boomboom--examplefolder--.txt--['.txt']
countsmalltext.txt--countsmalltext--examplefolder--.txt--['.txt']
Sometimes you may want to just find the files, or just the txt files.
print(list(newpath.glob('*.txt')))
[PosixPath('examplefolder/smalltext.txt'), PosixPath('examplefolder/demo.txt'), PosixPath('examplefolder/uppersmalltext.txt'), PosixPath('examplefolder/boomboom.txt'), PosixPath('examplefolder/countsmalltext.txt')]
Creating file names and zfill
File names, at their heart, are just strings. You should use the pathlib module to create them, but you may still need to use string manipulation to craft the contents. For example, you should use pathlib operations to connect file names and folder locations. However, you may need to programmatically make folder names or the contents of a file name.
You can use basic string manipulation to create this content, then feed that into a pathlib object.
for color in ['book', 'movie']:
for i in range(15):
print(color + '-' + str(i) + '.txt')
book-0.txt
book-1.txt
book-2.txt
book-3.txt
book-4.txt
book-5.txt
book-6.txt
book-7.txt
book-8.txt
book-9.txt
book-10.txt
book-11.txt
book-12.txt
book-13.txt
book-14.txt
movie-0.txt
movie-1.txt
movie-2.txt
movie-3.txt
movie-4.txt
movie-5.txt
movie-6.txt
movie-7.txt
movie-8.txt
movie-9.txt
movie-10.txt
movie-11.txt
movie-12.txt
movie-13.txt
movie-14.txt
I'm using core string operations and casting to concat a file name together, including just the string value of .txt
at the end.
So this is fine, except they will sort strangely. It would be nice to add leading zeros into the numbers so they have the sameoverall string length.
You can use the zfill
string method to create the padded numbers. The syntax here is tricky, so pay close attention.
str.zfill(int)
so you call it on the string value, and then give an integer number representing how many leading places you want.
for color in ['book', 'movie']:
for i in range(15):
print(color + '-' + str(i).zfill(3) + '.txt')
book-000.txt
book-001.txt
book-002.txt
book-003.txt
book-004.txt
book-005.txt
book-006.txt
book-007.txt
book-008.txt
book-009.txt
book-010.txt
book-011.txt
book-012.txt
book-013.txt
book-014.txt
movie-000.txt
movie-001.txt
movie-002.txt
movie-003.txt
movie-004.txt
movie-005.txt
movie-006.txt
movie-007.txt
movie-008.txt
movie-009.txt
movie-010.txt
movie-011.txt
movie-012.txt
movie-013.txt
movie-014.txt
'ORD'.zfill(5) # this does also work on character strings, but is a little weird.
'00ORD'
Once you construct the file name, you can also cast it as a pathlib object and construct a path with it. It might seem a little tricky, but in the absolute print value here you can see that it is doing a lot of work.
target = pathlib.Path('results')
for color in ['book', 'movie']:
for i in range(15):
fname = color + '-' + str(i).zfill(3) + '.txt'
outfilepath = target / pathlib.Path(fname)
outfilepath.write_text('hello world' + str(i)) # you can even write files!!!
deleteus = target.glob('*.txt')
for f in deleteus:
f.unlink()
print(list(target.glob('*.txt')))
[]
# maybe you wanted these subgroups to be separate folders.
target = pathlib.Path('results')
for color in ['book', 'movie']:
for i in range(15):
fname = str(i).zfill(3) + '.txt'
outfilepath = target / pathlib.Path(color) / pathlib.Path(fname)
outfilepath.write_text('hello again ' + color + str(i))
You can also have it take an action for you, to make a folder and do a bunch of other stuff.
target = pathlib.Path('results')
if not target.exists():
target.mkdir()
for color in ['book', 'movie']:
c_dir = pathlib.Path(color)
subdir = target / c_dir
if not subdir.exists():
subdir.mkdir()
for i in range(15):
fname = color + '-' + str(i).zfill(3) + '.txt'
outfilepath = target / pathlib.Path(fname)
print(outfilepath.absolute())
/home/nbuser/library/results/book-000.txt
/home/nbuser/library/results/book-001.txt
/home/nbuser/library/results/book-002.txt
/home/nbuser/library/results/book-003.txt
/home/nbuser/library/results/book-004.txt
/home/nbuser/library/results/book-005.txt
/home/nbuser/library/results/book-006.txt
/home/nbuser/library/results/book-007.txt
/home/nbuser/library/results/book-008.txt
/home/nbuser/library/results/book-009.txt
/home/nbuser/library/results/book-010.txt
/home/nbuser/library/results/book-011.txt
/home/nbuser/library/results/book-012.txt
/home/nbuser/library/results/book-013.txt
/home/nbuser/library/results/book-014.txt
/home/nbuser/library/results/movie-000.txt
/home/nbuser/library/results/movie-001.txt
/home/nbuser/library/results/movie-002.txt
/home/nbuser/library/results/movie-003.txt
/home/nbuser/library/results/movie-004.txt
/home/nbuser/library/results/movie-005.txt
/home/nbuser/library/results/movie-006.txt
/home/nbuser/library/results/movie-007.txt
/home/nbuser/library/results/movie-008.txt
/home/nbuser/library/results/movie-009.txt
/home/nbuser/library/results/movie-010.txt
/home/nbuser/library/results/movie-011.txt
/home/nbuser/library/results/movie-012.txt
/home/nbuser/library/results/movie-013.txt
/home/nbuser/library/results/movie-014.txt
for folder in target.iterdir():
print(list(folder.glob('*')))
[PosixPath('results/movie/000.txt'), PosixPath('results/movie/011.txt'), PosixPath('results/movie/013.txt'), PosixPath('results/movie/004.txt'), PosixPath('results/movie/014.txt'), PosixPath('results/movie/009.txt'), PosixPath('results/movie/005.txt'), PosixPath('results/movie/006.txt'), PosixPath('results/movie/008.txt'), PosixPath('results/movie/001.txt'), PosixPath('results/movie/007.txt'), PosixPath('results/movie/002.txt'), PosixPath('results/movie/003.txt'), PosixPath('results/movie/012.txt'), PosixPath('results/movie/010.txt')]
[PosixPath('results/book/000.txt'), PosixPath('results/book/011.txt'), PosixPath('results/book/013.txt'), PosixPath('results/book/004.txt'), PosixPath('results/book/014.txt'), PosixPath('results/book/009.txt'), PosixPath('results/book/005.txt'), PosixPath('results/book/006.txt'), PosixPath('results/book/008.txt'), PosixPath('results/book/001.txt'), PosixPath('results/book/007.txt'), PosixPath('results/book/002.txt'), PosixPath('results/book/003.txt'), PosixPath('results/book/012.txt'), PosixPath('results/book/010.txt')]
Reading in flat text files
There are so many ways to this.
The open()
function covers alot. Once you have a file open in a read mode or a write mode, you can access those specific methods. Again, there are many and this will not be exhaustive.
target = pathlib.Path('examplefolder') / pathlib.Path('boomboom.txt')
with open(target, 'r') as file_in:
text = file_in.read()
print(text)
A told B, and B told C, "I'll meet you at the top of the coconut tree."
"Wheee!" said D to E F G, "I'll beat you to the top of the coconut tree."
Chicka chicka boom boom! Will there be enough room? Here comes H up the coconut tree,
and I and J and tag-along K, all on their way up the coconut tree.
Chicka chicka boom boom! Will there be enough room? Look who's coming! L M N O P!
And Q R S! And T U V! Still more - W! And X Y Z!
The whole alphabet up the - Oh, no! Chicka chicka... BOOM! BOOM!
Skit skat skoodle doot. Flip flop flee. Everybody running to the coconut tree.
Mamas and papas and uncles and aunts hug their little dears, then dust their pants.
"Help us up," cried A B C.
Next from the pileup skinned-knee D and stubbed-toe E and patched-up F. Then comes G all out of breath.
H is tangled up with I. J and K are about to cry. L is knotted like a tie.
M is looped. N is stopped. O is twisted alley-oop. Skit skat skoodle doot. Flip flop flee.
Look who's coming! It's black-eyed P, Q R S, and loose-tooth T. Then U V W wiggle-jiggle free.
Last to come X Y Z. And the sun goes down on the coconut tree...
But - chicka chicka boom boom! Look, there's a full moon.
A is out of bed, and this is what he said, "Dare double dare, you can't catch me.
Chicka chicka BOOM! BOOM!Chicka chicka BOOM! BOOM!
I'll beat you to the top of the coconut tree."
Chicka chicka BOOM! BOOM!
target = pathlib.Path('examplefolder') / pathlib.Path('boomboom.txt')
with open(target, 'r') as file_in:
lines = file_in.readlines()
lines
['A told B, and B told C, "I\'ll meet you at the top of the coconut tree."\n',
'"Wheee!" said D to E F G, "I\'ll beat you to the top of the coconut tree."\n',
'Chicka chicka boom boom! Will there be enough room? Here comes H up the coconut tree,\n',
'and I and J and tag-along K, all on their way up the coconut tree.\n',
"Chicka chicka boom boom! Will there be enough room? Look who's coming! L M N O P!\n",
'And Q R S! And T U V! Still more - W! And X Y Z!\n',
'The whole alphabet up the - Oh, no! Chicka chicka... BOOM! BOOM!\n',
'Skit skat skoodle doot. Flip flop flee. Everybody running to the coconut tree.\n',
'Mamas and papas and uncles and aunts hug their little dears, then dust their pants.\n',
'"Help us up," cried A B C.\n',
'Next from the pileup skinned-knee D and stubbed-toe E and patched-up F. Then comes G all out of breath.\n',
'H is tangled up with I. J and K are about to cry. L is knotted like a tie.\n',
'M is looped. N is stopped. O is twisted alley-oop. Skit skat skoodle doot. Flip flop flee.\n',
"Look who's coming! It's black-eyed P, Q R S, and loose-tooth T. Then U V W wiggle-jiggle free.\n",
'Last to come X Y Z. And the sun goes down on the coconut tree...\n',
"But - chicka chicka boom boom! Look, there's a full moon.\n",
'A is out of bed, and this is what he said, "Dare double dare, you can\'t catch me.\n',
'Chicka chicka BOOM! BOOM!Chicka chicka BOOM! BOOM!\n',
'I\'ll beat you to the top of the coconut tree."\n',
'Chicka chicka BOOM! BOOM!']
target = pathlib.Path('examplefolder') / pathlib.Path('boomboom.txt')
text = target.read_text()
print(text)
A told B, and B told C, "I'll meet you at the top of the coconut tree."
"Wheee!" said D to E F G, "I'll beat you to the top of the coconut tree."
Chicka chicka boom boom! Will there be enough room? Here comes H up the coconut tree,
and I and J and tag-along K, all on their way up the coconut tree.
Chicka chicka boom boom! Will there be enough room? Look who's coming! L M N O P!
And Q R S! And T U V! Still more - W! And X Y Z!
The whole alphabet up the - Oh, no! Chicka chicka... BOOM! BOOM!
Skit skat skoodle doot. Flip flop flee. Everybody running to the coconut tree.
Mamas and papas and uncles and aunts hug their little dears, then dust their pants.
"Help us up," cried A B C.
Next from the pileup skinned-knee D and stubbed-toe E and patched-up F. Then comes G all out of breath.
H is tangled up with I. J and K are about to cry. L is knotted like a tie.
M is looped. N is stopped. O is twisted alley-oop. Skit skat skoodle doot. Flip flop flee.
Look who's coming! It's black-eyed P, Q R S, and loose-tooth T. Then U V W wiggle-jiggle free.
Last to come X Y Z. And the sun goes down on the coconut tree...
But - chicka chicka boom boom! Look, there's a full moon.
A is out of bed, and this is what he said, "Dare double dare, you can't catch me.
Chicka chicka BOOM! BOOM!Chicka chicka BOOM! BOOM!
I'll beat you to the top of the coconut tree."
Chicka chicka BOOM! BOOM!
target = pathlib.Path('examplefolder') / pathlib.Path('boomboom.txt')
infile = open(target, 'r')
text = infile.read()
infile.close()
print(text)
A told B, and B told C, "I'll meet you at the top of the coconut tree."
"Wheee!" said D to E F G, "I'll beat you to the top of the coconut tree."
Chicka chicka boom boom! Will there be enough room? Here comes H up the coconut tree,
and I and J and tag-along K, all on their way up the coconut tree.
Chicka chicka boom boom! Will there be enough room? Look who's coming! L M N O P!
And Q R S! And T U V! Still more - W! And X Y Z!
The whole alphabet up the - Oh, no! Chicka chicka... BOOM! BOOM!
Skit skat skoodle doot. Flip flop flee. Everybody running to the coconut tree.
Mamas and papas and uncles and aunts hug their little dears, then dust their pants.
"Help us up," cried A B C.
Next from the pileup skinned-knee D and stubbed-toe E and patched-up F. Then comes G all out of breath.
H is tangled up with I. J and K are about to cry. L is knotted like a tie.
M is looped. N is stopped. O is twisted alley-oop. Skit skat skoodle doot. Flip flop flee.
Look who's coming! It's black-eyed P, Q R S, and loose-tooth T. Then U V W wiggle-jiggle free.
Last to come X Y Z. And the sun goes down on the coconut tree...
But - chicka chicka boom boom! Look, there's a full moon.
A is out of bed, and this is what he said, "Dare double dare, you can't catch me.
Chicka chicka BOOM! BOOM!Chicka chicka BOOM! BOOM!
I'll beat you to the top of the coconut tree."
Chicka chicka BOOM! BOOM!
Reading csv files
This has a very specific pattern to follow, but does use the standard library csv
module.
This will read a csv in as a 2D list.
You pass your file IO object to the csv.writer() function, which will created an iterative parser with a cursor. You can read individual lines with the next() function, and then use a list comprehension to read the rest of it.
This module has a few other mechanisms for reading in, but this is the most universal pattern for loading a 2D structure.
import csv
target = pathlib.Path('query_has_unique') / pathlib.Path('query_has_unique.csv')
with open(target, 'r') as file_in:
csvin = csv.reader(file_in)
headers = next(csvin)
data = [l for l in csvin]
data
[['horse', '9773', 'False', '0.0'],
['horseLabel', '9582', 'False', '0.0'],
['mother', '1548', 'False', '0.0'],
['father', '1852', 'False', '0.0'],
['birthyear', '319', 'False', '0.9968652037617555'],
['genderLabel', '4', 'False', '0.0'],
['deathyear', '152', 'False', '0.993421052631579'],
['testfoo', '1', 'True', '0.0']]
print(headers)
['field', 'num_times_seen', 'all_unique_Q', 'percent_is_numeric']
uniquefield = headers.index('all_unique_Q')
for line in data:
print(line[uniquefield])
False
False
False
False
False
False
False
True
Writing a CSV file
This uses a similar pattern. You need to already have a 2D array of lists ready to go. You can build up one like this:
import pathlib
target = pathlib.Path('examplefolder') / pathlib.Path('boomboom.txt')
with open(target, 'r') as file_in:
lines = file_in.readlines()
lines
['A told B, and B told C, "I\'ll meet you at the top of the coconut tree."\n',
'"Wheee!" said D to E F G, "I\'ll beat you to the top of the coconut tree."\n',
'Chicka chicka boom boom! Will there be enough room? Here comes H up the coconut tree,\n',
'and I and J and tag-along K, all on their way up the coconut tree.\n',
"Chicka chicka boom boom! Will there be enough room? Look who's coming! L M N O P!\n",
'And Q R S! And T U V! Still more - W! And X Y Z!\n',
'The whole alphabet up the - Oh, no! Chicka chicka... BOOM! BOOM!\n',
'Skit skat skoodle doot. Flip flop flee. Everybody running to the coconut tree.\n',
'Mamas and papas and uncles and aunts hug their little dears, then dust their pants.\n',
'"Help us up," cried A B C.\n',
'Next from the pileup skinned-knee D and stubbed-toe E and patched-up F. Then comes G all out of breath.\n',
'H is tangled up with I. J and K are about to cry. L is knotted like a tie.\n',
'M is looped. N is stopped. O is twisted alley-oop. Skit skat skoodle doot. Flip flop flee.\n',
"Look who's coming! It's black-eyed P, Q R S, and loose-tooth T. Then U V W wiggle-jiggle free.\n",
'Last to come X Y Z. And the sun goes down on the coconut tree...\n',
"But - chicka chicka boom boom! Look, there's a full moon.\n",
'A is out of bed, and this is what he said, "Dare double dare, you can\'t catch me.\n',
'Chicka chicka BOOM! BOOM!Chicka chicka BOOM! BOOM!\n',
'I\'ll beat you to the top of the coconut tree."\n',
'Chicka chicka BOOM! BOOM!']
linenum = 1
allrows = [] # empty base list
for line in lines:
length = len(line)
words = len(line.split())
linenum += 1
onerow = [linenum, length, words]
allrows.append(onerow)
headers = ['linenum', 'length', 'numwords']
print(headers)
allrows
['linenum', 'length', 'numwords']
[[2, 72, 17],
[3, 74, 17],
[4, 86, 16],
[5, 67, 15],
[6, 82, 17],
[7, 49, 16],
[8, 65, 12],
[9, 79, 13],
[10, 84, 15],
[11, 27, 7],
[12, 104, 19],
[13, 75, 19],
[14, 91, 17],
[15, 95, 18],
[16, 65, 15],
[17, 58, 11],
[18, 82, 18],
[19, 51, 7],
[20, 47, 10],
[21, 25, 4]]
import csv
with open(pathlib.Path('linecounts.csv'), 'w') as file_out:
csvout = csv.writer(file_out)
csvout.writerow(headers) # note the singular name here
csvout.writerows(allrows) # your 2D list goes here
print(pathlib.Path('linecounts.csv').read_text())
linenum,length,numwords
2,72,17
3,74,17
4,86,16
5,67,15
6,82,17
7,49,16
8,65,12
9,79,13
10,84,15
11,27,7
12,104,19
13,75,19
14,91,17
15,95,18
16,65,15
17,58,11
18,82,18
19,51,7
20,47,10
21,25,4
Reading json files
The bulit in json module will parse a json file into a python dictionary. There are two main functions for loading things.
.load()
for parsing a fileio object into a dictionary.loads()
for parsing a string (with a json file in it) into a dictionary.
They function just about the same way.
import json
target = pathlib.Path('query_has_unique') / pathlib.Path('query_has_unique.json')
data = json.loads(target.read_text())
data
import json
target = pathlib.Path('query_has_unique') / pathlib.Path('query_has_unique.json')
with open(target, 'r') as file_in:
data = json.load(file_in)
data
Writing out a json file
Once you have a dictionary loaded, you might want to write it out as a json file. Similarly, there are two methods:
.dump()
for file io objects, will write the new json structure out to a file.dumps()
will dump the dict object out as a string, so you can save it as a variable if needed. A little rarer to use.
But these take more arguments. Let's start with the examples from last week for dicts.
You can also specify the indent level, which I like to be about 4. This makes the file much more readable. Otherwise it puts all the values on a single line.
some_wikipedia_text = "In 1957 the Soviet Union began deploying the S-75 Dvina surface-to-air missile, controlled by Fan Song fire control radars. \
This development made penetration of Soviet air space by American bombers more dangerous. The US Air Force began a program of \
cataloging the approximate location and individual operating frequencies of these radars, using electronic reconnaissance aircraft \
flying off the borders of the Soviet Union. This program provided information on radars on the periphery of the Soviet Union, \
but information on the sites further inland was lacking. Some experiments were carried out using radio telescopes looking for \
serendipitous Soviet radar reflections off the Moon, but this proved an inadequate solution to the problem."
# https://en.wikipedia.org/wiki/SOLRAD_2
words = some_wikipedia_text.lower().split()
print(words)
['in', '1957', 'the', 'soviet', 'union', 'began', 'deploying', 'the', 's-75', 'dvina', 'surface-to-air', 'missile,', 'controlled', 'by', 'fan', 'song', 'fire', 'control', 'radars.', 'this', 'development', 'made', 'penetration', 'of', 'soviet', 'air', 'space', 'by', 'american', 'bombers', 'more', 'dangerous.', 'the', 'us', 'air', 'force', 'began', 'a', 'program', 'of', 'cataloging', 'the', 'approximate', 'location', 'and', 'individual', 'operating', 'frequencies', 'of', 'these', 'radars,', 'using', 'electronic', 'reconnaissance', 'aircraft', 'flying', 'off', 'the', 'borders', 'of', 'the', 'soviet', 'union.', 'this', 'program', 'provided', 'information', 'on', 'radars', 'on', 'the', 'periphery', 'of', 'the', 'soviet', 'union,', 'but', 'information', 'on', 'the', 'sites', 'further', 'inland', 'was', 'lacking.', 'some', 'experiments', 'were', 'carried', 'out', 'using', 'radio', 'telescopes', 'looking', 'for', 'serendipitous', 'soviet', 'radar', 'reflections', 'off', 'the', 'moon,', 'but', 'this', 'proved', 'an', 'inadequate', 'solution', 'to', 'the', 'problem.']
counts = {} # see how I'm always opening my cells with the base value? this lets it reset every time you rerun things
for word in words:
if word not in counts:
counts[word] = 1 # base case
else:
counts[word] += 1 #increment if its in there
print(counts)
{'in': 1, '1957': 1, 'the': 11, 'soviet': 5, 'union': 1, 'began': 2, 'deploying': 1, 's-75': 1, 'dvina': 1, 'surface-to-air': 1, 'missile,': 1, 'controlled': 1, 'by': 2, 'fan': 1, 'song': 1, 'fire': 1, 'control': 1, 'radars.': 1, 'this': 3, 'development': 1, 'made': 1, 'penetration': 1, 'of': 5, 'air': 2, 'space': 1, 'american': 1, 'bombers': 1, 'more': 1, 'dangerous.': 1, 'us': 1, 'force': 1, 'a': 1, 'program': 2, 'cataloging': 1, 'approximate': 1, 'location': 1, 'and': 1, 'individual': 1, 'operating': 1, 'frequencies': 1, 'these': 1, 'radars,': 1, 'using': 2, 'electronic': 1, 'reconnaissance': 1, 'aircraft': 1, 'flying': 1, 'off': 2, 'borders': 1, 'union.': 1, 'provided': 1, 'information': 2, 'on': 3, 'radars': 1, 'periphery': 1, 'union,': 1, 'but': 2, 'sites': 1, 'further': 1, 'inland': 1, 'was': 1, 'lacking.': 1, 'some': 1, 'experiments': 1, 'were': 1, 'carried': 1, 'out': 1, 'radio': 1, 'telescopes': 1, 'looking': 1, 'for': 1, 'serendipitous': 1, 'radar': 1, 'reflections': 1, 'moon,': 1, 'proved': 1, 'an': 1, 'inadequate': 1, 'solution': 1, 'to': 1, 'problem.': 1}
wordcats = {'<=3': [], '4-6':[], '7+': []}
for word in words:
l = len(word)
if l <= 3:
wordcats['<=3'].append(word)
elif l <= 6:
wordcats['4-6'].append(word)
else:
wordcats['7+'].append(word)
print(wordcats)
{'<=3': ['in', 'the', 'the', 'by', 'fan', 'of', 'air', 'by', 'the', 'us', 'air', 'a', 'of', 'the', 'and', 'of', 'off', 'the', 'of', 'the', 'on', 'on', 'the', 'of', 'the', 'but', 'on', 'the', 'was', 'out', 'for', 'off', 'the', 'but', 'an', 'to', 'the'], '4-6': ['1957', 'soviet', 'union', 'began', 's-75', 'dvina', 'song', 'fire', 'this', 'made', 'soviet', 'space', 'more', 'force', 'began', 'these', 'using', 'flying', 'soviet', 'union.', 'this', 'radars', 'soviet', 'union,', 'sites', 'inland', 'some', 'were', 'using', 'radio', 'soviet', 'radar', 'moon,', 'this', 'proved'], '7+': ['deploying', 'surface-to-air', 'missile,', 'controlled', 'control', 'radars.', 'development', 'penetration', 'american', 'bombers', 'dangerous.', 'program', 'cataloging', 'approximate', 'location', 'individual', 'operating', 'frequencies', 'radars,', 'electronic', 'reconnaissance', 'aircraft', 'borders', 'program', 'provided', 'information', 'periphery', 'information', 'further', 'lacking.', 'experiments', 'carried', 'telescopes', 'looking', 'serendipitous', 'reflections', 'inadequate', 'solution', 'problem.']}
with open(pathlib.Path('wordcounts.json'), 'w') as jout:
json.dump(wordcats, jout, indent = 4) # in order: the dict, the file out object, and specify the indent.
NameError Traceback (most recent call last)
<ipython-input-2-1d83855bd886> in <module>
----> 1 with open(pathlib.Path('wordcounts.json'), 'w') as jout:
2 json.dump(wordcats, jout, indent = 4) # in order: the dict, the file out object, and specify the indent.
NameError: name 'pathlib' is not defined
# jso
'{"<=3": ["in", "the", "the", "by", "fan", "of", "air", "by", "the", "us", "air", "a", "of", "the", "and", "of", "off", "the", "of", "the", "on", "on", "the", "of", "the", "but", "on", "the", "was", "out", "for", "off", "the", "but", "an", "to", "the"], "4-6": ["1957", "soviet", "union", "began", "s-75", "dvina", "song", "fire", "this", "made", "soviet", "space", "more", "force", "began", "these", "using", "flying", "soviet", "union.", "this", "radars", "soviet", "union,", "sites", "inland", "some", "were", "using", "radio", "soviet", "radar", "moon,", "this", "proved"], "7+": ["deploying", "surface-to-air", "missile,", "controlled", "control", "radars.", "development", "penetration", "american", "bombers", "dangerous.", "program", "cataloging", "approximate", "location", "individual", "operating", "frequencies", "radars,", "electronic", "reconnaissance", "aircraft", "borders", "program", "provided", "information", "periphery", "information", "further", "lacking.", "experiments", "carried", "telescopes", "looking", "serendipitous", "reflections", "inadequate", "solution", "problem."]}'
os and sys modules
These are the modules that are great for working with the file system. Many of the functions will allow you to perform system operations no matter which system it is working on, but there iwll be some platform differences for a few of them. I'm going to show you the pretty universal ones. Not only will this allow you to make directories and move files, but you can also get system information about the file. Knowing the ones in here can be really valuable. You can change you current working directory, create and delete folders, etc. There are strong overlaps between what pathlib and os can do, but os has been around longer.
For our purposes, the pathlib library takes care of many of these actions.
A portion of the os module is os.path, which has file pathname manipulations. https://docs.python.org/3/library/os.path.html Again, our use of pathlib replaces many of these, but there are some lower level system manipulations you would want to know about if you are doing things with file processing.
os.path.getmtime()
is the function that we are going to highlight.
This function will take a path and give you the time the file at that path was last modified. This can be valuable if you are trying to sort through a ton of files and are looking for specific ones made during a certain timeframe. Or to see if the file you are working with was recently changed. This is a module within a library in python, so the call is also a little different.
shutil
This has a high level set of file operations. Maninly, this is where the various copy functions are. There are many because they all do a variety of different operations, including differences in metadata. For example, you can copy the file object over with or without the original metadata, or you can copy the permissions and other metadata from one file to another. This module also has the move functions for moving files.
There are also system information tools, such as geting the disk space and executable information.
import shutil
import pathlib
scratch = pathlib.Path('.') / pathlib.Path('scratch')
# clean it out and make a fresh empty folder
if not scratch.is_dir():
scratch.mkdir()
print("made dir")
else:
shutil.rmtree(str(scratch)) # need this because the folder is not empty
scratch.mkdir()
print("made dir")
zips = pathlib.Path('.').glob('*.zip')
for f in zips:
shutil.unpack_archive(str(f), str(scratch)) # this will create a folder with the stem name, so we only need to give it scratch
# print(pathlib.Path('.').glob('query_has_unique*')))
made dir
os module
import os
import pathlib
for f in pathlib.Path('examplefolder').glob('*'):
print(f.name, os.path.getmtime(f))
demo.txt 1570474458.0
boomboom.txt 1570474456.0
uppersmalltext.txt 1570474458.0
data 1570474457.0
smalltext.txt 1570474458.0
countsmalltext.txt 1570474457.0
But what the heck are these numbers?? They are the epoch system time that it was last updated, so you'll need to convert this to normal human time.
time module
This is a great module! This has methods of getting the current time, counting a certain amount of time, and forcing Python to pause execution for a certain amount of time.
That last one is important for API work and webscraping. The sleep()
function takes an integer value representing the number of seconds that you want the program to pause execution.
import time
print("wait for it")
time.sleep(3)
print("hope it was worth it")
wait for it
hope it was worth it
Time conversions
The time module also has time converstion functions.
This is the workhorse function for formatiting and parsing time strings. https://docs.python.org/3/library/time.html#time.strftime
time.localtime() # will return a time object with all the time content accessible
time.struct_time(tm_year=2019, tm_mon=10, tm_mday=8, tm_hour=2, tm_min=45, tm_sec=8, tm_wday=1, tm_yday=281, tm_isdst=0)
time.strftime("%Y-%m-%d is the day and the hour is %H:%M in timezone %Z", time.gmtime())
'2019-10-08 is the day and the hour is 02:48 in timezone GMT'
So no, I'm not writing this at 2 in the morning. This is reporting in GMT.
You can read all about the different formatting options here: https://docs.python.org/3/library/time.html#time.strftime
Importantly, this can also convert epoch time to human time.
time.strftime("%Y-%m-%d", time.gmtime(1570474457.0))
'2019-10-07'
import os
import pathlib
import time
for f in pathlib.Path('examplefolder').glob('*'):
print(f.name, time.strftime("%Y-%m-%d", time.gmtime(os.path.getmtime(f))))
demo.txt 2019-10-07
boomboom.txt 2019-10-07
uppersmalltext.txt 2019-10-07
data 2019-10-07
smalltext.txt 2019-10-07
countsmalltext.txt 2019-10-07
import os
import pathlib
import time
for f in pathlib.Path('.').glob('*'):
print(f.name, time.strftime("%Y-%m-%d", time.gmtime(os.path.getmtime(f))))
testingtime.ipynb 2019-09-23
results 2019-10-07
linecounts.csv 2019-10-07
Week 2 demo.ipynb 2019-10-02
__pycache__ 2019-09-25
wordcounts.json 2019-10-07
week2notes-Copy.ipynb 2019-09-26
__MACOSX 2019-10-07
test.ipynb 2019-09-20
.ipynb_checkpoints 2019-09-23
week1.ipynb 2019-09-24
query_has_unique 2019-10-07
query_has_unique.zip 2019-10-07
examplefolder 2019-10-07
week1classdemo.ipynb 2019-09-25
week2notes.ipynb 2019-09-30
file creation project.ipynb 2019-09-23
Week3Notes.ipynb 2019-10-08
README.md 2019-09-20
test.py 2019-10-02
coolstuff.py 2019-10-02
Last updated
Was this helpful?