Week 3
Files, operating system operations, and time
making file names with leading zeros, zfill
flat text files chapter 10
csv files PCB chapter 6.1
json PCB chapter 6.2
break
os and sys modules, why these are important
shutil
time
datetime
time conversion
Readings for this week:
https://docs.python.org/3/library/pathlib.html (skim through the explainations of things)
Python Crash Course chapter 10
Python Cook Book (https://vufind.carli.illinois.edu/vf-uiu/Record/uiu_8507281) sections 6.1 and 6.2
https://docs.python.org/3/library/os.html focus on reading through the intro descriptions for each major section. There are about 9 of them, and you can see them on the left side panel of the page. Don't worry about reading through all the functions.
Automate the Boring Stuff with Python (https://vufind.carli.illinois.edu/vf-uiu/Record/uiu_8500455) chapter 15
File paths and pathlib
You may or may not have seen that macs and windows have different formats for file paths. This means that if you are trying to programmatically generate a path that will work on multiple operating systems, you have to do some strange things.
Also, even if you are not worring about that, trying to just get the file name out of an arbitrarily deep file path usually would result in a ton of strange string manipulations. Also, if you were trying to get the root name of a file out so you could add a different file extension, you would have to do some even worse manipulation.
This is where pathlib comes in. It understands the file system from an object oriented perspective, so you are operating on and extracting information from the file path using object attributes and methods rather than manipulating a string without conceptuale awareness of the core content meaning.
First, import pathlib. This will give you access to the Path() function. You can pass this function a full file name or a directory name. Passing it a directory will give you access to the glob() method for searching for file paths.
I'm going to give this the . which represents my current directory.
The glob() method of a Path object allows you to provide pattern searches for files within that directory. Put these in as string. This returns a generator object, which is beneficial for when you have a huge number of files. We can go ahead and cast it to a list to see the contents, or we can loop through it to see them individually. Now that we have multiple here, we can add a few checks about what they are.
Warning, generators will only allow you to loop through them once before their contents go away.
There are many useful methods for getting metadata about object at question.
When you have a directory, you can use several operators to build up a new path.
You can also ask for more informaiton about it. https://pbpython.com/pathlib-intro.html has several great diagrams
Sometimes you may want to just find the files, or just the txt files.
Creating file names and zfill
File names, at their heart, are just strings. You should use the pathlib module to create them, but you may still need to use string manipulation to craft the contents. For example, you should use pathlib operations to connect file names and folder locations. However, you may need to programmatically make folder names or the contents of a file name.
You can use basic string manipulation to create this content, then feed that into a pathlib object.
I'm using core string operations and casting to concat a file name together, including just the string value of .txt at the end.
So this is fine, except they will sort strangely. It would be nice to add leading zeros into the numbers so they have the sameoverall string length.
You can use the zfill string method to create the padded numbers. The syntax here is tricky, so pay close attention.
str.zfill(int) so you call it on the string value, and then give an integer number representing how many leading places you want.
Once you construct the file name, you can also cast it as a pathlib object and construct a path with it. It might seem a little tricky, but in the absolute print value here you can see that it is doing a lot of work.
You can also have it take an action for you, to make a folder and do a bunch of other stuff.
Reading in flat text files
There are so many ways to this.
The open() function covers alot. Once you have a file open in a read mode or a write mode, you can access those specific methods. Again, there are many and this will not be exhaustive.
Reading csv files
This has a very specific pattern to follow, but does use the standard library csv module.
This will read a csv in as a 2D list.
You pass your file IO object to the csv.writer() function, which will created an iterative parser with a cursor. You can read individual lines with the next() function, and then use a list comprehension to read the rest of it.
This module has a few other mechanisms for reading in, but this is the most universal pattern for loading a 2D structure.
Writing a CSV file
This uses a similar pattern. You need to already have a 2D array of lists ready to go. You can build up one like this:
Reading json files
The bulit in json module will parse a json file into a python dictionary. There are two main functions for loading things.
.load()for parsing a fileio object into a dictionary.loads()for parsing a string (with a json file in it) into a dictionary.
They function just about the same way.
Writing out a json file
Once you have a dictionary loaded, you might want to write it out as a json file. Similarly, there are two methods:
.dump()for file io objects, will write the new json structure out to a file.dumps()will dump the dict object out as a string, so you can save it as a variable if needed. A little rarer to use.
But these take more arguments. Let's start with the examples from last week for dicts.
You can also specify the indent level, which I like to be about 4. This makes the file much more readable. Otherwise it puts all the values on a single line.
os and sys modules
These are the modules that are great for working with the file system. Many of the functions will allow you to perform system operations no matter which system it is working on, but there iwll be some platform differences for a few of them. I'm going to show you the pretty universal ones. Not only will this allow you to make directories and move files, but you can also get system information about the file. Knowing the ones in here can be really valuable. You can change you current working directory, create and delete folders, etc. There are strong overlaps between what pathlib and os can do, but os has been around longer.
For our purposes, the pathlib library takes care of many of these actions.
A portion of the os module is os.path, which has file pathname manipulations. https://docs.python.org/3/library/os.path.html Again, our use of pathlib replaces many of these, but there are some lower level system manipulations you would want to know about if you are doing things with file processing.
os.path.getmtime() is the function that we are going to highlight.
This function will take a path and give you the time the file at that path was last modified. This can be valuable if you are trying to sort through a ton of files and are looking for specific ones made during a certain timeframe. Or to see if the file you are working with was recently changed. This is a module within a library in python, so the call is also a little different.
shutil
This has a high level set of file operations. Maninly, this is where the various copy functions are. There are many because they all do a variety of different operations, including differences in metadata. For example, you can copy the file object over with or without the original metadata, or you can copy the permissions and other metadata from one file to another. This module also has the move functions for moving files.
There are also system information tools, such as geting the disk space and executable information.
os module
But what the heck are these numbers?? They are the epoch system time that it was last updated, so you'll need to convert this to normal human time.
time module
This is a great module! This has methods of getting the current time, counting a certain amount of time, and forcing Python to pause execution for a certain amount of time.
That last one is important for API work and webscraping. The sleep() function takes an integer value representing the number of seconds that you want the program to pause execution.
Time conversions
The time module also has time converstion functions.
This is the workhorse function for formatiting and parsing time strings. https://docs.python.org/3/library/time.html#time.strftime
So no, I'm not writing this at 2 in the morning. This is reporting in GMT.
You can read all about the different formatting options here: https://docs.python.org/3/library/time.html#time.strftime
Importantly, this can also convert epoch time to human time.
Last updated
Was this helpful?