Python UnZipped - Going Full Monty with the Zip File (
Page 2 of 3 )
You guessed it, where going to unzip them. (Using our file.txt and file.gif sample
files again just to make things easier to follow.)
<br />>>> import zipfile
<br />>>> zip = zipfile.ZipFile('Python.zip', 'r')
<br />>>> file(' 'file.txt', 'w').write(zip.read('file.txt'))
<br />>>> file('file.gif', 'wb').write(zip.read('file.gif'))
<br />>>> zip.close()
<br />
Note: Images are binary; I’ve used the 'wb' (write binary) flag for the second
file although this may not always be necessary.
Ok we just extracted two files from our Zip, and in only five lines! And this
example is fine if you know the names of the files you want to extract, but what
if you don't?
<br />#!/usr/bin/env python</p>
<p> </p>
<p>import zipfile</p>
<p> </p>
<p>def inzip(filename, zip):</p>
<p>return filename in zip.namelist()</p>
<p> </p>
<p>if __name__ == '__main__':</p>
<p> </p>
<p>zip = zipfile.ZipFile('Python.zip', 'r')</p>
<p>inzip('file.txt', zip)</p>
<p>zip.close()
<br />
Short and sweet just like its name, this function simply returns True or False
(True in the example above) if the string filename is in the list of files in
the zip.
The namelist() method (along with its brothers and sisters) provides information
about a Zip file, namelist() itself returns a list of all the files within a Zip.
For example:
<br />>>> zip.namelist()</p>
<p>['1.txt', '2.txt', '3.txt', 'file.gif', 'file.txt', 'folder/file.html', ‘folder/’]
<br />>>>
<br />
You’ve checked the contents of the file and you want to get extracting. Rather
than sitting there typing names into your Python shell one by one (which lets
face it is pretty boring), I’m going to show you how.
<br />#!/usr/bin/env python </p>
<p> </p>
<p>import zipfile </p>
<p> </p>
<p>def unzip(zip): </p>
<p>for name in zip.namelist(): </p>
<p>file(name, 'wb').write(zip.read(name)) </p>
<p> </p>
<p>if __name__ == '__main__': </p>
<p> </p>
<p>zip = zipfile.ZipFile('Python.zip', 'r') </p>
<p>unzip(zip) </p>
<p>zip.close()
<br />
This is fine for a flat Zip files (those without subfolders) but it’d just barf
all over the screen if we passed a name that included a none existent directory
to file(), there are two choices:
- Remove everything before the filename, simple yes, but you could end up two files
named the same and we all know what happens next.
- We can create all the directories before unzipping our file, which is a lot safer,
though requires a little more work…
Of course we’re going for the second choice, not only is it the most interesting
but also the most Pythonic!
To borrow from another TV snake (Black Adder) "I have a cunning plan!"
<br />#!/usr/bin/env python</p>
<p> </p>
<p>import os, zipfile</p>
<p> </p>
<p>def unzip(path, zip):</p>
<p> </p>
<p>isdir = os.path.isdir
<br />join = os.path.join
<br />norm = os.path.normpath
<br />split = os.path.split</p>
<p> </p>
<p>for each in zip.namelist():</p>
<p>if not each.endswith('/'):</p>
<p>root, name = split(each)</p>
<p>directory = norm(join(path, root))</p>
<p>if not isdir(directory):</p>
<p>os.makedirs(directory)</p>
<p>file(join(directory, name), 'wb').write(zip.read(each))</p>
<p> </p>
<p>if __name__ == '__main__':</p>
<p> </p>
<p>zip = zipfile.ZipFile('Python.zip', 'r')</p>
<p>unzip('', zip)</p>
<p>zip.close()
<br />
Don’t panic! This is a little more advanced than the other functions we've created
so far and there’s actually quite a lot going on inside it so we'll go though
step by step; you might have noticed the os module sitting at the core of this
example too.
The first part of this function is pretty strange as functions go; basically
all it does is create some local copies of some of the functions from os.path
(to improve performance).
Next we loop though each of the names in zip.namelist() and if the name isn’t
a directory (end with a forward slash).
<br />>>> for each in zip.namelist(): print each
<br />1.txt
<br />2.txt
<br />3.txt
<br />file.gif
<br />file.txt
<br />folder/file.html
<br />folder/
<br />
The path is split from the filename and assigned to root, name. Our next line
creates a variable named directory that holds the new path for the file, which
is simply path and root joined.
Note: This won't work with absolute paths like C:FolderFolderFile.ext; in this
case the file should be extracted to that location (tested on windows). For this
example I'm assuming that absolute paths won’t be used.
All we do then is check if the directory tree does NOT already exist before attempting
to create it and extracting our file. Overall, it's a very small function (especially
compared to some other languages).