Python UnZipped

pythonPython is a great choice for anyone wanting to play with the increasingly popular ZIP or GZIP (not covered here) file formats, and as usual Python makes it surprisingly fun/easy! Don’t believe me? In this article we’ll look at creating, extracting, and adding to Zip archives using Pythons standard zipfile module and defining a set of functions you can use with your own Zip files; ending with an example which recursively scans a Zip file and sub-archives.


 

Introduction

Python is a great choice for anyone wanting to play with the increasingly popular ZIP or GZIP (not covered here) file formats, and as usual Python makes it surprisingly fun/easy!

Don’t believe me?

In this article we’ll look at creating, extracting, and adding to Zip archives using Pythons standard zipfile module and defining a set of functions you can use with your own Zip files; ending with an example which recursively scans a Zip file and sub-archives.

This does require some prior knowledge of Python, so if you have never used Python before you should read Vikram Vaswani’s Python 101 before reading this.

Creating Our Zip File

Lets jump right in and create our Zip file, then add a few sample files to it.

[code]
>>> import zipfile
>>> zip = zipfile.ZipFile('Python.zip', 'w')
>>> zip.write('file1.txt')
>>> zip.write('file2.gif')
>>> zip.close()
[/code]

So we should have small Zip containing two files (file.txt and file.gif) sitting in our current working directory. Easy enough and pretty neat overall. How about something a little more interesting? Adding all the .txt files in a directory to our archive, perhaps?

[code]
#!/usr/bin/env python

import os, zipfile

 

def zipdir(path, extension, zip):

for each in os.listdir(path):

if each.endswith('.txt'):

try: zip.write(path + each)

except IOError: None

if __name__ == '__main__':

 

zip = zipfile.ZipFile('Python.zip', 'w')

zipdir('', '.txt', zip)

zip.close()
[/code]

Still pretty simple. This example basically defines a new user function named zipdir(), which follows three steps..

  1. Loop though a list of all the names in our directory.
  2. If each ends with .txt try and write it to zip.
  3. If an IOError is raised skip this name and move onto the next one (this could happen if you have a folder ending with .txt)

There is a problem with this one though… because ZipFile is a file-based object, data already in our Zip gets wiped when we start writing again just like with normal files. Luckily this also means we can use other flags beside write, to show this we’ll add a few more files to our Zip using append.

[code]
>>> import zipfile
>>> zip = zipfile.ZipFile('Python.zip', 'a')
>>> zip.write('file.txt')
>>> zip.write('file.gif')
>>> zip.write('folder/file.html')
>>> zip.close()
[/code]

So we’ve seen how to create a Zip file and we’ve added a set of files to it using write and append flags, what’s next?

{mospagebreak title=Going Full Monty with the Zip File}

You guessed it, where going to unzip them. (Using our file.txt and file.gif sample files again just to make things easier to follow.)

[code]
>>> import zipfile
>>> zip = zipfile.ZipFile('Python.zip', 'r')
>>> file(' 'file.txt', 'w').write(zip.read('file.txt'))
>>> file('file.gif', 'wb').write(zip.read('file.gif'))
>>> zip.close()
[/code]

Note: Images are binary; I’ve used the ‘wb’ (write binary) flag for the second file although this may not always be necessary.

Ok we just extracted two files from our Zip, and in only five lines! And this example is fine if you know the names of the files you want to extract, but what if you don’t?

[code]
#!/usr/bin/env python

 

import zipfile

 

def inzip(filename, zip):

return filename in zip.namelist()

 

if __name__ == '__main__':

 

zip = zipfile.ZipFile('Python.zip', 'r')

inzip('file.txt', zip)

zip.close()
[/code]

Short and sweet just like its name, this function simply returns True or False (True in the example above) if the string filename is in the list of files in the zip.

The namelist() method (along with its brothers and sisters) provides information about a Zip file, namelist() itself returns a list of all the files within a Zip. For example:

[code]
>>> zip.namelist()

['1.txt', '2.txt', '3.txt', 'file.gif', 'file.txt', 'folder/file.html', ‘folder/’]
>>>
[/code]

You’ve checked the contents of the file and you want to get extracting. Rather than sitting there typing names into your Python shell one by one (which lets face it is pretty boring), I’m going to show you how.

[code]
#!/usr/bin/env python

 

import zipfile

 

def unzip(zip):

for name in zip.namelist():

file(name, 'wb').write(zip.read(name))

 

if __name__ == '__main__':

 

zip = zipfile.ZipFile('Python.zip', 'r')

unzip(zip)

zip.close()
[/code]

This is fine for a flat Zip files (those without subfolders) but it’d just barf all over the screen if we passed a name that included a none existent directory to file(), there are two choices:

  1. Remove everything before the filename, simple yes, but you could end up two files named the same and we all know what happens next.
  2. We can create all the directories before unzipping our file, which is a lot safer, though requires a little more work…

Of course we’re going for the second choice, not only is it the most interesting but also the most Pythonic!

To borrow from another TV snake (Black Adder) “I have a cunning plan!”

[code]
#!/usr/bin/env python

 

import os, zipfile

 

def unzip(path, zip):

 

isdir = os.path.isdir
join = os.path.join
norm = os.path.normpath
split = os.path.split

 

for each in zip.namelist():

if not each.endswith('/'):

root, name = split(each)

directory = norm(join(path, root))

if not isdir(directory):

os.makedirs(directory)

file(join(directory, name), 'wb').write(zip.read(each))

 

if __name__ == '__main__':

 

zip = zipfile.ZipFile('Python.zip', 'r')

unzip('', zip)

zip.close()
[/code]

Don’t panic! This is a little more advanced than the other functions we’ve created so far and there’s actually quite a lot going on inside it so we’ll go though step by step; you might have noticed the os module sitting at the core of this example too.

The first part of this function is pretty strange as functions go; basically all it does is create some local copies of some of the functions from os.path (to improve performance).

Next we loop though each of the names in zip.namelist() and if the name isn’t a directory (end with a forward slash).

[code]
>>> for each in zip.namelist(): print each
1.txt
2.txt
3.txt
file.gif
file.txt
folder/file.html
folder/
[/code]

The path is split from the filename and assigned to root, name. Our next line creates a variable named directory that holds the new path for the file, which is simply path and root joined.

Note: This won’t work with absolute paths like C:FolderFolderFile.ext; in this case the file should be extracted to that location (tested on windows). For this example I’m assuming that absolute paths won’t be used.

All we do then is check if the directory tree does NOT already exist before attempting to create it and extracting our file. Overall, it’s a very small function (especially compared to some other languages).

{mospagebreak title=Listings in the Key of Zip}

Finally were going to look at a function that uses recursion to move though a Zip file and its sub archives; returning a complete list of all none Zip files. But what’s the point in this? Let’s say for instance that you want to count the number of files in a Zip, this way all you have to do is call len() on our function. Enough talk lets see this function in action.

[code]
#!/usr/bin/env python

 

import os, zipfile

 

def rezipe(path, files = []):

 

zip = zipfile.ZipFile(path)

 

for name in zip.namelist():

if name.endswith('.zip'):

file(name, 'wb').write(zip.read(name))

rezipe(name)

os.remove(name)

elif not name.endswith('/'):

files.append(name)

 

return files

 

print len(rezipe('Python.zip'))
[/code]

But wait, there’s… uhm… nevermind. That’s it. Sorry if I hyped that last one up a bit.

Unlike our other examples rezipe() opens the Zip file itself instead of using one we’ve already opened. It then loops though zip.namelist() and if name ends with .zip we extract it to the current working directory and call rezipe() on it, removing it when rezipe() is complete. The next part simply says if name isn’t a Zip file or a folder append it to the end of our list.

If your anything like me by now I’m sure you can see the potential this little guy (in particular) has and what this means for your Zip files!

If you’ve found this article interesting and you want to learn more about Python or some of the subjects covered here:

http://www.python.org/ – Python homepage
http://www.python.org/doc/2.3.2/tut/tut.html – Python tutorial
http://www.python.org/doc/2.3.2/modindex.html – Python module index
http://www.python.org/doc/2.3.2/lib/module-zipfile.html – The zipfile module

Note: All the examples shown and discussed in this article where tested on Windows XP with Python 2.3 and are meant only as examples.

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye