Python for PDF Generation

The Portable Document Format (PDF) lets you create documents that look exactly the same on every platform. Sometimes a PDF document needs to be generated dynamically, however, and that can be quite a challenge. Fortunately, there are libraries that can help. This article examines one of those for Python.

The aim of the Portable Document Format is noble. Every page should look exactly the same on any platform, regardless of user settings. If a user were to view a certain web page on my computer and then switch the resolution, the change could be quite significant. Likewise, if a user viewed a web page or some other sort of document on Windows and then switched to Linux, things might also look very different. This is fine for a lot of things, but when pages need to be formatted in a precise way, as in books or user manuals, it becomes a problem. It is, however, a problem that can easily be solved by creating PDF documents for things like these.

PDF documents are fairly easy to create today. All someone has to do is click a button or a menu option in his favorite word processing application, or he can simply print a page to a PDF file. However, there are certain situations where this cannot be done, such as when a PDF document needs to be generated dynamically. This is where programming comes in. There are a number of libraries designed to work with a number of languages to generate PDF documents. This article will examine the ReportLab Toolkit for Python.

Obtaining the ReportLab Toolkit

The ReportLab Toolkit may be obtained from the ReportLab website:

http://www.reportlab.org/downloads.html

Extract the archive and then run the install script:

$ python setup.py install

If you plan on working with images, you’ll also need the Python Imaging Library (PIL):

http://www.pythonware.com/products/pil

If you’re on Windows, just download and run the binary installer. Otherwise, download the source, extract it and run the installation script:

$ python setup.py install

Again, you’ll need  the proper permissions.

{mospagebreak title=Putting Virtual Ink to Virtual Paper}

Now that the ReportLab Toolkit has been installed, we can begin using it immediately. Fire up Python’s interactive interpreter, and let’s get started. The first step is to import canvas from the pdfgen module:

>>> from reportlab.pdfgen.canvas import Canvas

Since the ReportLab Toolkit is set up to use the A4 paper size by default, North American developers will have to perform an extra step to gain access to the standard 8.5″ by 11″ letter size:

>>> from reportlab.lib.pagesizes import letter

The pagesizes module also contains various other paper sizes in case you ever need to use them.

Next, we have to create the PDF document in the form of a Canvas object. It requires a filename argument, which may be an absolute or relative path (based in the current working directory):

>>> pdf = Canvas(“test.pdf”)

This will create an A4 page. However, as I mentioned before, not everyone will want an A4 page. To create a page based off of the letter page size, we must set pagesize:

>>> pdf = Canvas(“test.pdf”, pagesize = letter)

Throughout this article, I’ll be using the letter page size simply because this is what I’m familiar with. In your own scripts, you’re free to use whatever you’re accustomed to.

We can now draw something in our PDF document. A string of text is probably the easiest place to start. We’ll go ahead and draw one using the Courier font in red:

>>> pdf.setFont(“Courier”, 12)
>>> pdf.setStrokeColorRGB(1, 0, 0)
>>> pdf.drawString(300, 300, “CLASSIFIED”)

An important thing to note here is that when specifying coordinates, the origin is in the lower left hand corner of the page, rather than the top left. It’s also possible to specify measurements in other units. You can use centimeters, millimeters, inches and picas. The default unit of measurement is a point, equal to one seventy-second of an inch. The extra measurements are available from reportlab.lib.units:

>>> from reportlab.lib.units import cm, mm, inch, pica

To use the measurements, simply multiply them by however many units you want. Let’s go ahead and draw a string of text one inch above the bottom of the page and one inch to the right of the page:

>>> pdf.drawString(2 * inch, inch, “For Your Eyes Only”)

Now that we have some text on the page, let’s close the page:

>>> pdf.showPage()

The showPage method closes the current page. Any further drawing will occur on the next page, though if all drawing has ended, another page will not be added. We now have to save the PDF document:

>>> pdf.save()

The ReportLab Toolkit saves our page, which you can now view. The first thing you’ll notice is that it’s rather ugly and blindly formatted. The text would look a lot better if it were centered, which is perfectly possible with the drawCentredString method (notice the British English spelling) and a bit of math. The drawCentredString method draws the text with its center on the given x-coordinate, which makes centering text easy since we only have to calculate the center of the page. We’ll also change the font size (which, by the way, has been reset along with the font face and color since we started on a new page):

>>> pdf.setFont(“Courier”, 60)
>>> pdf.setFillColorRGB(1, 0, 0)
>>> pdf.drawCentredString(letter[0] / 2, inch * 6, “CLASSIFIED”)
>>> pdf.setFont(“Courier”, 30)
>>> pdf.drawCentredString(letter[0] / 2, inch * 5, “For Your Eyes
Only”)
>>> pdf.showPage()
>>> pdf.save()

There, the result now looks slightly more pleasing.

{mospagebreak title=Text Formatting Techniques}

Drawing strings of text one at a time is fine and good for some purposes, like above when we only needed to place two distinct lines of text. However, imagine creating a bigger document. Positioning individual strings of text would get real boring real fast. Thankfully, there are other ways to position text.

Text objects allow for larger amounts of text to be added to a page, starting from a given point. You can pass any number of lines into one, and it will put space in between them. To create a text object, the beginText method must be called, which accepts the starting coordinates and returns a text object:

>>> rhyme = pdf.beginText(inch * 1, inch * 10)

Calling the textLine method allows a line of text to be added. Further calls will each be put on a new line:

>>> rhyme.textLine(“Humpty Dumpty sat on a wall.”)
>>> rhyme.textLine(“Humpty Dumpty had a great fall.”)
>>> rhyme.textLine(“All the king’s horses and all the king’s
men”)
>>> rhyme.textLine(“Couldn’t put Humpty together again.”)

After all of the necessary text is added, the text object must be drawn to the page:

>>> pdf.drawText(rhyme)

Saving and viewing the page will show that it has been formatted nicely:

>>> pdf.showPage()
>>> pdf.save()

It’s also possible to draw all of the lines in the same method call by using the textLines method. The following code will produce a result identical to the previous page, but it requires less work than the previous approach:

>>> rhyme = pdf.beginText(inch * 1, inch * 10)
>>> rhyme.textLines(“””Humpty Dumpty sat on a wall.
Humpty Dumpty had a great fall.
All the king’s horses and all the king’s men
Couldn’t put Humpty together again.”””)
>>> pdf.drawText(rhyme)
>>> pdf.showPage()
>>> pdf.save()

However, this approach still has its problems. Consider long documents that need to have standard headers, standard footers and wrapped words. Using text objects to do everything isn’t entirely practical for documents like this. A tool called Platypus (Page Layout and Typography Using Scripts) exists which can make everything a lot easier. Generating a very simple PDF document with Platypus isn’t difficult, either. We’ll use simplified tools to get introduced to Platypus. The first step is to import the required modules:

>>> from reportlab.platypus import Paragraph, SimpleDocTemplate,
Spacer
>>> from reportlab.lib.styles import getSampleStyleSheet

Now that we’ve imported what we need, it’s necessary to call getSampleStyleSheet to get what its name implies–a simple style sheet that we can use:

>>> style = getSampleStyleSheet()

Next, we create an instance of SimpleDocTemplate, which will be used to structure our document:

>>> pdf = SimpleDocTemplate(“testplatypus.pdf”, pagesize =
letter)

Just as in creating a canvas, you’re required to pass a filename, which may be either absolute or relative to the current working directory.

We now have to create what’s called a story. A story is basically a list of elements (which are termed flowables by Platypus) to be used within a page:

>>> story = []

We’ll also need some text for a paragraph (which we will repeat multiple times for an example):

>>> text = “Paragraphs are quite easy to create with Platypus, and Platypus handles things like word wrapping for you. There’s not a lot of coding work involved if you wish to create something simple.”

We’ll loop through and create twenty-five paragraphs for our document, with a half-inch space after each one. All of this will need to be added to our Platypus story:

>>> for x in xrange(25):
      para = Paragraph(text, style["Normal"])
      story.append(para)
      story.append(Spacer(inch * .5, inch * .5))

Finally, we can generate the resulting document:

>>> pdf.build(story)

Another interesting thing about Platypus it the ability to embed formatting tags inside of paragraphs. It works like this:

>>> pdf = SimpleDocTemplate(“testplatypus.pdf”, pagesize = letter)
>>> story = []
>>> for color in ["red", "green", "blue"]:
      para = Paragraph(“<font color=’%s’>This is <b>%
s</b>.</font>” % (color, color), style["Normal"])
      story.append(para)
      story.append(Spacer(inch * .5, inch * .5))     

>>> pdf.build(story)

The result is three paragraphs, each half an inch apart and in a different color.

Platypus provides a huge benefit over the pdfgen module when it comes to quickly and efficiently generating dynamic documents. Consider this script, which takes the contents of a simple text file and generates a PDF document:

import sys

from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
from reportlab.platypus import Paragraph, SimpleDocTemplate,
Spacer

if len(sys.argv) < 3:
    print “Usage: <script> textfile pdffile”
    sys.exit()
else:
    pdf = SimpleDocTemplate(sys.argv[2], pagesize = letter)
    story = []
    style = getSampleStyleSheet()
    text = file(sys.argv[1]).read()
    paragraphs = text.split(“n”)
    for para in paragraphs:
        story.append(Paragraph(para, style["Normal"]))
        story.append(Spacer(0, inch * .1))
    pdf.build(story)

The script is used like this:

$ python texttopdf.py textDocument.txt pdfToCreate.txt

{mospagebreak title=Using Graphics}

Drawing graphics in a PDF isn’t very difficult with the ReportLab Toolkit. Let’s go back to the pdfgen module and draw a few shapes on the page. Go ahead and create a new canvas to work with:

>>> pdf = Canvas(“graphics.pdf”, pagesize = letter)

We’ll now draw a line that spans across the top of the page, leaving an inch on its left, right and top:

>>> pdf.line(inch, inch * 10, inch * 7.5, inch * 10)

We have the freedom to choose whatever colors we want for graphics. Here, we set the stroke color (the color outlining an image) to black and the fill color to lime green:

>>> pdf.setStrokeColorRGB(0, 0, 0)
>>> pdf.setFillColorRGB(0, 1, 0)

Using these colors, we can draw shapes. Notice how we have the option of specifying whether or not we want to stroke or fill the image being drawn. If, however, we choose to omit these variables, the shape will be stroked but not filled:

>>> pdf.rect(inch, inch, inch * 2, inch * 2, stroke = True, fill
= True)
>>> pdf.circle(inch * 5, inch * 5, inch)
>>> pdf.circle(inch * 5, inch * 5, inch * .5, False, True)

Save the page and then take a look at the result:

>>> pdf.showPage()
>>> pdf.save()

If the shape that you want isn’t available, you can also draw it within the ReportLab Toolkit yourself, using paths. With paths, it’s possible to draw lines from point to point to construct a shape. After that, you can stroke and fill the shape. Let’s set colors, first. Our shape will be stroked with blue and filled with red:

>>> pdf.setStrokeColorRGB(0, 0, 1)
>>> pdf.setFillColorRGB(1, 0, 0)

Next, we have to create a path object to be used:

>>> path = pdf.beginPath()

Before we begin drawing, let’s move it to a starting point:

>>> path.moveTo(inch * 4, inch * 4)

Now let’s get to the drawing part. We’ll create three lines that form a triangle by using the lineTo method:

>>> path.lineTo(inch * 3, inch * 4)
>>> path.lineTo(inch * 3.5, inch * 5)
>>> path.lineTo(inch * 4, inch * 4)

When we’re done with a path, we simply have to draw it. We can specify whether we want a stroke or a fill or both:

>>> pdf.drawPath(path, True, True)

Save the page and take a look at the triangle:

>>> pdf.showPage()
>>> pdf.save()

We’re not limited to drawing everything by hand. It’s also possible to draw existing images into a PDF document. For example, let’s draw the DevShed symbol on a page:

Images can be drawn using the drawImage method of a canvas object, which also returns the image dimensions:

>>> pdf.drawImage(“devshed.jpg”, inch, inch * 10)
(34, 24)
>>> pdf.showPage()
>>> pdf.save()

If you’re using Platypus, you’ll want to use the Image flowable object instead:

>>> from reportlab.platypus import Image
>>> pdf = SimpleDocTemplate(“logoDoc.pdf”)
>>> pdf.build([Image("devshed.gif")])

Conclusion

The Portable Document Format is very popular because of its ability to render pages that look exactly the same in many environments. There are many libraries out there that deal with generating PDF documents dynamically, and the ReportLab Toolkit is one of those libraries. In this article, we examined the low-level pdfgen module, which allows text and images to be positioned precisely on a page. While this approach is fine for many purposes, it becomes impractical when dealing with larger amounts of content. For those situations, Platypus is the tool of choice. It takes care of things such as word wrapping and page breaks for us, allowing us to spend more time on other things.

You should now be familiar with the basics of the ReportLab Toolkit. From here, try to create scripts that generate dynamic PDF documents from data sources, such as text files (which we took a look at already, though you can certainly attempt to improve our script) and databases.

[gp-comments width="770" linklove="off" ]

chat