PDFs with PHP part 1

This tutorial is intended for the PHP programmer who needs to incorporate PDF generation in a script without using external libraries such as PDFlib (often unavailable due to licensing restrictions or lack of funds). This tutorial will cover only the basics, which hopefully will give you a good start. PDF has a vast set of features and possibilities which can not be covered in a short tutorial. If you need more than what is covered here, you might want to look at some similar yet more complete solutions available, such as the excellent work done by Olivier Plathey on the FPDF class (http://fpdf.org), on which this tutorial is based. Of course, you may wish to take your own route and for that there is also the PDF reference (be warned: it’s 1,172 pages!) Basic familiarity with using PHP classes is assumed. Knowledge of PDF file structure is not required, as all references are explained.

Overview
PDF files are, after all, just plain text files with specific markup syntax that describes what should happen to objects within the document, such as text and images. It follows that, armed with some PDF logic; anyone can create a PDF file. In this tutorial you will be shown the basic features of the PDF language, to enable you to put together your own PDF document.

Learning Objectives
At the end of this first part of the tutorial you should be able to put together a simple PDF class that can:

  • Create and output a PDF document
  • Set up page size and orientation
  • Insert simple text into the page
  • Handle simple font attributes
  • Activate compression.

{mospagebreak title=Prerequisites}
You need to have a fully functional PHP install (either PHP 4 or PHP 5 will work here) and a running web server to output the PDF file from your script.

Acrobat Reader, XPDF, or an equivalent is required to see the results of your work.

You do not need any external library, either separate or compiled into PHP, to generate your PDF files.

How It Works

The best approach is to set the code up as a class. This allows for greater flexibility later.

The primary (public) methods deal with the main operations on a PDF document: setting it up, adding pages, setting font, adding text, activating compression, and output of the document.

We shall review the various methods and features of the PDF language, and then eventually put it all together as one class.

Setting up Class Variables
We will need a few class variables to keep track of output, pages, objects, settings, etc.

The following is a list of the essential variables, with brief comments. You will later see each one of these variables in its context, which will give you a better idea of how they are used. For now just briefly get yourself acquainted with them.
var $_buffer = ”;          // Buffer holding in-memory PDF. var $_state = 0;            // Current document state.
var $_page = 0;             // Current page number.
var $_n = 2;                // Current object number.
var $_offsets = array();    // Array of object offsets.
var $_pages = array();      // Array containing the pages.
var $_w;                    // Page width in points.
var $_h;                    // Page height in points
var $_fonts = array();      // An array of used fonts.
var $_font_family = ”;     // Current font family.
var $_font_style = ”;      // Current font style.
var $_current_font;         // Array with current font info. var $_font_size = 12;       // Current font size in points.
var $_compress;             // Flag to compress or not.
var $_core_fonts = array(‘courier’=> ‘Courier’,
                         ‘courierB’=> ‘Courier-Bold’,
                         ‘courierI’=> ‘Courier-Oblique’,
                         ‘courierBI’=> ‘Courier-BoldOblique’,
                         ‘helvetica’=> ‘Helvetica’,
                         ‘helveticaB’=> ‘Helvetica-Bold’,
                         ‘helveticaI’=> ‘Helvetica-Oblique’,
                         ‘helveticaBI’=> ‘Helvetica-BoldOblique’,
                         ‘times’=> ‘Times-Roman’,
                         ‘timesB’=> ‘Times-Bold’,
                         ‘timesI’=> ‘Times-Italic’,
                         ‘timesBI’=> ‘Times-BoldItalic’,
                         ‘symbol’=> ‘Symbol’,
                         ‘zapfdingbats’ => ‘ZapfDingbats’);

{mospagebreak title=The Factory Method}
This method will give us the PDF object with which we can build our document. It sets the initial values for the document, such as page orientation and size, and returns the object.
[code]
function &factory($orientation = 'P', $format = 'A4')
{
    /* Create the PDF object. */
    $pdf = &new PDF();

    /* Page format. */
    $format = strtolower($format);
    if ($format == 'a3') {           // A3 page size.
        $format = array(841.89, 1190.55);
    } elseif ($format == 'a4') {     // A4 page size.
        $format = array(595.28, 841.89);
    } elseif ($format == 'a5') {     // A5 page size.
        $format = array(420.94, 595.28);
    } elseif ($format == 'letter') { // Letter page size.
        $format = array(612, 792);
    } elseif ($format == 'legal') {  // Legal page size.
        $format = array(612, 1008);
    } else {
        die(sprintf('Unknown page format: %s', $format));
    }   
    $pdf->_w = $format[0];
    $pdf->_h = $format[1];

    /* Page orientation. */
    $orientation = strtolower($orientation);
    if ($orientation == 'l' || $orientation == 'landscape') {
        $w = $pdf->_w;
        $pdf->_w = $pdf->_h;
        $pdf->_h = $w;
    } elseif ($orientation != 'p' && $orientation != 'portrait') {
        die(sprintf('Incorrect orientation: %s', $orientation));
    }

    /* Turn on compression by default. */
    $pdf->setCompression(true);

    return $pdf;
}
[/code]
Also in this method we turn on compression by default. This makes the output PDF files a lot smaller.

The actual setCompression() method is as follows:
[code]
function setCompression($compress)
{   
    /* If no gzcompress function is available then default to
     * false. */
    $this->_compress = (function_exists('gzcompress') ? $compress : false);
}
[/code]
However, whilst learning you may wish to explicitly turn off compression, so that you can open your created PDF document with a text editor and see easily what is happening.

{mospagebreak title=Writing Content}


We will not be writing directly to the PDF file, the content is going to be buffered as it is created. Only after the PDF document is closed, and after some rearranging, will it be sent as a PDF file to the browser for download. So, as a first step, we will need to create a function to buffer the output. As it will be used internally within the PDF class, let’s make it a private function.
[code]
function _out($s)
{
    if ($this->_state == 2) {
        $this->_pages[$this->_page] .= $s . "n";
    } else {
        $this->_buffer .= $s . "n";
    }
}
[/code]
Here you can see straight away a number of class variables being used. Let’s take a moment to work through them. The $_state variable keeps track of four different states that the PDF document can be in:
  • 0 = initialized
  • 1 = opened but no page opened
  • 2 = page opened
  • 3 = document closed

    The state is important in this method for determining how to buffer the output. If there is an open page, output is sent to the $_pages array. For any other state it is sent to the main buffer held in $_buffer variable.

    This distinction is necessary because page content is handled as a separate object within PDF and hence will need extra work on it when it is finally written to the main buffer.

    As you will later see, the $_state variable is used elsewhere to similarly add logic according to the document state.

    It is recommended to use the newline (“n”) following each output, as it is required in some cases (for example certain PDF instructions have to begin on a new line). Also, remember that PDF is case sensitive, so always follow the exact spelling of PDF syntax.

    Starting the Document
    The following two lines of code are required for initializing the document. These two lines must be called before any output:
    [code]
    function open()
    {   
        $this->_state = 1;          // Set state to initialized.
        $this->_out('%PDF-1.3');    // Output the PDF header.
    }
    [/code]
    The second line writes the initial header that is required to identify the file and the PDF version being followed. The version number helps PDF readers handle the file properly.

    This tutorial will not be covering anything exotic, so you might as well stick with version 1.3. If you do start incorporating the more advanced PDF features found in 1.5 you will need to change the version number.

    {mospagebreak title=Adding a Page}
    We can now add a page to our document. The following code is quite straightforward.

    One point worth noting is the $_font_family check. For any text to be written to a page we need to set the font. However, we have to take into account the possibility that the font was set before any page was added, or that the font was set for a previous page in the current document. Either way we need to check the font class variable, and output the font information to the page. The function setFont() is used for this, which we shall cover later.
    [code]
    function addPage()
    {   
        $this->_page++;                   // Increment page count.
        $this->_pages[$this->_page] = ''; // Start the page buffer.
        $this->_state = 2;                // Set state to page
                                          // opened.
        /* Check if font has been set before this page. */
        if ($this->_font_family) {
            $this->setFont($this->_font_family, $this->_font_style, $this->_font_size);
        }
    }
    [/code]

    Output of Simple Text
    As mentioned earlier, before any text can be output, font information must be supplied. We therefore need a function to define which font will be used. PDF specifications offer a core set of fonts which can be used with no extra information supplied to the PDF reader. You can also embed your own custom fonts into a PDF file, but for this you need to create font definitions, which are beyond the scope of this tutorial.

    For now, limit your output to the following fonts:

    • Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique;
    • Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique;
    • Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic;
    • Symbol;
    • ZapfDingbats.

    The following method sets the font family name, and also (optionally) a style such as bold, italic or both, and a font size.

    [code]
    function setFont($family, $style = '', $size = null)
    {
        $family = strtolower($family);
        if ($family == 'arial') {               // Use helvetica.
            $family = 'helvetica';
        } elseif ($family == 'symbol' ||        // No styles for
                  $family == 'zapfdingbats') {  // these two fonts.
            $style = '';
        }

        $style = strtoupper($style);
        if ($style == 'IB') {                   // Accept any order
            $style = 'BI';                      // of B and I.
        }

        if (is_null($size)) {                   // No size specified,
            $size = $this->_font_size;          // use current size.
        }

        if ($this->_font_family == $family &&   // If font is already
            $this->_font_style == $style &&     // current font
            $this->_font_size == $size) {       // simply return.
            return;
        }

        /* Set the font key. */
        $fontkey = $family . $style;

        if (!isset($this->_fonts[$fontkey])) {  // Test if cached.
            $i = count($this->_fonts) + 1;      // Increment font
            $this->_fonts[$fontkey] = array(    // object count and
                'i'    => $i,                   // store cache.
                'name' => $this->_core_fonts[$fontkey]);
        }

        /* Store current font information. */
        $this->_font_family  = $family;
        $this->_font_style   = $style;
        $this->_font_size    = $size;
        $this->_current_font = $this->_fonts[$fontkey];

        /* Output font information if at least one page has been
         * defined. */
        if ($this->_page > 0) {
            $this->_out(sprintf('BT /F%d %.2f Tf ET', $this->_current_font['i'], $this->_font_size));
        }
    }
    [/code]
    The following method enables easier changing between font sizes, without having to go through the whole setFont() function.
    [code]
    function setFontSize($size)
    {
        if ($this->_font_size == $size) {   // If already current
            return;                         // size simply return.
        }

        $this->_font_size = $size;          // Set the font.

        /* Output font information if at least one page has been
         * defined. */
        if ($this->_page > 0) {
            $this->_out(sprintf('BT /F%d %.2f Tf ET',
                                $this->_current_font['i'],
                                $this->_font_size));
        }
    }
    [/code]
    {mospagebreak title=And Now to Output the Text}
    You will need to pass to this method the x/y position of your text, as well as the actual text.
    [code]
    function text($x, $y, $text)
    {
        $text = $this->_escape($text);    // Escape any harmful
                                          // characters.

        $out = sprintf('BT %.2f %.2f Td (%s) Tj ET',
                       $x, $this->_h - $y, $text);
        $this->_out($out);
    }
    [/code]
    Note how for simplicity we allow the user to specify the y position measured from the top edge of the paper (whereas in fact PDF measures from the bottom). To achieve this we subtract the y value from the page height in the actual code ($this->_h – $y).

    Also note how actual text needs to be escaped to ensure that it is safely inserted into the file. Since text in the PDF file is denoted using parentheses around it, any parentheses in the text itself should be escaped.

    The best solution is to create a separate function to handle any cases when text needs to be inserted safely. This will be used a couple of times in this tutorial, but it will also be useful if you add more functionality to this class later.
    [code]
    function _escape($s)
    {   
        $s = str_replace('\', '\\', $s);   // Escape any '\'
        $s = str_replace('(', '\(', $s);     // Escape any '('
        return str_replace(')', '\)', $s);   // Escape any ')'
    }
    [/code]

    {mospagebreak title=Closing the Document}
    The closing function is a bit more involved: we need to clean up a bit, set some PDF tags, and create a few references. This is the code that does most of the work in setting up the buffered content to finally look like a PDF file.

    Begin by checking that there is at least one page, and setting the state to “page closed”.

    [code]
    function close()
    {
        if ($this->_page == 0) {    // If not yet initialised, add
            $this->addPage();       // one page to make this a valid
        }                           // PDF.

        $this->_state = 1;          // Set the state page closed.
    [/code]
    Now output the couple of objects that we have been buffering separately: pages and other resources. We shall go through each method later.
       
    [code]
    /* Pages and resources. */
        $this->_putPages();
        $this->_putResources();
    [/code]

    Include some document information. PDF treats this information as a separate object, and here we introduce the _newobj() function. You could add other information to this section, such as author, subject, title, keywords, etc. For now we’ll just put in the producer.
       
    [code]
    /* Print some document info. */
        $this->_newobj();
        $this->_out('<<');
        $this->_out('/Producer (My First PDF Class)');
        $this->_out(sprintf('/CreationDate (D:%s)',
                            date('YmdHis')));
        $this->_out('>>');
        $this->_out('endobj');
    [/code]
    The next section is the PDF catalog, which defines how the document will initially look in the reader. You can take this as it is for now. There’s nothing exciting going on here, but it is needed.
       
    [code]
    /* Print catalog. */
        $this->_newobj();
        $this->_out('<<');
        $this->_out('/Type /Catalog');
        $this->_out('/Pages 1 0 R');
        $this->_out('/OpenAction [3 0 R /FitH null]');
        $this->_out('/PageLayout /OneColumn');
        $this->_out('>>');
        $this->_out('endobj');
    [/code]
    The cross reference section is very important. It brings into use the $_offset array that has appeared before. PDF stores a byte offset reference to all objects in the document. This allows the PDF reader to read objects in a random access way, without having to load the entire document.
       
    [code]
    /* Print cross reference. */
        $start_xref = strlen($this->_buffer); // Get the xref offset.
        $this->_out('xref');                  // Announce the xref.
        $this->_out('0 ' . ($this->_n + 1));  // Number of objects.
        $this->_out('0000000000 65535 f ');
        /* Loop through all objects and output their offset. */
        for ($i = 1; $i <= $this->_n; $i++) {
            $this->_out(sprintf('%010d 00000 n ', $this->_offsets[$i]));
        }
    [/code]
    Each object is printed on a separate line. The offset for each object is printed as a 10 digit integer, followed by a generation number, and an in-use/free indicator. You need not worry about either the generation number or the in-use/free indicator, since those will only be used if editing PDF files and deleting objects. Since we shall be generating from scratch, the generation number will always be 00000 and the in use indicator will be set to ‘n’.

    {mospagebreak title=The Trailer}The final lines to be printed are the PDF trailer.
    [code]
        /* Print trailer. */
        $this->_out('trailer');
        $this->_out('<<');
        /* The total number of objects. */
        $this->_out('/Size ' . ($this->_n + 1));
        /* The root object. */
        $this->_out('/Root ' . $this->_n . ' 0 R');
        /* The document information object. */
        $this->_out('/Info ' . ($this->_n - 1) . ' 0 R');
        $this->_out('>>');
        $this->_out('startxref');
        $this->_out($start_xref);  // Where to find the xref.
        $this->_out('%%EOF');
        $this->_state = 3;         // Set the document state to
                                   // closed.
    }
    [/code]
    Now let’s look at the new functions we’ve met in this document closing method. The _newobj() function above is used simply to keep track of objects added to the document.
    [code]

    function _newobj()
    {
        /* Increment the object count. */
        $this->_n++;
        /* Save the byte offset of this object. */
        $this->_offsets[$this->_n] = strlen($this->_buffer);
        /* Output to buffer. */
        $this->_out($this->_n . ' 0 obj');
    }
    [/code]

    The _putPages() function handles the output of the page content. Here we go through the $_pages array that has been buffering the page content separately, and output it to the main buffer.

    {mospagebreak title=Compression}
    If compression is required page content will be passed through the gzcompress() function before being written to output. Here you also can see why the $_n object counter starts from 2. We set the root pages parent as object number 1, and later you will see that we set resources as object number 2. This is just so that it is easier for us to reference these when required, for example in each page object.

    [code]
    function _putPages()
    {
        /* If compression is required set the compression tag. */
        $filter = ($this->_compress) ? '/Filter /FlateDecode ' : '';
        /* Print out pages, loop through each. */
        for ($n = 1; $n <= $this->_page; $n++) {
            $this->_newobj();                 // Start a new object.
            $this->_out('<</Type /Page');     // Object type.
            $this->_out('/Parent 1 0 R');
            $this->_out('/Resources 2 0 R');
            $this->_out('/Contents ' . ($this->_n + 1) . ' 0 R>>');
            $this->_out('endobj');

            /* If compression required gzcompress() the page content. */
            $p = ($this->_compress) ? gzcompress($this->_pages[$n]) : $this->_pages[$n];

            /* Output the page content. */
            $this->_newobj();                 // Start a new object.
            $this->_out('<<' . $filter . '/Length ' . strlen($p) . '>>');
            $this->_putStream($p);            // Output the page.
            $this->_out('endobj');
        }

        /* Set the offset of the first object. */
        $this->_offsets[1] = strlen($this->_buffer);
        $this->_out('1 0 obj');
        $this->_out('<</Type /Pages');
        $kids = '/Kids [';
        for ($i = 0; $i < $this->_page; $i++) {
            $kids .= (3 + 2 * $i) . ' 0 R ';
        }   
        $this->_out($kids . ']');
        $this->_out('/Count ' . $this->_page);
        /* Output the page size. */
        $this->_out(sprintf('/MediaBox [0 0 %.2f %.2f]',
                            $this->_w, $this->_h));
        $this->_out('>>');
        $this->_out('endobj');
    }
    [/code]

    Let’s look at another method now: _putStream(). We could have included the code in the actual _putPages() function, however, since this method is required for other objects (such as images), we might as well separate it out now.
    [code]
    function _putStream($s)
    {
        $this->_out('stream');
        $this->_out($s);
        $this->_out('endstream');
    }
    [/code]

    {mospagebreak title=Resources}
    Whilst the content streams define the objects on a page, sometimes they need to reference objects outside the content stream. These are called resources. Resources are named objects such as font information or image data. The following method includes any resources defined so far into the main buffer. In this first part of the tutorial fonts are the only resources we will be dealing with.
    [code]
    function _putResources()
    {
        /* Output any fonts. */
        $this->_putFonts();

        /* Resources are always object number 2. */
        $this->_offsets[2] = strlen($this->_buffer);
        $this->_out('2 0 obj');
        $this->_out('<</ProcSet [/PDF /Text]');
        $this->_out('/Font <<');
        foreach ($this->_fonts as $font) {
            $this->_out('/F' . $font['i'] . ' ' . $font['n'] . ' 0 R');
        }
        $this->_out('>>');
        $this->_out('>>');
        $this->_out('endobj');
    }
    [/code]
    This last private function, called in the above _putResources() method, includes any font names into the PDF file. As we are only covering core fonts in this tutorial, nothing more than listing the font names is done here.
    [code]
    function _putFonts()
    {
        /* Print out font details. */
        foreach ($this->_fonts as $k => $font) {
            $this->_newobj();
            $this->_fonts[$k]['n'] = $this->_n;
            $name = $font['name'];
            $this->_out('<</Type /Font');
            $this->_out('/BaseFont /' . $name);
            $this->_out('/Subtype /Type1');
            if ($name != 'Symbol' && $name != 'ZapfDingbats') {
                $this->_out('/Encoding /WinAnsiEncoding');
            }
            $this->_out('>>');
            $this->_out('endobj');
        }
    }
    [/code]
    Document Output
    The following function, the actual output of the document, does nothing more than make sure the document is closed, send a few headers according to browser type, and echo the buffered data.
    [code]
    function output($filename)
    {
        if ($this->_state < 3) {    // If document not yet closed
            $this->close();         // close it now.
        }

        /* Make sure no content already sent. */
        if (headers_sent()) {
            die('Unable to send PDF file, some data has already been output to browser.');
        }

        /* Offer file for download and do some browser checks
         * for correct download. */
        $agent = trim($_SERVER['HTTP_USER_AGENT']);
        if ((preg_match('|MSIE ([0-9.]+)|', $agent, $version)) ||
            (preg_match('|Internet Explorer/([0-9.]+)|', $agent, $version))) {
            header('Content-Type: application/x-msdownload');
            Header('Content-Length: ' . strlen($this->_buffer));
            if ($version == '5.5') {
                header('Content-Disposition: filename="' . $filename . '"');
            } else {
                header('Content-Disposition: attachment; filename="' . $filename . '"');
            }
        } else {
            Header('Content-Type: application/pdf');
            Header('Content-Length: ' . strlen($this->_buffer));
            Header('Content-disposition: attachment; filename=' . $filename);
        }
        echo $this->_buffer;
    }
    [/code]

    The Script
    The complete class
    You can download the entire class for use with Part 1 of this tutorial.
    Example Use
    [code]
    <?php
    require 'PDF.php';                         // Require the lib.
    $pdf = &PDF::factory('p', 'a4');       // Set up the pdf object.
    $pdf->open();                             // Start the document.
    $pdf->setCompression(true);         // Activate compression.
    $pdf->addPage();                        // Start a page.
    $pdf->setFont('Courier', '', 8);        // Set font to arial 8 pt.
    $pdf->text(100, 100, 'First page');  // Text at x=100 and y=100.
    $pdf->setFontSize(20);                 // Set font size to 20 pt.
    $pdf->text(100, 200, 'HELLO WORLD!'); // Text at x=100 and y=200.
    $pdf->addPage();                         // Add a new page.
    $pdf->setFont('Arial', 'BI', 12);        // Set font to arial bold italic 12 pt. $pdf->text(100, 100, 'Second page');  // Text at x=100 and y=200.
    $pdf->output('foo.pdf');              // Output the file named foo.pdf
    ? >[/code] 


     

  • [gp-comments width="770" linklove="off" ]
    antalya escort bayan antalya escort bayan