HomeXML Page 2 - XML Matters: Practical XML Data Design and Manipulation for Voting Systems
EVM2003 and XML - XML
EVM2003 brings XML to the democratic process. In this installment, David discusses his practical experiences developing interrelated XML data formats for the EVM2003 Free Software project to develop voting machines that produce voter-verifiable paper ballots. Some design principles of format subsetting emerge. In addition, David looks at how an application-specific meaning for XML document equivalence can be programmed, and why canonicalization is insufficient. (This intermediate-level article was first published by IBM developerWorks, June 28, 2004, at http://www.ibm.com/developerWorks.)
Here's how the EVM2003 system works. EVM2003 uses or produces three closely related types of XML files:
electronic ballot image (EBI) file
reconstructed ballot image (REBI) file
At the initial stage, an election at a given place/precinct and date -- and perhaps for a specific party ballot -- is data driven by a ballot file. In production, this file would be uniquely named to indicate place, date, party (if applicable), and natural language (where multilingual ballots are provided). For example, a production ballot XML data file might have a name like election-20041102-US-MA-Franklin-2390-Dem-EN.xml or election-20041102-US-MA-Hampshire-3451-Rep-ES.xml. This exact naming convention is merely my suggestion, but it indicates the type of information that I would expect to find (all of the identifying information is also contained in the content of the data files, so the XML might be stored in, for example, an RDBMS rather than on a file system).
After a voter has voted, an electronic ballot image (EBI) is stored as an XML file on the voting station; the collection of stored EBIs is written to removable media, such as a CD-R, in randomized order to preserve voter anonymity. The information contained in an EBI should exactly match the information represented on an official printed ballot, though a ballot would contain various formatting such as font choices, placement, rules, and boxes to make verification visually easy for voters.
During the reconciliation process, poll workers scan the paper ballots to produce reconstructed ballot images (REBIs). These REBIs might not be byte-wise identical to their corresponding EBIs, even if no error occurs. For example, in the demo system (and perhaps for production), we decided to represent votes as barcodes to make scanning simpler; however, our barcode only records the fact of a write-in vote, not the write-in name itself. Jurisdictions differ greatly as to when and whether write-in names will be counted, so this is a fine way to establish boundary conditions for requiring visual examination of write-in names on ballots.
EBIs and REBIs have a special, and rather elegant, relation to ballot files. An EBI or REBI is almost a proper subset of a ballot file -- that is, you create an EBI simply by removing non-selected choices from a ballot file and renaming the root element from <ballot> to <cast_ballot>. One effect of this relation is that a <cast_ballot> XML file is a simpler version of a ballot XML file. The subset relation I state is violated in a couple minor ways within the current CVS snapshot of EVM2003, but it could easily be enforced with minimal modification. For example, in ballot-election.xml, <contest> elements have an allow_writein element that is superfluous to, and not included in, EBIs or REBIs. And symmetrically the root element <cast_ballot> contains an attribute source to distinguish EBIs from REBIs (for example, voting_machine versus ballot_scan).
If the proper subset relation is enforced in production systems, it provides the somewhat nice property that ballot files and EBIs conform to an identical DTD/schema, providing a basic consistency guarantee. Even if we decide that precise subsetting is unnecessary, the close similarity of EBIs and ballot XML files makes debugging and development easier on developers (and on our frail memories).
Visit developerWorks for thousands of developer articles, tutorials, and resources related to open standard technologies, IBM products, and more. See developerWorks.