HomeXML Page 4 - Practical XML Data Design and Manipulation for Voting Systems
Comparing XML Ballots - XML
EVM2003 brings XML to the democratic process: In this article, David Mertz discusses his practical experiences developing interrelated XML data formats for the EVM2003 Free Software project to develop voting machines that produce voter-verifiable paper ballots. Some design principles of format subsetting emerge. In addition, David looks at how an application-specific meaning for XML document equivalence can be programmed, and why canonicalization is insufficient. (This intermediate-level article was first published by IBM developerWorks, 28 Jun 2004, at http://www.ibm.com/developerWorks.)
I have mentioned that the programming for EVM2003 was done in Python; in addition, the XML access is performed using my Gnosis Utilities library, specifically gnosis.xml.objectify. Using this library makes operations on ballot or EBI files particularly painless. For example, information on contests and candidates is loaded into some Python data structures with the following code:
Listing 3. ballot-election.xml to Python conversion
from gnosis.xml.objectify import make_instance ballot = make_instance(xml_data) contnames, cont = [], {} for contest in ballot.contest: name = contest.name contnames.append(name) if contest.coupled=="No": cont[name] = [select.PCDATA for select in contest.selection] if contest.allow_writein=="Yes": cont[name].append("") else: cont[name] = [] for n in range(0, len(contest.selection), 2): cont[name].append([s.PCDATA for s in contest.selection[n:n+2]]) if contest.allow_writein=="Yes": cont[name].append(["",""])
The function make_instance() generally reduces thought of the XML-ness of data formats to a single line; after that, it's just Python.
A special issue comes up in comparing EBIs with each other, or with REBIs (or rather, several related issues). As I mentioned, REBIs are not generally byte-wise identical to their corresponding EBIs because write-in names are not recorded in full on barcodes. But more generally, the OVC intends to set standards for data formats, not simply produce them with specific code. Third-party code should be able to produce and process EBIs -- for example, to confirm that tabulation has been performed accurately.
The document equality question applies to many classes of XML documents: When are two documents identical according to application requirements? Conforming to the same DTD or schema is a minimum necessary condition, and XML canonicalization can remove many trivial syntactic variants. But as a rule, meaningful identity cannot be expressed by schemas alone. For example, deciding when the order of child elements is meaningful and when it is incidental is strictly an application-level issue.
The Gnosis Utilities library provides (in my opinion) a rather elegant way to customize the meaning of equality. You may define a custom class with equality and inequality tests to hold all XML documents with the root element <cast_ballot>. The module evm2003.utils.equiv injects an application-specific equality test into EBI Python objects, and may also be used as a command-line tool to compare EBIs/REBIs. Here it is, including the detailed docstring:
Listing 4. evm2003.utils.equiv.py module
"""Compare ballot XML files for equivalence . This file may be imported as a module or used as a command-line ballot comparison tool. If imported, e.g.: . >>> import evm2003.utils.equiv >>> from gnosis.xml.objectify import make_instance >>> a = make_instance('scanned.xml') >>> b = make_instance('stored.xml') >>> a == b 1 . At the command-line: . % python equiv.py scanned.xml stored.xml . (lack of any output means success, in that ultra-terse UNIX-philosophy way). . We implement custom .__eq__() and .__ne__() methods specific to cast ballots. Injecting such methods is the recommended technique for enhancing gnosis.xml.objectify objects. . The files scanned.xml and stored.xml documents were used to test this. They differ in several non-significant respects: (1) the top-level attributes occur in a different order; (2) non-ordered multi-select contests have selections in a different order; (3) Write-in votes have different PCDATA content (for example, nothing for scanned.xml). """ import gnosis.xml.objectify import sys class cast_ballot(gnosis.xml.objectify._XO_): def __eq__(self, other): metadata = '''election_date country state county number precinct serial'''.split() for attr in metadata: if getattr(self, attr) != getattr(other, attr): return 0 by_name = lambda a, b: cmp(a.name, b.name) self.contest.sort(by_name) other.contest.sort(by_name) for my, your in zip(self.contest, other.contest): if my.name != your.name or \ my.ordered != your.ordered or \ my.coupled != your.coupled: return 0 if my.ordered == "No": # Compare non-writeins (but don't know if same num writeins) my_select = dict([(x.PCDATA,None) for x in my.selection if x.writein=="No"]) your_select = dict([(x.PCDATA,None) for x in your.selection if x.writein=="No"]) if my_select != your_select: return 0 continue for my_select, your_select in zip(my.selection, your.selection): if (my_select.writein, your_select.writein) == ("Yes","Yes"): pass elif my_select.PCDATA != your_select.PCDATA: return 0 return 1 def __ne__(self, other): return not self == other #-- Namespace injection gnosis.xml.objectify._XO_cast_ballot = cast_ballot #-- Command-line operation if __name__=='__main__': a, b = map(gnosis.xml.objectify.make_instance, sys.argv[1:3]) if a != b: print sys.argv[1], "and", sys.argv[2], "are NOT equivalent ballots!"
I see no need to explain the principles of EBI equivalence in more detail than the docstring gives. The sample code suffices as an illustration of similar considerations that arise in many XML processing applications.
Visit developerWorks for thousands of developer articles, tutorials, and resources related to open standard technologies, IBM products, and more. See developerWorks.