So far, we have discussed the XML Signature element and its schema at a high level and reviewed the processing steps necessary to create and verify a Signature. In the process, we have glossed over a lot of the specifics; now let's discuss each element in more detail.The SignedInfo Element
As we mentioned, the SignedInfo element contains all the information about what is digitally signed. As shown in Listing 4.12, the SignedInfo element contains three children elements—CanonicalizationMethod, SignatureMethod, and zero or more Reference elements.
Listing 4.12 XML Shorthand Schema for the <SignedInfo> Element
We describe each of these three elements in turn, starting with CanonicalizationMethod.The CanonicalizationMethod Element and Canonicalization
Canonicalization is quite a mouthful and is often abbreviated as C14N (14 letters bracketed by the C and the N). The concept behind canonicalization is straightforward, but you wouldn't know this to look at the amount of discussion and work it has engendered. The W3C has published multiple specifications on canonicalization and the subtle issues surrounding it. It continues to be a well-discussed subject (a recent google search on canonicalization located 35,500 hits). Let's look at this topic at its simplest level first and then determine where the more subtle issues arise.
As we discussed in Chapter 3, if even a single bit changes in a document that is being signed, the digest (hash) will not be the same, and Signature/Reference validation will fail. With XML, in particular, certain differences may exist between an XML document or XML fragment that has nothing to do with the underlying meaning of the XML but is introduced simply because of the operating system or a difference in the way an XML parser resolves the XML. For example, a file created on a Windows operating system or one created on a Unix operating system typically ends a line of text with different ASCII character sequences. Another example is that the whitespace outside the tags of an XML element might be handled slightly differently among different XML parsers. From an XML language definition standpoint, these whitespace differences mean nothing and are ignored.
As we mentioned, however, any difference between the resulting XML at signature time and the resulting XML that is resolved at validation time will result in the Signature validation failing. Canonicalization is the strategy used to deal with this issue.
Canonicalization normalizes the XML so that, regardless of inconsequential physical differences in the XML, two logically equivalent XML documents will become physically, bit-to-bit equivalent. This is a critical requirement for digital signatures to work.
The following simple scenario illustrates the issue. Say that Bob is creating an XML Signature over the XML structure in Listing 4.13.
Listing 4.13 An XML Structure to be Signed
Now, say that Bob created this XML document on a Windows platform. Hidden within Bob's hypothetical XML document is whitespace in the form of spaces, tabs, and end-of-line characters. His document actually looks like Listing 4.14 (hidden characters are bold and delimited by + in this example).
Listing 4.14 The GroceryList XML Structure with Embedded Whitespace from Windows
+t+<GroceryList>+c-r++l-f+ +t++t+<GroceryStore>Safeway</GroceryStore>+sp++c-r++l-f+ +t++t+<Item Category="produce">Lettuce</Item>+sp++c-r++l-f+ +t++t+<Item Category="produce">Tomato</Item>+c-r++l-f+ +t++t+<Item Category="meat">Bacon</Item>+c-r++l-f+ +t+</GroceryList>+c-r++l-f+
Now say this document is emailed to Alice, who is running on a Unix platform and needs to validate this document. One difference between the Windows and Unix environments is the extra carriage return put in by Windows. As a convenience to Alice, when she opens her emails, the reader software converts the carriage returns and linefeeds to single linefeeds. So, now the XML document looks like Listing 4.15.
Listing 4.15 The GroceryList XML Structure with Embedded Whitespace from Unix
Are these two XML documents different? Not semantically. But, from a classic digital signature perspective, any change to the underlying bits would break the signature. If Bob had digitally signed this XML using a traditional digital signing approach, the digital signature validation would fail verification (although the email reader could get around this problem by validating the signature before doing any convenience manipulation of the document). It is clear, however, that logically these two XML documents are identical, and they should compare equally. This is where canonicalization comes into play. There needs to be a common way to format XML regardless of platform so that, no matter what underlying physical changes have occurred, two semantically identical XML documents will be considered physically identical. This is not limited to physical changes, but also to certain syntactic changes where XML gives you options that do not change the underlying logical equivalence. We come back to this topic soon because this is the place where some of the difficulties of canonicalization lie.
Let's return to our simple Bob and Alice scenario with the concept of canonicalization. This time, before signing the XML document, Bob (or more accurately, the tool that he is using to create the XML Signature) canonicalizes the XML. Here are some simple rules for canonicalization for this example (note that these are not the canonicalization rules for XML Signature, just some simple rules for this example):
Applying these rules to the preceding XML document creates the structure in Listing 4.16.
Listing 4.16 The GroceryList XML Structure After Canonicalization
These examples might not be pretty to look at, but canonicalization is not for human consumption; it is to feed directly into the Digest algorithm. This canonicalized output is not stored anywhere. It is not in the Signature element, and the XML information being referenced is not changed.
The purpose of the CanonicalizationMethod element in the SignedInfo block is to provide the name of the canonicalization algorithm that Bob employed when digitally signing the XML. Alice's XML Signature processing engine will read the CanonicalizationMethod and apply the same canonicalization algorithm on the XML that might have been subtly changed by her operating system. Consequently, due to the canonicalization, the same XML will result, and the signature will be verified correctly.
The CanonicalizationMethod algorithm attribute is applied to the SignedInfo element. It is also a type of Transform that can be used as a Transform element with References that we will talk more about later.
The initial XML specification contains only one required canonicalization algorithm (http://www.w3.org/TR/2001/REC-xml-c14n-20010315). This version ignores comments, so, for example, you could put XML comments into a SignedInfo section and they would be ignored in the signature. There is also a recommended canonicalization algorithm that includes comments (http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments).
blog comments powered by Disqus