Ever tried to read a DTD, and failed miserably? Ever wondered what all those symbols and weird language constructs meant? Well, fear not - this crash course will get you up to speed with the basics of DTD design in a hurry.
A number of special symbols can be added to an element declaration in order to define its frequency and order, or the frequency and order of its child elements. Here's a quick list:
symbol description
---------------------------------------------------------
+
one
or more occurrence(s)
* zero or more occurrence(s)
? zero or one occurrence(s)
|
choice
If you're familiar with regular expressions, you'll feel right at home with these
symbols - they're almost identical to the symbols used to build regular expression patterns.
Let's take this for a quick spin. Consider the following revised XML document
<!ELEMENT weather (city, high*, low*, forecast)+>
<!ELEMENT city (#PCDATA)>
<!ELEMENT
forecast (#PCDATA)>
<!ELEMENT high (#PCDATA)>
<!ELEMENT low (#PCDATA)>
How did I come up with this? It's simple - you just have to take it step by step.
The first thing to do is allow for more than one "city" block within the "weather" element.
<!ELEMENT weather (city, high, low, forecast)+>
Next, the "high" and "low" elements must be made optional.
<!ELEMENT weather (city, high*, low*, forecast)+>
And Bob's your uncle!
The | operator sets up a list of alternatives, and comes in handy when an element must contain any one of a finite list of alternatives. Consider the following XML document,
and take a look at the corresponding DTD, specifically at the declarations for
the "tel", "fax" and "email" elements, which may contain either "home" or "work" nested child elements.
<!ELEMENT addressbook (record+)>
<!ELEMENT record (name, street?, city?,
zip?, country?, tel?, fax?, email?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT
street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ELEMENT
zip (#PCDATA)>
<!ELEMENT tel (home | work)>
<!ELEMENT fax (home | work)>
<!ELEMENT
email (home | work)>
<!ELEMENT home (#PCDATA)>
<!ELEMENT work (#PCDATA)>
The | operator also comes in handy when defining elements which are of so-called
"mixed" type - they can contain either data or other elements. Here's an example:
<?xml version="1.0"?>
<surrealism>
The elongated <color>blue</color>
<animal>fox</animal> jumped over the
<color>green</color> <vegetable>pumpkin</vegetable>
and morphed into
<personality>Richard VIII</personality>
</surrealism>
Pay close attention to the "surrealism" element, which can contain either character
data or any one of the listed elements:
Obviously, all these symbols can also be combined to create weird and wonderful
rules for the document to follow. An example awaits you at the end of the article...but first, attributes.