Imagine a technology that offered you all the power of a DTD without the associated unpleasantness of those strange symbols and arcane commands. Sounds too good to be true? Say hello to XML Schema.
Now that you've got the basics down, let's look at a couple of composite examples just to put everything in perspective. Consider the following two document instances, and then see if you can write appropriate schema definitions for each:
<?xml version="1.0"?>
<!-- id is a required attribute -->
<recipe id="3450">
<name>Chicken
Tikka</name>
<author>Mr. Cluck</author>
<date>1999-06-08</date>
<ingredients>
<!--
quantity is a required attribute, units is optional -->
<item quantity="2">Boneless
chicken breasts</item>
<item quantity="2">Chopped onions</item>
<item
quantity="1" units="tsp">Ginger</item>
<item quantity="1" units="tsp">Garlic</item>
<item
quantity="1" units="tsp">Red chili powder</item>
<item quantity="1" units="tsp">Coriander
seeds</item>
<item quantity="2" units="tbsp">Lime juice</item>
<item
quantity="2" units="tbsp">Butter</item>
</ingredients>
<process>
<step>Cut
chicken into cubes, wash and apply lime juice and salt</step>
<step>Add
ginger, garlic, chili, coriander and lime juice in a separate
bowl</step>
<step>Mix
well, and add chicken to marinate for 3-4 hours</step>
<step>Place chicken
pieces on skewers and barbeque</step>
<step>Remove, apply butter, and
barbeque again until meat is tender</step>
<step>Garnish with lemon and
chopped onions</step>
</process>
</recipe>
Here's the second one:
<?xml version="1.0"?>
<weather>
<!-- id is a required attribute -->
<city
id="52320">
<name>Boston</name>
<temperature>
<!-- units is
a required attribute, restricted to values "celsius" and
"fahrenheit" -->
<high
units="celsius">23</high>
<low units="celsius">5</low>
</temperature>
<!--
forecast may be any one of "rain", "sun", "snow" or "fog" -->
<forecast>snow</forecast>
</city>
<city
id="9010">
<name>New York</name>
<temperature>
<high units="celsius">11</high>
<low
units="celsius">-5</low>
</temperature>
<forecast>snow</forecast>
</city>
<city
id="8239">
<name>London</name>
<temperature>
<high units="celsius">27</high>
<low
units="celsius">12</low>
</temperature>
<forecast>sun</forecast>
</city>
</weather>
This version first defines various elements and then references those definitions
to construct a schema. In case this doesn't work for you, you can derive and use named datatypes instead of references - here's an alternative version of the schema above:
Of these two approaches, I've always found the derived types approach to be a
bit more flexible, not to mention more clearly-structured and logical - although it does take a bit of getting used to.