Doing More With XML Schemas (part 1) (
Page 1 of 5 )
Get into the more advanced aspects of XML Schema design with a discussion of
simple and complex datatypes, and learn to apply the basic OOP concepts of extensibility
and inheritance to your schemas. Back in the old days, when XML was still a dark and nebulous cloud on the horizon,
the only way to verify the integrity of XML-encoded data was with a Document Type
Definition (DTD). DTDs were incomprehensible beasts consisting of strange symbols
and tangled acronyms, and it took a tremendous amount of patience (not to mention
a fair amount of alcohol) to successfully write one that worked as it was supposed
to.
Realizing that the arcane syntax used to construct DTDs was hindering rather
than helping its efforts to make XML the de facto standard for data markup on
the Web, the W3C came up with a kinder, gentler way of validating XML data. It
was called XML Schema, and it offered developers all the capabilities of current
DTDs while simultaneously adding a number of new capabilities designed to improve
maintainability and extensibility.
As the name suggests, a "schema" is a blueprint for a specific class of XML document.
It lays down rules for the types of elements and attributes allowed within an
XML document, the types of values that accompany such elements, and the order
and occurrence of these elements. It also addresses a number of issues which cannot
be handled by DTDs: datatyping (including the ability to derive new datatypes
from existing ones), inheritance, grouping, and database linkage.
Specific XML documents (referred to by the Working Group as "document instances")
can be linked to a schema and validated against the rules contained within it.
The XML Schema specification specifies the process by which document instances
and schemas are linked together, and a number of tools are now available to perform
this validation.
Now, if you've been paying attention to previous columns, you probably already
know the basics of how schemas work. In this series of articles, I'll be building
on that basic knowledge to demonstrate some of the more advanced capabilities
available to you via XML schemas, in the hope that it will assist you in fully
exploiting the powers of this new tool. Keep reading!{mospagebreak title=Keeping
It Simple} Let's begin with a quick refresher course in simple and complex element
types. Consider the following XML document:
<?xml version="1.0" encoding="UTF-8"?>
<character>
<name>Luke Skywalker</name>
<species>Human</species>
<language>Basic</language>
<home>Tatooine</home>
</character>
The XML Schema specification makes a basic distinction between "simple" and "complex"
elements. Simple elements cannot contain other elements or possess additional
attributes; complex elements can have additional attributes and serve as containers
for other elements (which themselves may be either simple or complex).
Within a schema, these two element types are represented by the <xsd:simpleType>
and <xsd:complexType> elements respectively.
The easiest method to represent simple elements in a schema is to use the <xsd:element>
declaration with a built-in datatype - the following simple element
<name>Luke Skywalker</name>
would be represented in a schema by
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">;
<xsd:element
name="name" type="xsd:string"/>
</xsd:schema>
When the datatype name is preceded by the "xsd:" prefix, it indicates a predefined
datatype and not a new, user-defined type. The XML Schema specification lists
about forty different built-in datatypes, including "string", "integer", "decimal",
"float", "boolean", "time", "date", "dateTime" and "anyURI". However, in case
these are too generic for you, it's also possible to derive your own custom datatype
from the built-in ones, and then declare simple elements using this custom datatype.