In this article, Martin Bond discusses XML and its associated APIs and standards, and how XML can be used to create flexible structured data that is inherently portable. This excerpt is from chapter (Day) 16 of Teach Yourself J2EE in 21 Days, second edition, by Martin Bond, et. al. (Sams, ISBN: 0672325586)
The outermost element in an XML document is called the root element. Each XML document must have one and only one root element, often called the top level element. If there is more than one root element, an error will be generated.
The root element can be preceded by a prolog that contains XML declarations. Comments can be inserted at any point in an XML document. The prolog is optional, but it is good practice to include a prolog with all XML documents giving the XML version being used (all full XML listings in this chapter will include a prolog). A minimal XML document must contain at least one element.
Declarations
There are two types of XML declaration. XML documents may, and should, begin with an XML declaration, which specifies the version of XML being used. The following is an example of an XML declaration:
<?xml version ="1.0"?>
The XML version element tells the parser that this document conforms to the XML version 1.0 (W3C recommendation 10-February-1998). As with all declarations, the XML declaration, if present, should always be placed in the prolog.
The other type of declaration is called an XML document type declaration and is used to validate the XML. This will be discussed in more detail in the section titled "Creating Valid XML" later in this chapter.
Elements
An element must have a start tag and an end tag enclosed in < and > characters. The end tag is the same as the start tag except that it is preceded with a / character. The tags are case sensitive, and the names used for the start and end tags must be exactly the same, for example the tags <Start>...</start> do not make up an element, whereas <Start>...</Start> do (both tags are letter case consistent).
An element name can only contain letters, digits, underscores _, colons :, periods ., and hyphens -. An element name must begin with a letter or underscore.
An element may also optionally have attributes and a body. All the elements in Listing 16.2 are well-formed XML elements. All attributes must be quoted, both single and double quotes are permitted.
Listing 16.2 Valid XML Elements
<start>this is the beginning</start>
<date day="16th" Month="February">My Birthday</date>
<today yesterday="15th" Month="February"></today>
<box color="red"/>
<head></head>
<end/>
Table 16.1 describes each of these elements.
Table 16.1 XML Elements
Element Type
XML Element Includes
<tag>text</tag>
A start tag, body, and end tag
<tag attribute="text"> text </tag>
An attribute and a body
<tag attribute="text"> </tag>
An attribute but no body
<tag attribute="text"/>
Short form of attribute but no body
<tag></tag>
A start tag and end tag but no body
<tag/>
Shorthand for the previous tag
Although the body of an element may contain nearly all the printable Unicode characters, certain characters are not allowed in certain places. To avoid confusion (to human readers as well as parsers) the characters in Table 16.2 should not be used in tag or attribute values. If these characters are required in the body of an element, the appropriate symbolic string in Table 16.2 can be used to represent them.
Table 16.2 Special XML Characters
Character
Name
Symbolic Form
&
Ampersand
&
<
Open angle bracket
<
>
Close angle bracket
>
'
Single quotes
'
"
Double quotes
"
The elements in an XML document have a tree-like hierarchy, with elements containing other elements and data. Elements must nest—that is, an end tag must close the textually preceding start tag. This means that
<b><i>bold and italic</i></b>
is correct, while
<b><i>bold and italic</b></i>
is not.
This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.