Integrating XML with J2EE - Structure of an XML Document (
Page 4 of 14 )
The outermost element in an XML document is called the root element. Each XML
document must have one and only one root element, often called the top level
element. If there is more than one root element, an error will be generated.
The root element can be preceded by a prolog that contains XML declarations.
Comments can be inserted at any point in an XML document. The prolog is
optional, but it is good practice to include a prolog with all XML documents
giving the XML version being used (all full XML listings in this chapter will
include a prolog). A minimal XML document must contain at least one
element.
Declarations
There are two types of XML declaration. XML documents may, and should, begin
with an XML declaration, which specifies the version of XML being used. The
following is an example of an XML declaration:
<?xml version ="1.0"?>
The XML version element tells the parser that this document conforms to the
XML version 1.0 (W3C recommendation 10-February-1998). As with all declarations,
the XML declaration, if present, should always be placed in the prolog.
The other type of declaration is called an XML document type declaration and
is used to validate the XML. This will be discussed in more detail in the
section titled "Creating Valid XML" later in this chapter.
Elements
An element must have a start tag and an end tag enclosed in < and
> characters. The end tag is the same as the start tag except that
it is preceded with a / character. The tags are case sensitive, and the
names used for the start and end tags must be exactly the same, for example the
tags <Start>...</start> do not make up an element, whereas
<Start>...</Start> do (both tags are letter case
consistent).
An element name can only contain letters, digits, underscores _,
colons :, periods ., and hyphens -. An element name
must begin with a letter or underscore.
An element may also optionally have attributes and a body. All the elements
in Listing 16.2 are well-formed XML elements. All attributes must be quoted,
both single and double quotes are permitted.
Listing 16.2 Valid XML Elements
<start>this is the beginning</start>
<date day="16th" Month="February">My Birthday</date>
<today yesterday="15th" Month="February"></today>
<box color="red"/>
<head></head>
<end/>
Table 16.1 describes each of these elements.
Table 16.1 XML Elements
|
Element Type |
XML Element Includes |
|
<tag>text</tag> |
A start tag, body, and end tag |
|
<tag attribute="text"> text
</tag> |
An attribute and a body |
|
<tag attribute="text"> </tag>
|
An attribute but no body |
|
<tag attribute="text"/> |
Short form of attribute but no body |
|
<tag></tag> |
A start tag and end tag but no body |
|
<tag/> |
Shorthand for the previous
tag |
Although the body of an element may contain nearly all the printable Unicode
characters, certain characters are not allowed in certain places. To avoid
confusion (to human readers as well as parsers) the characters in Table 16.2
should not be used in tag or attribute values. If these characters are required
in the body of an element, the appropriate symbolic string in Table 16.2 can be
used to represent them.
Table 16.2 Special XML Characters
|
Character |
Name |
Symbolic Form |
|
& |
Ampersand |
& |
|
< |
Open angle bracket |
< |
|
> |
Close angle bracket |
> |
|
' |
Single quotes |
' |
|
" |
Double quotes |
" |
The elements in an XML document have a tree-like hierarchy, with elements
containing other elements and data. Elements must nest—that is, an end tag must
close the textually preceding start tag. This means that
<b><i>bold and italic</i></b>
is correct, while
<b><i>bold and italic</b></i>
is not.
|
This chapter is from Teach Yourself
J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams,
2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy
this book now.
|