Now that you know the basics, this article explains how to use XML's more advanced constructs to author complex XML documents. Entities, namespaces, CDATA blocks, processing instructions - they're all in here, together with aliens, idiots, secret agents and buried treasure.
Entities come in a variety of flavours. They can broadly be divided into general entities and parameter entities. The examples you've seen above are general entities; since parameter entities are used only with DTDs, you don't need to worry about them for the moment.
Entities may be further classified into internal entities (entities defined within the document), external entities (entities defined in a separate file), and unparsed entities (entities which are not processed by the parser).
Most of the examples you've seen so far use internal entities - that is, the entity declaration and entity references are stored in the same physical document. XML also allows you to separate the entity declaration from the entity reference by storing it in a separate file, which comes in handy when the entity declaration contains a large block of text. Consider the following example:
<?xml version="1.0"?>
<!DOCTYPE article
[
<!ENTITY header "All source
code copyright and proprietary Melonfire, 2001.
All content, brand names and trademarks
copyright and proprietary
Melonfire, 2001. All rights reserved. Copyright infringement
is a violation
of law. This source code is provided with NO WARRANTY WHATSOEVER.
It is
meant for illustrative purposes only, and is NOT recommended for use in
production
environments. Read more articles like this one at
<url>http://www.melonfire.com/community/columns/trog/</url>
and
<url>http://www.melonfire.com/</url>">
]>
<article>
<title>XML
Basics (part 2)</title>
<abstract>A discussion of basic XML theory</abstract>
<body>
&header;
Article
body goes here
</body>
</article>
Since the entity contains a fairly large block of text, it may be more convenient
to extract it and store it in a separate file, "header.xml". In that case, the example above would reduce to
<?xml version="1.0"?>
<!DOCTYPE article
[
<!ENTITY header SYSTEM "header.xml">
]>
<article>
<title>XML
Basics (part 2)</title>
<abstract>A discussion of basic XML theory</abstract>
<body>
&header;
Article
body goes here
</body>
</article>
In this case, the SYSTEM keyword is used to tell the parser the location of the
file containing the replacement text for the entity.
Unparsed entities usually contain references to images, sound files or other binary data, and hence should not be processed by a parser (jeez, you think maybe that's why they're called "unparsed entities"?) Such entity declarations usually contain a link to the file (as with external entities) followed by an additional notation identifier which specifies the type of file.
In the following example, the NDATA keyword is used to tell the parser that the file being referenced is not to be processed in the usual manner; it is followed by a file type specification offering further information on the nature of the file.
<?xml version="1.0"?>
<!DOCTYPE article
[
<!ENTITY map SYSTEM "treasuremap.jpg"
NDATA JPEG>
]>
<message>
To whoever finds this: here is a map showing the
location of pirate
treasure on an island in the South Pacific.
↦
Remember,
X marks the spot.
</message>
This article copyright Melonfire 2001. All rights reserved.