XML Basics (part 2) - Splitting Up
(Page 2 of 9 )
First up, CDATA. As explained in the previous article, the XML specification considers all text enclosed within tags to be character data. There is one important exception to this - CDATA blocks.
CDATA blocks are document sections explicitly marked as not containing markup, and are hence treated as character data by the parser. These blocks can contain pretty much anything - strings, numbers, symbols, ancient Egyptian hieroglyphics - and will be ignored by the parser.
A CDATA block typically begins with
<![CDATA[
and ends with
]]>
with the data enclosed within the two. Here's an example:
<?xml version="1.0"?>
<manual>
<function>split(str, pattern)</function>
<description>Split
a string <param>str</param> into component parts on the
basis of <param>pattern</param></description>
<example>
<![CDATA[
<?
split("apple,
vanilla, orange", ",");
?>
]]>
</example>
</manual>
CDATA blocks make it easy to add large blocks of text (including text containing
special characters, symbols or program code) to an XML document, yet have the parser treat it as regular character data. And so, while a parser might choke on this,
<?xml version="1.0"?>
<secret_message>
<from>Our man in Paris</from>
<to>Director,
Special Operations</to>
<coded_body_text>
12637 0%%348 83483 89238 82383
10341 0*049 27216 02039 84585 18127 45759
3@492 83%84 22829 238#3 92345 72310
53467 12941 92461 40149 7^&291 21271
46101 42356 74(@1 4!128 47353 #511~ 473~7
12942 38#53 45628
</coded_body_text>
</secret_message>
it will be absolutely fine with this.
<?xml version="1.0"?>
<secret_message>
<from>Our man in Paris</from>
<to>Director,
Special Operations</to>
<coded_body_text>
<![CDATA[
12637 0%%348
83483 89238 82383 10341 0*049 27216 02039 84585 18127 45759
3@492 83%84 22829
238#3 92345 72310 53467 12941 92461 40149 7^&291 21271
46101 42356 74(@1 4!128
47353 #511~ 473~7 12942 38#53 45628
]]>
</coded_body_text>
</secret_message>
Obviously, you cannot include the ending sequence
]]>
within a CDATA block, as this would merely serve to confuse the parser. If you
need to include this sequence within a CDATA block, it needs to be written as
]]>
This article copyright Melonfire 2001. All rights reserved.Next: Eating Humble PI >>
More XML Articles
More By icarus, (c) Melonfire