My First XML Document

We have dealt with some basic XML concepts already, so now we are prepared to leap forward and take a closer look at this powerful language. The X in XML stands for eXtensible- in other words, the language can be extended to meet the requirements of a specific situation.

We have dealt with some basic XML concepts already, so now we are prepared to leap forward and take a closer look at this powerful language. The X in XML stands for eXtensible- in other words, the language can be extended to meet the requirements of a specific situation. The basis of XML is the XML document, in its most essential form, it is a text file with .xml extension that contains text, data, and XML tags. Technically speaking XML is not a language but rather it is a metalanguage, a language for defining other languages.

In practical terms, this means that you are not limited to a predefined set of tags in creating an XML document-You can create any tag you need for a specific application. XML standard provides a set of rules that specify some details, such as how to create tags and how an XML document can be structured, but within the XML framework you are free to define and use tags that best suit the data. Whether your XML document contains mailing list, a web page, or an inventory information, you can represent and structure the data with essentially complete freedom. XML was specifically designed so that the storage of data in an XML document would place no constraint on the display of the data.

The XML standard was created by World Wide Web Consortium (W3C) which is an open, public organization whose task is to develop technologies and standards for the Internet. Because the XML standard is public, it is not owned or dominated by any single commercial interest. Companies such as IBM, Microsoft, Sun had developed many of the current standards for data storage and manipulation. These so-called proprietary standards may work well, but since a single company controls them, an outside developer has no assurance that the standard will not change without warning, subject to licensing fee, or dropped altogether. With a public standard as XML you know these things cannot happen. Hence it is worth spending time learning about a technology which is meant to stay and rule. {mospagebreak title=Tools For Working With XML} A common question that is asked by developers just getting started with XML is “What tools do I use to work with XML?” This is a tough question to answer. Unlike some development technologies, XML does not require any specific software package. Say you want to program in Visual Basic all you need is MS Visual Basic. Since XML is non proprietary in nature anyone is free to create and distribute tools for working in XML. This situation means there are many tools available for you to choose from, we will be dealing with them shortly.{mospagebreak title=Tools For Creating XML} Because XML are text files, you could use any text editor, including Notepad utility that comes with the MS Windows to create XML files. You should not use word processing programs such as Microsoft word, unless you are sure it can save your file as plain text. Plain text editors can be cumbersome when it comes to XML editing task, so you my want to consider one of the many specialized XML editors that are available. These programs automate or simplify many aspects of creating XML, thus easing your job as programmer. Some editors have a parser integrated with them. XML editors are available as commercial software, as shareware and as freeware. You can locate various editors by performing web search on the term “XML editor”. Microsoft provides a free XML editor called XML Notepad. {mospagebreak title=Tools For Browsing XML} One area where use of XML is widespread is the World Wide Web. This means that browser support for XML is increasingly important. To support XML, a browser has to do two things: display the content of XML document formatted according to its associated style sheet and parse the XML document to allow access to its contents. The first commercial browser to support XML was Microsoft Internet Explorer 5.0. Mozilla browser which, is an open source code initiative that is based on Netscape Communication Browser (developed by Netscape) completely supports XML. You can find additional information at http://www.mozilla.org/. In addition W3C has developed its own browser, Amaya to browse and edit XML document. You can download Amaya from W3C web site, http://www.w3.org/amaya/. {mospagebreak title=Tools For Parsing XML} The one essential tool for reading and modifying XML is a “parser”. The term “parser” and “processor” are sometimes used interchangeably, although technically this is not correct. Any program that takes an XML file as its input and produces some output based on the XML files content is an XML processor. An XML browser is an example of processor, as are programs that create typesetting codes, synthesized speech or HTML pages based on XML document. A parser is software that performs the first step in processing an XML document. Rarely, if ever, will an XML parsers operate on it’s own. Rather, a parser is almost always used as part of an XML processor. The most basic task of a parser is checking the XML document for well-formed data, making sure the documents content follows the rules of XML syntax. All parsers perform this task, if the syntax is incorrect the parser will detect it. Most parsers can also check a document for validity by checking against DTD or Schema. While not all documents require validation, many do. The fact that the document is well-formed is not enough-the data must be structured properly for the processing software to work correctly. The final task the parser may perform is to make the content of the document, both markup and data, available to the processing software. SAX API or DOM API does this, we will deal with them later on. XML parsers do not usually exist as stand-alone application, most parsers are in the form of a software component, or class, that can be used by processing applications. The processing software interacts with the parser, making calls to the parser’s properties and methods to perform the needed tasks. {mospagebreak title=Microsoft XML Parser} As we have already stated there are numerous XML parsers available, you could choose the one, which you feel comfortable with. But this tutorial is focused on Microsoft technologies for XML, so the material covered will be limited to the Microsoft XML parser (MSXML). This is a very powerful parser performs syntax checking, does validation, and exposes the document content with both SAX and DOM. To download and install package for MSXML Parser go to http://msdn.microsoft.com/downloads and search for “MSXML Parser”.

Microsoft XML Software Development Kit:
Another useful tool that is available from Microsoft is the Microsoft XML Software Development Kit (MSXML SDK). The SDK is installed at the same time you install MSXML Parser.{mospagebreak title=Creating An XML Document} This part explores how to build an XML document. First we will begin with a simple XML document. In Part I, we have already briefly dealt with XML tags, attributes, elements, namespaces and rule-based systems of both DTD and Schemas. Albeit, writing your first XML document is going to be so easy, we are going to dive right in and write one.

Point To Note:
This basic XML document contains one element, called Basic. This is really the essence of XML: the ability to define your own meaning and structure to the document. So open your favorite text editor and type the content of Example 1.1.

Example 1.1: A Basic XML Document
  
     <?xml version = 1.0 ?>
      <?xml-stylesheet type = text/xsl href= 
Basic.xsl?>
      <Basic> Hello, Welcome To The XML World. </Basic>
  
Now, save the file as HelloWorld.xml. If you are using Notepad, then in the Save as Dialog box be sure to select all files form the Save as Type drop-down list. If you dont note pad will save as .txt file, and you may not be able to open the document in an XML Processor. You can also rap the file name in quotes as an alternative.

Next, let us develop a simple Style Sheet so we can use the file in a browser. This step isnt really necessary, because XML Parsers, including the one that comes with IE5 and onwards will parse the document and reveal its structure. But I just want to give you something here that looks familiar, and most people have by now seen a simple web page.

Example 1.2: A Basic XSL(T) Document
  
<?xml version= “1.0” ?>
<xsl:stylesheet version= “1.0” xmlns:xsl = “http://www.w3.org/1999/XSL/Transform”>
<!- 
xmlns:xsl = http://www.w3.org/TR/WD-xsl for most versions of Internet Explorer 
5 ->
<xsl:template match= “/”>
<html>
<head>
<title> A Basic Style 
Sheet </title>
</head>
<body>
<xsl:value-of select = “/” />
</body>
</html>
</xsl: 
template>
</xsl:stylesheet> 
  
Next, save the file as Basic.xsl in the same directoryu or folder as HelloWorld.xml. The browser must be able to access it in order to view HelloWorld.xml.

If you have Netscape, you can choose Open File from the File menu and nagivate to the directory in which you saved HelloWorld.xml. Then Open the file in the Netscape, and the screen will have something like this.

The file Basic.xml as rendered by Netscape: File:///c:/filepath/HelloWorld.xml

Hello, Welcome To The XML World.


Note: You dont have to build a Style Sheet in Netscape. Netscape will render the document using built-in style sheet based on Cascading style sheet (CSS).

If you have IE5 or onwards, choose File-Open to navigate to HelloWorld.xml. You may need to swap the portion of the XSL code that reads xmlns:xsl= http//:www.w3.org/1999/xsl/Transform for the commented code that reads xmlns:xsl = http//www.w3.org/TR/WD-xsl and resave Basic.xmlThe page should now display looking like the screen below:

The file HellowWorld.xml as rendered by Internet Explorer. C:filepathHelloWorld.xml.

Hello, Welcome to XML World.
{mospagebreak title=Tools For The Trade} Now let us see an Example of XML Document to illustrate all that we are learnt so far in Part I and II emphasizing on Syntax Element (just numbered each line for easy explanation).

Example 2.1:
  
1.<?xml version = “1.0” encoding = “UTF-8”>
2.<!DOCTYPE addresslist SYSTEM 
“addresslist.dtd” [
3.<!ENTITY oldlist System “oldaddresslist.xml>
4.]>
5.<adresslist>
6. 
<employee>
7. <name> Adam Hall </name>
8. <address> 123 Green Street 
</address>
9. <city> Pompano Beach </city>
10. <state> FL </state>
11. 
<zipcode> 12345 </zipcode>
12. </employee>
13. <employee>
14. <name> 
Mary Joseph </name>
15. <address> 63 Belle Street </address>
16. <city> 
Delray Beach </city>
17. <state> FL </state>
18. <zipcode> 12210 </zipcode>
19. 
</employee>
20. &oldlist:
21.</addresslist>
  
Example 2.2: The file oldaddresslist.xml that is referenced as an external entity on line 3 of example 2.1.
  
   <?xml version = “1.0” encoding = “UTF-8”>
<employee>
<name> Nelson 
Turner </name>
<address> 344 Main Street </address>
<city> Boca Raton 
</city>
<state> FL </state>
<zipcode> 33445 </zipcode>
</employee>
<employee>
<name> 
Jacky Chan</name>
<address> 2345 Lake Street </address>
<city> Deerfield 
Beach </city>
<state> FL </state>
<zipcode> 12556</zipcode>
</employee>
  
The functions of tags in Example 2.1 are as follows:

Line 1: This is the processing instruction that informs XML processor that the file adheres to XML version 1.0 and that the file uses the character encoding known as UTF-8.

Line 2: This is the start of a multi line DOCTYPE. This type of tag is used for various purposes and often contains other tags. On this line tag specifies that the document is of type “addresslist” and that it follows the data model that is define in the DTD located in the file addresslist.dtd. In this case the name of the document type and the name of DTD are the same, but they need not be so. Note that the DTD file addresslist.dtd is not presented here, its name is used only to demonstrate the DOCTYPE tag.

Line 3: This is an ENTITY tag, which is contained within the DOCTYPE tag. It declares reference to an external file. Specifically, it defines the term “oldlist” to refer to the file oldaddresslist.xml.

Line 4: This marks the end of the DOCTYPE tag.

Line 5: This tags marks the beginning of the content portion of the document, of type “addresslist”. Note the name of this tag must be the same as the name specified in the DOCTYPE tag (line 2). This is the document’s root element.

Line 6: This tag marks the beginning of a unit of data called “employee”.

Line 7 through 11: These lines contains tags and datas that define the information belonging to the this “employee” unit.

Line 12: This tag marks the end of the “employee” unit of data.

Line 13 through 19: The data and data on these lines define a second “employee” unit of data.

Line 20: This line references the “oldlist” entity that was declared in line 3. It is marked as an entity referenced by the leading (&) and the ending semicolon (;). The effect of this line will be pretty much the same as if the content of the file oldaddresslist.xml were cut and pasted into the document at this location.

Line 21: This final tag marks the end of the “addresslist” document.

Hope this part of the tutorial had provided an overview of the “XML World”. XML is deceptively simple in principle, but it is this very simplicity that is the root of its tremendous power and flexibility for representing structured data. In the next part we will deal elaborately on Data Modeling in DTDs and later move up to Data Modeling in XDR Schemas.
[gp-comments width="770" linklove="off" ]

chat