Home arrow XML arrow Page 2 - An Introduction to XML


XML is an acronym for "Extensible Markup Language". XML is the latest, powerful , platform-independent and content-dependent technology in the internet development. Learn all about it here.

  1. An Introduction to XML
  2. SGML, HTML and XML
  3. Valid and Well-formed XML
  4. Example XML Documents and analysis
  5. Software for XML
  6. Conclusion
By: Suresh Kumar
Rating: starstarstarstarstar / 6
October 21, 1999

print this article



SGML -Standard Generalized Markup Language

SGML is an international standard for describing electronic documents. SGML is a meta language used to write other languages. SGML helps describe text documents in a logical and structural manner. SGML is used primarily for the creation, storage, and distribution of documents and as a source for conversion to other documents.

SGML documents have been used in the US military and American aviation industries for many years. It is too complicated for web publishers and this is the reason for the growth of HTML, a simplified subset of SGML.

HTML - Hyper Text Markup Language

HTML can be considered as the simplest subset of SGML and is simple enough to have Web publishing accessible to anyone. Publishers do not necessarily need knowledge of HTML as a lot of WYSIWYG editors are available in the market.

What are the problems with HTML?

HTML is too restrictive. Standard tags are predefined by W3C, so HTML is not powerful enough to describe more complex documents. HTML is more presentation oriented than content oriented, so HTML tags do not give an indication of the meaning of the content. You may ask, why can't W3C introduce more tags to describe content? Doing just that led to another problem: browser companies have introduced new, proprietary tags to attract users to their products.

With current HTML, publishers have to do lot of adjustments to their documents to be compatible with popular browsers. Browsers do not check for bad HTML code and hence the Internet has a lot of documents with several HTML mistakes. These issues were raised by content managers and Internet publishers and this problem escalated to such an extent that W3C began to look for alternatives. What is the solution?

XML - eXtensible Markup Language

XML can be considered as a simplified version of SGML. XML is case sensitive. <p> is different from <P>. though in HTML both would be considered the same.
XML is extensible - You can create your own elements to meet your publishing demands. You need not wait for W3C HTML committee to release the next version of HTML to include your required tags.

XML is structured - XML documents should adhere to a specific structure. If a document is not structured properly, it is not considered to be XML.

XML is a much more accessible language than SGML. Since XML documents are well structured, programmers can easily write software for rendering the XML documents. XML has simple rules to differentiate between the document contents and the XML markup elements.

XML markup elements start with either a less than symbol(<) or an ampersand (&) character XML also uses greater than symbol (>), single quote (') and the double quotation marks(") for markup. To use the above markup characters, one should use the corresponding general XML entity (&amp for &, &gt for >, &lt for <, &apos for ' and &quot for ").{mospagebreak title=What is DTD - Document Type Definition}

A DTD can be considered the grammar for a markup language. It is a set of regulations that specifies the usage of XML markup. It defines elements, an element's attributes and its values, and contains specifications about which elements can be contained in others. DTD can also define entities.

We will consider an example DTD for email:

<!ELEMENT Mail (From, To, Cc?, Date?, Subject, Body)> <!ELEMENT From (#PCDATA) > <!ELEMENT To (#PCDATA) > <!ELEMENT Cc (#PCDATA) > <!ELEMENT Date (#PCDATA) > <!ELEMENT Subject (#PCDATA) > <!ELEMENT Body (#PCDATA | P | Br)* > <!ELEMENT P (#PCDATA | Br)* > <!ATTLIST P align (left | right | justify) "left" > <!ELEMENT Br EMPTY >

A XML document conforming to the mail DTD has only one From, one To, an optional Cc, an optional Date, one Subject and one body.
  • A From element has only text.
  • A To element has only text.
  • A Cc element has only text.
  • A Date element has only text.
  • A Subject element has only text.
  • A Body element can have text and zero or more of P and Br elements.
  • A P element can have text and zero or more of Br element
  • The P element has an align attribute. The attributes possible values are left, justify or right. Its default value is left.
  • The Br element is empty.

A XML parser (discussed in the software section) will use the DTD to parse the document. The DTDs enable you to publish your documents to be used by others. The XML document should have instructions to tell the XML processing programs to find out the DTD.

A <!DOCTYPE> element at the start of the XML file will instruct the program about the location of the DTD. For example:

<!DOCTYPE Mail system "http://infowest.com/DTDS/mail.dtd"> <Mail> .. .. .. </Mail>

>>> More XML Articles          >>> More By Suresh Kumar

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Google Docs and Xpath Data Functions
- Flex Array Collection Sort and Filtering
- The Flex Tree Control
- Flex List Controls
- Working with Flex and Datagrids
- How to Set Up Podcasting and Vodcasting
- Creating an RSS Reader Application
- Building an RSS File
- An Introduction to XUL Part 6
- An Introduction to XUL Part 5
- An Introduction to XUL Part 4
- An Introduction to XUL Part 3
- An Introduction to XUL Part 2
- An Introduction to XUL Part 1
- XML Matters: Practical XML Data Design and M...

Developer Shed Affiliates


Dev Shed Tutorial Topics: