Integrating XML with J2EE

In this article, Martin Bond discusses XML and its associated APIs and standards, and how XML can be used to create flexible structured data that is inherently portable. This excerpt is from chapter (Day) 16 of Teach Yourself J2EE in 21 Days, second edition, by Martin Bond, et. al. (Sams, ISBN: 0672325586)

bondToday, we take a bit of a departure from J2EE and its emphasis on programming elements to look at what is fast becoming the lingua franca of the Internet—the Extensible Markup Language (XML).

Throughout the book so far, you have seen many ways in which XML is used within J2EE applications to describe the structure and layout of the application. Today and tomorrow, you will study XML and its associated APIs and standards to gain a fuller understanding of how XML can be used to exchange data between different components in your applications.

Today, you will learn about

  • How XML has evolved from the need for platform-independent data exchange

  • The relationship between XML and both Standard Generalized Markup Language (SGML) and Hypertext Markup Language (HTML)

  • How to create well-formed and valid XML documents

  • The Java API for XML Processing (JAXP)

  • How to process XML documents with the Simple API for XML (SAX) and the Document Object Model (DOM)

  • How XML is used in the J2EE platform

This book is about J2EE, of which XML is just a component. To learn more about XML, take a look at Sams Teach Yourself XML in 21 Days, which covers everything you need to know about XML and related standards.

The Drive to Platform-Independent Data Exchange

Applications essentially consist of two parts—functionality described by the code and the data that is manipulated by the code. The in-memory storage and management of data is a key part of any programming language and environment. Within a single application, the programmer is free to decide how the data is stored and represented. Problems only start when the application must exchange data with another application.

One solution is to use an intermediary storage medium, such as a database, and standard tools, such as SQL and JDBC, to gain access to the data in such databases.

But what if the data is to be exchanged directly between two applications, or the applications cannot access the same database? In this case, the data must be encoded in some particular format as it is produced, so that its structure and contents can be understood when it is consumed. This has often resulted in the creation of application-specific data formats, such as binary data files (.dat files) or text-based configuration files (.ini, .rc, .conf, and so on), in which applications store their information.

Similarly, when exchanging information between applications, purpose-specific formats have arisen to address particular needs. Again, these formats can be text-based, such as HTML for encoding how to display the encapsulated data, or binary, such as those used for sending remote procedure calls. In either case, there tends to be a lack of flexibility in the data representation, causing problems when versions change or when data needs to be exchanged between disparate applications, frequently from different vendors.

XML was developed to address these issues. Because XML is written in plain text, and shares similarities with HTML but uses self-describing elements, XML provides a data encoding format that is

  • Generic

  • Simple

  • Flexible

  • Extensible

  • Portable

  • Human readable

  • And perhaps most importantly, license-free

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Benefits and Characteristics of XML}

XML offers a method of putting structured data in a text file. Structured data is data that conforms to a particular format; examples are spreadsheets, address books, configuration parameters, and financial transactions. While being structured, XML is also readable by humans as well as software; this means that you do not need the originating software to access the data.

Origins of XML

XML was created by the World Wide Web Consortium (W3C) which now promotes and controls the standard. The W3C also promotes and develops a number of other interoperable technologies. The latest XML standard, along with lots of useful information and tools, can be obtained from the WC3 Web site (http://www.w3.org).

XML is a set of rules for designing text formats that describe the structure of your data. XML is not a programming language, so it is therefore easy for non-programmers to learn and use. In devising XML, the originators had a set of design goals, which were as follows:

  • XML should be straightforward to use over the Internet.

  • XML should support a wide variety of applications.

  • XML should be compatible with the Standard Generalized Markup Language.

  • It must be easy to write programs that process XML documents.

  • The number of optional features in XML should be kept to the absolute minimum—ideally, zero.

  • XML documents should be human-legible and reasonably clear.

  • XML documents should be easy to create.

  • Terseness in XML was of minimal importance.

XML is based on the Standard Generalized Markup Language (SGML). SGML is a powerful but complex meta-language that is used to describe languages for electronic document exchange, document management, and document publishing. HTML (probably the best known markup language) is an example of an SGML application. SGML provides a rich and powerful syntax, but its complexity has restricted its widespread use and it is used primarily for technical documentation.

XML was conceived as a means of retaining the power and flexibility of SGML while losing most of its complexity. Although a subset of SGML, XML manages to preserve the best parts of SGML and all of its commonly used features while being more regularly structured and easy to use.

XML is still a relatively young technology but it is fast making a significant impact. Already there is an important XML application—XHTML, the successor to HTML, which is now supported by most of the popular Web browsers.

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Structure and Syntax of XML}

In this section, you will explore the syntax of XML and understand what is meant by a well-formed document.


Note - You will often encounter the terms “well-formed” and “valid” applied to XML documents. These are not the same. A well-formed document is structurally and syntactically correct (the XML conforms to the XML language definition, that is all tags have a correctly nested corresponding end tag, all attributes are quoted, only valid characters have been used, and so on), whereas a valid document is also semantically correct (the XML conforms to some external definition stored in an XML Schema or Document Type Definition). A document can be well-formed but may not be valid.


The best way to become familiar with the syntax of XML is to write an XML document. To check your XML, you will need access to an XML-aware browser or another XML validator. The XML-aware browser or XML validator will allow you to ensure that the XML is well-formed. If the XML references an XML Schema or Document Type Definition (more on these later) the validator can also check that the XML is valid.

An XML browser includes an XML parser. To get the browser to check the syntax and structure of your XML document, simply use the browser to open the XML file. Well-formed XML will be displayed in a structured way (with indentation). If the XML is not well-formed, an appropriate error message will be given.


Tip - An easy way to validate XML is to use an XML aware browser. The latest versions of most popular browsers are now XML aware. You can download validating XML parsers from Sun Microsystems at http://www.sun.com/software/xml/developers/multischema/ and the Microsoft Developers Network at http://msdn.microsoft.com/downloads/samples/internet/xml/xml_validator/. There are numerous other XML validators and XML editors vailable from the Internet.


HTML and XML

At first glance, XML looks very similar to HTML. An XML document consists of elements that have a start and end tag, just like HTML. In fact, Listing 16.1 is both well-formed HTML and XML.

Listing 16.1 Example XML and HTML

<html>
 <head><title>Web Page</title></head>
 <body>
 <h1>Teach Yourself J2EE in 21 Days</h1>
 <p>Now you have seen the web page – buy the book</p>
 </body>
</html> 

An XML document is only well-formed if there are no syntax errors. If you are familiar with HTML, you will be aware that many browsers are lenient with poorly formed HTML documents. Missing end tags and even missing sections will often be ignored and therefore unnoticed until the page is displayed in a more rigorous browser, and fails to display correctly.

XML differs from HTML in that a missing end tag will always cause an error.

We will now look at XML syntax so you can understand what is going on.

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Structure of an XML Document}

The outermost element in an XML document is called the root element. Each XML document must have one and only one root element, often called the top level element. If there is more than one root element, an error will be generated.

The root element can be preceded by a prolog that contains XML declarations. Comments can be inserted at any point in an XML document. The prolog is optional, but it is good practice to include a prolog with all XML documents giving the XML version being used (all full XML listings in this chapter will include a prolog). A minimal XML document must contain at least one element.

Declarations

There are two types of XML declaration. XML documents may, and should, begin with an XML declaration, which specifies the version of XML being used. The following is an example of an XML declaration:

<?xml version =”1.0″?>

The XML version element tells the parser that this document conforms to the XML version 1.0 (W3C recommendation 10-February-1998). As with all declarations, the XML declaration, if present, should always be placed in the prolog.

The other type of declaration is called an XML document type declaration and is used to validate the XML. This will be discussed in more detail in the section titled “Creating Valid XML” later in this chapter.

Elements

An element must have a start tag and an end tag enclosed in < and > characters. The end tag is the same as the start tag except that it is preceded with a / character. The tags are case sensitive, and the names used for the start and end tags must be exactly the same, for example the tags <Start>…</start> do not make up an element, whereas <Start>…</Start> do (both tags are letter case consistent).

An element name can only contain letters, digits, underscores _, colons :, periods ., and hyphens -. An element name must begin with a letter or underscore.

An element may also optionally have attributes and a body. All the elements in Listing 16.2 are well-formed XML elements. All attributes must be quoted, both single and double quotes are permitted.

Listing 16.2 Valid XML Elements

<start>this is the beginning</start>
<date day=”16th” Month=”February”>My Birthday</date>
<today yesterday=”15th” Month=”February”></today>
<box color=”red”/>
<head></head>
<end/>

Table 16.1 describes each of these elements.

Table 16.1 XML Elements

Element Type

XML Element Includes

<tag>text</tag>

A start tag, body, and end tag

<tag attribute=”text”> text </tag>

An attribute and a body

<tag attribute=”text”> </tag>

An attribute but no body

<tag attribute=”text”/>

Short form of attribute but no body

<tag></tag>

A start tag and end tag but no body

<tag/>

Shorthand for the previous tag


Although the body of an element may contain nearly all the printable Unicode characters, certain characters are not allowed in certain places. To avoid confusion (to human readers as well as parsers) the characters in Table 16.2 should not be used in tag or attribute values. If these characters are required in the body of an element, the appropriate symbolic string in Table 16.2 can be used to represent them.

Table 16.2 Special XML Characters

Character

Name

Symbolic Form

&

Ampersand

&amp;

<

Open angle bracket

&lt;

>

Close angle bracket

&gt;

Single quotes

&apos;

Double quotes

&quot;


The elements in an XML document have a tree-like hierarchy, with elements containing other elements and data. Elements must nest—that is, an end tag must close the textually preceding start tag. This means that

<b><i>bold and italic</i></b>

is correct, while

<b><i>bold and italic</b></i>

is not.

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Well-formed XML Documents}

An XML document is said to be well-formed if there is exactly one root element, it and every sub-element has delimiting start and end tags that are properly nested within each other and all attributes are quoted.

The following is a simple XML document with an XML declaration followed by a number of elements. The structure represents a list of jobs that could be used in the Agency case study example. In Listing 16.3, the <jobSummary> tag is the root tag followed by a number of jobs.

Listing 16.3 Example jobSummary XML

<?xml version =”1.0″?>
<jobSummary>
 <job>
 <customer>winston</customer>
 <reference>Cigar Trimmer</reference>
 <location>London</location>
 <description>Must like to talk and smoke</description>
 <skill>Cigar maker</skill>
 <skill>Critic</skill>
 </job>
 <job>
 <customer>george</customer>
 <reference>Tree pruner</reference>
 <location>Washington</location>
 <description>Must be honest</description>
 <skill>Tree surgeon</skill>
 </job>
</jobSummary>

Attributes

Attributes are name/value pairs that are associated with elements. There can be any number of attributes, and an element’s attributes all appear inside the start tag. The names of attributes are case sensitive and are limited to the following characters: letters, digits, underscores _, periods ., and hyphens -. An attribute name must begin with a letter or underscore.

The value of an attribute is a text string delimited by quotes, either single or double quotes may be used. Unlike HTML, all attribute values in an XML document must be enclosed in quotes. Listing 16.4 shows the jobSummary XML document re-written to use attributes to hold some of the data.

Listing 16.4 JobSummary.xml XML with Attributes

<?xml version =”1.0″?>
<jobSummary>
 <job customer=”winston” reference=”Cigar Trimmer”>
 <location>London</location>
 <description>Must like to talk and smoke</description>
 <skill>Cigar maker</skill>
 <skill>Critic</skill>
 </job>
 <job customer=”george” reference=”Tree pruner”>
 <location>Washington</location>
 <description>Must be honest</description>
 <skill>Tree surgeon</skill>
 </job>
</jobSummary>

The choice of using nested elements or attributes is a contentious area. There are many schools of thought and it usually ends up being a matter of personal taste or corporate standards. Prior to the introduction of XML Schemas (see section “XML Schemas”) there were advantages to using attributes when the values were constrained in some way; such as values that are numbers or specific patterns. XML Schemas also allow element values to be constrained in the same way as attribute values.

Comments

XML comments are introduced by <!– and ended with –>, for example

<!– this is a comment –>

Comments can appear anywhere in a document except within the tags, for example,

<item quantity=”1lb”>Cream cheese <!– this is a comment –></item>

is acceptable, whereas the following is not

<item <!– this is a comment –> quantity=”1lb”>Cream cheese </item>

Note - As with commenting code, the comments you add to your XML should be factually correct, useful, and to the point. They should be used to make the XML document easier to read and comprehend.


Any character is allowed in a comment, including those that cannot be used in elements and tags, but to maintain compatibility with SGML, the combination of two hyphens together () cannot be used within the text of a comment.

Comments should be used to annotate the XML, but you should be aware that the parser might remove the comments, so they may not always be accessible to a receiving application.

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Namespaces}

When designers define an XML structure for some data, they are free to choose tag names that are appropriate for the data. Consequently, there is nothing to stop two individuals from using the same tag name for different purposes or in different ways. Consider the job agency that deals with two contract companies, each of which uses a different form of job description (such as those in Listings 16.3 and 16.4). How can an application differentiate between these different types of book descriptions?

The answer is to use namespaces. XML provides namespaces that can be used to impose a hierarchical structure on XML tag names in the same way that Java packages provide a naming hierarchy for Java methods. You can define a unique namespace with which you can qualify your tags to avoid them being confused with those from other XML authors.

An attribute called xmlns (XML Namespace) is added to an element tag in a document and is used to define the namespace. For example, the second line in Listing 16.5 indicates that the tags for the whole of this document are scoped within the agency namespace.

Listing 16.5 XML Document with Namespace

<?xml version =”1.0″?>
<jobSummary xmlns=”agency”>
 <job customer=”winston” reference=”Cigar Trimmer”>
 <location>London</location>
 <description>Must like to talk and smoke</description>
 <skill>Cigar maker</skill>
 <skill>Critic</skill>
 </job>
 <job customer=”george” reference=”Tree pruner”>
 <location>Washington</location>
 <description>Must be honest</description>
 <skill>Tree surgeon</skill>
 </job>
</jobSummary>

The xmlns attribute can be added to any element in the document to enable scoping of elements, and multiple namespaces can be defined in the same document using a prefix. For example, Listing 16.6 has two namespaces—ad and be. All the tags have been prefixed with the appropriate namespace, and now two different forms of the job tag (one with attributes and one without) can coexist in the same file.

Listing 16.6 XML Document with Namespaces

<?xml version =”1.0″?>
<jobSummary xmlns:ad=”ADAgency” xmlns:be=”BEAgency”>
 <ad:job customer=”winston” reference=”Cigar Trimmer”>
 <ad:location>London</ad:location>
 <ad:description>Must like to talk and smoke</ad:description>
 <ad:skill>Cigar maker</ad:skill>
 <ad:skill>Critic</ad:skill>
 </ad:job>
 <be:job>
 <be:customer>george</be:customer>
 <be:reference>Tree pruner</be:refenence>
 <be:location>Washington</be:location>
 <be:description>Must be honest</be:description>
 <be:skill>Tree surgeon</be:skill>
 </be:job>
</jobSummary>

Creating Valid XML

As you have seen, XML validators recognize well-formed XML, and this is very useful for picking up syntax errors in your document. Unfortunately, a well-formed, syntactically-correct XML document may still have semantic errors in it. For example, a job in Listing 16.4 with no location or skills does not make sense, but without these elements, the XML document is still well-formed, but not valid.

What is required is a set of rules or constraints that define a valid structure for an XML document. There are two common methods for specifying XML rules—the Document Type Definition (DTD) and XML Schemas.

Document Type Definitions

A DTD provides a template that defines the occurrence, and arrangement of elements and attributes in an XML document. Using a DTD, you can define

  • Element ordering and hierarchy

  • Which attributes are associated with an element

  • Default values and enumeration values for attributes

  • Any entity references used in the document (internal constants, external files, and parameters)


Note - Entity references are covered in Appendix A, “An Overview of XML.”


DTDs originated with SGML and have some disadvantages when compared with XML Schemas, which were developed explicitly for XML. One of these disadvantages is that a DTD is not written in XML, which means you have to learn another syntax to define a DTD. Another disadvantage is that DTD’s are not as comprehensive as XML Schemas and cannot therefore constrain an XML document as tightly as an XML Schema.

DTD rules can be included in the XML document as document type declarations, or they can be stored in an external document. The syntax is the same in both cases.

If a DTD is being used, the XML document must include a DOCTYPE declaration, which is followed by the name of the root element for the XML document. If an external DTD is being used, the declaration also includes the word SYSTEM followed by a system identifier (the URI that identifies the location of the DTD file). For example

<!DOCTYPE jobSummary SYSTEM “jobSummary.dtd”>

specifies that the root element for this XML document is jobSummary and the remainder of the DTD rules are in the file called jobSummary.dtd in the same directory.

An external identifier can also include a public identifier. The public identifier precedes the system identifier and is denoted by the word PUBLIC. An XML processor can use the public identifier to try to generate an alternative URI. If the document is unavailable by this method, the system identifier will be used.

<!DOCTYPE web-app 
 PUBLIC ‘-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN’ 
 ‘http://java.sun.com/dtd/web-app_2_3.dtd’>

Note - DOCTYPE, SYSTEM and PUBLIC must appear in capitals to be recognized.


This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Element Type Declarations}

The DTD defines every element in the XML document with element type declarations. Each element type declaration takes the following form:

<!ELEMENT name ( content ) >

For example, for the jobSummary XML document in Listing 16.4, the jobSummary root element is defined as

<!ELEMENT jobSummary ( job* )>

The * sign indicates that the jobSummary element may consist of zero or more job elements. There are other symbols used to designate rules for combining elements and these are listed in Table 16.3.

Table 16.3 Occurrence Characters Used in DTD Definitions

Character

Meaning

*

Zero or more (not required)

+

One or more (at least one required)

?

Element is optional (if present can only appear once)

|

Alternate elements

()

Group of elements


The following defines an XML job element that must include one location, an optional description, and at least one skill:

<!ELEMENT job (location, description?, skill+)>

Defining the Element Content

Elements can contain other elements, or content, or have elements and content. The jobSummary element, in Listing 16.4, contains other elements but no text body; whereas the location element has a text body but does not contain any elements.

To define an element that has a text body, use the reference #PCDATA (Parsed Character DATA). For example, the location element in Listing 16.4 is defined by

<!ELEMENT location (#PCDATA)>

An element can also have no content (the <br> tag in HTML is such an example). This tag would be defined with the EMPTY keyword as

<!ELEMENT br EMPTY>

You will also see elements defined with contents of ANY. The ANY keyword denotes that the element can contain all possible elements, as well as PCDATA. The use of ANY should be avoided. If your data is so unstructured that it cannot be defined explicitly, there probably is no point in creating a DTD in the first place.

Defining Attributes

In Listing 16.4, the job element has two attributes—customer and reference. Attributes are defined in an ATTLIST that has the following form:

<!ATTLIST element attribute type default-value>

The element is the name of the element and attribute is the name of the attribute. The type defines the kind of attribute that is expected. A type is either one of the defined constants described in Table 16.4, or it is an enumerated type where the permitted values are given in a bracketed list.

Table 16.4 DTD Attribute Types

Type

Attribute Is a…

CDATA

Character string.

NMTOKEN

Valid XML name.

NMTOKENS

Multiple XML names.

ID

Unique identifier.

IDREF

An element found elsewhere in the document. The value for IDREF must match the ID of another element.

ENTITY

External binary data file (such as a gif image).

ENTITIES

Multiple external binary files.

NOTATION

Helper program.


The ATTLIST default-value component defines a value that will be used if one is not supplied. For example

<!ATTLIST button visible (true | false) “true”).

defines that the element button has an attribute called visible that can be either true or false. If the attribute is not supplied, because a default value is supplied, it will be set to be true.

The default-value item can also be used to specify that the attribute is #REQUIRED, #FIXED, or #IMPLIED. The meaning of these values is given in Table 16.5.

Table 16.5 DTD Attribute Default Values

Default Value

Meaning

#REQUIRED

Attribute must be provided.

#FIXED

Effectively a constant declaration. The attribute must be set to the given value or the XML is not valid.

#IMPLIED

The attribute is optional and the processing application is allowed to use any appropriate value if required.


Example DTD

Listing 16.7 is the DTD for the jobSummary XML document. Create the DTD in a file called jobSummary.dtd in the same directory as your jobSummary XML document.

Listing 16.7 DTD for jobSummary XML

<!ELEMENT jobSummary (job*)>
<!ELEMENT job (location, description, skill+)>
<!ATTLIST job customer CDATA #REQUIRED>
<!ATTLIST job reference CDATA #REQUIRED>
<!ELEMENT location (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT skill (#PCDATA)>

Don’t forget to add the following line to the jobSummary XML at line 2 (following the PI):

<!DOCTYPE jobSummary SYSTEM “jobSummary.dtd”>

View the jobSummary.xml document in your XML browser or other XML validator.

If the browser cannot find the DTD, it will generate an error. Edit jobSummary.xml, remove the customer attribute, and check that your XML validator generates an appropriate error (such as “Required attribute ‘customer’ is missing”).

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=XML Schemas}

As has been already stated, DTDs have some limitations:

  • A DTD cannot define type information other than characters.

  • DTDs were not designed to support namespaces and, although it is possible to add namespaces to a DTD, how to do so is beyond the scope of this book.

  • DTDs are not easily extended.

  • You can only have one DTD per document, so you cannot have different definitions of an element in a single document and have them validated with a DTD.

  • The syntax for DTDs is not XML. Tools and developers must understand the DTD syntax as well as XML.

To address these issues, the XML Schema structure definition mechanism was developed by the W3C to fulfill the role of DTDs while addressing the previously listed limitations. XML Schemas are XML documents.

The XML Schema standard is split into two parts:

  • Specifying the structure and constraints on an XML document

  • A way of defining data types, including a set of pre-defined types

Because it is a more powerful and flexible mechanism than DTDs, the syntax for defining an XML schema is slightly more involved. An example of an XML schema for the jobSummary XML shown in Listing 16.4 can be seen in Listing 16.8.


Tip - The World Wide Web Consortium Web site provides access to a number of XML schema tools, including XML schema browsers and validators. These tools can be found at http://www.w3.org/XML/Schema.


Listing 16.8 XML Schema for Job Agency JobSummary XML Document

<?xml version=”1.0″?>
 <xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema” 
       elementFormDefault=”qualified”>

 <xsd:element name=”jobSummary”>
  <xsd:complexType>
  <xsd:sequence>
   <xsd:element name=”job” type=”jobType” 
         minOccurs=”0″ maxOccurs=”unbounded”/>
  </xsd:sequence>
  </xsd:complexType>
 </xsd:element>

 <xsd:complexType name=”jobType”>
  <xsd:sequence>
  <xsd:element name=”location” type=”xsd:string”/>
  <xsd:element name=”description” type=”xsd:string”/>
  <xsd:element name=”skill” type=”xsd:string” 
         minOccurs=”1″ maxOccurs=”unbounded”/>
  </xsd:sequence>
  <xsd:attribute name=”customer” type=”xsd:string” use=”required”/>
  <xsd:attribute name=”reference” type=”xsd:string” use=”required”/>
 </xsd:complexType>
</xsd:schema>

The first thing to notice is that this schema exists within a namespace as defined on the second line. The string xsd is used by convention for a schema namespace, but any prefix can be used.

Schema Type Definitions and Element and Attribute Declarations

Elements that have sub-elements and/or attributes are defined as complex types. In addition to complex types, there are a number of built-in simple types. Examples of a few simple types are

  • string Any combination of characters

  • integer Whole number

  • float Floating point number

  • boolean true/false or 1/0

  • date yyyy-mm-dd

A complex type element (one with attributes or sub-elements) has to be defined in the schema and will typically contain a set of element declarations, element references, and attribute declarations. Listing 16.8 contains the definition for the job tag complex type, which contains three elements (location, description, and skill) and two attributes (customer and reference).

In a schema, like a DTD, elements can be made optional or required. The job element in Listing 16.8 is optional because the value of the minOccurs attribute is 0. In general, an element is required to appear when the value of minOccurs is 1 or more. Similarly, the maximum number of times an element can appear is determined by the value of maxOccurs. This value can be a positive integer or the term unbounded to indicate there is no maximum number of occurrences. The default value for both the minOccurs and the maxOccurs attributes is 1. If you do not specify the number of occurrences, the element must be present and must occur only once.

Element attributes can be declared with a use attribute to indicate whether the element attribute is required, optional, or even prohibited.

There are more aspects to schemas than it is possible to cover in this book. Visit the WC3 Web site (http://www.w3.org) for more information on XML schemas and all other aspects of XML.

J2EE Support for XML

XML is portable data, and the Java platform is portable code. Add Java APIs for XML that make it easy to use XML and, together, you have the ideal combination:

  • Portability of data

  • Portability of code

  • Ease of use

The J2EE platform bundles all these advantages together.

Enterprises are rapidly discovering the benefits of using J2EE for developing Web Services that use XML for the dissemination and integration of data; particularly because XML eases the sharing of legacy data both internally among departments and with other enterprises.

J2EE includes the Java API for XML Processing (JAXP) that makes it easy to process XML data with applications written in Java. JAXP embraces the parser standards:

  • Simple API for XML Parsing (SAX) for parsing XML as a stream.

  • Document Object Model (DOM) to build an in-memory tree representation of an XML document.

  • XML Stylesheet Language Transformations (XSLT) to control the presentation of the data and convert it to other XML documents or to other formats, such as HTML. XLST is covered on Day 17, “Transforming XML Documents.”

JAXP also provides namespace support, allowing you to work with multiple XML documents that might otherwise cause naming conflicts.


Note - Because of the increasing use and importance of XML, JAXP is now incorporated into J2SE 1.4; previously it was available only in J2EE 1.3 or as a separate Java extension.


This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Parsing XML}

So far, you have used Internet Explorer or other third-party tools to parse your XML documents. Now you will look at three APIs that provide a way to access and manipulate the information stored in an XML document so you can build your own XML applications.

The Simple API for XML (SAX) defines parsing methods and Document Object Model (DOM) defines a mechanism for accessing and manipulating well-formed XML. A third API is the Java API for XML Processing (JAXP) that you will use to build a simple SAX and DOM parser. The two parsers you will develop effectively echo the input XML structure. Usually, you will want to parse XML to perform some useful function, but simply echoing the XML is a good way to learn the APIs.

JAXP has the benefit that it provides a common interface for creating and using SAX and DOM in Java.

SAX and DOM define different approaches to parsing and handling an XML document. SAX is an event-based API, whereas DOM is tree-based.

With event-based parsers, the parsing events (such as the start and end tags) are reported directly to the application through callback methods. The application implements these callback methods to handle the different components in the document, much like handling events in a graphical user interface (GUI).

Using the DOM API, you will transform the XML document into a tree structure in memory. The application then navigates the tree to parse the document.

Each method has its advantages and disadvantages. Using DOM

  • Simplifies the mapping of the structure of the XML.

  • Is a good choice when the document is not too large. If the document is large, it can place a strain on system resources.

  • Most, or all, of the document needs to be parsed.

  • The document is to be altered or written out in a structure that is very different from the original.

Using SAX is a good choice

  • If you are searching through an XML document for a small number of tags

  • The document is large

  • Processing speed is important

  • If the document does not need to be written out in a structure that is different from the original

SAX is a public domain API developed cooperatively by the members of the XML-DEV (XML DEVelopment) Internet discussion group (http://www.xml.org/).

The DOM is a set of interfaces defined by the W3C DOM Working Group. The latest DOM recommendation can be obtained from the W3C Web site (http://www.w3.org).

The JAXP Packages

The JAXP APIs are defined in the J2SDK 1.4 javax.xml.parsers package, which contains two factory classes—SAXParserFactory and DocumentBuilderFactory.

The packages that define the SAX and DOM APIs are

  • javax.xml.parsers A common interface for different vendors’ SAX and DOM parsers

  • org.w3c.dom Defines the DOM and all of the components of a DOM

  • org.xml.sax The SAX API

You will now build two applications—one that uses the SAX API and one that uses DOM.

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Parsing XML Using SAX}

To parse an XML document, you instantiate a javax.xml.parsers.SAXParseFactory object to obtain a SAX-based parser. This parser is then used to read the XML document a character at a time. (In the following code fragment the document is obtained from a command-line argument.)

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

DefaultHandler handler = new XMLParse();
saxParser.parse( new File(argv[0]), handler );

Your SAX parser class must extend the public class org.xml.sax.helpers. DefaultHandler. This class defines stub methods that receive notification (callbacks) when XML entities are parsed. By default, these methods do nothing, but they can be overridden to do anything you like. For example, a method called startElement() is invoked when the start tag for an element is recognized. This method receives the element’s name and its attributes. The element’s name can be passed in any one of the first three parameters to startElement(), see Table 16.6, depending on whether namespaces are being used.

Table 16.6 Parameters to the startElement() Method

Parameter

Contents

uri

The namespace URI or the empty string if the element has no namespace URI or if namespace processing is not being performed.

localName

The element name (without namespace prefix) will be a non-empty string when namespaces processing is being performed.

qualifiedName

The element name with namespace prefix.

attributes

The element’s attributes.


In the following code example, handling for the qualified name is provided.

public void startElement(String uri, String localName, 
 String qualifiedName, Attributes attributes)
 throws SAXException {
 System.out.println (“START ELEMENT ” + qualifiedName);
 for (int i = 0; i< attributes.getLength(); i++) {
  System.out.println (“ATTRIBUTE ” + 
   attributes.getQName(i) + ” = ” + attributes.getValue(i));
 }
}

This example prints out a statement indicating that a start tag has been parsed followed by a list of the attribute names and values.

A similar endElement() method is invoked when an end tag is encountered.

public void endElement(String uri, String localName, String qualifiedName) 
  throws SAXException {
 System.out.println (“END ELEMENT ” + qualifiedName);
}

The full parser is shown in Listing 16.9, but not all of the XML components will be handled. The default action for a parser is for all components to be ignored; only the methods that are overridden in the DefaultHandler subclass will be process XML components. For a complete list of the other DefaultHandler methods, see Table 16.7 or refer to the J2SDK, v 1.4 API Specification.

Listing 16.9 Simple SAX Parser

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.*;

public class XMLParse extends DefaultHandler {

 public static void main(String argv[]) {
  if (argv.length != 1) {
   System.err.println(“Usage: XMLParse filename”);
   System.exit(1);
  }
  DefaultHandler handler = new XMLParse();
  SAXParserFactory factory = SAXParserFactory.newInstance();
  try {
   SAXParser saxParser = factory.newSAXParser();
   saxParser.parse( new File(argv[0]), handler );
  }
  catch (ParserConfigurationException ex) {
   System.err.println (“Failed to create SAX parser:” + ex);
  }
  catch (SAXException ex) {
   System.err.println (“SAX parser exceeption:” + ex);
  }
  catch (IOException ex) {
   System.err.println (“IO exeception:” + ex);
  }
  catch (IllegalArgumentException ex) {
   System.err.println (“Invalid file argument” + ex);
  }
 }
 public void startDocument() throws SAXException {
  System.out.println (“START DOCUMENT”);
 }

 public void endDocument() throws SAXException {
  System.out.println (“END DOCUMENT”);
 }

 public void startElement(String uri, String localName, 
  String qualifiedName, Attributes attributes) throws SAXException {
     
  System.out.println (“START ELEMENT ” + qualifiedName);
  for (int i = 0; i< attributes.getLength(); i++) {
   System.out.println (“ATTRIBUTE ” + 
    attributes.getQName(i) + ” = ” + attributes.getValue(i));
  }
 }

 public void endElement(String uri, String localName, String qualifiedName)
     throws SAXException {
  System.out.println (“END ELEMENT ” + qualifiedName);
 }

 public void characters(char[] ch, int start, int length) 
     throws SAXException {
  if (length > 0) {
   String buf = new String (ch, start, length);
   System.out.println (“CONTENT ” + buf);
  }
 }
}

The parser first checks for the XML document, the name of which is provided on the command line. After instantiating the SAXParserFactory and constructing the handler, the XML file is parsed—that is all there is to it. This parser reports the occurrence of the start and end of the document—the start and end of elements and the characters that form the element bodies only.

If an entity method is not declared in your parser, the entity is handled by the superclass DefaultHandler methods, the default action being to do nothing. Table 16.7 gives a full list of the callback DefaultHandler methods that can be implemented.

Table 16.7 SAX DefaultHandler Methods

Method

Receives Notification of

characters(char[] ch, int start, int length)

Character data inside an element.

startDocument()

Beginning of the document.

endDocument()

End of the document.

startElement(String uri, String localName, String qName, Attributes attributes)

Start of an element.

endElement(String uri, String localName, qName)

End of an element.

startPrefixMapping (String prefix, String uri)

Start of a namespace mapping.

endPrefixMapping (String prefix)

End of a namespace mapping.

error(SAXParseException e)

A recoverable parser error.

FatalError (SAXParseException e)

A fatal XML parsing error.

Warning (SAXParseException e)

Parser warning.

IgnorableWhitespace (char[] ch, int start, int length)

Whitespace in the element contents.

notationDecl(String name, String publicId, String systemId)

Notation declaration.

processingInstruction (String target, String data)

A processing instruction.

resolveEntity(String publicId, String systemId)

An external entity.

skippedEntity(String name)

-A skipped entity. Processors may skip entities if they have not seen the declarations. (For example, the entity was declared in an external DTD.)


As this code does not use any J2EE components, you can simply compile and run it from the command line. From the Day16/examples directory run the command:

> java –classpath classes XMLParse XML/jobSummary.xml

Or use the supplied asant build files and enter:

> asant XMLParse

Provide the filename XML/jobSummary.xml when prompted:

The output in Figure 16.1 is produced when this SAX parser is used on the jobSummary XML in Listing 16.4.

bond

Figure 16.1 — SAX parser output.

As you can see, the output is not very beautiful. You might like to improve it by adding indentation to the elements or even getting the output to look like the original XML.

In addition to making this parser more robust, the following functionality could be added:

  • Scan element contents for the special characters, such shown in a table, and replacing them with the symbolic strings as appropriate

  • Improve the handling of fatal parse errors (SAXParseException) with appropriate error messages giving error line numbers

  • Use the DefaultHandler error() and warning() methods to handle non-fatal parse errors

  • Configure the parser to be namespace aware with javax.xml.parsers.SAXParserFactory.setNamespaceAware(true), so that you can detect tags from multiple sources

Having seen a simple SAX parser, you will now build a parser application that uses the DOM API.

{mospagebreak title=Document Object Model (DOM) Parser}

When you use the DOM API to parse an XML document, a tree structure representing the XML document is built in memory. You can then analyze the nodes of the tree to discover the XML contents.

Building a DOM Tree

The mechanism for instantiating a DOM parser is very similar to that for a SAX parser. A new instance of a DocumentBuilderFactory is obtained that is used to create a new DocumentBuilder.

The parse() method is called on this DocumentBuilder object to return an object that conforms to the public Document interface. This object represents the XML document tree. The following code fragment creates a DOM parser and reads the XML document from a file supplied as a command-line argument:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(argv[0]));

With the DocumentBuilder.parse() method, you are not restricted to reading XML only from a file; you can also use a constructed InputStream or read from a source defined by a URL.

The Document obtained form the parse() method is a subclass of org.w3c.dom.Node. To simplify processing of the DOM tree, all of the objects in the tree are either Node objects or objects of a sub class of Node.

There are a number of methods provided in the Document interface to access the nodes in the tree. These are listed in Table 16.8.

The normalize() method should always be used to put all text nodes into a form where there are no adjacent text nodes or empty text nodes. In this form, the DOM view better reflects the XML structure.

After parsing an XML document the DOM parser has built an in-memory representation of the document that will look something like Figure 16.2.

The root of the DOM tree is obtained with the getDocumentElement() method.

Element root = document.getDocumentElement();

bond

Figure 16.2 — Diagram of the DOM tree.

This method returns an Element, which is simply a subclass of Node that may have attributes associated with it. An element can be the parent of other elements.

There are a number of methods provided in the Document interface to access the nodes in the tree, some of which are listed in Table 16.8. These methods return either a Node or a NodeList (ordered collection of nodes).

Table 16.8 Document Interface Methods to Traverse a DOM Tree

Method Name

Description

getDocumentElement()

Allows direct access to the root element of the document

getElementsByTagName(String)

Returns a NodeList of all the elements with the given tag name in the order in which they are encountered in the tree

getChildNodes()

A NodeList that contains all children of this node

getParentNode()

The parent of this node

getFirstChild()

The first child of this node

getLastChild()

The last child of this node

getPreviousSibling()

The node immediately preceding this node


In a simple DOM application the getChildNodes() method can be used to recursively traverse the DOM tree. The NodeList.getLength() method can then be used to find out the number of nodes in the NodeList.

NodeList children = node.getChildNodes();
int len = (children != null) ? children.getLength() : 0;

In addition to the tree traversal methods, the Node interface provides the following methods (among others) to investigate the contents of a node as in Table 16.9.

Table 16.9 Document Interface Methods to Inspect DOM Nodes

Method Name

Description

getAttributes()

A NamedNodeMap containing the attributes of a node if it is an Element or null if it is not.

getNodeName()

A string representing the name of this node (the tag).

getNodeType()

A code representing the type of the underlying object. A node can be one of ELEMENT_NODE, ATTRIBUTE_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_REFERENCE_NODE, ENTITY_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, DOCUMENT_FRAGMENT_NODE, NOTATION_NODE.

getNodeValue()

A string representing the value of this node. If the node is a text node, the value will be the contents of the text node; for an attribute node, it will be the string assigned to the attribute. For most node types, there is no value and a call to this method will return null.

getNamespaceURI()

The namespace URI of this node.

hasAttributes()

Returns a boolean to indicate whether this node has any attributes.

hasChildNodes()

Returns a boolean to indicate whether this node has any children.


This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Modifying a DOM Tree}

We will now look at using the DOM API—to modify the contents or structure of the XML. Unlike SAX, DOM provides a number of methods that allow nodes to be added, deleted, changed, or replaced in the DOM tree. Table 16.10 summarizes these methods.

Table 16.10 Document Interface Methods to Inspect DOM Nodes

Method Name

Description

appendChild(Node newNode)

Adds the new node to the end of the NodeList of children of this node.

cloneNode(boolean deep)

Returns a duplicate of a node. The cloned node has no parent. If deep is true, the whole tree below this node is cloned; if false, only the node itself is cloned.

insertBefore(Node newNode, Node refNode)

Inserts the newNode before the existing refNode.

removeChild(Node oldNode)

Removes the oldNode from the list of children.

replaceChild(Node newNode, Node oldNode)

Replaces the oldNode with newNode in the child NodeList.

setNodeValue(String nodeValue)

Set the value of this node, depending on its type.

setPrefix(java.lang.String prefix)

Set the namespace prefix of this node.


For example, the following code fragment simply creates a new customer element and appends it to the end of the XML document:

Node newNode = addXMLNode (document, “Customer”, “Columbus”);
Element root = document.getDocumentElement();
root.appendChild(newNode);

private static Node addXMLNode (Document document, String name, String text) {
 Element e = document.createElement(name);
 Text t = document.createTextNode(text);
 e.appendChild (t);
 return e;
}

The following XML element is added to the XML file that is read in by this example code:

<customer>Columbus</customer>

Outputting a DOM Tree

Having parsed or created an XML document in memory, a common requirement is to output the DOM tree. The javax.xml.transform class defines a transformer that can be used to output a DOM tree from memory. The following code shows how easy it is to take a DOM tree and output it to the screen:

TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(new DOMSource(root), new StreamResult(System.out));

Note - In Day 17 you will see how to use XML Sylesheets with the transformer object to format the transformed output.


A Simple DOM Example

The WebDDBuilder example shown in Listing 16.10 is a simple program that creates a new Web Application deployment descriptor and adds a single <servlet> and <servlet-mapping> element to the tree before writing the updated DD. The Web Application DD was described on Day 12, “Servlets.”

Listing 16.10 WebDDBuilder.java

import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.xml.sax.*;
import java.io.*;
import org.w3c.dom.*;
import java.util.*;

public class WebDDBuilder {

  public static void main(String argv[]) {
    int argCount = argv.length;
    if (argCount != 2) {
      System.err.println(“Usage: WebDDBuilder servlet-class URL-mapping”);
      System.exit(1);
    }
    String servletClass = argv[0];
    String URLPattern = argv[1];
    try {
      WebDDBuilder dd = new WebDDBuilder();
      dd.addServlet(servletClass, URLPattern);
      // output document
      dd.print(System.out);
    }
    catch (IllegalArgumentException ex) {
      System.err.println (“Invalid argument” + ex);
      ex.printStackTrace(System.out);
    }
  }

  private static final String SERVLET_VERSION = “2.4”;
  private static final String XML_NAMESPACE = 
    “http://java.sun.com/xml/ns/j2ee”;
  private static final String XML_SCHEMA_INST = 
    “http://www.w3.org/2001/XMLSchema-instance”;
  private static final String XML_SCHEMA_LOC = 
    “http://java.sun.com/xml/ns/j2ee
_ http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd”;

  private static final String SERVLET = “servlet”;
  private static final String SERVLET_MAPPING = “servlet-mapping”;
  private static final String SERVLET_NAME = “servlet-name”;
  private static final String SERVLET_CLASS = “servlet-class”;
  private static final String URL_PATTERN = “url-pattern”;

  private static final String[] DD_ELEMENTS = {“icon”, “display-name”,
    “description”, “distributable”, “context-param”, “filter”,
    “filter-mapping”, “listener”, “servlet”, “servlet-mapping”,
    “session-config”, “mimemapping”, “welcome-file-list”, “error-page”,
    “taglib”, “resource-env-ref”, “resource-ref”, “security-constraint”,
    “login-config”, “security-role”, “env-entry”, 
    “ejb-ref”, “ejb-local-ref” };

  private Document document;
  private Element root;
  private HashMap DDElements; 

  public WebDDBuilder () {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    try {
      DocumentBuilder builder = factory.newDocumentBuilder();
      document = builder.newDocument();
      root = document.createElement(“web-app”);
      root.setAttribute(“version”, SERVLET_VERSION);
      root.setAttribute(“xmlns”, XML_NAMESPACE);
      root.setAttribute(“xmlns:xsi”, XML_SCHEMA_INST);
      root.setAttribute(“xsi:schemaLocation”, XML_SCHEMA_LOC);
      DDElements = createDDMap(DD_ELEMENTS);
    }
    catch (ParserConfigurationException ex) {
      System.err.println (“Failed to create DOM document:” + ex);
    }
  }

  private void addServlet (String servletClass, String URLPattern) {

    //create the servlet name from the servlet class name
    // if fully qualified class name take just last part
    int index = servletClass.lastIndexOf(“.”);
    String servletName;
    if (index != -1)
      servletName = servletClass.substring(index+1);
    else
      servletName = servletClass;

    // build the servlet element
    Element servlet_name = document.createElement(SERVLET_NAME);
    servlet_name.appendChild(document.createTextNode(servletName));

    Element servlet_class = document.createElement(SERVLET_CLASS);
    servlet_class.appendChild(document.createTextNode(servletClass));

    Element servlet = document.createElement (SERVLET);
    servlet.appendChild(servlet_name);
    servlet.appendChild(servlet_class);

    // find where in the DOM to insert the new servlet node
    Node refChild = findNode (root, DDElements, SERVLET);
    root.insertBefore(servlet, refChild);

    // build the servlet-mapping element
    Element url_pattern = document.createElement(URL_PATTERN);
    url_pattern.appendChild(document.createTextNode(URLPattern));

    Element servlet_mapping = document.createElement (SERVLET_MAPPING);
    // no need to create servlet name element as we already have one
    // make sure we clone deep so that we get the text node
    servlet_mapping.appendChild(servlet_name.cloneNode(true));
    servlet_mapping.appendChild(url_pattern);

    refChild = findNode (root, DDElements, SERVLET_MAPPING);
    root.insertBefore(servlet_mapping, refChild);
  }

  private void print (PrintStream stream) {
    try {
      TransformerFactory tf = TransformerFactory.newInstance();
      Transformer transformer = tf.newTransformer();
      transformer.setOutputProperty(OutputKeys.INDENT,”yes”);
      transformer.transform(new DOMSource(root), 
                 new StreamResult(stream));
    }
    catch (TransformerConfigurationException ex) {
      System.err.println (“Failed to create transformer factory:” + ex);
    }
    catch (TransformerException ex) {
      System.err.println (“Failed to transform DOM tree:” + ex);
    }
  }

  private Node findNode (Node treeRoot, HashMap ddSchema, String tagName) {

    // find out index of tagName
    int refKey = getKey (ddSchema, tagName);

    NodeList tags = treeRoot.getChildNodes();
    int tagsLen = (tags != null) ? tags.getLength() : 0;

    // find first tag after tagName in tree
    for (int i = 0; i < tagsLen; i++) {
      Node tag = tags.item(i);
      if (getKey(ddSchema, tag.getNodeName()) > refKey)
        return tag;
    }
    return null;
  }

  private int getKey (HashMap ddSchema, String tagName) {
    for (int key = 0; key < ddSchema.size(); key++) {
      if (ddSchema.get(new Integer(key)).equals(tagName))
        return key;
    }
    return -1; 
  }

  private HashMap createDDMap(String[] ddSchema) {
    HashMap map = new HashMap();
    for (int i = 0; i < ddSchema.length; i++)
      map.put(new Integer(i), ddSchema[i]);
    return map;
  }

The WebDDBuilder example in Listing 16.10 starts by creating a new, empty, DOM tree representing an empty Web Application DD. Next the addServlet() method is called to add the servlet name and URL pattern passed as command-line parameters.

The addServlet()method builds two XML DD elements, <servlet> and <servlet-mapping>, using the values supplied as arguments. Each of these elements has a <servlet-name> sub-element, so instead of creating a new element from scratch, addServlet uses the Node.cloneNode() method to create a copy. A deep copy is preformed by passing true as the parameter to cloneNode; this ensures that all the child nodes are also cloned.

Finally, the print() method is called to output the DOM tree using a Transformer object.

As with the SAX example, this code does not use any J2EE components; you can simply compile and run it from the command line. From the Day16/examples directory run the command

> java –classpath classes demo.Hello /hello

This will create a DD entry for the demo.servlet with URL pattern /hello. Alternatively, use the supplied asant build files and enter

> asant WebDDBuilder

Provide the servlet name and URL pattern when prompted:

The resultant DD looks like this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<web-app version=”2.4″ 
 xmlns=”http://java.sun.com/xml/ns/j2ee” 
 xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” 
 xsi:schemaLocation=”http://java.sun.com/xml/ns/j2ee 
_ http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd”>
 <servlet>
  <servlet-name>Hello</servlet-name>
  <servlet-class>demo.Hello</servlet-class>
 </servlet>
 <servlet-mapping>
  <servlet-name>Hello</servlet-name>
  <url-pattern>/hello</url-pattern>
 </servlet-mapping>
</web-app>

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Java Architecture for XML Binding}

DOM is a useful API allowing you to build and transform XML documents in memory. Unfortunately, DOM is somewhat slow and resource hungry. To address these problems, the Java Architecture for XML Binding (JAXB) has been developed through the Java Community Process (JCP) with an expert group consisting of representatives from many commercial organizations.

JAXB provides a mechanism that simplifies the creation and maintenance of XML-enabled Java applications. It does this by using an XML schema compiler (only DTDs and a subset of XML schemas and namespaces at the time of this writing) that translates XML DTDs into one or more Java classes, thereby removing the burden from the developer to write complex parsing code.

The generated classes handle all the details of XML parsing and formatting, including code to perform error and validity checking of incoming and outgoing XML documents, which ensures that only valid, error-free XML is accepted.

Because the code has been generated for a specific schema, the generated classes are more efficient than those in a generic SAX or DOM parser. Most important, a JAXB parser often requires a much smaller footprint in memory than a generic parser.

Classes created with JAXB do not include tree-manipulation capability, which is one factor that contributes to the small memory footprint of a JAXB object tree. If you want to build an object representation of XML data, but need to get around the memory limitations of DOM, you should use JAXB.

These following two bulleted lists summarize the advantages of JAXB and JAXP so you can decide which one is right for your application.

Use JAXB when you want to

  • Access data in memory, but do not need tree manipulation capabilities

  • Process only data that is valid

  • Convert data to different types

  • Generate classes based on a DTD or XML schema

  • Build object representations of XML data

Use JAXP when you want to

  • Have flexibility with regard to the way you access the data, either serially with SAX or randomly in memory with DOM

  • Use your same processing code with documents based on different DTDs

  • Parse documents that are not necessarily valid

  • Apply XSLT transformations

  • Insert or remove components from an in-memory XML tree

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

{mospagebreak title=Summary}

Today, you have had a very quick, and necessarily brief, introduction to XML and the APIs and technologies available in J2EE to parse and generate XML data. You have seen how XML can be used to create flexible structured data that is inherently portable. With DTDs and XML Schemas, you were shown how this data can also be validated. You have been introduced to several different ways of parsing an XML document with SAX, DOM, JAXP, or JAXB, and you should now recognize the advantages and disadvantages of each technique.

Tomorrow, you will extend your XML knowledge to include XML transformations.

Q&A

  1. What are the major characteristics of XML?

  2. XML is a human readable, structured data-encoding format that is generic, simple, flexible, extensible and free to use.

  1. What is the difference between well-formed and valid XML.

  2. Well-formed XML is syntactically and structurally correct. XML is only valid if it complies with the constraints of a DTD or XML schema.

  1. What are the J2EE APIs and specifications that support the processing of XML?

  2. The J2EE APIs and specifications that supports XML processing are JAXP (Java API for XML Processing), SAX (Simple API for XML Parsing), DOM (Document Object Model), and XLST for transforming XML documents

  1. What are the main differences between SAX and DOM?

  2. SAX provides a serial event-driven parser. DOM is more flexible in that it builds an in-memory representation of the document that can be manipulated randomly (that is, nodes can be addressed or processed in any order). SAX is generally more efficient (faster), while DOM can be a heavy user of memory.

Exercise

To practice working with XML, try the following two exercises; the first is relatively simple, but the second requires a little more effort.

  1. Extend the WebDDBuilder application to optionally read in an existing web.xml file and add the new <servlet> and <servlet-mapping> elements from the servlet class and URL pattern information provided on the command line.

Hint: Most of the code is already in place. You will need to create another constructor to build the DOM tree from an existing DD whose filename is supplied as the last (optional) parameter on the command line. A simple web.xml file is provided in the Day16/solution/XML directory.


Tip - If your web.xml file has a non-local DTD specified in the DOCTYPE element (this will be the case if you are using a J2EE 1.3 or earlier web.xml file), you will require access to the Web for the parser to validate the XML.


  1. Enhance your solution to check for duplicate servlet names. The servlet name in a Web Application DD must be unique. Ensure that the program will not add the same servlet class twice by checking for duplicate servlet names before adding the new entry.

A solution is provided in the Day 16 solution directory.

This chapter is from Teach Yourself J2EE in 21 Days, second edition, by Martin Bond et. al. (Sams, 2004, ISBN: 0-672-32558-6). Check it out at your favorite bookstore today. Buy this book now.

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan