Programming Guide


How to get parse tree

The Simplest Example

import com.ibm.xml.parser.*;
....
        String filename;
        ....
        InputStream is = new FileInputStream(filename);
        TXDocument doc = new Parser(filename).readStream(is);

Parser#readStream() never returns null. In this way, the parser prints parse errors to the standard error stream.

To access a parse tree, use TXDocument#getDocumentElement() (See How to operate). TXDocument#getDocumentElement() may returns null when the XML document has serious errors.

Parser instance can not be reused. An application can call Parser#readStream() method only once.

You can restruct the parse tree into a stream in XML format.

    String charset = "ISO-8859-1";    // MIME charset name
    String jencode = MIME2Java.convert(charset);
    PrintWriter pw
        = new PrintWriter(new OutputStreamWriter(System.out, jencode));
    doc.setEncoding(charset);
    doc.print(pw, jencode);

Set Parse Options

You can configure parser's behavior after making Parser instance before call of readStream().

import com.ibm.xml.parser.*;
....
        String filename;
        ....
        Parser parse = new Parser(filename);
        parse.setWarningNoDoctypeDecl(false);
        parse.setWarningNoXMLDecl(false);
        InputStream is = new FileInputStream(filename);
        TXDocument doc = parse.readStream(is);

Redirect parse errors

You can control output of errors produced by the parser. Make an instance of a class implementing ErrorListener, and specify the instance to Parser constructor.

Object key parameter of error() method is an instancce of String or Exception.. When key is String, it means a type of error (See a source com/ibm/xml/parser/r/Message.java).

See the sources, com/ibm/xml/parser/trlxml.java, com/ibm/xml/parser/Stderr.java .


How to operate parse tree

TXDocument can have one TXElement instance and zero or one DTD instance and instances of TXPI and TXComment as children. All children of TXDocument can be accessed with TXDocument#getChildren() / TXDocument#getChildrenArray(). The TXElement instance can be accessed with TXDocuemnt#getDocumentElement() also.

TXElement can have some instances of TXElement, TXText, TXPI and TXComment as children. All children of TXElement can be accessed with TXElement#getChildren() / TXElement#getChildrenArray().

Some mtehods of TXDocuement and TXElement returns instance(s) of Child interface. These Child instances are also instances of TXElement or TXText or TXPI or TXComment or DTD(if a child of TXDocument). To know what class an instance belongs to, use Node#getNodeType() or instanceof operator like the following:

import com.ibm.xml.parser.*;
import org.w3c.dom.*;
import java.util.Enumeration;
    ....
    TXDocument doc = ....;
    TXElement root = (TXElement)doc.getDocumentElement();
    Enumeration en = root.elements();
    whilte (en.hasMoreElements()) {
        Node ch = (Node)en.nextElement();
        if (ch instanceof TXElement) {
            TXElement el = (TXElement)ch;
            ....
        } else if (ch instanceof TXText) {
            TXText te = (TXText)ch;
           ....
        }
    }

White Space

The processor keeps all spaces and pass them to applications according to 2.10 White Space Handling in XML 1.0 Proposed Recommendation. The processor set IsIgnorableWhitespace flag to TXText instances which consist of only white spaces.

<MEMBERS>
  <PERSON>Hiroshi</PERSON>
  <PERSON>Naohiko</PERSON>
  <PERSON>
    Kent
  </PERSON>
</MEMBERS>

The processor parses this Element as the following.

TXElement (getName():"MEMBERS", getText():"\n  Hiroshi\n  Naohiko\n  \n    Kent\n  \n")
  TXText ("\n  ", ignorable)
  TXElement (getName():"PERSON", getText():"Hiroshi")
    TXText ("Hiroshi")
  TXText ("\n  ", ignorable)
  TXElement (getName():"PERSON", getText():"Naohiko")
    TXText ("Naohiko")
  TXText ("\n  ", ignorable)
  TXElement (getName():"PERSON", getText():"\n    Kent\n  ")
    TXText ("\n    Kent\n  ")
  TXText ("\n", ignorable)

It is useful to call TXText#trim(String) / TXText#trim(String,boolean,boolean) when an application need not leading/trailing spaces.


How to get filtered parse tree

class AElementHandler implements ElementHandler {
    public TXElement handleElement(TXElement el) {
        ....
    }
}

    ....
    Parser parse = new Parser(...);
    parse.setElementHandler(new AElementHandler(), "CHANNEL");
    TXDocument doc = parse.readStream(is);

This ElementHandler#handleElement() method is called after parsing eash end tag (</CHANNEL>) before being added to a parent while processing Parser#readStream(). The parser adds to the parent an TXElement instance returned by handleElement(). If handleElement() returns null, the parser doesn't add this TXElement instance to the parent.

Two methods to set ElementHandler:

The Order of Calling ElementHandlers

When more than one ElementHandlers are recorded to the parser, first, the parser calls ElementHandlers for specific TXElement (first set, first called) and then calls ElementHandlers for all TXElement.

Even if an ElementHandler changes a name of TXElement, the parser calls other ElementHandlers for original name. When an ElementHandler returns null, the parser doesn't call other ElementHadnlers.

    Parser parse = new Parser(...);
    parse.addElementHandler(handler1);
    parse.addElementHandler(handler2, "CHANNEL");
    parse.addElementHandler(handler3, "CHANNEL");
    parse.addElementHandler(handler4);
    TXDocument doc = parse.readStream(is);

In this case, when the parser processes </CHANNEL> tag, the parser calls handler2 first, and calls handler3, handler1 and handler4.


How to make new XML document

  1. Make a DefaultElementFactory instance.
    ElementFacotry factory = new DefaultElementFactory();
  2. Make a TXDocument instance with createDocument() mehtod
    TXDocument doc = factory.createDocument();
  3. Construct tree.
    doc.addElement(factory.createElement("ROOT"));
    :
    :
  4. Prepare PrintWriter
  5. Set encoding to TXDocument if encoding of PrintWriter isn't UTF-8.
  6. Output with Format class.
    Fromat.print(doc, pwriter);
    ElementFactory factory = new DefaultElementFactory();
    TXDocument doc = factory.createDocument();
    TXElement el = factory.createElement("CHANNEL");
    ....
    doc.addElement(el);
    PrintWriter pw
        = new PrintWriter(new OutputStreamWriter(System.out,
                                                 MIME2Java.convert("Shift_JIS")));
    doc.setEncoding("Shift_JIS");
    Format.print(doc, pw);
XML representsHow to make
<?xml version="1.0" encoding="ISO-8859-1"?> TXDocument doc = factory.createDocument();
doc.setVersion("1.0");
doc.setEncoding("ISO-8859-1");
<?footarget foodata?> TXPI pi = factory.createPI("footarget", " foodata");
<?footarget?> TXPI pi = factory.createPI("footarget", "");
<!-- comment --> TXComment comm = factory.createComment(" comment ");
<!DOCTYPE ROOT SYSTEM "root.dtd"> DTD dtd = factory.createDTD("ROOT", new ExternalID("root.dtd"));
<!DOCTYPE ROOT [...]> DTD dtd = factory.createDTD("ROOT", null);
dtd.addElement(...);
<!ELEMENT ROOT EMPTY> ElementDecl ed = factory.createElementDecl("ROOT", factory.createContentModel(ElementDecl.EMPTY));
<!ELEMENT ROOT (#PCDATA|FOO|BAR)*> CMNode model = new CM1op('*', new CM2op('|', new CM2op('|', new CMLeaf("#PCDATA"), new CMLeaf("FOO")), new CMLeaf("BAR")));
ContentModel cm = factory.createContentModel(model);
ElementDecl ed = fatory.createElementDecl("ROOT", cm);

or
ContentModel cm = factory.createContentModel(ElementDecl.MODEL_GROUP);
cm.setPseudoContentModel("(#PCDATA|FOO|BAR)*");
ElementDecl ed = factory.createElementDecl("ROOT", cm);

(A DTD including this instance can't be used for validation. It can be used for only printing.)
<!ELEMENT ROOT (FOO?, (DL|DD)+, BAR*)> CMNode model = new CM2op(',', new CM2op(',', new CM1op('?', new CMLeaf("FOO")), new CM1op('+', new CM2op('|', new CMLeaf("DL"), new CMLeaf("DD")))),new CM1op('*', new CMLeaf("BAR")));
ContentModel cm = factory.createContentModel(model);
ElementDecl ed = factory.createElementDecl("ROOT", cm);

or
ContentModel cm = factory.createContentModel(ElementDecl.MODEL_GROUP);
cm.setPseudoContentModel("(FOO?, (DL|DD)+, BAR*)");
ElementDecl ed = factory.createElementDecl("ROOT", cm);

(A DTD including this instance can't be used for validation. It can be used for only printing.)
<!ATTLIST ROOT
att1 CDATA #IMPLIED
att2 (A|B|O|AB) "A">
Attlist al = factory.createAttlist("ROOT");
AttDef ad = factory.createAttDef("att1");
ad.setDeclaredType(AttDef.CDATA);
ad.setDefaultType(AttDef.IMPLIED);
al.addElement(ad);
ad = factory.createAttDef("att2");
ad.setDeclaredType(AttDef.NAME_TOKEN_GROUP);
ad.addElement("A");
ad.addElement("B");
ad.addElement("O");
ad.addElement("AB");
ad.setDefaultStringValue("A");
al.addElement(ad);
<!NOTATION png SYSTEM "viewpng.exe"> TXNotation no = factory.createNotation("png", new ExternalID("viewpng.exe"));
<!ENTITY version.num "1.1.6"> Entity ent = factory.createEntity("version.num", "1.1.6", false);
<!ENTITY version.num SYSTEM "version.ent"> Entity ent = factory.createEntity("version.num", new ExternalID("version.ent"), null);
<!ENTITY logoicon SYSTEM "logo.png" NDATA png> Entity ent = factory.createEntity("logoicon", new ExternalID("logo.png"), "png");
<ROOT att1="val1" att2="val2">any text</ROOT> TXElement el = factory.createElement("ROOT");
el.setAttribute("att1", "val1");
el.setAttribute("att2", "val2");
el.addElement(factory.createText("any text"));
<![CDATA[any text]]> TXCDATASection cd = factory.createCDATASection("any text");
&foobar; GeneralReference gr = factory.createGeneralReference("foobar");

All XML nodes can be created with `new PseudoNode("literal");'. For instance, `dtd.addElement(new PseudoNode("<!ELEMENT ROOT (FOO, BAR)*>"));' for creating `<!ELEMENT ROOT (FOO, BAR)*>'. But you can use a tree including PseudoNode instances only for printing.


Replace classes

If you want to use not TXElement class but a subclass of TXElement, Implement ElementFactory interface and call Parser#setElementFactory().

  1. Design a subclass of TXElement class
  2. Design a subclass of DefaultElementFactory class.
  3. Call Parser#setElementFactory() with an instance of the class implementing ElementFactory.
class MyElement extends TXElement {
    ....
}
class MyElementFactory extends DefaultElementFactory {
    public TXElement createElement(String name) {
        MyElement el = new MyElement(name);
        el.setFactory(this);
        return el;
    }
    ....
}

    ....
    Parser parse = new Parser(...);
    parse.setElementFactory(new MyElementFactory());
    TXDocument doc = parse.readStream(is);
    // doc has not TXElement instances but MyElement instances
NOTE:

You must call setFactory(this) in create*() methods of your factory class.


Query DTD information

Load DTD without loading document

    String systemlit = "http://.../foobar.dtd";
    InputStream is = (new URL(systemlit)).openStream();
    Parser parse = new Parser(...);
    DTD dtd = parse.readDTDStream(is);

What attribute is able to be set in element "FOO"?

    Enumeration en = dtd.getAttributeDeclarations("FOO");
    while (en.hasMoreElements()) {
        AttDef attd = (AttDef)en.nextElement();
        // attd.getName() is attribute name
    }

What value an attribute can has?

First, get AttDef instance by the above method or by DTD#getAttributeDeclaration(String,String).

Second, check attribute type by AttDef#getDeclaredType(), which returns one of the following values.

AttDef.CDATA
Any text value.
AttDef.ENTITIES
A subset of unparsed entity names. Names are chained with " " when you want to specify more than one values. For example: "name1 name2 name3".
    Enumeration en = dtd.getEntities();
    while (en.hasMoreElements()) {
        EntityValue ev = (EntityValu)en.nextElement();
        if (ev.isNDATA()) {
            // Each ev.getName() is valid value.
        }
    }
AttDef.ENTITY
One of unparsed entity names(See above).
AttDef.NAME_TOKEN_GROUP
One of AttDef#elements().
    Enumeration en = attd.elements();
    while (en.hasMoreElements()) {
        String s = (String)en.nextElement();
        // Each s is valid.
    }
AttDef.ID
Any Name which DTD#checkID() returns null.
    String newid = ...
    if (null != dtd.checkID(newid)) {
        // Can't use newid
    } else
        dtd.registID(element, newid);
AttDef.IDREF
One of registered IDs.
    Enumeration en = dtd.IDs();
    while (en.hasMoreElements()) {
        String id = (String)en.nextElement();
        // The attribute can has one in a set of each id.
    }
AttDef.IDREFS
A subset of registered IDs. IDs are chained with " " when you want to specify more than one values.
AttDef.NMTOKEN
One Nmtoken.
AttDef.NMTOKENS
A set of Nmtoken. Nmtokens are cahined with " " when you want to specify more than one values.
AttDef.NOTATION
One of AttDef#elements().
    Enumeration en = attd.elements();
    while (en.hasMoreElements()) {
        String s = (String)en.nextElement();
        // Each s is valid.
    }

What element is able to be inserted to an element "FOO" as child?

<!ELEMENT PERSON (NAME, HEIGHT, WEIGHT, EMAIL?)>

By this declaration, you must insert "NAME" element to "PERSON" element first, "HEIGHT" element second, "WEIGHT" element third and may insert "EMAIL" element.

Applications can know such rules with DTD#getInsertableElements() / DTD#getAppendableElements().

    TXElement el = new TXElement("PERSON");
    ....
    switch (dtd.getContentType("PERSON")) {
      case 0:
        // This element is not declared.
        break;
      case ElementDecl.EMPTY:
        // Any element is not insertable.
        break;
      case ElementDecl.ANY:
        // Any element is insertable.
        break;
      case ElementDecl.MODEL_GROUP:
        Hashtable tab = dtd.prepareTable("PERSON");
            // This hashtable is reusable for any elements.
        dtd.getAppendableElement(el, tab);
        if (((InsertableElement)tab.get(DTD.CM_ERROR)).status) {
            // This element has incorrect structure.
        } else {
            Enumeration en = tab.elements();
            while (en.hasMoreElements()) {
                InsertableElement ie = (InsertableElement)en.nextElement();
                if (!ie.name.equals(DTD.CM_ERROR)
                    && !ie.name.equals(DTD.CM_EOC)
                    && ie.status) {
                    if (ie.name.equals(DTD.CM_PCDATA)) {
                        // Can append TextElement instance to el.
                    } else {
                        // Can append Element instance named ie.name.
                    }
                }
            }
        }
        break;
    }

Namespace

Namespace spec. is in progress. This implementation is experimental.


Element Digest

TXElement / TXText / TXComment / TXPI have getDigest() method. This method returns digest(hash) value (128bit MD5 in default).

TXElement#getDiget() returns a digest value consisted of itself and all children. When a child is modified, all parent element's getDigest() returns a new digest value.

Stream formats and sample Java codes for digesting

See the DOMHASH document


Modify application code for upgrading XML for Java

Alpha-5 [xml4j-19980513] to alpha-6 [xml4j-19980612]

You need to rewrite much about namespace.

Alpha-4 [xml4j-19980416] to alpha-5 [xml4j-19980513]

OldNew
TXElement#searchChildren()TXElement#getElementNamed()
TXElement#getNthElementByTagName()TXElement#getNthElementNamed()

DOM-related changes

OldNew
NodeIterator#getCurrent()removed
NodeIterator#toNext()NodeIterator#toNextNode()
NodeIterator#toPrevious()NodeIterator#toPrevNode()
NodeIterator#toFirst()NodeIterator#toFirstNode()
NodeIterator#toLast()NodeIterator#toLastNode()
NodeIterator#toNth(int)NodeIterator#moveTo(int)
NodeIterator#toNode(Node)removed
NodeIteraotr ni = el.getChildNodes();
for (Node n = ni.toFirst();
     null != n;
     n = ni.toNext()) {
    ....
}
NodeIteraotr ni = el.getChildNodes();
Node n;
while (null != (n = ni.toNextNode())) {
    ....
}

Alpha-3 [xml4j-19980206] to alpha-4 [xml4j-19980416]

Generation

print() methods in some classes called by Document#print() method never add extra white-spaces.

For example,
	TXElement top = new TXElement("FOO");
	top.addElement(new TXElement("BAR"));
	top.print(new PrintWriter(new InputStreamWriter(System.out)), null, 0);
This code doesn't print:
<FOO>
  <BAR/>
</FOO>
but:
<FOO><BAR/></FOO>

When you want formatted output, Use com.ibm.xml.parser.Format class.

Misc.

If you have a class implemented StreamProducer, add closeInputStream(java.io.InputStream) method to it.

TXElement#TXElement(String,String,String) was removed.

Object Tree Operation

W3C Document Object Model (DOM) working draft was updated at 18 Mar. 1998. XML for Java now adapt this new draft. So, some DOM APIs were changed from previous version of XML for Java.

NodeEnumerator was removed. Use NodeIterator
NodeEnumerator ne = parent.getChildren().getEnumerator();
Node ch;
while (null != (ch = ne.getNext())) {
	:
	:
NodeIterator ni = parent.getChildNodes();
for (Node ch = ni.toFirst();  null != ch;  ch = ni.toNext()) {
	:
	:
Other DOM-related changes
old new
Node.NodeType.* Node.*
Node#hasChildren() Node#hasChildNodes()
TXAttribute.T_ENUMERATION AttDef.NAME_TOKEN_GROUP
TXAttribtue.T_* AttDef.*
TXAttribute.S_TYPESTR AttDef.S_TYPESTR
AttDef.D_* AttDef.*
AttDef#setDefaultValue() AttDef#setDefaultStringValue()
AttDef#getDefaultValue() AttDef#getDefaultStringValue()
AttDef#setType() AttDef#setDeclaredType()
AttDef#getType() AttDef#getDeclaredType()
AttDef#setDefault() AttDef#setDefaultType()
AttDef#getDefault() AttDef#getDefaultType()
DTD.CM_EMPTY / DTD.CM_ANY / DTD.CM_REGULAR ElementDecl.EMPTY / ElementDecl.ANY / ElementDecl.MODEL_GROUP

Go to home
Last modified: Fri Jun 12 18:15:20 JST 1998