import com.ibm.xml.parser.*;
....
String filename;
....
InputStream is = new FileInputStream(filename);
TXDocument doc = new Parser(filename).readStream(is);
is.close();
|
Parser#readStream() never returns null.
In this way, the parser prints parse errors to the standard error stream.
To access a parse tree, use TXDocument#getDocumentElement()
(see How to operate).
TXDocument#getDocumentElement() may return null
when the XML document has serious errors.
A Parser instance cannot be reused.
An application can call the Parser#readStream() method only once.
NOTE:
A TXDocument instance generated by Parser is
also an instance of org.w3c.dom.Document.
This is a DOM object tree.
You can restructure the parse tree into a stream in XML format.
String charset = "ISO-8859-1";
String jencode = MIME2Java.convert(charset);
PrintWriter pw
= new PrintWriter(new OutputStreamWriter(System.out, jencode));
doc.setEncoding(charset);
doc.print(pw, jencode);
|
You can configure the parser's behavior after making a Parser instance before readStream() is called.
setErrorNoByteMark(boolean)
setKeepComment(boolean)
setPreserveSpace(boolean)
setWarningNoDoctypeDecl(boolean)
setWarningNoXMLDecl(boolean)
setWarningRedefinedEntity(boolean)
import com.ibm.xml.parser.*;
....
String filename;
....
Parser parse = new Parser(filename);
parse.setWarningNoDoctypeDecl(false);
parse.setWarningNoXMLDecl(false);
InputStream is = new FileInputStream(filename);
TXDocument doc = parse.readStream(is);
is.close();
|
You can control the output of errors produced by the parser.
Make an instance of a class implementing ErrorListener,
and specify the instance to the Parser constructor.
The Object key parameter of an error() method
is an instance of String or Exception..
When key is String,
it means a type of error (See the source com/ibm/xml/parser/r/Message.java).
import com.ibm.xml.parser.*;
class ErrorIgnorer implements ErrorListener {
public void error(String fname, int lineno,
int charoff, Object key, String mes) {
// do nothing
}
}
....
String filename;
....
InputStream is = new FileInputStream(filename);
Parser parse = new Parser(filename, new ErrorIgnorer(), null);
TXDocument doc = parse.readStream(is);
is.close();
|
import com.ibm.xml.parser.*;
import java.awt.TextArea;
class ErrorEater extends TextArea implements ErrorListener {
String m_fname;
int m_noferrors = 0;
ErrorEater(String n) {
super();
m_fname = n;
}
public void error(String fname, int lineno,
int charoff, Object key, String mes) {
append((null == fname ? m_fname : fname)+":"+lineno+":"+mes+"\n");
m_noferrors ++;
}
public boolean hasError() {
return 0 < m_noferrors;
}
}
....
String filename;
....
InputStream is = new FileInputStream(filename);
Parser parse
= new Parser(filename, ee = new ErrorEater(filename), null);
TXDocument doc = parse.readStream(is);
is.close();
|
See the sources com/ibm/xml/parser/trlxml.java and
com/ibm/xml/parser/Stderr.java .
A TXDocument can have one TXElement instance,
zero or one DTD instance, and instances of TXPI
and TXComment as children.
All children of TXDocument can also be accessed with
TXDocument#getChildren() / TXDocument#getChildrenArray().
The TXElement instance can be accessed
with TXDocuemnt#getDocumentElement() also.
TXElement can have some instances of TXElement,
TXText, TXPI and TXComment as children.
All children of TXElement can be accessed with TXElement#getChildren() /
TXElement#getChildrenArray().
Some mtehods of TXDocuement
and TXElement
returns one or more instances of the
Child interface.
These Child
instances are also instances of
TXElement or
TXText or
TXPI or
TXComment or
DTD(if a child of TXDocument).
To know what class an instance belongs to, use Node#getNodeType() or instanceof operator like the following:
import com.ibm.xml.parser.*;
import com.ibm.dom.*;
....
TXDocument doc = ....;
TXElement root = doc.getDocumentElement();
NodeEnumerator ne = root.getChildren().getEnumerator();
Node ch;
while (null != (ch = ne.getNext())) {
if (ch instanceof TXElement) {
TXElement el = (TXElement)ch;
....
} else if (ch instanceof TXText) {
TXText te = (TXText)ch;
....
}
}
|
The processor keeps all spaces and pass them to applications
according to 2.10 White Space Handling
in XML 1.0 Proposed Recommendation.
The processor sets the IsIgnorableWhitespace flag to
TextElement instances that consist only of white spaces.
<MEMBERS>
<PERSON>Hiroshi</PERSON>
<PERSON>Naohiko</PERSON>
<PERSON>
Kent
</PERSON>
</MEMBERS>
|
The processor parses this Element as follows:
TXElement (getName():"MEMBERS", getText():"\n Hiroshi\n Naohiko\n \n Kent\n \n")
TXText ("\n ", ignorable)
TXElement (getName():"PERSON", getText():"Hiroshi")
TXText ("Hiroshi")
TXText ("\n ", ignorable)
TXElement (getName():"PERSON", getText():"Naohiko")
TXText ("Naohiko")
TXText ("\n ", ignorable)
TXElement (getName():"PERSON", getText():"\n Kent\n ")
TXText ("\n Kent\n ")
TXText ("\n", ignorable)
It is useful to call
TXText#trim(String) /
TXText#trim(String,boolean,boolean)
when an application does not need leading/trailing spaces.
class AElementHandler implements ElementHandler {
public TXElement handleElement(TXElement el) {
....
}
}
....
Parser parse = new Parser(...);
parse.setElementHandler(new AElementHandler(), "CHANNEL");
TXDocument doc = parse.readStream(is);
|
This ElementHandler#handleElement() method is called after parsing each end tag
(</CHANNEL>), and before being added to a parent
while processing Parser#readStream().
The parser adds to the parent an TXElement instance
returned by handleElement().
If handleElement() returns null,
the parser does not add this TXElement instance to the parent.
There are two methods of setting ElementHandler:
TXElementaddElementHandler(handler, "CHANNEL");</CHANNEL> tag.
TXElementsaddElementHandler(handler);When more than one ElementHandler is registered in the parser,
the parser first calls ElementHandlers for a specific TXElements
(first set, first called)
and then calls ElementHandlers for all TXElement.
Even if an ElementHandler changes the name of an TXElement,
the parser calls other ElementHandlers for the original name.
When an ElementHandler returns null,
the parser does not call other ElementHandlers.
Parser parse = new Parser(...);
parse.addElementHandler(handler1);
parse.addElementHandler(handler2, "CHANNEL");
parse.addElementHandler(handler3, "CHANNEL");
parse.addElementHandler(handler4);
TXDocument doc = parse.readStream(is);
|
In this case, when the parser processes the </CHANNEL> tag,
the parser calls handler2 first, and calls handler3,
handler1, and handler4.
TXDocument instanceTXDocument doc = new TXDocument();
doc.addElement(...);
PrintWriterTXDocument
if the encoding of PrintWriter is not UTF-8.doc.print(...);
TXDocument doc = new TXDocument();
TXElement el = new TXElement("CHANNEL");
....
doc.addElement(el);
PrintWriter pw
= new PrintWriter(new OutputStreamWriter(System.out,
MIME2Java.convert("Shift_JIS")));
doc.setEncoding("Shift_JIS");
doc.print(pw);
|
If you want to use not TXElement class but a subclass of TXElement,
implement the ElementFactory interface
and call Parser#setElementFactory().
TXElement class
DefaultElementFactory class.
Parser#setElementFactory()
with an instance of the class implementing ElementFactory.
class MyElement extends TXElement {
....
}
class MyElementFactory extends DefaultElementFactory {
....
}
....
Parser parse = new Parser(...);
parse.setElementFactory(new MyElementFactory());
TXDocument doc = parse.readStream(is);
// doc has not TXElement instances but MyElement instances
|
ElementFactory#createElement() is called when the processor reaches a start-tag.
ElementFactory#ripenElement() is called when the processor reaches an end-tag.
String systemlit = "http://.../foobar.dtd";
InputStream is = (new URL(systemlit)).openStream();
Parser parse = new Parser(...);
DTD dtd = parse.readDTDStream(is);
|
Enumeration en = dtd.getAttributeDeclarations("FOO");
while (en.hasMoreElements()) {
AttDef attd = (AttDef)en.nextElement();
// attd.getName() is attribute name
}
|
First, get an AttDef instance by the above method
or by DTD#getAttributeDeclaration(String,String).
Next, check the attribute type by means of AttDef#getType(),
which returns one of the following values:
TXAttribute.T_CDATA
TXAttribute.T_ENTITIES
Enumeration en = dtd.getEntities();
while (en.hasMoreElements()) {
EntityValue ev = (EntityValu)en.nextElement();
if (ev.isNDATA()) {
// Each ev.getName() is valid value.
}
}
|
TXAttribute.T_ENTITY
TXAttribute.T_ENUMERATION
AttDef#elements().
Enumeration en = attd.elements();
while (en.hasMoreElements()) {
String s = (String)en.nextElement();
// Each s is valid.
}
|
TXAttribute.T_ID
DTD#checkID() returns null.
String newid = ...
if (null != dtd.checkID(newid)) {
// Can't use newid
} else
dtd.registID(element, newid);
|
TXAttribute.T_IDREF
Enumeration en = dtd.IDs();
while (en.hasMoreElements()) {
String id = (String)en.nextElement();
// The attribute can have one in a set of each id.
}
|
TXAttribute.T_IDREFS
TXAttribute.T_NMTOKEN
TXAttribute.T_NMTOKENS
TXAttribute.T_NOTATION
AttDef#elements().
Enumeration en = attd.elements();
while (en.hasMoreElements()) {
String s = (String)en.nextElement();
// Each s is valid.
}
|
<!ELEMENT PERSON (NAME, HEIGHT, WEIGHT, EMAIL?)>
When using this declaration, you must insert the "NAME" element into the "PERSON" element first, the "HEIGHT" element second, and the "WEIGHT" element third, you can also insert the "EMAIL" element if you want.
Applications can know such rules with
DTD#getInsertableElements() / DTD#getAppendableElements().
TXElement el = new TXElement("PERSON");
....
switch (dtd.getContentType("PERSON")) {
case 0:
// This element is not declared.
break;
case DTD.CM_EMPTY:
// No element is insertable.
break;
case DTD.CM_ANY:
// Any element is insertable.
break;
case DTD.CM_REGULAR:
Hashtable tab = dtd.prepareTable("PERSON");
// This hashtable is reusable for any elements.
dtd.getAppendableElement(el, tab);
if (((InsertableElement)tab.get(DTD.CM_ERROR)).status) {
// This element has an incorrect structure.
} else {
Enumeration en = tab.elements();
while (en.hasMoreElements()) {
InsertableElement ie = (InsertableElement)en.nextElement();
if (!ie.name.equals(DTD.CM_ERROR)
&& !ie.name.equals(DTD.CM_EOC)
&& ie.status) {
if (ie.name.equals(DTD.CM_PCDATA)) {
// Can append a TextElement instance to el.
} else {
// Can append an Element instance named ie.name.
}
}
}
}
break;
}
|
Namespace spec. is in progress. This implementation is experimental.
Parser#setProcessNamespace(true) if you need the namespace feature.
"rdf:assertion" without namespace support
| "rdf:assertion" with namespace support
| "author" with namespace support
| |
|---|---|---|---|
TXElement#getName() / TXAttribute#getName()
| "rdf:assertion"
| "assertion"
| "author"
|
TXElement#getNameSpace() / TXAttribute#getNameSpace()
| null
| "rdf"
| null
|
TXElement#getQName() / TXAttribute#getQName()
| "rdf:assertion"
| "rdf:assertion"
| "author"
|