This document is meant to be a brief overview on how to read and interpret an XML Document Type Definition (DTD). It makes no assumptions over how much you know about markup languages, XML or DTDs.
Whats a markup language? A markup language is a way of describing textual data by indicating the meaning of a particular piece of text. This text can be anything from a word, sentence to a paragraph, or even an entire document.
Why would you do this? Its usually (!) obvious to the human reader of a document what the text in a document means. For example you can see from the way I've written "The 10 Minute Guide to Reading An XML DTD" above, and "1. Markup languages" that these are meant to be headings. You can see that because I left a space between this block of text and the previous block that this is a separate paragraph. We take the meaning for granted.
Its much harder to describe this meaning to a computer which must process a document. So we need a means to explicitly describe the meaning of a particular piece or block of text so that the application knows how to process it.
If this were a HTML document then I might do something like
the following:
<TITLE>THE 10 MINUTE GUIDE TO READING AN XML DTD</TITLE> <P>Whats a markup language?...</P>
Notice from the previous example the syntax of a tag (or element)
in a markup language such as HTML or XML:
<name-of-the-tag-goes-here>
</name-of-the-tag-goes-here>
Elements can have additional information (usually meta information)
associated with them. This information is known as an attribute.
It looks like this:
<P ALIGN="RIGHT">Some text</P>
Attributes then have the form:
<tag name-of-the-attribute="the-value-of-the-attribute">...</tag>
Its ought to be obvious from the above HTML examples that providing a document with markup like <TITLE> and <P> is only useful if the application knows how to handle those elements, and that they exist.
This is achieved by agreeing a Document Type Definition which describes the legal elements and attributes that can be used to markup a document. This is essentially a contract between the application and the user of the markup language - if the user marks up a document in a certain way, then the application can be relied upon to respond accordingly.
The additional advantage of a DTD is that they are defined on a rigorous syntax which means that it becomes possible to 'validate' (i.e. check) a document against its DTD to see whether it conforms to the letter of the contract.
XML DTDs all have a common underlying syntax which allows any XML parser (an application that can read a document, and potentially validate it against its DTD) to process any XML document.
OK, how do we define an element in a DTD?
Heres how:
<!ELEMENT element-name ...>
<!ELEMENT TITLE ...>
Whats the ... mean? This is where you declare what other tags or text an element can contain - its 'content model'. A content model is usually either plain text, other elements, or a mixture of the two.
Heres an element which can contain only plain text (no other elements)
<!ELEMENT TITLE #PCDATA>
<!ELEMENT FOOTNOTE (P)>
How do we specify multiple or optional elements? There are three qualifiers which we can specify to denote multiple or optional elements:
<!ELEMENT ARTICLE (P)*>
<!ELEMENT ARTICLE (TITLE?, P*)>
<!ELEMENT ARTICLE (TITLE? | P*)>
OK, how do we define an attibute in a DTD?
Heres how:
<!ATTLIST element-name attribute-name CDATA>
You can declare multiple attributes for an element in one go:
<!ATTLIST ARTICLE AUTHOR CDATA DATE CDATA>
Its common to put the declarations on separate lines for readability.
An attribute can also be declared as having one of several fixed values:
<!ATTLIST P ALIGN (LEFT|RIGHT|CENTRE)>
<!ATTLIST P ALIGN (LEFT|RIGHT|CENTRE) "LEFT">
Attributes can also be enforced - i.e. force the author to specify
a particular piece of information:
<!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED DATE CDATA #IMPLIED>
No unfortunately not, there lots of other subtle rules relating to DTDs and other types of declaration that can be used, but as elements and attributes are the main contents of an DTD, this should be enough to get you started.
One thing to remember is that an XML (or HTML or SGML) document is
really a tree structure. This means that one element is the 'document'
element or the root of the tree. In HTML the root element is
the HTML tag.
e.g. <HTML>...other tags...</HTML>
Because elements should be declared before they are used in the
content models of other elements, you can usually find the document
element of a DTD towards the bottom of the file.
e.g. <!ELEMENT P CDATA> <!ELEMENT ARTICLE (P)*>
Hope this proved of some use, if you want to contact the author, then you can mail me here: ldodds@ingenta.com. If there are any glaring omissions or errors then please let me know.