MULTEXT - Document MQL 1. MtSgmlQL/I/O formats.

logo

MtSgmlQL: I/O formats



Contents


The nSGML format

Definition

The nSGML format is a normalized SGML format which was defined in the MULTEXT project in collaboration with the Language Technology Group (LTG) of University of Edinburgh. The variant that we use is slightly different from LTG's.

nSGML is defined as follows:


nSGML format

 1.Document is valid SGML. 
 2.Reference concrete syntax is used. 
 3.No capacity/length restrictions. 
 4.No short refs or tag minimisation. 
 5.All end-tags present except for EMPTY elements. 
 6.No entity references except character references. 
 7.All character references terminated with ";". 
 8.No CDATA or RCDATA element content. 
 9.No attribute value minimisation; literal delimiters (quotes) 
   may not be omitted. 
10.No SUBDOCs. 
11.No marked sections.


Example

<!DOCTYPE MEMO SYSTEM "memo.dtd">
<MEMO TYPE="CONFIDEN">
<TO>
Dr. Watson
</TO>
<FROM>
Sherlock Holmes
</FROM>
<BODY>
<P>
Please install PGP on your computer.
</P>
<P>
You'll see my public key below.
</P>
</BODY>
<SIGN TYPE="PGP">
</MEMO>




The sgmls format

Definition

This format is a subset of the output format of the well-known sgmls and nsgmls parsers developed by Jim Clark.

The subset is defined as follows (the following is extracted from the nsgmls manual).


sgmls format

The output is a series of lines.  Lines can be arbitrarily
long.   Each line consists of an initial command character
and one or more arguments.  Arguments are separated  by  a
single  space,  but when a command takes a fixed number of
arguments the last argument can contain spaces.  There  is
no space between the command character and the first argu-
ment.   Arguments  can  contain   the   following   escape
sequences.

\\     A \.

\n     A record end character.

\|     Internal SDATA entities are bracketed by these.

\nnn   The character whose code is nnn octal.

A  record  start  character  will  be represented by \012.
Most applications will need to ignore \012  and  translate
\n into newline.

\#n;   The  character whose number is n in decimal.  n can
       have any number of digits.  This is used for  char-
       acters  that  are not representable by the encoding
       translation used for output (as  specified  by  the
       NSGML_CODE  environment  variable).  This will only
       occur with the multibyte version of nsgmls.

The possible command characters and arguments are as  fol-
lows:

(gi    The start of an element whose generic identifier is
       gi.  Any attributes for this element will have been
       specified with A commands.

)gi    The  end  of an element whose generic identifier is
       gi.

-data  Data.

Aname val
       The  next  element  to  start has an attribute name
       with value val which takes  one  of  the  following
       forms:

       IMPLIED
              The value of the attribute is implied.

       CDATA data
              The  attribute  is  character data.  This is
              used for attributes whose declared value  is
              CDATA.

       NOTATION nname
              The attribute is a notation name; nname will
              have been defined using a N  command.   This
              is  used for attributes whose declared value
              is NOTATION.

       ENTITY name...
              The attribute is a list  of  general  entity
              names.   Each  entity  name  will  have been
              defined using an I, E or S command.  This is
              used  for attributes whose declared value is
              ENTITY or ENTITIES.

       TOKEN token...
              The attribute is a list of tokens.  This  is
              used  for attributes whose declared value is
              anything else.

       ID token
              The attribute is an ID value.  This will  be
              output only if the -oid option is specified.
              Otherwise TOKEN will be used for ID  values.


Example

ATYPE TOKEN CONFIDEN
(MEMO
(TO
-Dr. Watson
)TO
(FROM
-Sherlock Holmes
)FROM
(BODY
(P
-Please install PGP on your computer.
)P
(P
-You'll see my public key below.
)P
)BODY
ATYPE TOKEN PGP
(SIGN
)SIGN
)MEMO




| Top | MtSgmlQL | LPL/CNRS | MULTEXT

Copyright © Centre National de la Recherche Scientifique, 1997.