$Id: step1e.html 1.7 2000/02/29 12:20:11 murata Exp $
STEP 1 covers basic features, which allows easy migration from DTD. A DTD-to-RELAX converter (dtd2relax) uses these features only.
To give an idea of RELAX, we recapture a DTD as a RELAX module.
A DTD is shown below. The number
attribute of
title
elements should be integers, but DTD cannot represent
this constraint.
<!ELEMENT doc (title, para*)> <!ELEMENT para (#PCDATA | em)*> <!ELEMENT title (#PCDATA | em)*> <!ELEMENT em (#PCDATA)> <!ATTLIST para role NMTOKEN #IMPLIED > <!ATTLIST title role NMTOKEN #IMPLIED number CDATA #IMPLIED >
Next, we show a RELAX module. The number
attribute
is specified as an integer.
<module moduleVersion="1.2" relaxCoreVersion="1.0" targetNamespace="" xmlns="http://www.xml.gr.jp/2000/relaxCore"> <interface> <export labels="doc"/> </interface> <elementRule pred="doc"> <sequence> <ref label="title"/> <ref label="para" occurs="*"/> </sequence> </elementRule> <elementRule pred="para"> <mixed> <ref label="em" occurs="*"/> </mixed> </elementRule> <elementRule pred="title"> <mixed> <ref label="em" occurs="*"/> </mixed> </elementRule> <elementRule pred="em" type="string"/> <tag name="doc"/> <tag name="para"> <attribute name="role" type="NMTOKEN"/> </tag> <tag name="title"> <attribute name="role" type="NMTOKEN"/> <attribute name="number" required="true" type="integer"/> </tag> <tag name="em"/> </module>
Subsequent sections explain syntactical constructs appeared in this example.
module
elementA RELAX grammar is a combination of modules. If the number of
namespaces is one and the grammar is not so large, a module provides a
RELAX grammar. A module is represented by a
module
element.
<module moduleVersion="1.2" relaxCoreVersion="1.0" targetNamespace="" xmlns="http://www.xml.gr.jp/2000/relaxCore"> ... </module>
The moduleVersion
attribute shows the version
of this module. In this example, it is "1.2"
.
The relaxCoreVersion
attribute shows the
version of RELAX Core. At present, it is always "1.0"
.
The targetNamespace
attribute shows the
namespace which this module is concerned with. In this example, it is
""
.
The namespace name for RELAX Core is
"http://www.xml.gr.jp/2000/relaxCore"
.
interface
elementA module
element begins with an
interface
element. There is at most one
interface
element in a single module.
<module moduleVersion="1.2" relaxCoreVersion="1.0" targetNamespace="" xmlns="http://www.xml.gr.jp/2000/relaxCore"> <interface> ... </interface> ... </module>
export
elementAn interface
element contains export
element(s).
<export labels="foo bar"/>
The labels
attribute of export
elements specifies element types that may become the root.
More than
one export
may appear in an interface
element.
Each of the following examples allows element type foo
and bar
as the root.
<interface> <export labels="foo"/> <export labels="bar"/> </interface>
<interface> <export labels="foo bar"/> </interface>
Element type declarations (<!ELEMENT ...>) of XML are
represented by elementRule
elements. The pred
attribute of elementRule
specifies an element
type name. More than one elementRule
may follow
the interface
element.
<elementRule pred="element-type-name"> ...hedge model... </elementRule>
An elementRule
element has an hedge model.
A hedge is a sequence of elements (and their decendands) as
well as character data. A hedge model is a constaint on permissible
hedges.
A hedge model is either an element hedge model, datatype reference, or mixed hedge model.
Element hedge models are represented by empty,
none, ref, choice, sequence
elements and the
occurs
attribute. An element hedge model represents
permissible sequences of child elements, which are possibly
intervened by whitespace characters.
empty
elementempty
represents the empty sequence.
Consider an elementRule
as below:
<elementRule pred="foo"> <empty/> </elementRule>
This elementRule
implies that the content
of a foo
element is the empty sequence. A foo
element can be a start tag followed by an end tag, or
an empty-element tag.
<foo/>
<foo></foo>
Unlike EMPTY
of XML, whitespace characters may
intervene between start and end tags.
<foo> </foo>
empty
can be used within sequence
and
choice
(see (4) and (5)). The motivation behind this extension
will become clear in STEP 2. If you need
exactly the same feature as EMTPY
of XML, use the
emptyString
datatype (shown in STEP
3).
From now on, we assume that foo, foo1, foo2
are
declared by elementRule
s whose hedge models are
empty
.
ref
elementref
references to an element type. For
example, <ref label="foo"/>
references to an element
type foo
.
Consider an elementRule
as below:
<elementRule pred="bar"> <ref label="foo"/> </elementRule>
This elementRule
implies that the content
of a bar
element is an foo
element.
For example, the next bar
element is legitimate
against this elementRule
.
<bar><foo/></bar>
Whitespace may appear before and after the foo
element.
<bar> <foo/> </bar>
ref
can have the occurs
attribute.
Permissible values are "*", "+", and "?" , which indicate "zero or
more", "one or more", and "zero or one times", respectively.
An example of "?" as the occurs
attribute
is as below:
<elementRule pred="bar"> <ref label="foo" occurs="?"/> </elementRule>
This elementRule
implies that the content of a
bar
element is either a foo
or empty.
<bar><foo/></bar>
<bar></bar>
Whitespace characters may appear before and after the foo
element. Even when this bar
is empty, it may have
whitespace characters.
<bar> <foo/> </bar>
<bar> </bar>
choice
elementchoice
indicates a choice of the specified hedge models
("|" of XML 1.0). Subordinate elements of choice
elements
are element hedge models. choice
can also have the
occurs
attribute.
An example of elementRule
containing choice
is shown below:
<elementRule pred="bar"> <choice occurs="+"> <ref label="foo1"/> <ref label="foo2"/> </choice> </elementRule>
This elementRule
indicates that the content of a
bar
element is one or more occurrences of either
foo1
or foo2
elements.
<bar><foo2/></bar>
<bar> <foo2/> </bar>
<bar> <foo1/> <foo2/> <foo1/> </bar>
sequence
elementsequence
is a sequence of the
specified hedge models. ("," of XML 1.0). Subordinate elements of
sequence
are element hedge models.
sequence
can also have the occurs
attribute.
An example of elementRule
containing sequence
is shown below:
<elementRule pred="bar"> <sequence occurs="?"> <ref label="foo1"/> <ref label="foo2"/> </sequence> </elementRule>
This elementRule
implies that the content of a
bar
element is either a sequence of a foo1
element and a foo2
element, or empty.
<bar><foo1/><foo2/></bar>
<bar> <foo1/> <foo2/></bar>
<bar/>
<bar></bar>
<bar> </bar>
none
elementnone
is an element hedge model, which does
not match anything. none
is unique to RELAX.
<elementRule pred="bar"> <none/> </elementRule>
This elementRule
implies that nothing is permitted as
the content of bar
elements. The motivation behind
none
will become clear in STEP 2.
The type
attribute of
elementRule
allows a content model that references to a
datatype. Character strings in an document are compared with the
specified datatype. Permissible datatypes are built-in datatypes of
XML Schema Part 2, or datatypes unique to RELAX. Details of
datatypes will be covered by STEP 3.
An example of elementRule
containing type
is shown below:
<elementRule pred="bar" type="integer"/>
This elementRule
indicates that the content of a
bar
element is a character string representing
an integer.
<bar>10</bar>
Whitespace characters may not occur before or after the integer. For example, the following is not permitted.
<bar> 10 </bar>
mixed
significantly extends
mixed content models (#PCDATA|a|b|...|z)* of XML.
A mixed
element wraps an element hedge model. Recall that an element
hedge model allows whitespace characters to intervene between
elements. By wrapping it with mixed
, any
character is allowed to intervene.
As an example, consider elementRule
as below:
<elementRule pred="bar"> <mixed> <ref label="foo"/> </mixed> </elementRule>
Element <foo/>
matches ref
in
the mixed
element. Thus, the following example
is permitted by this contentRule
.
<bar>Murata<foo/>Makoto</bar>
As shown in the following example, CDATA sections and character references may appear.
<bar><![CDATA[Murata]]><foo/>Makoto</bar>
(#PCDATA | foo1| foo2)*
of XML
can be captured as below:
<elementRule pred="bar"> <mixed> <choice occurs="*"> <ref label="foo1"/> <ref label="foo2"/> </choice> </mixed> </elementRule>
There are two ways to capture a content
model (#PCDATA)
. One is to reference to the datatype
string
by the type
attribute. The other is
to make an element hedge model that matches the empty sequence only and
wrap it with mixed
. An example is as below:
<elementRule pred="bar" type="string"/>
<elementRule pred="bar"> <mixed> <empty/> </mixed> </elementRule>
Attribute-list declarations (<!ATTLIST ...>)
of XML are captured by tag
elements.
<tag name="element-type-name"> ...list of attribute declarations ... </tag>
tag
can have attribute
elements as subordinates.
<tag name="element-type-name"> <attribute ... /> <attribute ... /> </tag>
attribute
declares an attribute.
An example of attribute
is shown below:
<attribute name="age" required="true" type="integer"/>
The value of the name
attribute
is the name of the declared attribute. In this example,
it is age
.
If the value of the required
attribute
is true
, the attribute being declared is mandatory. If
required
is not specified, it is
optional. Since required
is specified in this example, the age
attribute is mandatory.
The type
attribute specifies a
datatype name. If type
is not specified, a
datatype string
(which allows any string)
is assumed.
Consider an example of tag
which contains
an attribute
element only.
<tag name="bar"> <attribute name="age" required="true" type="integer"/> </tag>
The following start tag is permitted by this tag
.
<bar age="39">
The following two start tags are not permitted. In the first
example, the age
attribute is omitted. In the second
example, the value of age
is not an integer.
<bar>
<bar age="bu huo"> <!-- "bu huo" means forty years in Chinese. In Japan, it is pronounced as "FUWAKU". -->
In DTD, you do not have to write an attribute-list
declaration if an element type does not have any attributes.
In RELAX, you must write an empty tag
element even if there are no attributes. For example, if an
element type bar
does not have any attributes,
you have to write a tag
element as below:
<tag name="bar"/>
If you have finished reading this STEP, you can immediately start to use RELAX. If you do not need further features, you do not have to read other STEPs. Enjoy and RELAX!