STEP 6: Default values, entities, and notations

$Id: step6e.html 1.2 2000/02/29 11:59:59 murata Exp $

text by MURATA Makoto

html by NAMBA Ryosuke


Among the features of DTD, we have not covered default values, entities, and notations. STEP 6 is concerned with them.

1. Reasons that RELAX does not handle them

RELAX does not handle default values, entities, and notations. They are intentionally omitted from RELAX so that we can continue to use existing XML processors.

Suppose that RELAX introduced constructs for these features. For example, assume that RELAX had the default attribute which provides the default value of an attribute. Existing XML processors will not examine RELAX modules when they parse XML documents. Thus, they will not use default. The same thing applies to entities and notations: even if RELAX had constructs for declaring entities and notations, existing XL processors would not use them.

If we would like to introduce such features to RELAX, the only solution is to create RELAX-specific XML parsers. Those users who create and verify XML documents against RELAX grammars certainly have to use such RELAX-specific XML parsers. Furthermore, those users who receive such XML documents have to switch to RELAX-specific XML parsers. In our opinion, this is not very realistic.

2. Using DTD and RELAX together

Then, are we unable to use default values, entities, and notations? No, we can use these features if we use DTD and RELAX together.

The following is an XML document containing a DTD.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE person [
<!ATTLIST person
	bloodtype CDATA "A">
]>
<person/>

This document is verified against a RELAX module as below:

<module
      moduleVersion="1.0"
      relaxCoreVersion="1.0"
      xmlns="http://www.xml.gr.jp/2000/relaxCore">

  <interface>
    <export labels="person"/>
  </interface>

  <elementRule pred="person">
    <empty/>
  </elementRule>

  <tag name="person">
    <attribute name="type">
      <enumeration value="O"/>
      <enumeration value="A"/>
      <enumeration value="B"/>
      <enumeration value="AB"/>
    </attribute>
  </tag>
</module>

In this example, the DTD specifies the default value "A". XML processors do use this default. We can verify this XML document against the RELAX module without any problems. Verification is done as if "A" was specified as the attribute value.

Similarily, entities and notations can be described in DTD. First, we show an example of parsed entities.

<?xml version="1.0"?>
<!DOCTYPE doc [
<!ENTITY foo "This is a pen">
]>
<doc>
  <para>&foo;</para>
</doc>

This document is legitimate against the RELAX module as below:

<module
      version="1.0"
      relaxVersion="1.0"
      xmlns="http://www.xml.gr.jp/relax">

  <interface>
    <export labels="doc"/>
  </interface>

  <elementRule pred="doc">
    <ref label="para" occurs="*"/>
  </elementRule>

  <elementRule pred="para" type="string"/>

  <tag name="doc"/>

  <tag name="para"/>

</module>

Next, we show an example of unparsed entities and notations.

<?xml version="1.0"?>
<!DOCTYPE doc [

<!NOTATION eps          PUBLIC
"-//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Adobe Systems 
Encapsulated Postscript//EN">

<!ENTITY logo_eps SYSTEM "logo.eps" NDATA eps>

<!ELEMENT doc EMPTY>

<!ATTLIST doc logo ENTITY #IMPLIED>
]>
<doc logo="logo_eps"/>

This document is legitimate against the following RELAX module.

<module
      version="1.0"
      relaxVersion="1.0"
      xmlns="http://www.xml.gr.jp/relax">

  <interface>
    <export labels="doc"/>
  </interface>

  <elementRule pred="doc" type="emptyString"/>

  <tag name="doc">
    <attribute name="logo" type="ENTITY"/>
  </tag>

</module>

3. Better leave them out

As we have seen in the previous section, we can use default values, entites, and notations by using DTD and RELAX together. Their use is not recommended, however.

Default values can be mimicked by application programs. We only have to hardcode "default values" in application programs and use them when attributes are absent. We can also write XSLT scripts so as to embed "default values" when attributes are absent.

Use links (especially, XLink) rather than external parsed entities or external unparsed entities. Links are much more appropriate for the WWW.

Internal parsed entities can be used without any problems, however. Some text data such as "<" can be best represented by internal parsed entities (e.g., &lt;).

Unforunately, default values, entities, and notations in DTD are not always processed as expected by casual users. This is because some XML processors do not fetch external DTD subsets or external parameter entitites. However, all examples in this STEP use internal DTD subsets and thus are free from such unexpected results.

4. Summary

STEP 1 thru 6 provide more than enough features for the migration from DTD to RELAX. As long as we use these features only, we can convert RELAX to DTD and vice versa without loss of information except for datatypes and facets. In the future, conversion between XML Schema should be possible. Enjoy and RELAX!


mura034@attglobal.net

Valid HTML 4.0!