XML Audit, conference paper, XML / SGML differences, XML / SGML comparison.
This paper was originally presented at XML'98 in Chicago. The copy on the CD-ROM proceedings was apparently garbled during the composition process.
Charlie Halpern-Hamu, PhD, MBA
Structured-Text Consultant
This paper describes, at a technical level, how to assess the XML-readiness of your SGML data as a first step towards moving it towards XML.
This paper suggests an 'XML audit': a technical review of current markup practice with eye towards simplification. The goal of an XML audit is to understand which portions of your current SGML application are not XML. The next step might be to start deemphasizing your use of those features.
Moving all the way to XML allows you to use XML tools that do not support full SGML. Even getting part way there means you can use a wider variety of SGML tools. In either case you will be simplifying work for both editorial and programming staff. Simpler is better.
This paper is derived from James Clark's 'Comparison of SGML and XML', a World Wide Web Consortium Note (
www.w3.org/TR/NOTE-sgml-xml-971215
by
jjc@jclark.com
).
This talk describes, at a technical level, how to assess the XML-readiness of your SGML data as a first step towards moving it towards XML.
This talk introduces the concept of an 'XML audit': a review of current markup practice with eye towards simplification. An XML audit lets you know where you stand. Your next step might be to de-emphasize those SGML features that are not XML.
Moving all the way to XML allows you to use XML tools that do not support full SGML. Even getting part way there means you can use a wider variety of SGML tools. In either case you will be simplifying work for both editorial and programming staff. This simplification may result in reduced training requirements, less confusion and fewer errors.
But even if you choose to make no immediate change to your markup practices, an XML audit will give you valuable information that will help inform future decisions. You may discover that, give or take an angle-bracket or two, you are already doing XML.
All discussion assumes the reference concrete syntax. So I will say 'left angle-bracket' or '
<
', but not 'start-tag open delimiter' or '
STAGO
'. Similarly, I say 'white space' instead of 'separator'.
Where SGML and XML vary slightly in their nomenclature, I tend towards the SGML, since that's our starting point. Or I fall back towards spelling things out using the reference concrete syntax as described above.
This paper is derived from James Clark's 'Comparison of SGML and XML', a World Wide Web Consortium Note (
www.w3.org/TR/NOTE-sgml-xml-971215
by
jjc@jclark.com
).
Clark's Note discusses XML options not available in SGML. This paper ignores these, only discussing those SGML options that are not available in XML.
In this paper, and to an even greater degree in the corresponding presentation, I've tried to give more prominence to the more commonly-used SGML features that are missing in XML.
I'd like to thank Larry Sulky for his copy edit. The only suggestion I didn't take was to change 'a journey of a thousand miles' to 'a journey of sixteen-hundred kilometres'.
The key idea in conducting an XML audit is resisting the temptation to do more than simply review where you stand.
You need a selection of technical people: someone who knows the DTD, someone who knows editorial tagging practices, someone who knows about the programs that operate on the data as it flows in, through, and out of the organization.
An XML audit is for figuring out where you are, not where you are going. Consequently, you don't need managerial or technical decision-makers at the meeting. They will want to understand and act on the final assessment.
Make printouts of this paper, your SGML declaration(s), DTD(s), some sample data, and programs that act on this data. Distribute these items in advance to your attendees. Each attendee should review these items, especially those about which she is the designated expert. So the data architect should focus on the DTDs, the programmer the programs, etc. Ask attendees to note those aspects of your current SGML that are not XML, perhaps in the margins of this paper.
Designate one person as the note-taker. As with the individual preparation step, it may be convenient to use a copy of this paper as a note-taking template. Move systematically through the headings in this paper and determine if they apply to your application.
Postpone discussions about how to recast SGML usage as simpler XML usage. Focus on simply listing those aspects of
your SGML usage that go beyond XML. When you do find non-XML usages, include details of where. Do you have one use
of the '
&
' connector or a dozen? Which elements? Try not to worry about why you use this aspect of SGML or how you might
avoid it.
The result of an XML audit should be an assessment report. Transcribe your notes into a complete list of the non-XML things you do. The next step will be to decide if it makes sense to change all or some of your markup practices.
Those aspects of SGML that are not available in XML are listed in the sections that follow. The following organization has been used:
The Big Three
Out of Band
Miscellaneous
You can not declare several element types with the same declaration:
<!ELEMENT (isnt-xml | isnt-xml2) (#PCDATA | em)*>
This habit makes finding element type declarations in a DTD more difficult. A better practice might be to use a parameter entity for the common content model:
<!ENTITY % inline '(#PCDATA | em)'> <!ELEMENT okay-xml %inline;> <!ELEMENT okay-xml2 %inline;>
You can not specify minimization in XML element declarations:
<!ELEMENT isnt-xml - - (#PCDATA | em)*>
If you are not using
OMITTAG
, you can leave this out of your SGML:
<!ELEMENT okay-xml (#PCDATA | em)*>
You can not declare content to be
RCDATA
:
<!ELEMENT isnt-xml RCDATA>
You can not declare content to be
CDATA
:
<!ELEMENT isnt-xml CDATA>
You can not use the '
&
' connector:
<!ELEMENT isnt-xml (phone & fax & email)>
If the random order is important to you, you can recast short lists by listing all the possible orders, avoid SGML-ambiguous content models by factoring out commonalities:
<!ELEMENT okay-xml ( (phone, ((fax, email) | (email, fax)) | (fax, ((phone, email) | (email, phone)) | (email, ((phone, fax) | (fax, phone)) )>
If you can enforce an order, do so:
<!ELEMENT okay-xml (phone, fax, email)>
If you can't enforce an order, but your list it too long to recast without the
&
connector, you may need to loosen your content model:
<!ELEMENT okay-xml (phone | fax | email)+>
You can not have deprecated mixed content in XML:
<!ELEMENT isnt-xml (em | #PCDATA)>
Indeed, the rules are stricter even than just avoiding deprecated mixed content:
<!ELEMENT isnt-xml (em | #PCDATA)*> <!ELEMENT isnt-xml2 (#PCDATA)*>
In a mixed content model, the
#PCDATA
must be listed first, the only connector permitted is '
|
', the only occurrence indicator permitted is '
*
', and the '
*
' must appear only when there is a '
|
':
<!ELEMENT okay-xml (#PCDATA | em)*> <!ELEMENT okay-xml2 (#PCDATA)>
XML does not allow inclusions on content models:
<!ENTITY % text '(#PCDATA)'> <!ELEMENT isnt-xml (heading | para)* +(warning)> <!ELEMENT heading %text;> <!ELEMENT para %text;> <!ELEMENT warning %text;>
Element types declared using inclusions are often far looser than they need to be. Usually they can be recast using other mechanisms:
<!ENTITY % text '(#PCDATA | warning)*'> <!ELEMENT ok-xml (heading | para | warning)*> <!ELEMENT heading %text;> <!ELEMENT para %text;> <!ELEMENT warning %text;>
XML does not allow exclusions on content models:
<!ENTITY % text '(#PCDATA | em | etc | isnt-xml)*'> <!ELEMENT document (heading | para | isnt-xml)*> <!ELEMENT heading %text;> <!ELEMENT para %text;> <!ELEMENT em %text;> <!ELEMENT etc %text;> <!ELEMENT isnt-xml %text; -(isnt-xml)>
Sometimes exclusions can be recast using other mechanisms:
<!ENTITY % text '(#PCDATA | em | etc | okay-xml)*'> <!ELEMENT document (heading | para | okay-xml)*> <!ELEMENT heading %text;> <!ELEMENT para %text;> <!ELEMENT em %text;> <!ELEMENT etc %text;> <!ELEMENT okay-xml (#PCDATA | em | etc)*>
Other times, the easiest way to move to XML is to simply remove the exclusion, leaving the content model somewhat looser than it was.
XML uses a special syntax for empty elements:
<toc/> <toc depth='2'/>
XML also allows empty elements to have end tags:
<toc></toc> <toc depth='2'></toc>
You should note which elements you declare as empty:
<!ELEMENT toc EMPTY> <!ATTLIST toc depth #CDATA #IMPLIED>
Here's one way to make the transition. This element declaration is looser than intended, but is both SGML and XML:
<!ELEMENT toc (#PCDATA)> <!--Should be EMPTY.-->
In both SGML and XML the declaration above allows the markup below:
<toc></toc>
If you have a very small number of such elements, you might consider if they could be recast as container elements or perhaps as attributes on other elements. Those DTDs that do not feature empty elements avoid a major area of incompatibility between XML and SGML as it is usually used.
Here's one way to change your SGML declaration so that it allows XML-style markup for emptments or perhaps as attributes on other elements. Those DTDs that do not feature empty elements avoid a major area of incompatibility between XML and SGML as it is usually used.
Here's one way to change your SGML declaration so that it allows XML-style markup for empty elements:
DELIM GENERAL SGMLREF NET '/>'
You can only declare attributes for one element type at a time:
<!ATTLIST (isnt-xml | isnt-xml2) attrib #CDATA #IMPLIED>
XML will require that this be split into one
ATTLIST
declaration per element type:
<!ATTLIST okay-xml attrib #CDATA #IMPLIED> <!ATTLIST okay-xml2 attrib #CDATA #IMPLIED>
If removing the redundancy is important, this can be done using a parameter entity:
<!ENTITY % attribute 'attrib #CDATA #IMPLIED'> <!ATTLIST okay-xml %attribute;> <!ATTLIST okay-xml2 %attribute;>
XML does not include some declared values for attributes that can be used in SGML. Substituting other declared values may have little or no negative effect on your SGML environment while moving you one step closer to XML.
The following declared values are not allowed:
<!ATTLIST isnt-xml attrib NAME #IMPLIED> <!ATTLIST isnt-xml2 attrib NAMES #IMPLIED> <!ATTLIST isnt-xml3 attrib NUMBER #IMPLIED> <!ATTLIST isnt-xml4 attrib NUMBERS #IMPLIED> <!ATTLIST isnt-xml5 attrib NUTOKEN #IMPLIED> <!ATTLIST isnt-xml6 attrib NUTOKENS #IMPLIED> <!ATTLIST isnt-xml7 attrib NOTATION (jpeg | tiff) #IMPLIED>
The following are allowed:
<!ATTLIST okay-xml attrib CDATA #IMPLIED> <!ATTLIST okay-xml attrib ENTITY #IMPLIED> <!ATTLIST okay-xml attrib ENTITIES #IMPLIED> <!ATTLIST okay-xml attrib ID #IMPLIED> <!ATTLIST okay-xml attrib IDREF #IMPLIED> <!ATTLIST okay-xml attrib IDREFS #IMPLIED> <!ATTLIST okay-xml attrib NMTOKEN #IMPLIED> <!ATTLIST okay-xml attrib NMTOKENS #IMPLIED> <!ATTLIST okay-xml attrib (this | that) #IMPLIED>
When you enumerate a list of options using a name token group, you must use the or-bar between then (SGML allows you to use the or-bar or comma interchangeably):
<!ATTLIST isnt-xml attrib (red, green, blue) #IMPLIED> <!ATTLIST okay-xml attrib (red | green | blue) #IMPLIED>
These two default value declarations are not allowed in XML:
<!ATTLIST isnt-xml attrib CDATA #CURRENT> <!ATTLIST isnt-xml2 attrib CDATA #CONREF>
These four default value declarations are allowed:
<!ATTLIST okay-xml attrib CDATA #FIXED "only value"> <!ATTLIST okay-xml2 attrib CDATA "default value"> <!ATTLIST okay-xml3 attrib CDATA #REQUIRED> <!ATTLIST okay-xml4 attrib CDATA #IMPLIED>
Default values must be enclosed in quote marks:
<!ATTLIST isnt-xml attrib (this | that) this> <!ATTLIST okay-xml attrib (this | that) "this"> <!ATTLIST okay-xml2 attrib (this | that) 'this'>
You must use an attribute value literal, not an attribute value, in an attribute value specification. In other words, you must use quote marks when specifying an attribute value:
<isnt-xml attrib=this>...</isnt-xml> <okay-xml attrib="this">...</okay-xml> <okay-xml2 attrib='this'>...</okay-xml>
You must always spell out the attribute name; you can't imply it by using a name value:
<isnt-xml "red">...</isnt-xml> <okay-xml color="red">...</okay-xml>
You can't use data attributes:
<!NOTATION mpeg SYSTEM "mpgview.exe"> <!ATTRIBUTE #NOTATION mpeg isnt-xml (v2 | v3) #REQUIRED> <!ENTITY movie-a SYSTEM "movie-a.mpg" NDATA mpeg [isnt-xml="v2"]> <!ENTITY movie-b SYSTEM "movie-b.mpg" NDATA mpeg [isnt-xml="v3"]>
In some cases, the way to make this XML might be to expand your list of notations:
<!NOTATION mpeg2 SYSTEM "mpgview2.exe"> <!ENTITY movie-a SYSTEM "movie-a.mpg" NDATA mpeg2> <!ENTITY movie-b SYSTEM "movie-b.mpg" NDATA mpeg3>
XML places various restrictions on entity declarations and entity references.
You can't use data text internal entities (
CDATA
,
SDATA
or
PI
):
<!ENTITY isnt-xml CDATA "text"> <!ENTITY isnt-xml2 SDATA "[adjust me]"> <!ENTITY isnt-xml3 PI "BRS ..YEAR">
You can't use bracketed text internal entities (
STARTTAG
,
ENDTAG
,
MS
and
MD
):
<!ENTITY isnt-xml4 STARTTAG "gi"> <!ENTITY isnt-xml5 ENDTAG "gi"> <!ENTITY isnt-xml6 MS "CDATA[text"> <!ENTITY isnt-xml7 MD "--comment--">
Only the simplest form is allowed for internal entities:
<!ENTITY okay-xml "text"> <!ENTITY okay-xml2 "[adjust me]"> <!ENTITY okay-xml3 "<?BRS ..YEAR?>" <!ENTITY still-isnt-xml4 "<gi>"> <!ENTITY still-isnt-xml5 "</gi>"> <!ENTITY okay-xml4-5 "<gi></gi>"> <!ENTITY okay-xml6 "<![CDATA[text]]>"> <!ENTITY okay-xml7 "<!--comment-->">
Examples 4 and 5 are explained under 'Synchronicity', below.
You can't use
SUBDOC
,
CDATA
or
SDATA
external entities:
<!ENTITY isnt-xml SYSTEM "url" SUBDOC> <!ENTITY isnt-xml2 SYSTEM "url" CDATA mpeg> <!ENTITY isnt-xml3 SYSTEM "url" SDATA mpeg>
External entities can have no entity type specified, or have
NDATA
specified:
<!ENTITY okay-xml SYSTEM "url"> <!ENTITY okay-xml2 SYSTEM "url" NDATA mpeg>
The
FORMAL
feature allows you to use what are called 'formal public identifiers' to name entities such as portions of DTDs
and portions of documents. XML allows public identifiers, but requires that they be followed by a system
identifier to use in case the public identifier can not be resolved. External entities are identified in
ENTITY
declarations:
<!ENTITY isnt-xml PUBLIC "-//Example//Entity Example//EN"> <!ENTITY okay-xml PUBLIC "-//Example//Entity Example//EN" "../examples/example.ent">
External entities are also identified in
DOCTYPE
declarations:
<!DOCTYPE isnt-xml PUBLIC "-//Example//DTD Example//EN"> <!DOCTYPE okay-xml PUBLIC "-//Example//DTD Example//EN" "http://www.example.org/example.dtd">
The exception to this rule is the NOTATION declaration, which does not require a system identifier:
<!NOTATION okay-xml PUBLIC "ISO/IEC 10918:1993//NOTATION Digital Compression and Coding of Continuous-tone Still Images (JPEG)//EN"> <!NOTATION okay-xml2 PUBLIC "ISO/IEC 10918:1993//NOTATION Digital Compression and Coding of Continuous-tone Still Images (JPEG)//EN" "jpegview.exe">
When you use
SYSTEM
identifiers for external entities, these identifiers must be URLs:
<!DOCTYPE okay-xml SYSTEM "example.dtd"> <!NOTATION okay-xml SYSTEM "http://www.example.org/example.not">
In SGML, you can omit the system identifier after the
SYSTEM
keyword:
<!ENTITY isnt-xml SYSTEM>
In XML, you must always include it:
<!ENTITY okay-xml SYSTEM "example.ent">
You can not declare a default entity:
<!ENTITY #DEFAULT "[isnt-xml]">
You can't leave the final semicolon off entity references, as SGML allows you to do in certain contexts:
<isnt-xml>Résumé</isnt-xml> <okay-xml>Résumé</okay-xml>
SGML's deprecated obfuscatory entity references are disallowed in XML. Elements and marked sections need to start and end in the same entity.
Generally, everything needs to be balanced inside of each entity. This is important because it allows you to choose not to expand entities in certain contexts while still maintaining a balanced structure.
XML does not allow references to external entities in attribute literals:
<!ENTITY external SYSTEM "file.txt"> <!ENTITY internal "text"> ... <isnt-xml attrib="&external;"> <okay-xml attrib="&internal;">
You can refer to external data entities in content; but non-validating parsers are not required to include that entity. They may merely choose to note that they saw the reference and go on.
In a separate DTD file (the 'external subset'), parameter entities are allowed to appear inside of markup declarations. But in the internal subset of the DTD in an XML document, they can only appear where a whole markup declaration would be allowed:
<!DOCTYPE document SYSTEM "document.dtd" [ <!ENTITY % isnt-xml "p"> <!ELEMENT %isnt-xml; (#PCDATA)> <!ENTITY %okay-xml SYSTEM "fragment.dtd"> %okay-xml; ]>
XML restricts the variation in syntax and location of comments that SGML allows.
A typical SGML comment looks like this:
<!--Okay XML.-->
The '
<!
' and '
>
' are called the comment declaration, and the '
--...--
' is the comment proper.
You can't slip comments into other declarations. So this is not allowed:
<!ELEMENT p (#PCDATA | em)* --Isn't XML.-->
You must have exactly one comment inside of each comment declaration. You are not allowed zero:
<isnt-xml><!></isnt-xml> <okay-xml><!----></okay-xml>
And you are not allowed more than one:
<!--Isn't XML.-- --Isn't XML.--> <!--Okay XML.- - - -Okay XML [FILE GARBAGED HERE, SORRY]
[FILE GARBAGED HERE, SORRY] n empty status keyword specification. XML does not allow this:
<![[isnt-xml]]>
You can't use
TEMP
in a status keyword specification:
<![ TEMP [isnt-xml]]>
You can't use
RCDATA
marked sections:
<![ RCDATA [isnt-xml]]>
You can't use
INCLUDE
or
IGNORE
marked sections in the document instance, but only in the DTD (and not in the internal subset of the DTD):
<!DOCTYPE document SYSTEM "document.dtd" [ <![ IGNORE [isnt-xml]]> ]> <document><![ IGNORE [isnt-xml]]></document>
You can't use more than one status keyword in a single marked section:
<![ INCLUDE CDATA [isnt-xml]]>
You can't use parameter entities to specify status keywords.
<![ %maybe; [isnt-xml]]>
You aren't allowed any white space around the word '
CDATA
' in a
CDATA
marked section start:
<![CDATA [isnt-xml]]> <![ CDATA[isnt-xml]]> <![ CDATA [isnt-xml]]> <![CDATA[okay-xml]]>
XML uses a special syntax for processing instructions. You can imitate this XML syntax by using a similar convention for your SGML processing instructions. Processing instructions are closed in SGML with a right angle-bracket. In XML, they are closed by a question-mark right angle-bracket sequence:
<?isnt-xml This is a processing instruction.> <?okay-xml This is a processing instruction.?>
In XML, the
PIC
(processing instruction close) delimiter to '
?>
' instead of the usual '
>
'. If you make this change to your SGML declaration, then the first processing instruction above will not parse and
the second will parse just as in XML. If you do not make this change, both will parse, but the second will contain
the question mark as part of the content of the processing instruction, rather than as the ending delimiter.
It is good practice to categorize your SGML processing instructions by always starting them with a name that says to which processor they are directed. In XML, this practice is a requirement. This name is called the PI 'target':
<??> <!--This isn't XML because it has no target.--> <?okay-xml?> <?okay-xml2 The target is 'okay-xml2'.?> <?okay-xml3The target is 'okay-xml3The'.?>
The target '
xml
' has a special meaning in XML. To avoid confusion, any other capitalization of those three letters is reserved
(and prohibited):
<?xml This isn't XML.> <?XML This isn't XML.> <?XmL This isn't XML.> <?xmlx This is technically okay but tempting fate.> <?sgml This is okay XML.>
XML insists on case-sensitivity in places where SGML is typically insensitive. This can be a big headache at first, but it can ultimately simplify processing of the data. This is one of several places where SGML can be made to match XML by changing the SGML declaration you use.
First, adopt a standard capitalization for your element and attribute names. As a programmer afraid of
carpal-tunnel syndrome, I suggest all lower case. Then, change '
NAMECASE GENERAL YES
' to '
NAMECASE GENERAL NO
' in your SGML declaration file.
All the name characters allowed by the reference concrete syntax are allowed by XML. So are thousands of others. But it's possible to have an SGML declaration that declares as name characters some characters that XML doesn't allow as name characters.
Like with entity references, you can't leave off the final semicolon in a character reference:
<isnt-xml>Résumé</isnt-xml> <okay-xml>Résumé</okay-xml>
You can't use named character references:
<isnt-xml>You can't use &#RE;, &#RS;, &#SPACE;. or a custom-defined function &#NAME;.</isnt-xml>
You can't use a numeric character reference to include a non-SGML character in XML.
XML does not include a wide variety of markup minimization features available in SGML. This section lists the more common types of minimization. Less commonly used minimization techniques are listed under 'Obscure Features'.
The
OMITTAG
feature is fairly commonly used. It allows you to completely leave out certain start and end tags when you can
tell by the context that they are required. So, using this feature, you might leave out the start tag for a
chapter title (provided that there were some data characters at the beginning of the chapter, yet all chapters
were required by the DTD to have start with a title) or the end tag for a chapter (provided that there was a start
tag for the next chapter and the DTD didn't allow chapters to nest). Notice how both examples require consulting
the DTD to determine which tags have been left out. XML does not allow tags to be omitted in this way.
The
SHORTTAG
feature allows various abbreviations to be made within a tag. This feature is officially declared to be
ON
in the SGML declaration for XML, because some of these abbreviations are in fact allowed. But many are not.
Quite distinct from the idea of an empty element, there is the possibility in SGML of having empty tags. An empty
start tag looks like this: '
<>
'; and an empty end tag looks like this: '
</>
'. Empty tags are allowed in SGML in certain contexts where it is clear what the missing element type name is.
<isnt-xml><>Apparently, isnt-xml must always start with a certain element</></isnt-xml>
In XML, element type names must always be spelled out:
<okay-xml><TITLE>We need to spell out 'title'</title></okay-xml>
Did you know that the final right angle-bracket is not always required on tags in SGML? Stomach-turning, isn't it? Sorry I ever mentioned it.
<isnt-xml<isnt-xml2>text</isnt-xml2</isnt-xml> <okay-xml><okay-xml2>text</okay-xml2></okay-xml>
In SGML, you can sometimes leave off the quotes when specifying attribute values. This is not allowed in XML. See The Big Three: Attributes: Specifying Attribute Values, above.
In SGML, you can sometimes leave off the attribute name when specifying attribute values. This is not allowed in XML. See The Big Three: Attributes: Specifying Attribute Values, above.
You should not use '
&
' or '
<
' as data:
<isnt-xml>In SGML, you can use & and < as data in certain contexts; when followed by a space, for example.</isnt-xml>
Use '
&
' for '
&
' and '
<
' for '
<
':
<okay-xml>In SGML, you can use & and < as data in certain contexts; when followed by a space, for example.</okay-xml>
The places where you can use the
&
and
<
characters without them being interpreted as markup are in comments, processing instructions,
CDATA
marked sections and in the literal entite> and
<
characters without them being interpreted as markup are in comments, processing instructions,
CDATA
marked sections and in the literal entitace (the
S
) after the entity name can not be left out:
<!ENTITY isnt-xml"Scrunching allowed in SGML."> <!ENTITY okay-xml "No scrunching in XML.">
But note that the start of an XML CDATA section is defined to include no white space:
[19] CDStart ::= '<![CDATA['
Do not include the SGML declaration in the XML document entity. The SGML declaration for XML must be left implied.
There are a number of features of SGML of which you may be only dimly aware. You likely won't notice their absence
from XML. The
SHORTREF
,
RANK
and
DATATAG
features are properly classified under the minimization category.
XML does not allow SGML's
SHORTREF
feature, whereby certain short sequences (like double carriage-returns) are interpreted as abbreviated references
to markup (like paragraph tags).
SGML includes a feature called
DATATAG
in which data acts as both markup and content. I've never encountered a use of this feature. Perhaps a raise of
hands?
The
RANK
feature allows you to declare a set of elements that differ only in a numerical suffix (like the
H1
,
H2
,
H3
heading elements in HTML) and then to type only
<H>
, having it be interpreted as another of whatever the most recent heading-level occurred in the document.
Even when this feature is not turned on in your SGML declaration, you can still split the element type name into two parts in an element declaration. You can't do this in XML.
There are three variations on the
LINK
feature (
SIMPLE
,
EXPLICIT
and
IMPLICIT
). I've heard arguments for the importance of this feature, from people with more authority than I have. But I
don't believe them. At any rate, XML does not include this feature. (Note: The SGML LINK feature is not for
hyperlinking content, but rather for associating processing with an SGML document.)
The SGML
CONCUR
feature allows you do apply more than one DTD to the same data, simultaneously. XML does not include this feature.
A journey of a thousand miles starts with the first step. But before you take that step, you ought to determine where you stand. This will help you start out in the right direction. Or realize you're happy right where you are.
SGML is defined by 'ISO 8879:1986(E). Information processing - Text and Office Systems - Standard Generalized Markup Language (SGML). First edition - 1986-10-15', available from the International Organization for Standardization in Geneva.
XML is defined by 'Extensible Markup Language (XML) 1.0', a World Wide Web Consortium (W3C) Recommendation dated
1998 February 10 (
www.w3.org/TR/REC-xml
).
This paper is derived from James Clark's 'Comparison of SGML and XML', a World Wide Web (W3C) Consortium Note dated
1997 December 15 (
www.w3.org/TR/NOTE-sgml-xml-971215
by
jjc@jclark.com
). A few details have apparently changed since Clark's Note was written:
PUBLIC
identifiers, for example, are now a part of XML.