See more tricks in the Gallery.
Charlie Halpern-Hamu, Ph.D.
Structured-Text Consultant Incremental Development, Inc.
charlie@IncrementalDevelopment.com
Charlie Halpern-Hamu received his doctorate in Computer Science from the University of Toronto. He is completing an MBA through Heriot-Watt University. He has published papers in the areas of denotational semantics, programming-language design tools and graphical control of robots by the disabled. He has been a been a structured-text consultant for eight years. His company, Incremental Development, Inc., helps organizations structure both text and surrounding business processes with emphasis on simplicity and skill-transfer.
This paper presents several Stupid XSL Tricks. A Stupid XSL Trick is a use of XSL for something unusual or amusing for which it wasn't necessarily designed. A better name for this paper would be Stupid XSLT Tricks, as all the examples in this paper use the transformation half of XSL, rather than the formatting-object half.
This paper is intended for an audience that, like the author, is learning XSLT and wishes do so by poking around in various less-explored corners. This is not a scientific paper that expands the boundaries of human knowledge; it is more of a tutorial that might expand the boundaries of your knowledge.
This paper presents several Stupid XSL[xsl] Tricks. A Stupid XSL Trick is a use of XSL for something unusual or amusing for which it wasn't necessarily designed. A better name for this paper would be Stupid XSLT[xslt] Tricks, as all the examples in this paper use the transformation half of XSL, rather than the formatting-object half.
Here are today's entertainments:
Given an XML[xml] schema, produce a sample instance that conforms to that schema.
A schema expresses, as an XML document, the possible relations between elements, attributes and data values in a class of XML documents.
Here is an simplified schema:
<?xml version='1.0'?> <schema> <element name='doc'> <archtype> <element ref='head'/> <element ref='body'/> </archtype> </element> <element name='head'> <archtype> <element ref='title'/> <element ref='date'/> </archtype> </element> <element name='body'> <archtype> <element ref='para'/> </archtype> </element> <element name='title'> <archtype content='mixed'/> </element> <element name='date'> <archtype> <element name='year' type='four-digit-year'/> <element name='month'> <archtype content='mixed'/> </element> <element name='day' type='integer'/> </archtype> </element> <element name='para'> <archtype content='mixed'> <element ref='bold'/> <element ref='italic'/> </archtype> </element> <element name='bold'> <archtype content='mixed'/> </element> <element name='italic'> <archtype content='mixed'/> </element> </schema>
I mean for this schema to correspond to the following DTD:
<!ELEMENT doc (head, body)> <!ELEMENT head (title, date)> <!ELEMENT body (para)> <!ELEMENT title (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT year (#PCDATA)> <!--Should be a four-digit-year.--> <!ELEMENT month (#PCDATA)> <!ELEMENT day (#PCDATA)> <!--Should be an integer.--> <!ELEMENT para (#PCDATA | bold | italic)*> <!ELEMENT bold (#PCDATA)> <!ELEMENT italic (#PCDATA)>
Note that I haven't included any occurrence information: perhaps
date
should be optional; certainly
para
should be allowed more than once. This is left as an exercise to the reader, or perhaps an exercise for the author
if he can find some more time.
Note also that this schema language allows to be declared locally (as
year
,
month
and
day
are in this example) or globally (as all the other elements are). Locally declared elements be referenced only in
the same
archtype
element as their declaration. Locally declared elements are an interesting feature of the current XML Schema working draft, because they allow the same element name to
have alternative content restrictions depending on context.
Local element declarations are less challenging from the point of view of this XSLT exercise because the nested declarations map easily to a nested sample isntance. For me, the interesting aspect of this exercise is following a chain of references to produce a consolidated, nested structure. References to local element declarations do add an interesting wrinkle, in that both the current context and the global context need to be searched for the appropriate declaration.
My goal is to write a transformation that produces the following output:
<?xml version='1.0'?> <doc> <head> <title>title</title> <date> <year>four-digit-year [unknown type]</year> <month>month</month> <day>123</day> </date> </head> <body> <para>para and <bold>bold</bold> and <italic>italic</italic> </para> </body> </doc>
This example output, like all the others in this paper, is in fact actual output generated using the appropriate
stylesheet and XT (James Clark's implementation of XSLT)[xt]. At the time this paper was written, XT did not fully implement all the attributes
of
xsl:output
. So XML declarations and indentation have been
added by hand.
The following XSLT stylesheet accomplishes the desired transformation:
<?xml version='1.0'?> <xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'> <xsl:output method='xml' indent='yes'/> <xsl:template match='/schema'> <xsl:apply-templates select='element[1]'/> </xsl:template> <xsl:template match='element[@name and archtype]'> <xsl:element name='{@name}'> <xsl:apply-templates select='archtype'/> </xsl:element> </xsl:template> <xsl:template match='element[@name and @type]'> <xsl:element name='{@name}'> <xsl:choose> <xsl:when test='@type="integer"'> <xsl:text>123</xsl:text> </xsl:when> <xsl:otherwise> <xsl:value-of select='@type'/> <xsl:text> [unknown type]</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:element> </xsl:template> <xsl:template match='archtype[@content="mixed"]'> <xsl:value-of select='../@name'/> <xsl:for-each select='*'> <xsl:text> and </xsl:text> <xsl:apply-templates select='.'/> </xsl:for-each> </xsl:template> <xsl:template match='archtype'> <xsl:apply-templates select='*'/> </xsl:template> <xsl:template match='element[@ref]'> <xsl:apply-templates select='../element[@name=current()/@ref] | /schema/element [@name=current()/@ref]'/> </xsl:template> </xsl:stylesheet>
Here's how it works.
The first
xsl:template
matches the
schema
element just under the root of the tree. It assumes that the first
element
child of the
schema
is to be the root element of the resulting sample instance. Therefore, attention is directed, using
apply-templates
, to this first
element
child.
The second
xsl:template
matches an
element
declaration in which an element is identified by the
name
attribute and is defined using an
archtype
child. In this case, a sample element is generated using
xsl:element
. The contents of this generated element are determined by directing attention to the
archtype
that is the child of the
element
.
The third
xsl:template
handles the other way to declare an element. Rather than having a child
archtype
that restricts the possible subelements of the element being declared, an
element
can have a
type
attribute that determines the lexical structure of the textual content of the element being declared. In this simple
example, we only handle one predefined datatype (
integer
) and throw our hands up at any other value.
Just as in the previous
xsl:template
, we first output a sample element using
xsl:element
before filling in its contents. The second and third templates could have been combined into a single template that
matched based on the existence of the
name
attribute. The combined template could then unconditionally output a sample element before turning its attention to
how to fill in the contents. At that point, it could distinguish between the two ways of defining content: the
archtype
child or the
type
attribute.
The next two
xsl:template
s match the two kinds of
archtype
s in our simplified schema language:
content='mixed'
and the default
order='seq'
. These two
xsl:template
s are described in the following paragraphs.
A natural next step would be to add
order='choice'
. Choice could be handled very much like sequence, except instead of recursing into each child, only one would be
chosen. (The first child would be easiest choice.) As it is, the stylesheet assumes that if it isnt mixed content,
it is a sequence. This assumption luckily works when
order='choice'
and
maxOccurrences='*'
. So this schema-to-instance stylesheet can in fact be used with the instance-to-schema stylesheet below to effect
a round trip. A round trip from schema to instance and back has the predictable effect of loosening up the schema.
A round trip from instance to schema and back can have the effect of 'beefing up' the sample.
The fourth
xsl:template
is the first of the two
xsl:template
s to match the
archtype
element. This template matches
mixed
content. A
mixed
model is transformed into a sample instance fragment by generating some sample text (chosen here to be the same as
the name of the element) and iterating over each of its children, if any.
Iteration using
xsl:for-each
is used instead of the typical recursion using
xsl:apply-templates
because we want to have the opportunity to highlight the mixed content by adding the word "and" before each
subelement. If we used
xsl:apply-templates
, the templates that matched each child of the archtype would have to check if they were in mixed content to add the
"and" themselves. The
xsl:for-each
allows this logic to be consolidated in one place.
The fifth
xsl:template
is the second of the two to match the
archtype
element. It will only be triggered if the more specific
match
pattern of the preceding
xsl:template
fails. It is extremely easy to handle. No text is allowed in a element-content sequence, so we only need to assemble
the ordered results of evaluating the
archtype
's children.
XSLT
is purposely open about the order in which the attention will be paid to each of the children, but guarantees that
the output will be strung together in the same order as the corresponding input.
The last
xsl:template
matches a the third kind of
element
, a reference to an element declared elsewhere in the
schema
. The distinguishing characteristic of this kind of
element
is that it has a
ref
attribute. We want to behave just as if that element declaration were defined right here where we are. So we need to
redirect our attention, using
xsl:apply-templates
to an
element
whose
name
attribute matches the
current
element
's
ref
attribute (
[@name=current()/@ref]
). The element we seek may be in one of two places. It may be local, in which case it is a sibling, another child of
our parent (
../element
). Or it may be global, in which case it will be a top-level element, a child of schema (
/schema/element
).
Given a sample instance, produce an XML schema that allows that document and some similar documents without allowing everything.
This trick was done for SGML instances and DTDs by Fred [fred] and more recently for XML DTDs by SAXON DTDGenerator [dtdgenerator].
My goal is to start with a sample instance like this:
<?xml version='1.0'?> <document> <head> <title>Title</title> <abstract> <para>A <bold>bold</bold> bit.</para> </abstract> </head> <body> <para>First paragraph.</para> <note> <para>An <italic>italic</italic> bit.</para> </note> <para>Last paragraph.</para> </body> </document>
Note that one
para
contains
bold
and another contains
italic
. I'd like to collect these two clues into a single declaration that allows both subelements.
Also note that we, as humans, can guess that
head
must always appear before
body
and at the same time guess that any number of
para
s and
note
s can intermix in
body
. I'm not going to have the stylesheet make this guess, instead it will assume any number of subelements in any
order (based on the parent-child relationships that it actually sees).
From the sample instance above, I would like to produce a schema like this:
<?xml version='1.0'?> <schema xmlns="http://www.w3.org/1999/09/23-xmlschema/"> <element name="document"> <archtype order="choice" maxOccurrence="*"> <element ref="head"/> <element ref="body"/> </archtype> </element> <element name="head"> <archtype order="choice" maxOccurrence="*"> <element ref="title"/> <element ref="abstract"/> </archtype> </element> <element name="title"> <archtype content="mixed"/> </element> <element name="abstract"> <archtype order="choice" maxOccurrence="*"> <element ref="para"/> </archtype> </element> <element name="bold"> <archtype content="mixed"/> </element> <element name="body"> <archtype order="choice" maxOccurrence="*"> <element ref="note"/> <element ref="para"/> </archtype> </element> <element name="note"> <archtype order="choice" maxOccurrence="*"> <element ref="para"/> </archtype> </element> <element name="italic"> <archtype content="mixed"/> </element> <element name="para"> <archtype content="mixed"> <element ref="bold"/> <element ref="italic"/> </archtype> </element> </schema>
I mean for this schema to correspond to the following DTD:
<!ELEMENT document (head | body)*> <!ELEMENT head (title | abstract)*> <!ELEMENT title (#PCDATA)> <!ELEMENT abstract (para)*> <!ELEMENT bold (#PCDATA)> <!ELEMENT body (note | para)*> <!ELEMENT note (para)*> <!ELEMENT italic (PCDATA)> <!ELEMENT para (PCDATA | bold | italic)*>
This transformation is accomplished by the following XSLT stylesheet:
<?xml version='1.0'?> <xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'> <xsl:strip-space elements='*'/> <xsl:output method='xml' indent='yes'/> <xsl:template match='/'> <xsl:element name='schema'> <xsl:apply-templates select='//*'/> </xsl:element> </xsl:template> <xsl:template match='*'> <xsl:variable name='parent' select='name()'/> <xsl:if test='0 = count( following::*[name()=$parent])'> <xsl:element name='element'> <xsl:attribute name='name'> <xsl:value-of select='$parent'/> </xsl:attribute> <xsl:element name='archtype'> <xsl:choose> <xsl:when test='0 != count( //*[name()=$parent]/text())'> <xsl:attribute name='content'>mixed</xsl:attribute> </xsl:when> <xsl:otherwise> <xsl:attribute name='order'>choice</xsl:attribute> <xsl:attribute name='maxOccurrence'>*</xsl:attribute> </xsl:otherwise> </xsl:choose> <xsl:call-template name='find-kids'> <xsl:with-param name='parent' select='$parent'/> </xsl:call-template> </xsl:element> </xsl:element> </xsl:if> </xsl:template> <xsl:template name='find-kids'> <xsl:param name='parent'/> <xsl:for-each select='//*[name()=$parent]/*'> <xsl:variable name='child'> <xsl:value-of select='name()'/> </xsl:variable> <xsl:if test='0 = count( following::*[name()=$child]/ parent::*[name()=$parent])'> <xsl:element name='element'> <xsl:attribute name='ref'> <xsl:value-of select='$child'/> </xsl:attribute> </xsl:element> </xsl:if> </xsl:for-each> </xsl:template> </xsl:stylesheet>
Here's how it works.
The first
xsl:template
matches the root of the tree (
/
): one level above the root element (
document
in our example input). It writes out a
schema
element. The "usual" next step would be to
apply-templates
to the children of the root (the root element). The template that matched this element would do some processing and
then move on to each child of this new element, etc. Rather than taking the usual approach, the match pattern used
here simply reviews all the elements in the sample instance in one go (
.//*
).
The second
xsl:template
reviews an element, potentially writing out an
element
declaration for it. Because each element may appear many times in the sample, but only one
element
declaration should be created, only the last example of each element type triggers the creation of an
element
declaration. Each input element is considered as a parent: all its children (throughout the sample) are found and
listed in the
element
declaration created.
First, for convenience, set the variable
parent
to the name of the current input element. (We call the current element "
parent
" because we are interested in all the children it has.) Then,
test
if this is the last example of the
parent
. The
test
is: "is the count of all following elements whose
name
is the same as
parent
equal to 0?". We only process the element if it is the last example of its type. (The reason we choose to process
the last example instead of the more intuitive first example is that XT does not yet implement the
preceding
axis.) If there are more of the same element to come, nothing is done now.
If this is the last example of the
parent
in the sample, create an
element
declaration. Set the
name
attribute of this
element
declaration to be
parent
, the name of the current input element. Then check if the
parent
ever contains
text
directly. Do this by counting the number of times, in the whole sample, the pattern "parent is
parent
and child is
text
" is found. If
parent
ever contains
text
, define its contents to be
#PCDATA
mixed
with all the children found inside the
parent
. Otherwise, define its contents to be a repeating choice of all these same children, but without the
#PCDATA
. So either set the
content
attribute of the
element
declaration or set both the
order
and
maxOccurrence
attributes. Either way, the contents of this
element
declaration should be references to all the children that are ever found inside the element being declared.
The third
xsl:template
does the actual work of tracing down all the children of the
parent
, wherever they appear in sample input. It is not invoked by a
match
on the input, but rather by
xsl:call-template
calling it by its
name
. Because it is only called once, one might reasonably choose to place this logic inline in the previous
xsl:template
.
It iterates
for-each
element whose parent is
parent
, anywhere in the document. The name of each such element is stored in the
child
variable. Again, to avoid repetition, only the last example of
parent
/
child
relationship triggers an
element
reference. The
ref
attribute of the
element
reference is set to
child
: the name of the child element.
It's been awhile since I've done this myself, so I'm not sure I'm using the right vocabulary to describe the class of functions I'm trying to differentiate. But an example will illustrate. Sample input and output:
To make this easy on myself, I'll start with a very structured and restrictive DTD that insists on input and output like this:
Here is that input DTD:
<!ELEMENT function-of-x (term+)> <!ELEMENT term (coeff, x, power)> <!ELEMENT coeff (#PCDATA)> <!ELEMENT x EMPTY> <!ELEMENT power (#PCDATA)>
This is the sample input shown above as expressed in this DTD:
<?xml version='1.0'?> <function-of-x> <term><coeff>1</coeff><x/><power>3</power></term> <term><coeff>2</coeff><x/><power>2</power></term> <term><coeff>3</coeff><x/><power>1</power></term> <term><coeff>4</coeff><x/><power>0</power></term> </function-of-x>
This is the sample output as expressed in the same DTD:
<?xml version='1.0'?> <function-of-x> <term><coeff>3</coeff><x/><power>2</power></term> <term><coeff>4</coeff><x/><power>1</power></term> <term><coeff>3</coeff><x/><power>0</power></term> <term><coeff>0</coeff><x/><power>-1</power></term> </function-of-x>
MathML[mathml] this ain't. For the curious, this is MathML:
<?xml version='1.0'?> <reln><eq> <apply><diff/> <bvar><ci>x</ci><degree><cn>1</cn></degree></bvar> <apply><fn><ci>f</ci></fn><ci>x</ci></apply> </apply> <apply><plus/> <apply><times/><cn>3</cn> <apply><power/><ci>x</ci><cn>2</cn></apply></apply> <apply><times/><cn>4</cn> <apply><power/><ci>x</ci><cn>1</cn></apply></apply> <apply><times/><cn>3</cn> <apply><power/><ci>x</ci><cn>0</cn></apply></apply> <apply><times/><cn>0</cn> <apply><power/><ci>x</ci><cn>-1</cn></apply></apply> </apply> </reln>
The desired transformation is accomplished with this stylesheet:
<?xml version='1.0'?> <xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'> <xsl:strip-space elements='*'/> <xsl:output method='xml' indent='yes'/> <xsl:template match='/function-of-x'> <xsl:element name='function-of-x'> <xsl:apply-templates select='term'/> </xsl:element> </xsl:template> <xsl:template match='term'> <term> <coeff> <xsl:value-of select='coeff * power'/> </coeff> <x/> <power> <xsl:value-of select='power - 1'/> </power> </term> </xsl:template> </xsl:stylesheet>
Here's how it works.
The first
xsl:template
matches the root element, echos the same
function-of-x
element to the output, and then turns its attention to each child
term
.
The second
xsl:template
matches the
term
element. It writes out a new
term
, in which the
coeff
has the value of the old
coeff
times the old
power
, and the
power
has the value of the old
power
minus
1
.
Though the output is correct, it would be prettier if the stylesheet noted the power 1 and the coefficient 0 and simplified appropriately.
For me, the exciting thing about this example is the ease with which the coefficient and power of a term can be used in the expressions calculating the new values. Unlike the previous stylesheets, this one reads very much like English.
Some more tricks to try:
I expect that the principal stumbling block with each of these tricks will be the inability to do more than one iteration / go more than set number of levels deep.
Lessons learned:
XSLT
effectively prevents multiple iterations in which each iteration uses the results of the
previous iteration. Is there any sneaky way around this for any/all of the tricks listed above? current()
. When I
revised the paper in response to the October draft, current()
alleviated the need for a variable.
XSLT
's limits on looping are quite intentional and so my difficulties
in looping do not reflect an accidental omission in the language.
bold
to
nest in italic
and vice-versa.)coeff * power
above.
key
feature could have been used in the schema-to-instance exercise if all
element
declarations were global. It wouldn't have been particularly easier on the stylesheet-writer, but I assume
that it should be more efficient.
My original inspiration came before reading Rick Jelliffe's thoughts about using XSL as schema-validation language [xslv] but before reading Francis Norton's follow-on suggestions about using XSLT to build such XSL schema-validators [xsltv]. Reading both spurned me on to actually experiment myself by showing that such tricks are possible. And amusing.