This guide describes three command-line tools for use by DTD and W3C XML Schema authors: xsdvalid, dtdvalid and dtdtoxsd. It is also a good reference for the support of DTD and W3C XML Schema in XXE.
This distribution contains the W3C XML Schema validation engine which is integrated in XMLmind XML Editor (XXE).
This engine has been made available to schema and DTD authors in the form of 3 command-line tools:
Features:
Non features:
Xsdvalid tools have been tested with:
Procedure:
$ java -version java version "1.5.0_04" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_04-b05) Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode, sharing)
$ cd $ tar zxvf xsdvalid-30.tar.gz $ ls xsdvalid-30 bin/ doc/
$ xsdvalid-30/bin/xsdvalid -s my_w3c_xml_schema.xsd
Manual install on Windows is similar to the install on Unix. Simply run xsdvalid-30\bin\xsdvalid.bat, dtdvalid.bat, dtdtoxsd.bat rather than the xsdvalid-30/bin/xsdvalid, dtdvalid, dtdtoxsd shell scripts.
These excellent packages have not been developed by XMLmind. Copyright information is contained in the corresponding .LICENSE file. Read the corresponding .README file to have more details about these packages.
xsdvalid ?options? ?xml_doc ... xml_doc?
Checks an XML schema for validity. Checks an XML document for validity against an XML schema.
Options:
See note about the generated documentation.
It is possible to specify several -s and -ss options. Such multiple schemas (and their included/imported schemas, if any) are merged into one big global schema.
This command is XML catalog aware. This command will use the XML catalogs specified in environment variable XML_CATALOG_FILES. This variable must contain one or several XML catalog file names or URLs separated by a semi-colon (';').
XML catalogs may be used to resolve URLs found in the following places:
Limitations:
Examples:
$ xsdvalid -s bugreport.xsd
$ xsdvalid sample.xml
$ xsdvalid -s bugreport.xsd bad.xml file:/home/hussein/src/xxe/distrib/samples/xsdvalid/bad.xml:E:9:2: element contains invalid data: "M1.2p" does not match pattern "^((M|V)\d+\.\d+(p\d+)?)$" [cvc-pattern-valid] [cvc-type.3.1.3] file:/home/hussein/src/xxe/distrib/samples/xsdvalid/bad.xml:E:11:2: element contains invalid data: syntax error in dateTime value "2001-13-16T12:00:00" [cvc-datatype-valid.1.2.1] [cvc-type.3.1.3] file:/home/hussein/src/xxe/distrib/samples/xsdvalid/bad.xml:E:12:15: the sequence of child elements is incorrect [cvc-complex-type.2.4] file:/home/hussein/src/xxe/distrib/samples/xsdvalid/bad.xml:E:12:15: element cannot contain element "html:font" [cvc-complex-type] file:/home/hussein/src/xxe/distrib/samples/xsdvalid/bad.xml:E:24:0: the sequence of child elements is incorrect [cvc-complex-type.2.4] file:/home/hussein/src/xxe/distrib/samples/xsdvalid/bad.xml:E:36:0: element has no attribute "number" [cvc-complex-type.3] file:/home/hussein/src/xxe/distrib/samples/xsdvalid/bad.xml:E:37:2: element contains invalid data: "xsd" is not one of the allowed values [cvc-enumeration-valid] [cvc-type.3.1.3]
$ xsdvalid -s bugreport.xsd -w serial $ ls serial/ directory.txt schema0.ser
$ xsdvalid -v -r serial \ -ss http://www.xmlmind.com/xmleditor/schema/bugreport \ bugreport.xsd sample.xml Deserializing schema 'http://www.xmlmind.com/xmleditor/schemas' (1582ms) Loading XML document 'sample.xml' (457ms) Validating 'sample.xml' (182ms)
Note that sample.xml has an xsi:schemaLocation attribute, therefore there is no need to use option -ss even when the schema is to be deserialized.
$ xsdvalid -v -r serial sample.xml Loading XML document 'sample.xml' (491ms) Deserializing schema 'http://www.xmlmind.com/xmleditor/schemas' (1533ms) Validating 'sample.xml' (189ms)
The generated HTML reference manual, organized like "DocBook: The Definitive Guide" by Norman Walsh and al., lists all elements and attributes specified in the W3C XML schema or DTD.
This manual is intended to help content authors create instances conforming to a given XML schema or DTD. This manual is not intended to help XML schema or DTD authors document their design.
Note that, for now, the documentation generator cannot extract documentation contained in a schema (i.e. in annotation/documentation elements) and merge extracted documentation with automatically generated documentation.
dtdvalid ?options? ?xml_doc ... xml_doc?
Checks a DTD for validity. Checks an XML document for validity against a DTD.
Options:
See note about the generated documentation.
When the -d or -dd command-line options are used, the constraint that the root element of an XML instance and the document element of the DTD must match is not checked.
This command is XML catalog aware. This command will use the XML catalogs specified in environment variable XML_CATALOG_FILES. This variable must contain one or several XML catalog file names or URLs separated by a semi-colon (';').
Notes:
Therefore, if an XML document to be validated references such serialized DTD in its <!DOCTYPE>, dtdvalid will always complain that the root element in the XML instance does not match document element named "dummy".
The method to get rid of this false alert is to always specify such serialized DTD using the -dd command-line option.
Examples:
$ dtdvalid -d xhtml1-strict.dtd
$ dtdvalid sample.xhtml
$ dtdvalid -d xhtml1-strict.dtd bad.xhtml file:/home/hussein/src/xxe/distrib/samples/dtdvalid/bad.xhtml:E:7:2: element contains characters other than white space [cvc-complex-type.2.3] file:/home/hussein/src/xxe/distrib/samples/dtdvalid/bad.xhtml:E:8:4: element has no attribute "align" [cvc-complex-type.3] file:/home/hussein/src/xxe/distrib/samples/dtdvalid/bad.xhtml:E:13:4: the sequence of child elements is incorrect [cvc-complex-type.2.4] file:/home/hussein/src/xxe/distrib/samples/dtdvalid/bad.xhtml:E:13:4: element cannot contain element "hr" [cvc-complex-type]
$ dtdvalid -dd "-//W3C//DTD XHTML 1.0 Strict//EN" xhtml1-strict.dtd -w serial $ ls serial/ directory.txt schema0.ent schema0.ser
$ dtdvalid -v -r serial \ -dd "-//W3C//DTD XHTML 1.0 Strict//EN" xhtml1-strict.dtd sample.xhtml Deserializing global DTD '-//W3C//DTD XHTML 1.0 Strict//EN' (1079ms) Loading XML document 'sample.xhtml' (748ms) Validating 'sample.xhtml' (180ms)
dtdtoxsd ?options? in_dtd_file out_xsd_file
Converts DTD in_dtd_file to XML-Schema out_xsd_file.
Options:
If the DTD declares text, external or unparsed entities, these declarations are copied to a file which has the same basename as out_xsd_file but with extension .ent. This file is created in the same directory as out_xsd_file.
In addition to out_xsd_file, a schema file named xml.xsd is created in the same directory as out_xsd_file. This secondary schema declares standard attributes xml:space, xml:lang and xml:base. The main schema always imports xml.xsd even if it doesn't reference any of the standard attributes.
This command is XML catalog aware. This command will use the XML catalogs specified in environment variable XML_CATALOG_FILES. This variable must contain one or several XML catalog file names or URLs separated by a semi-colon (';').
Limitations:
Examples:
$ dtdtoxsd -t http://www.w3.org/1999/xhtml xhtml1-strict.dtd /tmp/xhtml.xsd $ ls /tmp xhtml.ent xhtml.xsd xml.xsd
Formal reference: XML Schema Part 2: Datatypes.
Formal reference: XML Schema Part 1: Structures.
Constraints on XML instances which are not checked:
Constraints on XML schemas which are not checked:
In this case, the implementation simply overwrites the previously defined group.
In this case, the implementation simply overwrites the previously defined attributeGroup.
attributeGroups are not validated as such. If something is wrong, it is detected when the the attributeGroup is actually used.
Example 1: circular references are checked when the attributeGroup is actually used.
Example 2: duplicate attribute and several ID attributes in the same attributeGroup are checked when the attributeGroup is actually used.
groups are not validated as such. If something is wrong, it is detected when the the group is actually used.
The implementation allows to add facets not defined by the base type.
Other specificities:
Rationale: the schema for schemas is found invalid when the algorithm described in the spec is used.
However, the validation engine supports xs:import elements without a schemaLocation attribute, if an xs:import element for the same namespace but this time having a schemaLocation attribute has previously been processed.
Example:
<xs:import namespace="foo" schemaLocation="http://foo.com/schema1.xsd" /> <!-- Later, typically inside an included module. --> <xs:import namespace="foo" />
Note that the other example below will not work because the validation engine cannot guess which of schema1.xsd or schema2.xsd contains the components to be imported.
<xs:import namespace="foo" schemaLocation="http://foo.com/schema1.xsd" /> <!-- Later, typically inside an included module. --> <xs:import namespace="foo" schemaLocation="http://foo.com/schema2.xsd" /> <!-- Later, typically inside another included module. --> <xs:import namespace="foo" />
That is, it is possible to specify this:
<xs:key name="truck1" > <xs:selector xpath="." /> <xs:field xpath="truck/@number | truck/@plate" /> </xs:key>
But not this:
<xs:key name="truck1" > <xs:selector xpath="." /> <xs:field xpath="truck / @number | truck / @plate" /> </xs:key>
<xs:element name="foo"> <xs:complexType> <xs:sequence> <xs:element ref="bar" /> <xs:element name="bar" form="qualified" type="xs:decimal" /> <!--NOT SUPPORTED--> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="bar" type="xs:decimal" />
An implementation limit error x-cos-element-consistent is reported in that case.
<xs:element name="foo"> <xs:complexType> <xs:sequence> <xs:element name="bar" type="xs:token" /> <xs:element name="bar" type="xs:token" nillable="true" /> </xs:sequence> </xs:complexType> </xs:element>
An implementation limit error x-cos-element-consistent is reported in that case.
Formal reference: Extensible Markup Language (XML) 1.0 (Second Edition).
Constraints on XML instances which are not checked:
Constraints on DTDs which are not checked:
Bug fixes:
Bug fixes:
Bug fixes:
Bug fixes:
Bug fixes:
<xs:element name="foo"> <xs:complexType> <xs:sequence> <xs:element name="bar" type="xs:token" /> <xs:element name="bar" type="xs:token" nillable="true" /> </xs:sequence> </xs:complexType> </xs:element>
An error is still reported for the above valid schema, but the error is now an implementation error: implementation limit: element declaration "bar" differs from previous element declarations "bar" [x-cos-element-consistent].
For example, ##other meant (to make it simple) "any namespace, including absent, different from targetNamespace".
In fact, ##other means "a namespace must be specified and this namespace must be different from targetNamespace".
Regressions:
xsdvalid -r serial f1.xhtml f2.xhtml f3.xhtml
is now much slower now because the W3C XML Schema for XHTML is deserialized for each document to be validated.
If you want to speed up this, run something like:
xsdvalid -r serial -ss http://www.xmlmind.com/xmleditor/schema/xhtml xhtml.xsd \ f1.xhtml f2.xhtml f3.xhtml
Enhancements:
This version of the Schema for Schemas allows to add attributes with non-schema namespaces to annotate most schema components.
Slightly edited this normative schema to allow spaces before and after path alternatives in the xpath attributes of elements selector and field. Example:
<xs:key name="truck1" > <xs:selector xpath="truck" /> <xs:field xpath="@number | @plate" /> </xs:key>
Example:
<xs:import namespace="foo" schemaLocation="http://foo.com/schema1.xsd" /> <!-- Later, typically inside an included module. --> <xs:import namespace="foo" />
Note that the other example below will not work because the W3C XML Schema validation engine cannot guess which of schema1.xsd or schema2.xsd needs to be imported.
<xs:import namespace="foo" schemaLocation="http://foo.com/schema1.xsd" /> <!-- Later, typically inside an included module. --> <xs:import namespace="foo" schemaLocation="http://foo.com/schema2.xsd" /> <!-- Later, typically inside another included module. --> <xs:import namespace="foo" />
Bug fixes:
<xs:group name="Misc.extra"> <xs:choice/> </xs:group>
caused xsdvalid to throw a NullPointerException.
<xs:group name="general"> <xs:sequence> <xs:element name="general" type="general"> <xs:unique name="generalUnique"> <xs:selector xpath="*"/> <xs:field xpath="@uniqueElementName"/> </xs:unique> </xs:element> </xs:sequence> </xs:group>
caused xsdvalid to report false errors (the error message was: an identity-constraint with the same name "XXX" has already been defined).
Enhancements:
Bug fixes:
Bug fixes:
Bug fixes:
Example: default namespace is "http://www.w3.org/1999/xhtml".
Before the bug fix,
"div/p"
meant
"{http://www.w3.org/1999/xhtml}div/{http://www.w3.org/1999/xhtml}p".
After
the bug fix, "div/p"
means
"{}div/{}p".
Option -gendoc added to xsdvalid and dtdvalid allows to automatically generate an hypertext (HTML) reference manual from an XML schema or a DTD. See note about the generated documentation.
Enhancements:
Bug fixes:
Changed version number to V2 to use the same version number as XXE.
Forgot to update the documentation for the release of Patch2.
Validating really large XML schemas on Windows was not possible due to a stack overflow error. Increasing the stack size by editing xsdvalid.bat and adding -Xss1m to the Java command line had no effect.
Xsdvalid 1.0 Patch2 requires Java 1.4. It will not run with Java 1.3. Do not upgrade if you cannot install Java 1.4 on your machine.
Fixed an obscure bug related to restrictions of the NMTOKENS, IDREFS, and ENTITIES simple types.
Initial release.