The HTML and XHTML DTDs are defined by the W3C for presenting information in web browsers. However, these DTDs are not perfect, and not complete. HTML is one of 3 standards currently desired for web page markup. The other two are WAI compliance for accessibility, and Dublin Core metadata for indexing and classification. The XHTML DTD, although it supports both of these standards, does not enforce them. In addition, while XHTML permits a web page to be hierarchically nested using the DIV element, it does not enforce it.
The purpose of the WebDoc DTD is to enhance and where necessary modify the XHTML 1.0 Strict DTD to enforce compliance with the extra standards now being requested for web pages. Since the WebDoc DTD was first released, the first public working draft of XHTML 2.0 has been published, and some of the ideas, such as explicit hierarchical nesting, are included.
Here are the differences between XHTML 1.0 and WebDoc 1.1.
- A hierarchical structure for body content has been enforced by modifying the content model for the BODY element, and adding new DIV3..6 elements.
- Dublin Core metadata fields have been promoted from generic META elements using attributes to specific elements such as DC.TITLE. Elements considered mandatory by the Irish Public Sector Metadata Standard are required elements, while others are optional.
- Elements not relevant to documents, such as form elements, have been removed.
- A set of specific WebDoc metadata elements has been defined, to support the XML to HTML publishing process, by explicitly storing information required to publish documents to the web, for example, the name of a HTML template that should be applied to the document.
- Named character entities (e.g. etc.) were removed from the specification, as these entities are more trouble than they are worth. Non-ASCII characters can be handled better using either UTF encoding or numeric character entities (e.g.  ).
The XHTML 1.0 Strict DTD already enforces markup standards required to support the WAI Accessibility Guidelines, and no further changes were made.
In addition to web publication, there is an increasing requirement for public sector agencies to exchange information directly in XML format. The WebDoc DTD enforces a hierarchical rather than linear document structure, which is crucial to exchanging useful documents.
Many of the ideas used in the WebDoc DTD are already used in a DTD developed for the Department of Enterprise Trade and Employment website (http://www.entemp.ie/). This site is maintained as a set of Word documents, and uses an intermediary XML document encoding for information storage, rather than mapping Word documents directly to HTML pages. This enables them to easily modify the look and feel of the website, without having to re-export all information from Word. When a change is required, the standard HTML template for the site can be modified once, and all pages quickly regenerated from the XML version.
- A mapping between Microsoft Word and the WebDoc DTD has been defined for the YAWC Pro (http://www.yawcpro.com/) Word to XML converter, so it is possible to create and edit WebDoc documents using Word.
- A mapping between the WebDoc and Word 2000 XML DTDs is being developed as an XSLT stylesheet. This will enable WebDoc documents to be converted into Word for editing. It will mean that while a document might be initially created in Word, it can be stored in XML, and 'round-tripped' back into Word for further editing. This means that it will not be necessary to store both the Word and XML versions of a document.
- A mapping between XHTML and WebDoc, to enable migration of HTML pages to WebDoc and Word, is also being developed as an XSLT stylesheet.