What the Hell is XML?
by Troy Janisch
- Published in: Browsers, HTML and XHTML, XML |
-
No discussion
XML (Extensible Markup Language) is the Eurodollar of web development. Both XML and the Euro bring order to chaos; both offer undeniable, wide–ranging benefits; both are poised, in 2002, to change the way we do things. Frankly, both scare the crap out of people.
For web developers, 2002 is a time to conquer fears and take their first hands–on approach to XML. It’s time to examine XML and realize the practical benefits that it can provide to web projects today.
The bankers can fend for themselves.
XML, HTML & Databases
If you need a good analogy to describe XML to other people, don't mention HTML. Although XML looks a lot like HTML, creating a good XML file is more like designing a database than creating a web page.
Databases and XML documents are both used as a means to organize data. As a result, they share a lot of similarities.
A database table design for a table containing news stories would look something like this:
- Table Name:
- News
- Table Columns:
-
- Headline
- Category
- Author
- Date
- Abstract
- Body
- Status
A basic XML document containing the same information might look like this:
<?xml version="1.0"?> <News> <Category></Category> <Headline></Headline> <Author></Author> <Abstract></Abstract> <Body>Pending</Body> </News>
In addition to these similarities, both databases and XML represent a huge step forward in the ability to publish and manage web content.
XML everywhere
At any scale above that of the small, personal site, database–driven websites are indisputably better at managing, updating, and maintaining content than HTML–only sites. What everyone will discover in 2002 in that XML–driven database sites will prove to be indisputably better than database–driven sites. XML is going to be everywhere.
And as a web developer, you are going to love it.
XML is poised to eliminate more headaches than a bottle of Ibuprofen, improve productivity more than cans of Red Bull, and increase profitability more than we’ll want to our clients to know about.
How? Two words: Content management.
Content management & migration
Before projects are initiated by a client, a website usually reaches a stage of obsolescence, immediacy, or embarrassment. Web projects are big projects with short time lines. It’s not surprising, then, that one of the biggest factors influencing the profitability and success of web projects is the ability to effectively manage content.
Separation of style, programming, and content
The ability to store a site’s content, programming, and design separately and mix them together transparently, on demand, is the art of our craft. Each moment eliminating rework and duplication is a dollar in our pocket. It’s time spent adding new features to a site rather than rewriting, reworking, and “searching and replacing.”
We’ve solved much of the problem with databases, templates, style sheets and server–side includes. Much that remains, XML can address. It’s the best tool for managing content – the content itself, not the way text appears on screen. XML is used to structure, store and send information in a platform–neutral, object–oriented, plain text format.
Guerilla tactics
The power of XML is unleashed when its placed in the hands of content providers. However, since copywriters and clients are accustomed to writing in platform–neutral, object–oriented, plain text formats, it means helping them do it unknowingly. Guerilla content management tactics, such as MS-Word–to–XML migration, can be wildly successful.
The basic model for XML migration is to start in a text editor, such as MS Word , that can be converted directly to XML, or via RTF, using third party tools. After conversion to XML, the documents can be used by an XML–aware server, or converted to HTML using another third-party tool.
Successful migration requires providing content creators with a Microsoft Word template and a set of basic instructions prior to Web development. The template must include custom style tags based on the organization of the pending website.
When using the template, content developers need to avoid using MS Word formatting options that are not defined within the custom style tags. If custom tags are insufficient, new tags must be added that reflect the type of content being addressed.
While the process seems cumbersome, with enough practice, it takes significantly less time to update site content than using processes without XML – particularly once you harness the power of XML validation.
Validation
Websites either evolve or suffer the slow, painful death of neglect. New content needs to be added. Old content needs to be removed. Missing content needs to be found. Clients are frustrated by their inability to maintain and manage their web content. Web developers are frustrated by the aftermath. XML can help.
XML–based documents make it easy to find outdated and missing content at a glance. This is achieved by using XML Data Type Definitions (DTDs) to identify the timeliness of information and determine what information “nuggets” must be present within the content.
Like databases, XML documents allow you to validate information, before you use it, to make sure the content is timely, appropriate, and complete. Since we’re used to talking about validation as it relates to databases, let’s take a more detailed look at the database table we created to hold news stories. In reality, a database table must include definitions for each column:
| News Table: | |||
|
Columns |
Type | Required? | Notes |
|
Headline |
varchar | Yes | Max of 50 characters |
| Author | varchar | no | |
|
Category |
Varchar | Yes | Selected from drop-down list |
|
Date |
date/time | Yes | Date added to table |
| Abstract | varchar | Yes | 250 character intro. |
| Body | text | Yes | Allows text formatting in field |
| Status | varchar | Yes | pending - No distribution public - Public distribution private - Internal distribution |
By validating fields, the data table ensures that each news story contains all of the required information. So, with the proper integration and a web–based interface, the data table could be an efficient tool for publishing news on the web.
The XML document with simple DTD validation used for the same information might look like this:
|
<?xml version="1.0"?> <News begins="7/1/02" ends="7/5/02" > |
The XML document makes significant contributions to web publishing when compared to the database alone. XML allows data to be validated based on the embedded DTDs, XML tags and attributes. This means that appropriate content can be extracted directly from the XML document based on selection criteria without requiring an interim database, without requiring a database query, and without being separated from the source document.
Using DTD, XML documents suddenly become self–aware.
Substance & Style
XML finds advocates on both sides of the ongoing “content” versus “style” debate.
XSL (the eXtensible Stylesheet Language), the style sheet language of XML, packs a wallop. It’s much more robust than Cascading Style Sheets (CSS). Instead of using rules (as CSS does) to format content, XSL uses (.xsl) templates to describe how to transform XML into other types of documents. When you implement an XML–based site, XML doesn't replace HTML. If it sounds a bit confusing, here’s why. When you deal with XSL files, all is not as it appears:
- The .XSL file embeds HTML with XML tags and logic that define how information should be displayed at run time.
- At run–time, the .XML file is displayed in the web browser on the fly.
- Although HTML formatting included in the .XSL file is applied, it won’t appear in the source for the .XML document being displayed in the browser.
- The appearance in HTML is based on the combination of XML tags and logic within the .XSL file.
- Because the .XSL file can transform XML in the browser, the document that appears in the browser may only be a subset of the content in the actual XML file.
The ability to transform the XML conditionally in a web browser means that content can be centralized. Parts of the document are displayed or ignored on an as–needed basis.
Now is the Time
Web developers have been telling others that they are waiting to dabble in XML until it becomes widely available. The truth is, it’s been widely available for months:
-
Internet Explorer 5 contains an XML engine that fully supports XML 1.0, as defined by the World Wide Web Consortium (W3C). This is a huge improvement over the engine in IE4.
-
Netscape 6.0/Mozilla includes full XML support.
-
Flash 5 ActionScript supports XML–based data transfer to and from a server.
-
Director has offered an XML Parser Xtra since Director 7.0 that allows Shockwave movies to read, parse, and make use of the contents of XML documents.
(Ed.Note: Director’s somewhat buggy XML parser has put off many developers. Reader Hussein Boon recommends Andy White’s user–extensible Lingo scripts instead. A tutorial is available. Boon also recommends a DOM–Lingo binding that binds Director’s Lingo scripting language to the W3C DOM Level 2.)
- IIS servers offer XML integration via the Microsoft XML Parser. Version 4 of the parser supports XML 1.0.
-
SQL Server 2000 provides integrated XML support. It’s the first release to do so.
-
Microsoft’s XML technology preview runs under any SQL Server release. Although the output is slightly different in a few cases, it’s a solid XML environment for the pre–SQL Server 2000 crowd.
-
Version 2 of the Apache Cocoon XML, a powerful framework for XML web publishing, been released.
-
Expat, an XML 1.0 parser can be used in cooperation with the XML parser function for PHP. This toolkit lets you parse, but not validate, XML documents.
- XML–RPC is a platform–neutral protocol for executing programs remotely, “designed to be as simple as possible, while allowing complex data structures to be transmitted, processed and returned.”
This means we’ve all run out of excuses for putting off XML. Today, the benefits of developing web projects in XML aren’t merely imaginable. They are achievable.
Learn More
Related Topics: Browsers, HTML and XHTML, XML
Troy Janisch is president and founder of the 
