The history of XML
20 September 2008
XML did not fall from heaven (or if you prefer, arise out of hell) fully completed. Instead there was a long process of standardisation.
In 1969, Bob Dylan started his comeback at the Isle of Wight festival, meanwhile, Elvis began his in Las Vegas, Elton John releases his first record and David Bowie's Space Oddity coincided with the Apollo 11 mission to the Moon.
Meanwhile in 1969, in IBM, Goldfarb, Mosher and Lorie were working on an application for legal offices. They decided to make a standardised high-level markup language that was independent of whatever control codes your printer used. They called this markup language after their initials: GML.
A decade later, ANSI (the American National Standards Institute) began developing a standard for information exchange based on GML, this became SGML, which stood for 'Standard Generalized Markup Language', this became an ISO (International Standards Organisation) standard in 1986.
In 1991, CERN physicist Tim Berners-Lee releases his Internet-based hypertext system called the 'World-Wide-Web', this used a particularly dirty SGML variant called HTML - 'HyperText Markup Language', HTML was dirty SGML because it went against the separation of content from presentation, with <b>, <center>, <font>, <blink>, <marquee> and other in-line monstrosities.
Despite being a complete hack and the bane of SGML purists, HTML propelled SGML out of the academic, literary and textual processing circles into the wider world. Angle brackets had taken over the world.
SGML had many features and very few restrictions; i.e. one program may have implemented a certain subset of SGML, while another program would have implemented a different subset, breaking the whole point of SGML which was to be a common information exchange format.
So in a, perhaps futile, attempt to establish order out of chaos, an international working group formed under more international quangos from 1996 to 1998, which defined a subset of SGML, called XML, 'Extensible Markup Language', which aimed to be simpler, stricter, easier to implement and more interoperable. A note by James Clark, the leader of the original technical group, explains the differences between SGML and XML. Over the last decade XML has been constantly revised and improved.
Of course, programs still implement XML in different ways, and one may find a load of marked up files that are somewhere between SGML and XML, as well as program or group specific non-standard behaviour.
The most enthusiastic XML advocates will recommend using XML for everything, including brushing your teeth. However, to be brutally honest, one uses XML when one is forced to.
XML does work better in some situations than others, for example, when you want to pass non-relational data between arbitrary systems, then XML works quite well.
In a future post, we will look at what do you do if you find yourself having to sort out a pile of random XML files.



1 Andy Canfield says...
Take a piece of paper. Write down my name, "Andy". Now underline it.
Oops! That's wrong. You can't underline anything any more. You have to go down to the accounting department and request a a tag for underlining. Any underlining? Underlining names? Underlining names of people who live in Asia? You must decide, first. Then, a week from now, accounting will issue you a tag and you can use that tag to underline my name, until someone else changes it from underline to italic and you have to get a new tag.
I don't see what's so horrible about presentation. If Vincent Van Gogh had to use cascading style sheets we would have only three paintings by him, and he would have cut off both his ears in frustration.
We are using strict XML to transfer data from one application to another. It is good for that. But to create web pages (documents?). Why? Since when am I supposed to leave "presentation" up to some anonymous program?
Posted at 12:06 a.m. on September 25, 2008