ODF VS OOXML: Library support shootout

28 November 2007

Web and Application development

One of the core reasons for wanting an XML based document format is that third-party applications and web services will be able to create, import and process them. For that, one needs language bindings. There are three levels of abstraction available:

  • Firstly, you can interact with these documents (at least with ODF) at a pure XML level using just XSLT, XQuery and so on; the basement approach. This is, of course, somewhat painful and time-consuming to work at this low level.
  • Secondly, you can have libraries that map ODF or OOXML documents to native datatypes (and back again) in the language that you use; to follow the analogy, this is the main floor.
  • Lastly, you can have APIs that allow you to control and automate desktop applications such as OpenOffice, Word and Excel; we can can think of as the loft or rooftop. This is pretty-much useless for applications that run on a server of course.

It is the rooms on the main level, the libraries for programming languages, that are the most useful and widely applicable for application and Web development.

How well is ODF and OOXML supported in different languages?

I started off by choosing six common languages used by application and Web developers, then I went and looked for what libraries are available for each document format:

Language ODF OOXML
Python Yes 1 No
Perl Yes 2 No
Ruby Yes 3 No
PHP Yes 4 Started 5
Java Yes 6 Started 7
Microsoft C# Yes 8 Yes 9

So a pretty clean sweep for ODF at the moment, though as you can see, there are a couple of projects just starting up to provide Java and PHP support for OOXML, how far they get will be interesting to see - is it really possible to create a complete and valid OOXML document without Microsoft software? There is a big difference between getting some text out of a OOXML document and making a library that fulfils the massive ECMA specification.

The Mono project are (or at least were) trying to make C# bindings work on non-Microsoft platforms. Hopefully, if OOXML takes off to any degree then some bright spark will write Python bindings too, even if just to help us all rescue data out of OOXML documents.

I'm sure if I took less well known languages, the same trend would be found - the ODF library is produced first and provides the most complete bindings. ODF has a number of advantages to make this the case:

  • Firstly, ODF has a couple of years head start over OOXML. A real implementation of OOXML did not appear until 2007, while ODF has already become the default office format for the Free Software/Open Source world (if you don't include plain text).
  • Secondly, because ODF is much purer XML than OOXML, making an ODF library for a language is not a lot of work if the language already has well developed XML support.
  • Thirdly, ODF is not owned or controlled by a single company, other IT organisations are not in any rush to support OOXML or help Microsoft maintain it's office monopoly. There are only so many times that you can pee in the pool before no one wants to swim with you anymore.

Please let me know using the comments if I have missed a project out. I know that OpenDocument in some cases has language support from multiple vendors which is nice, the OpenDocument Fellowship has a more complete list of libraries, among other useful lists.

Digg This.

1 Tim says...

this is a test of the emergency broadcast system...

Posted at 3:39 p.m. on November 28, 2007


2 Zeth says...

Hi Tim,

"please state the nature of the medical emergency"

Posted at 3:41 p.m. on November 28, 2007


3 bug says...

"there are a couple of projects just starting up to provide Java and PHP support for OOXML"

Shouldn't it be vise versa? As PHP and Java are the ones who get the OOXML support. OOXML by itself doesn't support PHP nor Java.

Posted at 6 p.m. on November 28, 2007


4 Joe says...

> There are only so many times that you can pee in the > pool before no one wants to swim with you anymore.

Nice analogy ;-)

Apparently however, even though the MS pool is bright yellow, and has what look like candy bars floating in it, there are gold doubloons scattered around the bottom.

People are camping in line for a chance to dive in.

BTW, you might also include the actual command line in your list, because it's quite possible to do some useful operations on an ODF file using command line tools.

E.g., list all the titles in a presentation:

$ unzip -p ~/doc/ODF_What_Who.odp content.xml | sed -e 's!<draw:frame !n&!g' -e 's!</draw:frame>!&n!g' | grep 'presentation:class="title"' | grep -o '<text:p.*</text:p>' <text:p text:style-name="P1">ODF: Open Document Format</text:p> <text:p text:style-name="P1">ODF: What Is It?</text:p> <text:p text:style-name="P1">ODF: So what?</text:p> <text:p text:style-name="P1">ODF: What Is It?</text:p> <text:p text:style-name="P1">ODF: What Is It?</text:p> <text:p text:style-name="P1">ODF: What Is It?</text:p> ...

Granted, not as nice as a real library, but useful at need, and readily available.

Posted at 7:18 p.m. on November 28, 2007


What do you have to say?

Show Editing Help

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

Cupcake

July 31, 2010
Good post! You helped me a lot with my school project! CountryField(blank = True) < (K)
Countries in Django

LeshaShampoo

July 30, 2010
it was very interesting to read commandline.org.uk I want to quote your post in my blog. It can? And you et an account on Twitter?
Email Syntax Check in Python

vemma2018

July 30, 2010
I find myself coming to your blog more and more often to the point where my visits are almost daily now!
On Comment Spam

layecenda

July 30, 2010
Hello. And Bye.test :) http://idfjhvihdfiphvlajbvhalibv.com
PuTTY Series: Adding PuTTY to your system path

scuba

July 30, 2010
I’ve been visiting your blog for a while now and I always find a gem in your new posts. Thanks for sharing.
On Comment Spam

Businesking

July 30, 2010
Great site and articles for hack for win, I said Amazing post
How not to program WSGI

Tehnoking

July 30, 2010
This is Great post to learn about the hack Thumbs-up for you :D
How not to program WSGI

Syabiltech

July 30, 2010
I think this articles for master...because very hard to learning, As blogger beginners like me.
How not to program WSGI

coffeeatea

July 30, 2010
Are you looking for coffee gifts? We can tell you more about the coffee gifts including coffee machines and coffee pods.
Introducing Soturi - yet another Django blog application

noni juice

July 30, 2010
I just sent this post to a bunch of my friends as I agree with most of what you’re saying here and the way you’ve presented it is awesome.
On Comment Spam

Dion Moult

July 29, 2010
What I do know is that ever since I tried out Opera and put their tab bar on the left as a column, I've loved that layout. Back on Firefox ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

ZonaEntertainment

July 29, 2010
Wow useful articles, I'm read to learn about this and now I bookmark this to my Facebook, thanks for share!
How not to program WSGI

Giacomo

July 29, 2010
Honestly, I think both Mozilla and you are wrong :) This sort of concept adds overhead. A user would have to manage all this crap, constantly dragging and dropping, creating ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

Matija "hook" Šuklje

July 29, 2010
As a minimalist, you'll probybly moan if I mention KDE, but I'll do so anyway ;) The future I want (and actually see slowly fold out before me) is to ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

tahitian noni

July 28, 2010
Thank You For This Blog, was added to my bookmarks.
On Comment Spam

Rick

July 28, 2010
I already have piles. It's called A New Window.
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

Tech News

July 25, 2010
Thanks for this short tutorial...was auto-FTPing my files from my appserver to webserver for my tech news website. Everything was OK until someone hacked it. Hosting provider is now recommending ...
SFTP in Python: Really Simple SSH

naypalm

July 24, 2010
During the past 3-4 years, I and many others have enjoyed unlimited 2G/3G internet. But ever since the massive cult-like following of i Phone users in the US, most cellular ...
Calling time on mobile internet nonsense?

Steve

July 15, 2010
Very occasionally, you will run into a Java program that uses a lot of memory just to hold all the classes used. It turns out that the JVM uses a ...
Three classic command line tips

no

July 14, 2010
1. number one 2. number two 4. number four 3. number three 6. number six # first # second ## second-ay ## second-bee ### second-bee-one ### second-bee-two
An Introduction to ReStructuredText