Building the TEI documentation

12 October 2008

When people want to make a digital representation of a hand-written manuscript or printed book, then they can make up their own format, however there is a standard for the representation of texts in digital form, it is called the Text Encoding Initiative Proposal 5, also known as TEI P5. There are previous versions of the TEI, but the current version, P5, is an XML-based format.

The documentation can be read online or downloaded as a PDF. However, a more fun approach is that they also provide a subversion repository, so you can check out the documentation in the TEI XML format and all their build scripts, and then build the documentation into HTML, LaTeX or PDF. So it acts as a demo of itself, documents in TEI are rendered in other formats.

Therefore this post is not really about the TEI in itself, but about how I built the documentation locally.

Dependencies

In the TEI Documentation, it says:

Prerequisites for use of TEI Modules

These packages and their associated Makefiles and scripts are all developed and tested on a Debian Linux system. While they should work (possibly with customisation) on other Linux systems, they are not designed to work on Microsoft Windows.

If you want to use the scripts provided with these packages, we recommend you to install the following additional software packages: jing, perl, saxon, trang, xmllint, xsltpro

Gentoo Linux has all the recommended software nicely packaged, install them with:

sudo emerge perl saxon trang libxml2 libxslt jing texlive-xetex

Ubuntu Linux has some of them, install them with:

sudo apt-get install perl libsaxon-java libxml2-utils xsltproc texlive-xetex lmodern

On Ubuntu, I had the most luck by using the unofficial TEI Debian Packages by Sebastian Rahtz for Jing .deb, the Saxon .deb and the Trang .deb. I haven't tried the rest, but it seems like you can get everything from there.

TEI via SVN

I am going to check out the latest version of the TEI via subversion:

svn co https://tei.svn.sourceforge.net/svnroot/tei/trunk ./TEI

This may take some time, go and have a cup of tea. As always, if your network connection breaks midway i.e. you are using a slightly fishy wireless network), then you can enter the TEI directory (cd TEI) and type SVN up and it will carry on.

Building TEI

The TEI directory contains lots of stuff including the guidelines in XML, stylesheets and build scripts. Lets enter the TEI P5 directory:

cd TEI

Now we can try to build the TEI P5 using the supplied build scripts.

Lets start with the Stylesheets.

mkdir build

cd Stylesheets/

make PREFIX=../build install

Secondly lets build the Roma package:

cd ../Roma/

make PREFIX=../build install

Thirdly lets build the guidelines themselves:

cd ../P5/

make PREFIX=../build XSL=../build/share/xml/tei/stylesheet install

This will take some time, go and have a walk in the park.

You may get errors that you have to look at, so you might have to fix the problem, rinse, lather and repeat until you can get it to work without throwing errors.

Part of the process are some Perl substitutions of the stylesheet paths, one of them did not seem to work for me. It was line 21 (starting <xsl:import href="/ ) of Utilities/guidelines-latex.xsl that did not work, I just edited the line to match my stylesheet location and it worked.

What have we done

The guidelines are stored in the Source/Guidelines directory:

ls P5/Source/Guidelines/en

They are documents marked up in TEI's XML format.

The Roma package generates validators for the TEI, the Perl scripts build the schemas,and then build the TEI Guidelines into HTML, LaTeX and PDF.

The generated HTML files are now in the Guidelines-web folder:

ls TEI/P5/Guidelines-web

As you can see, the documentation has been built for several languages. Inside the en/html folder is all the HTML files, Open index.html in your web browser and get reading!

1 Bug says...

So how do I actually write in this? [And I thought LaTeX was flexible (as it can be converted to anything and the styles are out of the content)].

Posted at 7:12 a.m. on October 13, 2008


2 Zeth says...

Hi Bug,

Well I am not sure that one would particularly use their build scripts for your own texts. There are easier ways to generate HTML from XML.

To 'write' in TEI, you just open a text editor and type according to the guidelines, but transcribing manuscripts is often in iterative process as the document is passed to different people who layer on more detail. Therefore often TEI documents are generated from data typed in simpler formats, or there are various abstractions for non-technical people to use.

Using LaTeX is great for typesetting, especially for modern scientific works, however for representing written sources in an extremely detailed way, for example by university researchers or libraries, the TEI has more useful tags.

To take one example, a manuscript might be damaged or altered at a later stage than its original composition,

In TEI,to represent a hole in the manuscript there is a <gap /> tag, if a line has been crossed out you have <del />, if a second scribe/writer has added in text over the top, you use <supplied>, if there has been water damage, you can use <damage agent="water> and so on.

Posted at 9:45 a.m. on October 13, 2008


What do you have to say?

Show Editing Help

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

Cupcake

July 31, 2010
Good post! You helped me a lot with my school project! CountryField(blank = True) < (K)
Countries in Django

LeshaShampoo

July 30, 2010
it was very interesting to read commandline.org.uk I want to quote your post in my blog. It can? And you et an account on Twitter?
Email Syntax Check in Python

vemma2018

July 30, 2010
I find myself coming to your blog more and more often to the point where my visits are almost daily now!
On Comment Spam

layecenda

July 30, 2010
Hello. And Bye.test :) http://idfjhvihdfiphvlajbvhalibv.com
PuTTY Series: Adding PuTTY to your system path

scuba

July 30, 2010
I’ve been visiting your blog for a while now and I always find a gem in your new posts. Thanks for sharing.
On Comment Spam

Businesking

July 30, 2010
Great site and articles for hack for win, I said Amazing post
How not to program WSGI

Tehnoking

July 30, 2010
This is Great post to learn about the hack Thumbs-up for you :D
How not to program WSGI

Syabiltech

July 30, 2010
I think this articles for master...because very hard to learning, As blogger beginners like me.
How not to program WSGI

coffeeatea

July 30, 2010
Are you looking for coffee gifts? We can tell you more about the coffee gifts including coffee machines and coffee pods.
Introducing Soturi - yet another Django blog application

noni juice

July 30, 2010
I just sent this post to a bunch of my friends as I agree with most of what you’re saying here and the way you’ve presented it is awesome.
On Comment Spam

Dion Moult

July 29, 2010
What I do know is that ever since I tried out Opera and put their tab bar on the left as a column, I've loved that layout. Back on Firefox ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

ZonaEntertainment

July 29, 2010
Wow useful articles, I'm read to learn about this and now I bookmark this to my Facebook, thanks for share!
How not to program WSGI

Giacomo

July 29, 2010
Honestly, I think both Mozilla and you are wrong :) This sort of concept adds overhead. A user would have to manage all this crap, constantly dragging and dropping, creating ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

Matija "hook" Šuklje

July 29, 2010
As a minimalist, you'll probybly moan if I mention KDE, but I'll do so anyway ;) The future I want (and actually see slowly fold out before me) is to ...
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

tahitian noni

July 28, 2010
Thank You For This Blog, was added to my bookmarks.
On Comment Spam

Rick

July 28, 2010
I already have piles. It's called A New Window.
We need a thoughout integration of the desktop and the web - not Tab Candy superfast jellyfish

Tech News

July 25, 2010
Thanks for this short tutorial...was auto-FTPing my files from my appserver to webserver for my tech news website. Everything was OK until someone hacked it. Hosting provider is now recommending ...
SFTP in Python: Really Simple SSH

naypalm

July 24, 2010
During the past 3-4 years, I and many others have enjoyed unlimited 2G/3G internet. But ever since the massive cult-like following of i Phone users in the US, most cellular ...
Calling time on mobile internet nonsense?

Steve

July 15, 2010
Very occasionally, you will run into a Java program that uses a lot of memory just to hold all the classes used. It turns out that the JVM uses a ...
Three classic command line tips

no

July 14, 2010
1. number one 2. number two 4. number four 3. number three 6. number six # first # second ## second-ay ## second-bee ### second-bee-one ### second-bee-two
An Introduction to ReStructuredText