Building the TEI documentation
12 October 2008
When people want to make a digital representation of a hand-written manuscript or printed book, then they can make up their own format, however there is a standard for the representation of texts in digital form, it is called the Text Encoding Initiative Proposal 5, also known as TEI P5. There are previous versions of the TEI, but the current version, P5, is an XML-based format.
The documentation can be read online or downloaded as a PDF. However, a more fun approach is that they also provide a subversion repository, so you can check out the documentation in the TEI XML format and all their build scripts, and then build the documentation into HTML, LaTeX or PDF. So it acts as a demo of itself, documents in TEI are rendered in other formats.
Therefore this post is not really about the TEI in itself, but about how I built the documentation locally.
Dependencies
In the TEI Documentation, it says:
Prerequisites for use of TEI Modules
These packages and their associated Makefiles and scripts are all developed and tested on a Debian Linux system. While they should work (possibly with customisation) on other Linux systems, they are not designed to work on Microsoft Windows.
If you want to use the scripts provided with these packages, we recommend you to install the following additional software packages: jing, perl, saxon, trang, xmllint, xsltpro
Gentoo Linux has all the recommended software nicely packaged, install them with:
sudo emerge perl saxon trang libxml2 libxslt jing texlive-xetex
Ubuntu Linux has some of them, install them with:
sudo apt-get install perl libsaxon-java libxml2-utils xsltproc texlive-xetex lmodern
On Ubuntu, I had the most luck by using the unofficial TEI Debian Packages by Sebastian Rahtz for Jing .deb, the Saxon .deb and the Trang .deb. I haven't tried the rest, but it seems like you can get everything from there.
TEI via SVN
I am going to check out the latest version of the TEI via subversion:
svn co https://tei.svn.sourceforge.net/svnroot/tei/trunk ./TEI
This may take some time, go and have a cup of tea. As always, if your network connection breaks midway i.e. you are using a slightly fishy wireless network), then you can enter the TEI directory (cd TEI) and type SVN up and it will carry on.
Building TEI
The TEI directory contains lots of stuff including the guidelines in XML, stylesheets and build scripts. Lets enter the TEI P5 directory:
cd TEI
Now we can try to build the TEI P5 using the supplied build scripts.
Lets start with the Stylesheets.
mkdir build
cd Stylesheets/
make PREFIX=../build install
Secondly lets build the Roma package:
cd ../Roma/
make PREFIX=../build install
Thirdly lets build the guidelines themselves:
cd ../P5/
make PREFIX=../build XSL=../build/share/xml/tei/stylesheet install
This will take some time, go and have a walk in the park.
You may get errors that you have to look at, so you might have to fix the problem, rinse, lather and repeat until you can get it to work without throwing errors.
Part of the process are some Perl substitutions of the stylesheet paths, one of them did not seem to work for me. It was line 21 (starting <xsl:import href="/ ) of Utilities/guidelines-latex.xsl that did not work, I just edited the line to match my stylesheet location and it worked.
What have we done
The guidelines are stored in the Source/Guidelines directory:
ls P5/Source/Guidelines/en
They are documents marked up in TEI's XML format.
The Roma package generates validators for the TEI, the Perl scripts build the schemas,and then build the TEI Guidelines into HTML, LaTeX and PDF.
The generated HTML files are now in the Guidelines-web folder:
ls TEI/P5/Guidelines-web
As you can see, the documentation has been built for several languages. Inside the en/html folder is all the HTML files, Open index.html in your web browser and get reading!




1 Bug says...
So how do I actually write in this? [And I thought LaTeX was flexible (as it can be converted to anything and the styles are out of the content)].
Posted at 7:12 a.m. on October 13, 2008
2 Zeth says...
Hi Bug,
Well I am not sure that one would particularly use their build scripts for your own texts. There are easier ways to generate HTML from XML.
To 'write' in TEI, you just open a text editor and type according to the guidelines, but transcribing manuscripts is often in iterative process as the document is passed to different people who layer on more detail. Therefore often TEI documents are generated from data typed in simpler formats, or there are various abstractions for non-technical people to use.
Using LaTeX is great for typesetting, especially for modern scientific works, however for representing written sources in an extremely detailed way, for example by university researchers or libraries, the TEI has more useful tags.
To take one example, a manuscript might be damaged or altered at a later stage than its original composition,
In TEI,to represent a hole in the manuscript there is a <gap /> tag, if a line has been crossed out you have <del />, if a second scribe/writer has added in text over the top, you use <supplied>, if there has been water damage, you can use <damage agent="water> and so on.
Posted at 9:45 a.m. on October 13, 2008