Building the TEI documentation

12 October 2008

When people want to make a digital representation of a hand-written manuscript or printed book, then they can make up their own format, however there is a standard for the representation of texts in digital form, it is called the Text Encoding Initiative Proposal 5, also known as TEI P5. There are previous versions of the TEI, but the current version, P5, is an XML-based format.

The documentation can be read online or downloaded as a PDF. However, a more fun approach is that they also provide a subversion repository, so you can check out the documentation in the TEI XML format and all their build scripts, and then build the documentation into HTML, LaTeX or PDF. So it acts as a demo of itself, documents in TEI are rendered in other formats.

Therefore this post is not really about the TEI in itself, but about how I built the documentation locally.

Dependencies

In the TEI Documentation, it says:

Prerequisites for use of TEI Modules

These packages and their associated Makefiles and scripts are all developed and tested on a Debian Linux system. While they should work (possibly with customisation) on other Linux systems, they are not designed to work on Microsoft Windows.

If you want to use the scripts provided with these packages, we recommend you to install the following additional software packages: jing, perl, saxon, trang, xmllint, xsltpro

Gentoo Linux has all the recommended software nicely packaged, install them with:

sudo emerge perl saxon trang libxml2 libxslt jing texlive-xetex

Ubuntu Linux has some of them, install them with:

sudo apt-get install perl libsaxon-java libxml2-utils xsltproc texlive-xetex lmodern

On Ubuntu, I had the most luck by using the unofficial TEI Debian Packages by Sebastian Rahtz for Jing .deb, the Saxon .deb and the Trang .deb. I haven't tried the rest, but it seems like you can get everything from there.

TEI via SVN

I am going to check out the latest version of the TEI via subversion:

svn co https://tei.svn.sourceforge.net/svnroot/tei/trunk ./TEI

This may take some time, go and have a cup of tea. As always, if your network connection breaks midway i.e. you are using a slightly fishy wireless network), then you can enter the TEI directory (cd TEI) and type SVN up and it will carry on.

Building TEI

The TEI directory contains lots of stuff including the guidelines in XML, stylesheets and build scripts. Lets enter the TEI P5 directory:

cd TEI

Now we can try to build the TEI P5 using the supplied build scripts.

Lets start with the Stylesheets.

mkdir build

cd Stylesheets/

make PREFIX=../build install

Secondly lets build the Roma package:

cd ../Roma/

make PREFIX=../build install

Thirdly lets build the guidelines themselves:

cd ../P5/

make PREFIX=../build XSL=../build/share/xml/tei/stylesheet install

This will take some time, go and have a walk in the park.

You may get errors that you have to look at, so you might have to fix the problem, rinse, lather and repeat until you can get it to work without throwing errors.

Part of the process are some Perl substitutions of the stylesheet paths, one of them did not seem to work for me. It was line 21 (starting <xsl:import href="/ ) of Utilities/guidelines-latex.xsl that did not work, I just edited the line to match my stylesheet location and it worked.

What have we done

The guidelines are stored in the Source/Guidelines directory:

ls P5/Source/Guidelines/en

They are documents marked up in TEI's XML format.

The Roma package generates validators for the TEI, the Perl scripts build the schemas,and then build the TEI Guidelines into HTML, LaTeX and PDF.

The generated HTML files are now in the Guidelines-web folder:

ls TEI/P5/Guidelines-web

As you can see, the documentation has been built for several languages. Inside the en/html folder is all the HTML files, Open index.html in your web browser and get reading!

1 Bug says...

So how do I actually write in this? [And I thought LaTeX was flexible (as it can be converted to anything and the styles are out of the content)].

Posted at 7:12 a.m. on October 13, 2008


2 Zeth says...

Hi Bug,

Well I am not sure that one would particularly use their build scripts for your own texts. There are easier ways to generate HTML from XML.

To 'write' in TEI, you just open a text editor and type according to the guidelines, but transcribing manuscripts is often in iterative process as the document is passed to different people who layer on more detail. Therefore often TEI documents are generated from data typed in simpler formats, or there are various abstractions for non-technical people to use.

Using LaTeX is great for typesetting, especially for modern scientific works, however for representing written sources in an extremely detailed way, for example by university researchers or libraries, the TEI has more useful tags.

To take one example, a manuscript might be damaged or altered at a later stage than its original composition,

In TEI,to represent a hole in the manuscript there is a <gap /> tag, if a line has been crossed out you have <del />, if a second scribe/writer has added in text over the top, you use <supplied>, if there has been water damage, you can use <damage agent="water> and so on.

Posted at 9:45 a.m. on October 13, 2008


What do you have to say?

Show Editing Help

Europython

About

Hello, my name is Zeth, I'll be your host here.

Command Line Warriors is about taking control of your own technology, it looks at our experiences of computing; especially using GNU/Linux, the Python programming language, the command-line and issues such as techno-ethics, best practices and whatever is cool now. If you take control of your technology then you are a Warrior too!

This site is your site too which means that you can contribute and get involved. You can leave comments using the facility provided. For me, the comments and discussions are by far the best part of the site. So please do have your say!

Latest Discussions

QuickSilver

January 5, 2009
Nice! Is there anyway to implement a ServerAliveInterval for long processes? This is because my our firewall keeps closing the connection based on inactive connections. Thanks,
SFTP in Python: Really Simple SSH

Tun

January 5, 2009
Hi, Do You know, haw can i get start date for tasks evolution? If exists the similar way to your example: i.get_due() ? I would like to have sth like ...
Three Useful Python Bindings - ClamAV, Apt and Evolution

Samuel Huckins

January 4, 2009
Great tips! I have had occasion to do a lot of MySQL instance migrations lately, so here is an improvement for Trick 1: mysqldump <DATABASE_NAME> [mysqldump_options] | gzip -c | ...
Five useful command one liners

George Glass

December 31, 2008
I don't really see the point in trying to make linux user-friendly or take over the desktop. We rule the servers the most important element of the entire game. Let ...
Give Linux a chance

bug

December 31, 2008
@Zeth: The hidden field does block some. Not perfect, but it does release some weight from the filtering system, as those are 100% false comments. Acctually, if you would have ...
On Comment Spam

Zeth

December 31, 2008
Hi Eion, Yes that is an interesting approach also. It is the only approach given by default in the stock Django comments module, though it does not stop all comment ...
On Comment Spam

Bug

December 30, 2008
Well... Sadly, and I guess you hate me for it, I use captcha. But at least it's not an image, so even if you visit using w3m [yey!] you can ...
On Comment Spam

Eion

December 30, 2008
Other than server-side processing of comments, I like to add additional <input>'s and hide them in external css. Most of the time the fields are populated by spam-bots, and if ...
On Comment Spam

Nostoc

December 27, 2008
... Mate possible because of the dull Kg8
Ruy Lopez, Berlin defence, open variation

Nostoc

December 27, 2008
My bad, I meant the picture beneath 15, after close inspection my suggestion would be on 18. Instead of 18 : Qe2, I would have taken that knight with my ...
Ruy Lopez, Berlin defence, open variation

Zeth

December 27, 2008
Nostoc, white takes the rook on 15, the rook is a better kill than a knight.
Ruy Lopez, Berlin defence, open variation

Nostoc

December 26, 2008
I'm not that good at chess, but I have a question. At 15, why doesn't white simply take black's knight in C6 with the bishop? It's an easy kill, since ...
Ruy Lopez, Berlin defence, open variation

Zeth

December 26, 2008
CorkyAgain, good question, I don't have a FreeBSD box available at the moment so I can't comment. On Linux at least watch does as I have described.
Five useful command one liners

CorkyAgain

December 25, 2008
Is the watch command you're describing a Linuxism? On my FreeBSD box, "man watch" seems to be describing something completely different.
Five useful command one liners

Binny V A

December 25, 2008
I have actually setup a site to store just short commands... http://txt.binnyva.com/
Five useful command one liners

Bassam essa

December 25, 2008
i try this line command elinks -source "http://www.e51g.com/" > resulthtml.txt its work done :) thx
Command the Web - an ELinks tutorial

Bassam essa

December 25, 2008
thanx man i need to know how i can download html source page from elinks ? i try $ elinks - view HTML "http://www.google.com/" > resulthtml.txt but its dont work ...
Command the Web - an ELinks tutorial

sam

December 25, 2008
Thanks for commenting the game. I'm new to chess and it's great to see your reasoning behind some of the moves. However, how do you know what to call the ...
Ruy Lopez, Berlin defence, open variation

halfpi

December 24, 2008
I have to mention that on my HP nx 6325 Ubuntu install has been done in ~ 1h, but XP SP3 was not ready after 3h of working. Also Fedora ...
Give Linux a chance

tulcod

December 24, 2008
as for the grep problem, it's probably easier to do: grep needle -R .
Five useful command one liners