The projects described involved the digitisation of everything from monumental inscriptions in the classical world (one of the many new words I learnt was „epigraphy“), through a 10th century palimpsest containing the earliest known manuscript of Archimedes, through the complete correspondence of the German composer Carl Maria von Weber, all the way to 1930s comic books. In all cases the challenge is to capture the detail – for example the fact that several words in an inscription might now be illegible, but were recorded in the 18th century by the first antiquarian visitors to a site. Capturing different features of the text often leads to a need for parallel markup, with corresponding XSLT challenges – but as I say, we didn’t get much technical detail.
I’m probably just going through an ignorant phase, but reading the above made me realize I don’t see the merits of XML anymore. For many applications, that is. After having contributed to a linguistic research project making heavy use of XML and XSLT for a year and a half, I wonder: Why bother to force data into the shape of trees when the data is clearly more complex than that? Because you can then do more operations with standard tools? But does this rather fuzzy advantage outweigh the morbidness of some graph-to-tree conversions and the resultant wrenches you have to make in writing tools to process the data? If I were to design, say, a platform for storing, browsing and querying linguistic annotations right now, I would definitely put a relational database at its core and not an XML one. Any similarities between this arbitrary example and real projects – *cough* – are purely coincidental.