I’ve been struggling recently with how I want to encode the transcriptions of the very many digital facsimiles of my documents. I decided even before I wanted to construct a digital project with my current work that I wanted to move away from rtf, doc, and pdf files as much as possible for my research. There are a few reasons for this, not the least of which that txt files are much smaller, more resilient, and, in a certain manner, more flexible. This has affected my workflow and presented a series of decisions relative to project-specific files.
I’ve spent some time playing with TextMate for transcribing using multimarkdown, which I like very much and is integrated into Scrivener (my favorite app for writing). I’ve used simple and free TextEdit to make plain txt files, which ultimately isn’t a solution because I do want/need some markup of the files. What I want to do is really mark these files up with both descriptive and, potentially, analytical bits that will ultimately be query-able. And, so, I keep finding myself drawn back to TEI, an XML schema designed specifically for markup of humanist texts.
What I like about MultiMarkdown is the ease of transcription, especially using the bundle in TextMate, and the ability to transform to a variety of file types– xhtml, pdf, LaTeX, etc. But, even given its ease, it’s not designed to do the type of manuscript description and qualitative markup I’m looking for. The TEI is the opposite of mmd– it is completely overwhelming in its potential complexities, and as a result doesn’t leave me with a feeling of ease during transcription. Am I validated? Will I ever learn the elements, and their attributes? What do I really need in that TEI Header, and what can I omit? At any rate, I bought <oxygen/> a few months ago because there is such a deep academic discount and I kept feeling this tug towards TEI. That, and I’ve been reading all the online tutorials and information I can find on TEI (lots of tutorials here, and I like these here and here).
The potential for TEI documents, as an xml data set, goes far beyond my personal technical skill. But, I’m planning for the future. I’m developing a digital archive that I want to be for the long term. So, what to do in the short term? I’m determined to get the work I’m doing up in an attractive and useful manner in the meantime while I’m developing my personal technical skills and a community of digital historians and humanists.
So, where does that leave me for now? If you’ve ever glanced at this blog, you’ll know that I’m most familiar and comfortable with WordPress, which I’ve been using as a CMS for my teaching and other professional activities for a little while now. WordPress does so many things easily and well that work for edu deployment. But, I’m under no illusion that WordPress provides a framework for serious text analysis of a manuscript corpus like the one I’m developing. Which brings me back to transformations. As with other xml formats, it’s not THAT difficult to transform tei texts into other formats– xhtml, html, pdf, docx, ood, etc. And, using an xml editor like <oxygen/>, one could transform the documents with the built in xslt scenarios, save and upload to wordpress pages. Or, one could use this plugin, which allows you to embed shortcode into a page and have the xml transformed directly in wordpress. Nice. It took me a little playing around to get it to work. What I ended up doing was copying a whole xsl package from the tei consortium into wp-content to locally host the whole set of stylesheets for transforming into valid xhtml. What I’m thinking is that I can also hack the code from that plugin a bit to make a form to allow visitors to access the transcriptions in the format of their choice– as a pdf, or a docx, or the raw xml.
At any rate, here’s a sneak preview of what’s to come on the wordpress front end:
I’m hoping to have site up before the summer is over, adding files I’ve been working on for the past year or two. But, I like the aesthetic of it as it is on the local dev right now. I’m also going to do a longer post specifically on how these decisions have affected my academic workflow.