I’m guessing that if you’re an historian who started using scripting to muck around with your docs or the web in the last three years, like me you probably worked through The Programming Historian at some point. It’s a great tutorial to evangelize scripting to everyday-working-historians who are curious enough to give programming a try. It worked for me because it is immediately practicable, and sidesteps the typical programming introduction centered on data types and structures, expressions, statements, conditionals, etc. It’s a gateway drug. I think the more approachable we make practical programming for historians, the more will give it a tree and discover its power. As an aside to that obviously-evangelistic statement, I do not see the ability to code as an imperative for either humanities scholars or even digital humanities scholars. Moreover, when speaking of programming and coding in the humanities, I also argue that scripting may be enough. There is some skepticism on the the relative rigor of code in the humanities, as opposed to its place in a CS curriculum or for professional software designers, but really I think that skepticism is misplaced. For the everyday-historian, learning some scripting skills with Ruby or Python or any other computer tongue for that matter, is potentially important in part because it can disrupt one’s apprehension of the textual nature of our sources, or more precisely of text-as-data, and offer wholly new and complimentary ways of reading the archival record. More on that another time. For evangelizing the gateway drug of simple code, I would also argue that it is helpful to try to bridge the interfaces historians are used to working with. Let’s face it, the command line and the interpreter are intimidating spaces when someone first ventures into them. Powerful, yes, but also otherworldly to a non-programming literate computer user.
Historians who use computers to manage their research, which at this point has to be approaching 98% at least (I won’t go higher than that, I know there are still legal pad-istas out there), are used to file-based workflows. It makes sense, of course, because for the archival historian a computer file represents an easy metaphor for the physical source world in which we work. In fact, I think that the folder/file/document metaphor may be more cogent for historians than for other everyday computer users. At the archive, we are likely to take folders from boxes, and in those folders we find texts that we call “my documents” and take notes on or transcribe into files on the computer that are likewise stored in folders. In fact, as opposed to cultural or literary scholars, it’s rare to hear archival historians speak of “texts” as opposed to “documents”. So, it’s a convenient replication that orders knowledge on the computer in a manner familiar to its organization in the archive. If I had to guess, I’d say historians are likely to be pretty tied to the folder/file/document metaphor for computer interaction, and likely resistant to calls for its demise, or for criticisms of the robustness of the metaphor’s graphical-user-interface.
Document-, or file-based workflows are mediated by a Finder-type window or a file/folder dialogue box. Programming such gui-elements can be a real pain in the ass. But, in the interest of the re-usability of code, they can also be pretty handy. In learning techniques to manipulate text with python, it’s nice to be able to pick files or folders of files to mess with, without having to put long paths and filenames through
raw_input or in the code itself. But again, guis can be a real pain in the ass to program. Not if you use
easygui. As the name of the module suggests,
easygui provides several very-easy-to-use dialogue boxes that look native to the operating system and that provide a comforting means of choosing directories and files or entering bits of text into a program. They also help abstract one’s script from specific example texts or current obsession. For the rest of this post, I’ll provide a quick and easy tutorial on using
easygui and some usage scenarios.
easygui is available through Python’s package managers,
easy_install. The current stable release is v.0.95, and you can also download the source files on the project’s page.
Install with the package manager of your choice in the normal way from a terminal prompt:
$ sudo easy_install easygui
$ sudo pip install easygui
To install from source, download and unzip the package from the site linked above. Change into the directory of the unzipped folder in the terminal and enter:
$ sudo python setup.py install
In both of those cases, I’m assuming you’re on a *nix computer, be it Linux or OSX.
sudo is necessary if you’re using the system’s default python, and installs easygui to your python’s site-packages folder. Things are a little different on Windows or if you’re using homebrew or macports python on your Mac. For beginners, it’s probably easiest to just use the default installation of python on your Mac and not have to worry about editing you PATH or PYTHONPATH.
To test if easygui is installed, open a python interpreter in your terminal and try to import it–>
>>> import easygui.
File and Directory Dialogues
Once installed, inserting file and directory dialogue boxes into your code is very simple.
The directory box simply allows you to chose a directory path, which can be used to open or write files. From the directory box, the only thing you can do is chose a directory. That said, in the case of OS X, for example, you can also use system built-in elements like the
New Folder button, which is useful if you want to write files to, well, a new directory. In basic usage we have–
easygui.diropenbox(msg=None, title=None, default=None)
This line will display an open directory dialogue box. The optional arguments allow you to customize the dialogue box a bit.
title both put text at the top of the box, and do so in order with a dash in between. Seems redundant to me, so I usually just use one.
default sets the directory that the dialogue box will open in. So, for example, on my laptop to open in my Documents folder as this would be
To get this Directory Open Box, the code would be
easygui.diropenbox(msg='Choose a directory.', default='/Users/chadblack/Documents/sex_crime_empire/'). Notice that files are grayed-out. You can’t choose a file with this subclass, only a directory. This returns to the script a directory name, listed fully, such that if I had chosen the default above, it would return
How is this useful? What sort of usage scenario? If, for example, one wanted to produce word counts or keywords-in-context or, for that matter, NCD clusters from a set of documents, one could use this dialog to choose the directory of files to iterate over. It’s trivial to produce a list of files from the returned directory name with the
os module. In the interpreter it would look like this:
>>> import easygui
>>> import os
>>> directory = easygui.diropenbox()
>>> fileList = os.listdir(directory)
That would give you a list of file names in that directory, which can then be iterated over. (Note, to open those individual files, you either need to concatenate them to the directory path, or change the working directory to where those files are located. For the latter, it’s as simple as
Basic usage for the file open dialog looks like this:
easygui.fileopenbox(msg=None, title=None, default='*', filetypes=None)
In this case,
title work the same way, but
default points to a file name. More importantly, though,
filetypes determines what type of files you can open, which needs to be defined as a list. So, to be able to open .txt files, you need
filetypes=['*.txt']. Any file that is of a type not listed will be grayed-out, and you won’t be able to open it.
For this File Open box, the code would be
easygui.fileopenbox(msg='Choose a file.', filetypes=['*.txt']). This returns a file name to the script, which in the case above would be
/Users/chadblack/Documents/00-Writings/Posts/trunk/102011/thinkingWorld.txt. You still need to open that file, which can be done in one assignment step like this:
text = open(easygui.fileopenbox(filetypes=['*.txt']), 'r').read()
With that line, you chose, open, and read a file, assigning it to
text as a string all in one fell swoop.
The save file dialog is just as easy. Basic usage is:
easygui.filesavebox(msg=None, title=None, default='*', filetypes=None)
The arguments for this function work the same as with opening files, and it also returns a filename.
For the example here, we have
easygui.filesavebox(msg='Save file.', default='MyFile.txt', filetypes=['*.txt']), and this would return to the script the full path of MyFile.txt, which can be used in a write statement. So, let’s say we did something with
text above and want to save it to MyFile.txt:
open(easygui.filesavebox(msg='Save file.', default='MyFile.txt', filetypes=['*.txt']), 'w').write(text)
There are many other types of dialog and text-display boxes that
easygui provides. In the past I’ve used, for example, a text entry box to set the base name for batch renaming photos.
easygui is, well, very easy and adds both a familiar interface and reproducibility to file and directory interactions for historians so used to working that way. I’m looking forward to v.2.0 of The Programming Historian, which I hear will be open source, to add
easygui into the excellent example scripts.