There’s a problem
I’ve been playing around for a while now with various algorithms designed to cluster unstructured text. The reason I want to do this is I’m working on a project to test for plural legal cultures in the Americas during the early modern Spanish empire. During the course of writing my first book, I was struck by the extent to which my findings on women’s participation in the legal system of late-colonial and early-republican Quito confirmed earlier findings for the 17th-century in Quito, and frequently clashed with books on the same theme in other areas of the Empire. There is an important caveat here. Much of the work on 17th-cent. Quito was done by my dissertation director. While I stand by my (and her) evidence for Quito, I’ve always wondered if this is simply a case of the influence of graduate apprenticeship, of if there is something more historically fundamental at work?
This is a difficult nut to crack. The Spanish legal system, as a scaffolding, was uniform on this side of the Atlantic. Cristobal Colon, Hernan Cortes, Francisco Pizarro, and the rest of Spain’s 16th-century marauders made there conquests under explicit authority of the Crown of Castile. And, unlike kingdoms on the Iberian peninsula that accrued over time to the Crowns of Castile and Aragon each with their own recognized fuero laws, the kingdoms in the Americas were unified under two specific overarching codes, the Leyes de Castilla and the Leyes de Indias. The former transferred the code of Castile to the Republica de españoles, while the latter dealt with legal conditions specific to the Americas, and particularly its indigenous inhabitants. These were not the only codes with currency. Not by a long shot. But, at a schematic level the authority of the Crown of Castile over the kingdoms of the Americas was enacted through these codes.
Furthermore, magistrates in the Americas were royal appointees, occasionally even at the city council level. While city councilors were chosen locally (and the Alcalde Mayor was a city councilman), in cities or regions with a resident corregidor, he would sit on the council. He was also a magistrate for his jurisdiction. What is more, magistrates above the alcalde level were invariably royal appointees (oidores of the Audiencia or the virrey himself). Litigants could also shop jurisdictions as needed, and one need not first seek a local magistrate to settle a dispute, but one could appeal directly to any of the levels above.
But wait, there’s more.
The journeymen of the Spanish legal system were notaries, and their handbooks and often there education were rooted in the peninsula. There is, in fact, a fair amount of uniformity across the Empire in the form of legal documents, prepared as they all were by notaries and their apprentices drawing on boilerplate examples from notarial manuals.
Where does this leave me? Well, with many reasons to see a unified legal system for the Empire. But there’s a catch. Central to the operation of the legal system, the administrative system, the economy, and everything else in the Spanish Empire was a notion of decentralization, of the devolution of power locally under the guise of customary practice. It had such currency, in fact, that its operation frequently went unnamed. In addition, magistrates were prevented from explaining their decisions with reference to specific codes or regulations (though the Crown’s attorney, or fiscal, frequently did express such opinions). So, the justification for legal decisions is almost never given. Even though the form and the function of the legal system is the same throughout the Americas, the opportunity for significant (and I do mean significant) local inflection was a built-in feature. There very well may be no singular legal culture. We don’t know if there was, though, because scholars (myself included) have tended to generalize from their particular archives
Wither text analysis
This seems like a job for a variety of text-mining techniques. I’ve had in my mind for a while now that a variety of clustering techniques may well help identify local legal culture in operation beneath the boilerplate of standard legal documents. In essence, I need to measure distance between documents, or rather groups of documents, and look for clusters by geography and time. It may well be that legal cultures in the Viceregal capitals were very similar, but not so much in other cities, towns, and villages. It could also be that local legal cultures are greatly impacted by local indigenous cultures or by the migration patterns of initial settlement in an area. Families frequently brought over extended kinship groups, drawn from specific regions in Spain. Did they bring with them the legal expectations of the area they left?
Don’t know. Yet. I’m hoping that text analysis of a corpus of civil and criminal trial transcripts from across the Americas, and across the centuries, will help me find out.
So, what techniques am I (planning) on playing around with? I’m only going to list them here. I’ll be writing follow-up posts on early experiments with each of the methods in the coming weeks.
Normalized Compression Distance (NCD), which uses compression algorithms to establish the similarity of two documents.
Term Frequency-Inverse Document Frequency (TF-IDF), also used in vector-space models.
KWIC and n-gram plots, which will be useful to chart both the occurrences and context of terms and phrases indicative of locality in legal outcomes.
I’ve also been playing with ways to include fuzzy orthography, so typical of early modern record making, into the analysis. Right now, I’m leaning towards incorporating some sort of Levenshtein Distance threshold to account for the heterogenous spelling.
At any rate, in order to test the scripts I’m writing to use these various techniques, I’m initially analyzing a relatively small corpus derived from the Criminales Series Finder’s Guide for the Archivo Nacional del Ecuador (ANE). The corpus consists of 21 documents, each of which comprises a decade of criminal prosecutions beginning in 1601 and ending in 1830. A rough word count (whitespace delimited) of the whole corpus amounts to 198,000 words, with the longest document coming in somewhere around 39,000.
Stay tuned for experiments with NCD.