an algorithmic approach to legal culture in the early modern Spanish empire

Posted on September 23, 2011 by ctb — 6 Comments

There’s a problem
But wait, there’s more.
Wither text analysis
Cluster techniques

There’s a problem

I’ve been playing around for a while now with various algorithms designed to cluster unstructured text. The reason I want to do this is I’m working on a project to test for plural legal cultures in the Americas during the early modern Spanish empire. During the course of writing my first book, I was struck by the extent to which my findings on women’s participation in the legal system of late-colonial and early-republican Quito confirmed earlier findings for the 17th-century in Quito, and frequently clashed with books on the same theme in other areas of the Empire. There is an important caveat here. Much of the work on 17th-cent. Quito was done by my dissertation director. While I stand by my (and her) evidence for Quito, I’ve always wondered if this is simply a case of the influence of graduate apprenticeship, of if there is something more historically fundamental at work?

This is a difficult nut to crack. The Spanish legal system, as a scaffolding, was uniform on this side of the Atlantic. Cristobal Colon, Hernan Cortes, Francisco Pizarro, and the rest of Spain’s 16th-century marauders made there conquests under explicit authority of the Crown of Castile. And, unlike kingdoms on the Iberian peninsula that accrued over time to the Crowns of Castile and Aragon each with their own recognized fuero laws, the kingdoms in the Americas were unified under two specific overarching codes, the Leyes de Castilla and the Leyes de Indias. The former transferred the code of Castile to the Republica de españoles, while the latter dealt with legal conditions specific to the Americas, and particularly its indigenous inhabitants. These were not the only codes with currency. Not by a long shot. But, at a schematic level the authority of the Crown of Castile over the kingdoms of the Americas was enacted through these codes.

Furthermore, magistrates in the Americas were royal appointees, occasionally even at the city council level. While city councilors were chosen locally (and the Alcalde Mayor was a city councilman), in cities or regions with a resident corregidor, he would sit on the council. He was also a magistrate for his jurisdiction. What is more, magistrates above the alcalde level were invariably royal appointees (oidores of the Audiencia or the virrey himself). Litigants could also shop jurisdictions as needed, and one need not first seek a local magistrate to settle a dispute, but one could appeal directly to any of the levels above.

But wait, there’s more.

The journeymen of the Spanish legal system were notaries, and their handbooks and often there education were rooted in the peninsula. There is, in fact, a fair amount of uniformity across the Empire in the form of legal documents, prepared as they all were by notaries and their apprentices drawing on boilerplate examples from notarial manuals.

Where does this leave me? Well, with many reasons to see a unified legal system for the Empire. But there’s a catch. Central to the operation of the legal system, the administrative system, the economy, and everything else in the Spanish Empire was a notion of decentralization, of the devolution of power locally under the guise of customary practice. It had such currency, in fact, that its operation frequently went unnamed. In addition, magistrates were prevented from explaining their decisions with reference to specific codes or regulations (though the Crown’s attorney, or fiscal, frequently did express such opinions). So, the justification for legal decisions is almost never given. Even though the form and the function of the legal system is the same throughout the Americas, the opportunity for significant (and I do mean significant) local inflection was a built-in feature. There very well may be no singular legal culture. We don’t know if there was, though, because scholars (myself included) have tended to generalize from their particular archives

Wither text analysis

This seems like a job for a variety of text-mining techniques. I’ve had in my mind for a while now that a variety of clustering techniques may well help identify local legal culture in operation beneath the boilerplate of standard legal documents. In essence, I need to measure distance between documents, or rather groups of documents, and look for clusters by geography and time. It may well be that legal cultures in the Viceregal capitals were very similar, but not so much in other cities, towns, and villages. It could also be that local legal cultures are greatly impacted by local indigenous cultures or by the migration patterns of initial settlement in an area. Families frequently brought over extended kinship groups, drawn from specific regions in Spain. Did they bring with them the legal expectations of the area they left?

Don’t know. Yet. I’m hoping that text analysis of a corpus of civil and criminal trial transcripts from across the Americas, and across the centuries, will help me find out.

Cluster techniques

So, what techniques am I (planning) on playing around with? I’m only going to list them here. I’ll be writing follow-up posts on early experiments with each of the methods in the coming weeks.

Normalized Compression Distance (NCD), which uses compression algorithms to establish the similarity of two documents.
Latent Dirichlet Allocation (LDA)-based topic modeling, for which I’ll be using the killer gensim vector-space modeling module for python.
Term Frequency-Inverse Document Frequency (TF-IDF), also used in vector-space models.
KWIC and n-gram plots, which will be useful to chart both the occurrences and context of terms and phrases indicative of locality in legal outcomes.

I’ve also been playing with ways to include fuzzy orthography, so typical of early modern record making, into the analysis. Right now, I’m leaning towards incorporating some sort of Levenshtein Distance threshold to account for the heterogenous spelling.

At any rate, in order to test the scripts I’m writing to use these various techniques, I’m initially analyzing a relatively small corpus derived from the Criminales Series Finder’s Guide for the Archivo Nacional del Ecuador (ANE). The corpus consists of 21 documents, each of which comprises a decade of criminal prosecutions beginning in 1601 and ending in 1830. A rough word count (whitespace delimited) of the whole corpus amounts to 198,000 words, with the longest document coming in somewhere around 39,000.

Stay tuned for experiments with NCD.

About ctb

Associate Professor of Early Latin America Department of History University of Tennessee-Knoxville

Tagged with: ANE, criminales, ncd, python
Posted in Digital History, Latin American History, programming

6 comments on “an algorithmic approach to legal culture in the early modern Spanish empire”

Adam (@adamwynne) says:

September 24, 2011 at 7:28 pm

This is a very cool approach that I hadn’t considered. Have you seen this applied to other historical/cultural problems?
ctb says:

September 24, 2011 at 9:34 pm

I haven’t, actually, applied it to any other historical problems yet. In fact, my curiosity on this particular problem is partially responsible for pursuing machine learning avenues.
fifth time’s a charm « parezco y digo says:

September 27, 2011 at 12:13 pm

[…] requests from now on are going to be decidedly digital, and in support of the project I described here. I’m also cooking up a tool to build code-named […]
tedunderwood says:

October 9, 2011 at 5:42 pm

Thanks for this post and your very helpful recent post on NCD.

I’ve been using clustering extensively on 18th and 19th-century printed books in English. I’m getting what I find very useful results with a vector-space model that is basically Latent Semantic Analysis, although I’ve tweaked the algorithm a bit and don’t use any “dimensional compression.”

That said, I have not yet rigorously compared my method to other methods (especially Bayesian methods like LDA, which are the hot thing right now). Moreover, I’m primarily interested in clustering terms rather than documents. I work in Python, so I need to check out this gensim module; thanks for the lead.
the parezcoydigo year in review « parezco y digo says:

December 31, 2011 at 4:32 pm

[…] Later in September and again in October, I took a couple more stabs at how one would conceive of an algorithmic approach to studying legal culture, and then more specifically on how normalized compression distance works […]
discontents - Topic modelling in the archives says:

May 22, 2012 at 10:00 am

[…] has also been investigating. But I’ve also been following with interest Chad Black’s use of algorithmic techniques, including topic modelling, to look for local variations amidst the legal system of the early […]