I have more to say about this later, but I find this image pretty interesting. When doing a box-by-box comparison of 12 different series finding guides from the Archivo Nacional del Ecuador, you get an image like this:
The graph represents a normalized-compression-distance comparison of 950 boxes described by the contents of their manuscripts. (Which to 22 hours of CPU time to accomplish.) Given the nature of the documents, it is a somewhat artificial comparison. But, you’ll notice that the clusters are almost always within series. That’s what we’d expect from institutional documents. But, I need to chart the years from the various series clusters and see if they come from roughly the same decades.