Going through the course readings this week, I was struck by déjà vu as I read Michael Simeone’s piece regarding the difficulties of interdisciplinary collaboration in the project he was involved with, Digging into Image Data to Answer Authorship-Related Questions (DID-ARQ). The part that leapt out at me was where he discussed how it “was crucial for our…team to establish a common means to collect, share, annotate, and examine large amounts of image data.” This resonated with me, as it recalled a prior internship experience I had at a small archives, in which I had to dig through the accumulated metadata on digital pictures of several years of past interns in order to find all the various spelling, spacing, and capitalization variations that existed and standardize it so that the search function would work effectively. The process was time-consuming and paranoia-inducing, as I had to constantly check my own work to make sure I hadn’t accidentally introduced a whole new variation. There is, however, an interesting parallel to be drawn between my experience and Simeone’s on one side, and the experience of historians learning to use data mining and visualization techniques on the other.
In the beginning of the digital age, historians resembled the interns who came before me at the archives—each with their own ideas about how to handle the tools at their disposal. Some embraced the new technology readily, while others then and now have remained with “a print mentality when it comes to information.” There were no set rules for how to handle the influx of new data and methodologies, beyond those that were established for the pre-digital profession. These same standards could not remain, however, as it became clear that “the advent of the digital archive [had] posed its own unique challenges.” The vast amount of data, and the various methods that arose to attempt to deal with it, provided sufficient proof to realize that greater collaboration was needed, both inside and outside the field.
These collaboration efforts, which are ongoing, are analogous to the work I did as an intern at the small archives. Although there isn’t an effort to standardize the approaches of historic inquiry necessarily, there is a call to “embrace new priorities for research publications that explicate…the hermeneutics of [digital] data.” The idea is to critique the flood of different methodologies to create a sort of “best practices” list; continued transparency in this regard could also make conclusions and “visualizations easier to understand because the logic of how and why [they were] generated is visible.”
The results of this continuing process are noticeable—research projects such as the Old Bailey Online, Mapping the Republic of Letters, and Railroads and the Making of America, as well as many more, all showcase attempts by historians to utilize data-mining and visualization techniques in innovative ways. Researchers can use data-mining to analyze collections on a scale that had never been possible before. For example, when studying quilts, it is difficult to see more than three or four at a time when they are laid out physically. On a computer, however, at least that many can be viewed on-screen simultaneously, while algorithm-driven workflows can categorically process images of millions more. Similar results have been seen with visualization techniques. Charts, graphs, maps, and other techniques have transcended the idea that they must be used “merely as illustrations…it may be more useful [at this point] to think of visualization as part of the research process.”
By going through the process of comparative methodology and increased transparency, historians have begun to adapt to the world of digital technologies and create innovative new products out of it. This is the phase that Simeone was in, compared to my experiences. He was able to look back on the multiplicity of projects that preceded his own, analyze what worked and what didn’t, and create a workable methodological framework out of the result. Similarly, historians now are able to build off of the digital projects that have come before. Throughout the course of the semester, I have grappled with the idea of how big data can change the historical field several times, first in relation to the Republic of Letters project at Stanford, then through the visualization models offered by Franco Moretti, and finally by examining the data-mining search capabilities used by the Old Bailey Online. Analyzing these projects has revealed trends such as the importance of playing in developing digital tools, the necessity for interdisciplinarity, the capability of visual media to discover hitherto unseen trends in massive datasets, and the power of digital databases to provide answers to questions that scholars previously would have spent a lifetime trying to answer. It is through a combination of these techniques that historians have begun to harness the vast potentialities of data mining and visualization, moving the field beyond an analysis of hermeneutics and into a new paradigm that embraces collaboration, interdisciplinarity, and a commitment to learning from the past.
 Michael Simeone, Jennifer Guiliano, Rob Kooper, Peter Bajcsy, “Digging into data using new collaborative infrastructures supporting humanities-based computer science research” First Monday, 16, no. 5 (2 May 2011).
 Simeone, (2 May 2011).
 Theibault, (Spring 2012).