Digital Databases and the Illusion of Comprehension

This post is part of a joint series entitled “Digital Research, Digital Age: Blogging New Approaches to Early American Studies,” the Panorama and the Junto. This joint series stems from  stemming from a conference entitled “Revolutionary Texts in a Digital Age: Thomas Paine’s Publishing Networks, Past and Present,” organized by Nora Slonimsky at Iona College in October 2018. This series will feature one post every day this week, hosted by both the Panorama and the Junto, and Dr. Slonimsky’s introductory post is found here. The first post at the Panorama is by Lindsay Chervinsky, “High Politics and Physical Space: Rethinking How We Commemorate Place.”

The rise of the digital humanities over the past decade has brought attention and support to a wide range of projects. In its most triumphalist form, the narrative about digital humanities suggests that digital projects have made early American materials far more accessible than they ever have been. Where once researchers could only access materials by visiting an archive or perhaps using microfilm or microfiche when available, now we can work from our homes in our pajamas to read manuscript and printed sources from the seventeenth, eighteenth, and nineteenth centuries. And we can gather massive data sets about the past for quantitative or qualitative analysis.

That is certainly true to an extent, but at the same time more recent conversations have highlighted ways in which which access is more limited and how digital tools, through a veneer of comprehensiveness, can actually obscure the incompleteness the digital historical record. Some of the tools that we access are free to users (for example, the Trans-Atlantic Slave Trade Database and Chronicling America). But many of the most important are behind paywalls of some kind, whether the Readex databases of early American imprints, Early English Books Online (EEBO), Eighteenth-Century Collections Onlines (ECCO), and others. For these resources, access is restricted to those with an affiliation at an institution that can afford to subscribe. As we have those conversations, research continues. We therefore need to consider how to talk about work based on these resources, not only to understand the benefits and pitfalls, but also to ensure that we accurately portray our research findings.

At last October’s ITPS conference, the focus was on Thomas Paine. Scholarship on his career serves as a useful example of the phenomenon I describe above. One question that has long bedeviled scholarship on Paine is the extent of the circulation of Common Sense. Its impact was obviously broad—the debate about independence shifted into high gear in the early months of 1776, and many of those promoting it linked their arguments back to Paine’s pamphlet. But it’s unclear just how far its reach spread.

Ever the self-promoter, Paine wildly estimated that printers produced 120,000 copies of the pamphlet in the first three months of 1776. As Trish Loughran has shown, that number is patently ridiculous.[1] It would have been all but physically impossible for American printers to run that many pages through their presses in that time. Instead, Loughran and others suggest that a number somewhere south of 50,000 copies seems more reasonable. That is still by comparison a significant number—far more imprints than anything else published during the imperial crisis. And, again building on Loughran, many of the editions (twenty-five total in 1776) appeared in Philadelphia. As a pamphlet, there was far more energy to publish it in Philadelphia than anywhere else.

But people read it outside of Philadelphia and the other towns where Common Sense was printed, so how did they learn about the pamphlet? For that, I turned to the America’s Historical Newspapers database published by Readex. In 1776, twenty-three newspapers made 137 references to the phrase “Common Sense.” The two biggest groupings of references were advertisements for the pamphlet (56, many of which repeated for several weeks) and responses to Paine (55). Fifteen of the appearances were for extracts of the pamphlet, and thirteen were “news” about the pamphlet, the possible identity of its author, and so on. Here’s the difficulty: I’m not sure how to make meaning out of these results because I can’t state with certainty how comprehensive my search was.

To be clear: I am a huge fan of Readex. Without America’s Historical Newspapers, my book would not have been possible. But it does have limits that require me to limit the scope of my argument. First, to make a circular-but-true statement, the database contains only the newspapers it contains. AHN is built on the collections of the American Antiquarian Society, which has an unsurpassed goldmine of early American newspapers. But AAS doesn’t own everything, and Readex has not always been able to acquire copies of every newspaper from other sources. Second, an individual user only has access to the series to which his or her institution subscribes. In other words, my search results will be different depending on whether I log in from my own university, drive to Cambridge and walk into the Harvard libraries, or head to Worcester and work inside the AAS reading room. Finally, there’s the longstanding problem of OCR, or optical character recognition. It is very good for these newspapers, but not perfect. There are images of issues with smudges on them, issues that are faded, and OCR has as much trouble with the long-s as my students do. All of which is to say that I didn’t really search for “common sense” but rather for “common ?en?e” and hoped for the best.

Because of these issues it is for practical purposes impossible to say that one has found everything, or even to say that the sample is representative because for the Revolutionary era, the gaps in AHN are idiosyncratic rather than predictable. For now the conference and this post are the venues through which I’ve shared my research on this question. But as I write further about these types of issues I would like to be able to say something comprehensive about the circulation of Common Sense, among other texts. I can (and will) add the phrase “at least” or something like it to indicate the incompleteness of the record, and I can add methodological statements to my research. As I do that, however, I want us to talk more about how to deal with the imperfections of these tools, in part to make them more perfect, but more importantly so that we understand what we’re looking at.


[1] Trish Loughran, The Republic in Print: Print Culture in the Age of U.S. Nation Building, 1770-1870 (New York: Columbia University Press, 2007), ch. 2.


4 responses

  1. Thanks for this thoughtful piece. I’m reminded that one of the most important newspapers of the Revolutionary period, William Bradford’s Pennsylvania Journal, has no digital product and the microfilm is not widely available.

  2. Pingback: Ben Wright: Thomas Paine and the Conflicting Ideologies of the Digital Revolution « The Junto

  3. Pingback: Editors’ Choice: Thomas Paine and the Conflicting Ideologies of the Digital Revolution

  4. Pingback: Canadian History Roundup – Week of April 7, 2019 | Unwritten Histories


Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: