Friday, February 04, 2022

Digitizing Dr. Johnson

For a long time I’ve been fond of the online version of Samuel Johnson’s Dictionary. I even recommended it in a Journal of the American Revolution roundup.

And then the original site, which offered page images and highlights, turned into a completely searchable database. Even better! (Though I do miss the page images.)

Public Books has just published an essay by Carmen Faye Mathes, who was a “Co-Principal Investigator” in the total digitization effort, about the peculiar challenges of that work.
How you go about bringing a historical dictionary online is that, first, you stop really reading the entries. You have other work to do, like adding XML tags; comparing modern transcriptions against page images of Johnson’s original; and digitally cropping each entry so as to include an original image next to each transcribed entry—before concluding that, with at least 41,684 unique words, there must be a way to automate this.

The result is that even when I would exclaim over a new-to-me word (“Did you know that ‘pulverulence’ means an ‘abundance of dust’?”), it was never in the context of actually using the word—only proofing it, rendering it, clipping it, highlighting it, making it findable. Johnson defines lexicographer—a writer of dictionaries—as “a harmless drudge.” But I have come to relate to Johnson as more of a long-distance runner: an athlete whose particular facility lies in the way he marathons forth even as it pains him.
And then there’s the question of Johnson’s prodigious memory—how fallible was it?
When it comes to tracing the provenance of Johnson’s literary quotations—there are approximately 115,000 of them—the thing that felt most like a discovery was catching the lexicographer in a mistake. We wanted our online dictionary to make it possible to search for all the times Johnson cites particular authors or sources. So, we needed to ask, when Johnson says these lines belong to Shakespeare, do they?

Sometimes quotations would be followed by the title of the play, including act and scene, and sometimes merely by “Sh.” Johnson likes to condense longer quotations to save space. How close is his adumbrated version to the original? These and other, similar questions motivated our investigations. We needed to make the quotations discoverable by first discovering them ourselves.

Why don’t you start checking the accuracy of all the times Johnson quotes John Milton? seemed like as good a place as any to begin. When a spreadsheet with 12,000 supposed Milton quotations arrived in my inbox, my heart sank, but my ego accepted the challenge. I hit the jackpot almost immediately. In the entry for Autumnal, Johnson quotes Milton: “Thou shalt not long / Rule in the clouds; like an autumnal star, / Or light’ning, thou shalt fall.” He gives the source: “Milt. Par. Lost, b. iv. l. 620.” A few seconds later, the internet revealed the error: this is a quotation from Paradise Regained (1671), not the more famous Paradise Lost (1667).
And finally there’s that citation of Pamela that’s still floating out there, untethered to a text.

1 comment:

  1. The page images are still there! At the top of the home page, look for the pull-down menu that says "Dictionary", and select "Browse".

    Now they have not only the first edition from 1755, but they've added a fourth edition from 1773.

    ReplyDelete