Enter the Internet Archive
Your one-stop-shop for Grateful Dead shows, accountability reporting, and books, books, books
Before we dive in, a quick announcement: I’m turning on a paid subscriber option for BookSmarts. All posts will remain free. The only difference is that you can now choose to contribute a few dollars a month. In return, you’ll get my deep gratitude and the fuzzy feeling of supporting a useful resource for writers. If a paid subscription isn’t for you, that’s just fine too. I’m glad you’re here.
Ok, on to today’s post: the next installment in the where-to-find-stuff series. In previous newsletters, we’ve covered how to access journal articles without an academic affiliation and how to find publicly available books and reports on HathiTrust. Next, let’s look at the sprawling digital library that is the Internet Archive.
I first discovered the Internet Archive a decade ago when I was writing my PhD dissertation. IA houses an enormous collection of live Grateful Dead recordings and it is no stretch to say that I would not have completed my thesis without listening to this one—over and over again—until the dang thing was done. (Seriously, this show is an amazing soundtrack for writing; it eases you in with some familiar crowd-pleasers and then, before you know it, you’re deep in the jam cranking out words.)
When I became a journalist, I found myself back at IA making use of the Wayback Machine, an archive of web pages that is extremely useful for figuring out if, say, a government agency has removed something sensitive from its site. (IA also contains other accountability resources for reporters and researchers, like archives of TV news broadcasts and millions of U.S. court documents.)
Nowadays, I visit IA mostly for its vast book collection. It’s like HathiTrust on steroids, with 41 million texts, including seemingly everything in the public domain and many books that are still under copyright. Most are download-able (at least temporarily) and virtually all are fully searchable.
Here’s the scoop:
What is the Internet Archive?
The Internet Archive is a digital library that includes webpages, audio and video recordings, images, software, and books. Its goal is to provide “Universal Access to All Knowledge” (their capitalization).
IA’s book program began in 2005 and the organization claims to add 4,500 texts per day. It works with libraries, museums, and other organizations. Users can upload materials too. Its scope is truly mind boggling. It’s got all of the classics as well as the kinds of obscure texts that researchers often need to hunt down.
Do I need a membership?
Membership is free, but you do not need it to read and download books that are in the public domain (i.e. published before 1928). You do need an account to borrow and download copyrighted books, and to favorite items for later reference.
Did you say download??
Yes! IA allows you to download books that are no longer protected by copyright in a variety of formats, including PDF, EPUB, Kindle, and DAISY—an audio version of text for people with print disabilities.
For books published after 1927, you can borrow the book for an hour at a time to read in the website’s built-in BookReader. For certain titles, 14-day loans are also available; these enable you to download an encrypted ebook version of the text (you will need something like Adobe Digital Editions to read it).
And it’s searchable?
Yep. You don’t even have to be logged in or borrow a book to search the text. Just enter a term in the search bar and IA will allow you to view the full pages on which it appears.
Hot tip #1: One of the few downsides of IA’s web reader is that you often cannot highlight text and copy it for your notes. However, I’ve found a workaround for short passages: if you search for a word or phrase (using quotes!) from the passage of interest, the entire paragraph in which it appears will be displayed in the search results—which you CAN highlight and copy.
Hot tip #2: Books that have been digitized by Google are the exception. You actually can highlight and copy sections of these texts.
Cool, but why are there multiple copies of the same book?
Short answer: I’m not totally sure. I think it’s because IA has a decentralized uploading model that results in duplicates. The issue is not trivial. Just as an example, there are 84 different versions of Darwin’s Descent of Man in IA’s library! It can be hard to know which one to choose, but there can be subtle differences, like who digitized it (see above for why you might want the Google version), the edition, or the translator (for things like classical texts). If all else fails, I use the Amazon approach and just choose the one with the most views or favorites.
Is it legal?
Generally. Like HathiTrust, IA has had some skirmishes with publishers and authors, and earlier this year, a judge ruled against its pandemic-era National Emergency Library. But the rest of its operations have carried on unchanged.
As usual, I’m sure I’ve only scratched the surface of what IA can do. Please share your tips in the comments or the chat.