The Internet Archive and the Digital Humanities

Tuesday November 27, 2007 | 5:30 PM

The Internet Archive was founded eleven years ago by Brewster Kahle to build the world’s first “Internet Library.” Since 1996, the Archive has been collecting bi-monthly snapshots of the World Wide Web—the entire Web—resulting in a cumulative collection of approximately 100 billion Web pages. This cumulative historical record can be browsed and viewed using the Wayback Machine, an access interface developed by the Internet Archive. The Archive has since expanded its activities to include book scanning, audio collections, video and still image collections and Open Education Resources (videotaped lectures for entire college-level courses and the supporting materials.) These collections comprise over two petabytes of data, stored in the Archive’s Digital Repository in San Francisco. The Internet Archive is active in the open-source software community and has developed several widely used tools for web harvesting, search, and management of clustered storage environments. The Archive is dedicated to open source principles; accordingly all software used and developed by the Internet Archive is open source and open access. As an active technology partner in the academic, library and research communities, the Archive has become both a storage partner and content source for educators and researchers. Its role as a unifier of open access content sources is growing through collaborative projects in book scanning and access, large collections of imagery and other content formats, as well as its roles as administrator of the Open Content Alliance and as a co-founder of the International Internet Preservation Consortium. The Archive is engaging in several projects that can directly serve educators and researchers in the Digital Humanities community. In this talk, LINDA FRUEH will describe the collections, projects and capabilities of the Internet Archive, and hopes to generate lively discussion in how the Archive can work to better support the conduct of Digital Humanities studies.