Lessons from the MALACH Project

Applying New Technologies to Improve Intellectual Access to Large Oral History Collections

Tuesday October 24, 2006 | 4:30 PM

In this talk I will describe the goals of the MALACH project (Multilingual Access to Large Spoken Archives) and some of our research results. I’ll begin by describing the unique characteristics of the oral history collection that we are using, in which Holocaust survivors, witnesses and rescuers were interviewed in several languages. Each interview has been digitized and extensively catalogued by subject matter experts, thus producing a remarkably rich collection for the application of machine learning techniques. Automatic speech recognition techniques originally developed for the domain of conversational telephone speech were adapted to process with word error rates that are adequate to support interactive search and automated clustering, detection of topic shifts, and topic classification. In this talk, I will describe the studies that we conducted to learn about what needs our systems should be designed to meet and I’ll summarize key results from our system development activities. I’ll conclude with some remarks about possible future directions for research applying new technologies to improve intellectual access to oral history and other spoken word collections.