Ransom_Center_records
A project is underway at the Ransom Center to digitize thousands of video and audio recordings that date back to the 1930s and earlier, some of which maybe the last surviving copy of a recording.Image credit: Harry Ransom Center.

People travel from all over the world to access the collections provided at the Harry Ransom Center at The University of Texas at Austin.

As an internationally renowned humanities research center, their extensive collections provide unique insight into the creative process of some of the finest writers and artists, deepening the understanding and appreciation of literature, photography, film, art, and the performing arts. The collections include nearly one million books, more than 42 million manuscripts, five million photographs, and 100,000 works of art.

Highlights include the earliest known surviving photograph made with the aid of the camera obscura; E. E. Cummings's wooden paint box; Jack Kerouac's notebook documenting his writing of On the Road; original works by Frida Kahlo, including her iconic self-portrait with thorn necklace and hummingbird; some of Albert Einstein's unpublished notes and calculations for his work on general relativity; and one of only 20 complete copies of the Gutenberg Bible in the world.

The Ransom Center exists largely to support students and researchers through access to their collections. However, with the onset of COVID-19 and so little travel being possible in recent months, the center's digital collections have become even more important.

"While we typically have thousands of research visitors and students coming through our reading room and classrooms each year to have a direct experience with our collections, COVID-19 has put a stop to that," said Jim Kuhn, Associate Director for the Library Division. "More and more of our support, especially for the upcoming academic year, will be to provide digital surrogates to instructors who are engaged in online teaching. Meanwhile, we're also strategizing about how best to meet researcher needs without visiting in-person."

Miss-All-American-Camp-Beauty-Pageant-drag
A film that documented an important event in LGBTQ+ history – the Miss All-American Camp Beauty Pageant drag queen competition held in 1967 – was thought to be lost. Original footage, rough cuts, and audio tracks from the film housed in the Lewis Allen archive were found at the Ransom Center. A collaboration between the Center, UCLA's Film and Television Archive, and Kino Lorber, Inc., resulted in restoration of the film, ‘The Queen,' and a 4K version was re-released in 2019. Image credit: Lewis Allen Productions, Inc., [Film still of contestants on stage with MC Sabrina from "The Queen,"(TQ-17)], 1968. Lewis M. Allen Papers, 39.4, Harry Ransom Center.

Fortunately, over the past year, the Ransom Center and TACC were already working to securely store the center's digital collections.

"What TACC provides is reliable, redundant, large-scale data storage," said Maria Esteva, a data curator with TACC whose research focus is digital archiving and preservation.

In total, the Ransom Center currently has 135 terabytes of storage on TACC's Corral data management system, a repository for large-scale digital collections. TACC also provides virtual machines that allows researchers to securely work with protected data. The virtual machines are also used for adding metadata to the collections so they can be more easily found.

The Ransom Center collections are extremely diverse. Currently, they're working on a project to digitize thousands of audio recordings — of music, theatrical performances, dictated letters, novels, and wire recordings that date back to the 1930s and earlier.

"They're all unique," Kuhn said, "none of it is commercially accessible sound, and the same thing is true of much of our video and film collections. In many cases, we have the only or the last surviving copy of a film that doesn't exist anywhere else."

One example is The Queen — a documentary re-released in 2019 based on a copy found at the Ransom Center — chronicling a 1967 drag queen competition called the Miss All-America Camp Beauty Contest that took place at New York City's Town Hall.

For many years, the Ransom Center has also been accumulating ‘born-digital' collections. Born-digital are collections that have been natively created in digital format rather than digitized from paper records. Examples include email and text-based documents. Scholars visiting the Ransom Center can't turn on Gabriel García Márquez's personal computer, but the center provides researchers access to the manuscripts and correspondence that were on his computer.

Gabriel-García-Marquez
Hundreds of pages of manuscripts, video, audio recordings, and photographs from the archive of Nobel Prizewinner Gabriel García Márquez, author of ‘One Hundred Years of Solitude,' have been digitized and made available on the Ransom Center's website. Image credit: Robert Lebeck (German, 1929-2014), [Gabriel García Márquez], ca. 1990s, chromogenic print, Gabriel García Márquez Papers, Harry Ransom Center.

Access to primary source documents is part of the Ransom Center's mission, but the most important aspect of their work is preservation. "We're stewards of these collections, whether we've created the digital artifact or it was given to us," Kuhn said. "It's our responsibility to save this material in perpetuity."

With a physical archive, one can control for temperature and humidity. If it's safe from fire and water damage, the chances are high that it will still be accessible 50 years from now. Librarians and other humanities researchers are less confident about digital archives because they can't physically inspect them.

"What TACC offers that is extraordinarily important to us is uncompressed replication that's geographically distributed. We are confident should a file go corrupt it can be retrieved from the second location. This is an important foundational first step for us that there's long term access to this material," Kuhn said.

Said Esteva: "The fact that TACC‘s infrastructure is available to preserve all of these incredible digital copies and original manuscripts enables the HRC to give access to them accordingly."

Going forward, Kuhn and his colleagues have applied for a grant from the National Endowment for the Humanitiesto make their archives and special collections accessible to artificial intelligence (AI)-enabled inquiry. For instance, what does it mean to query thousands of hours of sound recordings as a data set? They'd like to begin to try to develop solutions to providing humanities data at scale, or at the required size to solve new research questions.

"What we need are tools and trained graduate students and research proposals that will begin to query these data sets in ways that leverage AI principles," Kuhn said.

Projects such as the Text Encoding Initiative and textual archives such as the HathiTrust Digital Library are examples of how the humanities has taken strides in this area. Such projects enable structured transcripts that users can query for names, dates, genres of literature, and for usage of words across centuries. This has become second nature in the digital humanities scholarship disciplines across the past couple of over the last two decades.

"We don't have tools yet that can do the same sort of thing for audio, video, or born-digital," Kuhn concluded. "We'd like to steward these digital objects in perpetuity, but we also want to provide a sandbox for training future generations of scholars who will need to be equipped for digital humanities research. That's what TACC is enabling us to begin to do."