The site held a number of things: Lord Byron poems written about Mazepa’s life and a catalogue of centuries-old articles detailing his various conquests. Pirmann opened her website scraping tool, backing up the site and preserving its content.
Now, the original website is lost, its server space likely gone to cyberattacks, power outages or Russian shelling. But thanks to her, it still remains intact on server space rented by an international group of librarians and archivists.
“We’re trying to save as much as possible,” Pirmann said. “Otherwise, we lose that connection to the past.”
Buildings, bridges, and monuments aren’t the only cultural landmarks vulnerable to war. With the violence well into its second month, the country’s digital history — its poems, archives, and pictures — are at risk of being erased as cyberattacks and bombs erode the nation’s servers.
Over the past month, a motley group of more than 1,300 librarians, historians, teachers and young children have banded together to save Ukraine’s Internet archives, using technology to back up everything from census data to children’s poems and Ukrainian basket weaving techniques.
The efforts, dubbed Saving Ukrainian Cultural Heritage Online, have resulted in over 2,500 of the country’s museums, libraries, and archives being preserved on servers they’ve rented, eliminating the risk they’ll be lost forever. Now, an all-volunteer effort has become a lifeline for cultural officials in Ukraine, who are working with the group to digitize their collections in the event their facilities get destroyed in the war.
The endeavor, experts said, underscores how volunteers, armed with low-cost technology, training and organization can protect a country’s history from disasters such as war, hurricanes, earthquakes and fire.
“I have not seen anything like it,” said Winston Tabb, dean of libraries, archives and museums at Johns Hopkins University. “We didn’t really have the tools before that made it even possible to undertake this kind of initiative.”
The seeds of this international effort started online. On Feb. 26, Anna Kijas, a music librarian at Tufts University, put a call out on Twitter asking if any volunteers would join her for a “virtual data rescue session” to preserve Ukrainian musical collections which could be lost in the war.
That got notice from librarians and archivists across the world, including Quinn Dombrowski, an academic technology specialist at Stanford University, and Sebastian Majstorovic, a digital historian based in Vienna. They banded together, and amid sleepless nights across multiple time-zones, they recruited, trained, and organized scores of volunteers wanting to help archive Ukraine’s historical websites.
Large parts of the Internet get periodically archived through the Internet Archive’s Wayback Machine, which partners with the organization, but SUCHO’s organizers also needed something more advanced, Dombrowski said. In many cases, the Wayback Machine can dig into the first or second layer of a website, she added, but many documents, like pictures and uploaded files, on Ukraine’s cultural websites could be seven or eight layers deep, inaccessible to traditional Web crawlers.
To do that, they turned to a suite of open source digital archiving tools called Webrecorder, which have been around since the mid-2010s, and used by institutions including the United Kingdom’s National Archive and the National Library of Australia. They also started a global Slack channel to communicate with volunteers.
To archive, volunteers mostly use the Webrecorder suite, organizers said. There is Archiveweb.page, a browser extension and stand-alone desktop app that archives a website as people browse pages. Another is Browsertrix Crawler, which requires some basic coding skills, and is helpful for “advanced crawls,” such as capturing expansive websites that might have multiple features like calendars, 3D tours, or circuitous links for navigating in-site. And more recently, there is Browsertrix Cloud, an easier-to-use, automated version of the powerful Browsertrix Crawler, which is popular with volunteers.
“It essentially tries to mimic a human browsing the Web,” Ilya Kreymer, the founder of Webrecorder, said. “And as it does that, it’s archiving all of the network traffic, and then all that is stored into a file … that can be loaded from anywhere.”
Over the past month, SUCHO has developed systematic, and creative, way to go about its work. There’s a master spreadsheet where volunteers detail all the Ukrainian museums, libraries, and archives that need to have their websites backed up or ones that have been completed. To develop this list, SUCHO’s organizers receive tips from librarians and archivists across the world who may know of a rare museum in Ukraine that needs to have its work backed up.
Other volunteers have become sleuths, using Google Maps to take a digital walk down Ukrainian streets, looking for any signs that might say “museum” or “library” and trying to find out if it has a website that needs archiving.
In other cases, when a shelling happens somewhere, a group of volunteers dedicated to “situation monitoring” alerts any volunteers that might be awake to look for institution websites in that region that need backing up, for fear they could go offline any minute.
“These are the moments,” said Dombrowski, whose 8-year-old child occasionally helps archive sites, “that future historians will either celebrate or curse the people of our time for either doing or not doing something in a way that can enable them to tell those stories through a larger arc of history.”
In little over a month, volunteers have backed up an exhaustive array of data. According to their website and organizers, volunteers have preserved documents totaling 25 terabytes that include the history of Jewish towns in Ukraine, photographs of excavation sites in Crimea, and digitized exhibitions of Kharkiv’s Literary Museum.
For Majstorovic, the importance of the work he’s helping organize was made apparent a few weeks ago. In early March, he happened upon the Ukrainian State Archive of Kharkiv’s website. As Russia’s invasion of Ukraine was gearing up, he was worried how long the site would remain active, fearing its servers would be susceptible to cyberattacks or shelling.
He loaded the archive’s website into Webrecorder’s Browsertrix tool, and let it do its work. By early morning, it collected over 100 gigabytes of information, including the district’s census records, criminal cases, and lists of people who have been persecuted in the region.
Within hours, the website was gone. But still, its records remained. Looking back, Majstorovic says, that’s exactly why he is doing this work.
“If we can save these things, we prove that Ukraine has a history,” he said. “[If] they are gone forever … that just rips a black hole into the history of a place that will last forever.”