Bitrot

The promise of digital data is that bits don’t rot. But they do.

Data from the US Census of 1960 were lost because the tapes were obsolete and partly unreadable. They had to be restored from 300,000 rolls of microfilm stored in a refrigerated cave in Kansas.

In the mid-1970s NASA spent a billion dollars sending two Viking landers to Mars to search for life there. The biology data were lost: they were buried in thousands of pages of poor-quality microfilm archives, mixed in with engineering data and of too poor quality to be scanned; or, alternatively, stored as long sequences of numbers on CDROM without any indication of what the numbers meant. They were rescued only when a retired researcher was found to have kept some printouts on paper which could then be read and typed in by teams of students. (For reference, there is a nice Life on Mars paper here).

In 1986 the BBC’s Domesday Project was a 20th-century Domesday Book, with text and photographs from all over the country. The data were stored on two big silver laser discs and read by a special BBC-supplied computer which understood their format. There are no laser disc readers today outside museums, and the format was one that only the special computers could read. There were some of those in museums, but would any of them still work? The project was rescued from oblivion by the skin of its teeth and the data are now available on the web. The lessons of that narrow escape have been carefully forgotten: in place of video stills in an unreadable format (unless you had the right hardware and software) the photographs are compressed in JPEG format, which is unreadable unless you have the right software. Moreover, unlike the laser discs, nobody actually has the new Domesday data. We all have to rely on the BBC existing for ever and being for ever willing to keep them available to us. As available as Compuserve, BIX, or AOL? In a format as readily readable as 8″ floppy disks, Amstrad 3″ disks in their special cases, or the Sinclair Microdrive?

Meanwhile the special hardware required to read the Rosetta Stone is widely available, on either side of your nose; and the software is readily learned. The oldest photograph in the world dates from Spring 1838 and is readable with the same equipment (and no software). But what photographs does this generation have which will be readable in 175 years’ time?

There is an enjoyable article on digital preservation here. At least, there was when I wrote this blog entry. The link may rot at any moment.

Advertisements