The Disappearing Internet
The Internet isn’t as invincible as you might think, discovers Jessica Furseth…
Does information last forever on the Internet? It certainly seemed that way to the Spanish man who took his case to the courts earlier this year, arguing that reports on the repossession of his house in 1998 were far too findable in Google’s search engine results. It also seemed that way to the European Court of Justice, who agreed with him. And the subsequent 40,000 people who applied to Google to have their ‘inadequate, irrelevant and no-longer relevant’ personal data removed from public display.
What may surprise you, as Google starts on the arduous task of complying with the “right to be forgotten” laws, is the fact that individual web pages don’t actually last that long. A study by the University of Colorado Health Sciences Center researched the longevity of the websites that were referenced as footnotes in scientific papers. They discovered that more than one in 10 of these websites were inactive after two years. Other studies have suggested that 50 percent of web pages listed in scholarly articles cease to exist after four years.
Of course, the disappearance of a single page doesn’t have to mean the contents are completely gone. They may have simply been relocated to a new place or site. However, these findings call into question the idea that the Internet can be used as a reliable archive.
That’s where the Internet Archive in San Francisco comes in. Owner Brewster Kahle has made it his mission to archive the Internet. Kahle started his work in 1996, and compares the Internet Archive to the Library of Alexandria, which was one of the largest libraries of the ancient world.
Inside the Internet Archive, there’s a device called the Wayback Machine, which currently contains more than 400 billion copies of web pages. And the number keeps growing as Kahle and his team preserves a new version every couple of months. According to Kahle’s own estimates, the archive contains about 15 petabytes of information – that’s about one million gigabytes of data.
To store all this information the Internet Archive staff have designed their own petabyte storage unit, known as a PetaBox. This unit was specifically designed to store and process a million gigabytes of information. Its power consumption is six KW a rack.
“Our mission is universal access to all information all of the time,” Rick Prelinger, president of the Internet Archive board, told ‘The Guardian’. “Digital information is part of our cultural heritage but it’s tremendously volatile. It’s fragile.”
One function of the Internet Archive is to preserve documents so it can be proven if they have been changed or removed, as has been the case with sensitive company issues or government websites. But another function of the project is to preserving our cultural heritage. One piece of Internet history which is fading from memory is the web-hosting service GeoCities, which Yahoo shut down in 2009 – but not before it was copied by the Internet Archive. Maybe the contents of GeoCities doesn’t seem all that worthy of preservation today, but it’s not hard to imagine that 100 years from now, historians will be thrilled that someone went to the trouble to keep the first versions of the Internet, an invention that is slowly and surely changing the world.