UC10154 -- Exploring digital heritage -- Week 11

Draft, 17 October 2016
This page should be in a useful state, but still needs work before it's finished.

REMINDER: No on-campus workshop this week – work through the activities below and share the results on SLack or in your blog post. Have fun!

ASIO adventures

This week I’m off at the Australian Society of Archivists’ Annual Conference where I’ll be presenting my ASIO file experiments amongst other things. Keep and eye on the #asalinks hashtag if you’re interested – there should be a number of interesting digital projects discussed.

My latest hack is a userscript that puts redaction information back into the National Archives RecordSearch database so that it looks something like this:

RecordSearch redactions

We’ll explore the possibilities of userscripts a bit more next week.

DIY webarchiving

If you happen to be doing my ‘Working with Collections’ unit you will have been briefly introduced to some of the issues around digital archives last week.

If not, you might like to have a poke around amidst the riches of the Internet Archive. It’s a digital library that brings together freely available digitised books, photos, movies, software and more. Some highlights:

Most of the digital files can be downloaded in a variety of formats, so it’s a great source for potential projects.

But perhaps the most well-known part of the Internet Archive is the Wayback Machine, it’s a gateway to 20 years of web history.

Here for example is the history of my blog since 1998.

Venture back in time to see what else you can find…

The Wayback Machine is not the only web archive.

  • The National Library of Australia maintains a selective web archive called Pandora – many significant Australian sites are preserved there).

  • The National Library also harvests Australian government websites. Using the Australian Government Web Archive you can hunt down all those controversial media releases that mysteriously disappear from politicians’ web sites.

  • The UK web archive harvests UK sites (of course) but they also provide a number of interesting visualisation tools.

I wanted to introduce these sites today for three reasons. First of all they are themselves sites for digital heritage research. Historians are only just starting to investigate the possibilities of mining web archives, but if you’re writing a history of the 1990s or beyond, how can you ignore them? One historian who’s really leading the way in the use of web archives is Ian Milligan – check out his blog.

Secondly, as the creators of digital projects you should be thinking about questions of preservation and sustainability.

And thirdly, because the building of web archives is itself a creative, and sometimes political, process. For example, have a look at the work being undertaken by the Documenting the Now project.

With that in mind, the job for this week is to build digital archives and think about the possibilities.

Archive a page with the Wayback Machine

One of the most useful things about the Wayback Machine is that little box on the home page that says ‘Save page now’. Just feed it a url and it will instantly attempt to archive the page. Once it’s done you have a permanent link to the contents of that page – so even if the page disappears from the original site, you’ll always be able to go back to the archive and view it.

This is particularly useful if you want to cite webpages and are worried that they’re going to disappear. Perma.cc is another archive more directly aimed at preserving citations.

So grab some urls and start archiving! What I want you to look out for are examples where the archiving process hasn’t quite worked. Examples of this might be content that’s dynamically loaded using javascript, or some embedded resources like videos. Does the Wayback Machine create a perfect copy?

Here’s an example:

As it loads, the exhibition uses the Trove API to pull in its content. But these API requests aren’t captured by the Internet Archive, so it just shows the fallback version.

Archive your web browsing with Webrecorder.io

The problems of archiving dynamic content have lead to the development of new tools such as Webrecorder.io. This video gives a good introduction to the way it works:

Create a free account and record some web pages! Try some of the sites that didn’t work properly using the Wayback Machine and see if Webrecorder does the job. Try complicated sites with videos, or digital artworks. What works and what doesn’t?

Here’s Webrecorder’s version of the Chinese in NSW exhibition. Try navigating around the site.

Narrate an collection with Storify

Ok, Storify isn’t really an archive and it doesn’t actually preserve digital content. Nonetheless, Storify is often used to create a quick and dirty Twitter ‘archive’ relating to a particular event. It’s not really an archive because if Twitter ceased to exist, most of its content would be lost. But it’s very easy to use, and it allows you to thread Twitter and other online content to create an interesting story.

Because I tend to share the research that I doing using Twitter, I often use Storify to pull together the threads of a project. For example:

What story would you like to tell? Create a free account with Storify and build something. The interface is very straightforward, so you shouldn’t need any special help. Share your story on Slack or in your blog reflection.

Create a Twitter archive with TAGS

If you want to get serious about archiving Twitter then TAGS is the next step. It makes use of Google Drive, so you’ll need both a Google and a Twitter account, but it’s pretty easy to set up.

  • Go to the Get TAGS page and click on the TAGS v6.1 button.
  • Click on the Make a copy button.
  • Once the TAGS setup form loads, you need to connect it to your Twitter account. Just Select TAGS > Setup Twitter Access from the menu and follow the instructions.
  • Just enter a search term or hashtag in box number ‘2’ and select TAGS > Run now! from the menu.
  • If you want your archive to be automatically updated you can choose TAGS > Update every hour instead.
  • To view the archived tweets click on the ‘Archive’ tab at the bottom of the sheet.
  • To create a nice summary and dashboard, select TAGS > Add summary sheet and TAGS > Add dashboard sheet. They’ll be accessible from tabs at the bottom of the sheet.

You can also create some nice visualisations and public views of your archive.

  • Select File > Publish to the web from the menu and make your sheet public.
  • Then just follow the TAGS Explorer and TAGS Archive links on the setup form. Share them!

Here’s a visualisation and searchable archive of #asalinks.

For heavy duty web archiving there’s also the Twarc tool and library – it requires Python and some command line confidence, but it too is pretty easy to get going.