Draft, 17 October 2016
This page should be in a useful state, but still needs work before it's finished.
REMINDER: No on-campus workshop this week – work through the activities below and share the results on SLack or in your blog post. Have fun!
This week I’m off at the Australian Society of Archivists’ Annual Conference where I’ll be presenting my ASIO file experiments amongst other things. Keep and eye on the #asalinks hashtag if you’re interested – there should be a number of interesting digital projects discussed.
My latest hack is a userscript that puts redaction information back into the National Archives RecordSearch database so that it looks something like this:
We’ll explore the possibilities of userscripts a bit more next week.
If you happen to be doing my ‘Working with Collections’ unit you will have been briefly introduced to some of the issues around digital archives last week.
If not, you might like to have a poke around amidst the riches of the Internet Archive. It’s a digital library that brings together freely available digitised books, photos, movies, software and more. Some highlights:
Most of the digital files can be downloaded in a variety of formats, so it’s a great source for potential projects.
But perhaps the most well-known part of the Internet Archive is the Wayback Machine, it’s a gateway to 20 years of web history.
Venture back in time to see what else you can find…
The Wayback Machine is not the only web archive.
The National Library of Australia maintains a selective web archive called Pandora – many significant Australian sites are preserved there).
The National Library also harvests Australian government websites. Using the Australian Government Web Archive you can hunt down all those controversial media releases that mysteriously disappear from politicians’ web sites.
I wanted to introduce these sites today for three reasons. First of all they are themselves sites for digital heritage research. Historians are only just starting to investigate the possibilities of mining web archives, but if you’re writing a history of the 1990s or beyond, how can you ignore them? One historian who’s really leading the way in the use of web archives is Ian Milligan – check out his blog.
Secondly, as the creators of digital projects you should be thinking about questions of preservation and sustainability.
And thirdly, because the building of web archives is itself a creative, and sometimes political, process. For example, have a look at the work being undertaken by the Documenting the Now project.
With that in mind, the job for this week is to build digital archives and think about the possibilities.
One of the most useful things about the Wayback Machine is that little box on the home page that says ‘Save page now’. Just feed it a url and it will instantly attempt to archive the page. Once it’s done you have a permanent link to the contents of that page – so even if the page disappears from the original site, you’ll always be able to go back to the archive and view it.
This is particularly useful if you want to cite webpages and are worried that they’re going to disappear. Perma.cc is another archive more directly aimed at preserving citations.
Here’s an example:
As it loads, the exhibition uses the Trove API to pull in its content. But these API requests aren’t captured by the Internet Archive, so it just shows the fallback version.
The problems of archiving dynamic content have lead to the development of new tools such as Webrecorder.io. This video gives a good introduction to the way it works:
Create a free account and record some web pages! Try some of the sites that didn’t work properly using the Wayback Machine and see if Webrecorder does the job. Try complicated sites with videos, or digital artworks. What works and what doesn’t?
Here’s Webrecorder’s version of the Chinese in NSW exhibition. Try navigating around the site.
Ok, Storify isn’t really an archive and it doesn’t actually preserve digital content. Nonetheless, Storify is often used to create a quick and dirty Twitter ‘archive’ relating to a particular event. It’s not really an archive because if Twitter ceased to exist, most of its content would be lost. But it’s very easy to use, and it allows you to thread Twitter and other online content to create an interesting story.
Because I tend to share the research that I doing using Twitter, I often use Storify to pull together the threads of a project. For example:
What story would you like to tell? Create a free account with Storify and build something. The interface is very straightforward, so you shouldn’t need any special help. Share your story on Slack or in your blog reflection.
If you want to get serious about archiving Twitter then TAGS is the next step. It makes use of Google Drive, so you’ll need both a Google and a Twitter account, but it’s pretty easy to set up.
You can also create some nice visualisations and public views of your archive.
For heavy duty web archiving there’s also the Twarc tool and library – it requires Python and some command line confidence, but it too is pretty easy to get going.