UC10154 -- Exploring Digital Heritage -- Week 3

Draft, 22 August 2016
This page should be in a useful state, but still needs work before it's finished.

Collections and contexts

Last week I introduced you to some of my bot friends who are taking cultural collections into the spaces where people are and liberating them. That’s one example of how we can explore and discover cultural heritage collections in different contexts.

(Oh, and I think I forgot to mention that @trovenewsbot also has a Tumblr account.)

This week we’re going to start by casting a critical eye over search technologies. It’s the main way we access cultural heritage collections, but is it doing what we think it’s doing? And what alternatives might there be?

Filter bubbles

Let’s start with an exercise:

  • Open up Google and search for ‘unprofessional hairstyles for work’.
  • Look at the image results.
  • Notice anything interesting?

Let’s try that again:

  • This time search for ‘professional hairstyles for work’.
  • Look at the image results.
  • Notice anything different?

This little experiment was widely tweeted recently. You should have seen that the ‘unprofessional’ images were mostly of black women, while the ‘professional’ images were predominantly white women. Does this mean Google is racist? This question was examined in an article by the Guardian, which noted that Google’s search algorithm was just ‘mirroring conversations about “unprofessional hair” biases, not making a ruling’. It was reflecting back our preconceptions. Is this a problem or a feature? Is this how search should work? The article concludes:

These questions get at the very identity of ‘search’ as a digital concept: is its purpose to reflect and reinforce what its users feel, do and believe? Or is it to show us a fuller picture of the world and all things contained in it as they really are?

What do you think?

Search results are now heavily personalised – things like your location, your web browsing history, and your social media activity can affect the results that you see. The effect of this is to create a ‘filter bubble’ which works to limit your view of the world. For an explanation of filter bubbles see this Ted Talk by Eli Pariser.

Seams and edges

You’re probably thinking – Ah, but that’s just Google! Maybe not. While the search technologies used in cultural heritage settings don’t personalise results, they do reflect certain assumptions about the world.

Read these three papers:

Bias and selection can operate at many levels in cultural heritage collections – from decisions about what to keep and what not to keep; to the vocabularies and standards we use to describe collections; and to the configuration of search engines that index that metadata.

Perhaps your project could explore some of these biases. You might, like Matthew Reidsma, critically analyse a particular search interface for a cultural heritage collection – what does it show, and what does it hide. You might do something like my quick analysis of books by language in Trove – what do our collections say about Australian culture and society?

Generous interfaces

Another problem with search is described by Mitchell Whitelaw in his paper Generous Interfaces for Digital Cultural Collections. Search demands that we make a query, but what if we don’t know anything about the collection? Mitchell explores a number of examples that show how interfaces can be generous in providing ways to explore collections that don’t require a search term.

The examples Mitchell uses are:

Another example is:

Can you find any other examples of ‘generous’ interfaces to cultural heritage collections. Have a look at the online collections of museums and galleries around the world – which ones do you think are ‘generous’ and why?


Another way of exploring cultural heritage collections is through serendipitious discovery – finding things by accident, or at random, and seeing where that takes you. We’ve already met some examples of this: bots tweeting out random collection items; Headline Roulette presenting a random newspaper article. In Technologies of Serendipity, Paul Fyfe describes the potential value of serendipity for researches and cites a number of examples (a few will be familiar!).

For some serendipity in action, try out Serendip-o-matic. Just copy and paste some text, click the button and let the hippo do the rest.

What do see? Where do they come from?


Another way in which our perspective on collections can be skewed is through restrictions on access.

Do you remember QueryPic, my tool to visualise queries in Trove’s digitised newspapers. Have a look again at one of the graphs. Why do you think it ends in 1954? Did newspapers stop publishing? Did nothing interesting happen after 1954?

The answer, as you may suspect, is copyright. I explore this more in Asking better questions: History, Trove and the risks that count:

Why 1954? We’re currently about halfway through the great AUSFTA culture drought. On 1 January 2005, the Australia–US Free Trade Agreement (AUSFTA) extended the standard period of copyright protection from fifty to seventy years and changed the way photographs are treated. We might have to wait until 2025 before Trove’s newspapers can start edging forward, year by year, beyond 1954.

But it’s even more complicated than that, as there’s no certainty that newspaper articles published before 1955 are out of copyright. To be sure, the National Library would have to investigate any named authors to confirm they all died before 1955. That’s simply impossible in a mass digitisation project. Instead the library weighed the copyright risks against the cultural benefit and decided to proceed. If the library had been more cautious, if the risks or uncertainties had seemed too great, Trove would have no digitised newspapers. And we would all be poorer.

These types of judgements are made all the time by cultural organisations wanting to open online access to their collections. Libraries, archives and museums are full of so-called ‘orphan’ works, whose creators cannot be identified or located. There’s no risk-free way of making this content available online.

And to make matters even worse, unpublished manuscripts – like the diaries and letters held by archives and libraries – are in perpetual copyright. Yep - they’re in copyright FOREVER! So legally you can only put them online if you get the permission of the heirs of whoever wrote them in the first place. And that’s not easy.

Last year librarians around Australia showed the absurdity of this by cooking cakes from copyright-bound recipes.

If you’re interested in the complexities and absurdities of Australian copyright law, and particularly how it inhibits activity online, have a look at the Australian Digital Alliance website.

Perhaps your project could explore the impact of copyright on the display of Australian collections online.


Each year the Digital Public Library of America and DigitalNZ hold a competition to promote access to copyright-free or openly-licensed images in their collections. Last year Trove and Europeana joined in as well.

The aim of the competition is to find interesting images and then use them to create animated gifs. Here’s last year’s entries.

The competition site includes a list of open images to explore. You can also use this handy list of Reusable images in Trove.

The competition will be on again this year, so why not get in some practice? DigitalNZ provides a series of tutorials on making animated gifs – have a go and see what you can create. Share the results on Slack. I might even find a prize!