Landscapes in the archives

Sometime last year I harvested data and images from a number of series in the National Archives that Kate had identified as relating to the administration of the White Australia Policy.

You can see the code and some summary data on GitHub.

I also created a simple interface that was intended to make it easy for Kate to visually scan the contents of the digitised files.

Birth certificate

Kate wanted to try and find birth certificates, which have quite a distinctive shape – they’re long and narrow. RecordSearch’s digitised file viewer has a ‘view multiple pages’ option that displays a grid of thumbnails, but they’ve all been cropped to squares, so the shape is lost. I simply displayed all the pages for a file in a grid of thumbnails that maintained their original proportions.

That made it easier to spot the birth certificates as you browsed through files, but I also realised that I could simply query the database to return images whose height was less than their width – images with a landscape orientation.

It took me a while to get around to it, but I’ve just created a special view which displays all landscape images ordered by the width/height ratio – the longest and narrowest come first.

The query took a bit of fiddling with MongoDB’s aggregation framework, but in the end it was pretty easy.

{'$project': {
    'identifier': 1,
    'page': 1,
    'series': 1,
    'control_symbol': 1,
    'image_path': 1,
    'ratio': {'$divide': ['$height', '$width']}}},
{'$match': {
    'series': {'$exists': True},
    'ratio': {'$lt': 1}}},
{'$sort': {'ratio': 1}}

The key thing is obviously calculating the ratio of height to width. This is then used in the ‘$match’ statement to find all images with a height to width ratio of less than one.

Landscape view

The new view is effective in finding birth certificates – there’s about 30 of them visible in the first page of results. But it’s also just an interesting perspective on the records as a whole. Kate and I have made a couple of other discoveries – simply because you’re seeing the records differently. Make you wonder what other perspectives might be interesting or useful.

While I was fiddling with the digitised files browser I also added some extra functionality, cleaned up the design, and fixed some bugs. I’ll document all that shortly.