Tribune negatives metadata and licensing


Herewith some preliminary notes based on my poking around in the metadata. There may be errors or misunderstandings…

  • I’ve harvested 1,792 item records from the Tribune negatives collection at the SLNSW. There are 58,515 images attached to the item records.
  • All the item records are available in this CSV file.
  • Catalogue metadata is made available by the SLNSW under a CC0 Public Domain Dedication.
  • Images are made available under a CC-BY licence by the SEARCH Foundation and the SLNSW.
  • Make sure you acknowledge the source of the data and images appropriately!

Arrangement and description

The Tribune collection at the State Library of NSW includes negatives and photographs from 1964 to 1991. The catalogue description estimates there are 62,000 negatives and 2,533 photos. According to my most recent harvest of the negatives, 58,515 images have been digitised and made available online.

The collection is divided into 45 series:

  • Series 1 to 4 contain negatives
  • Series 5 to 44 contain photographs
  • Series 45 contains manuscript materials

Series 1 to 3 are further divided into parts:

  • Series 1 (1963-1972): parts 1 to 15
  • Series 2 (1974-1977): parts 1 to 6
  • Series 3 (1978-1990): parts 1 to 17
  • Series 4 (c. 1970-1988)

Here’s a CSV providing series level metadata.

Each of the negative series (or series parts) have child ‘items’. I think these ‘items’ correspond to a strip or sleeve of negatives. Each item has around 30-40 images attached to it (the numbers vary from 1 to 91).

So the descriptive hierarchy is:

Series (or series part)
    -- Items
        -- Images

Most of the descriptive metadata is at the item level. This includes things like:

  • title
  • date (string and structured)
  • subjects (controlled list of subject headings)
  • names (controlled list of people and organisations)
  • places
  • topics (subject tags)

The application of subjects and names doesn’t seem to be comprehensive. There’s also no way of knowing which of the subjects, names, places, and topics applies to which of the images grouped under the item.

There’s a free text description field that provides more fine-grained information. This has been added by volunteer cataloguers, or transcribed from the negative sleeves or indexes. Sometimes these descriptions are numbered – I need to check if these numbers can be reliably matched with individual images. There are many more names, dates, places, and events in these descriptions than are in the formal subject categories, so one thing to explore is whether structured data (such as names) can be accurately extracted from them.

Harvest details

I’ve harvested the metadata for all items in series 1 to 4. This comprises 1,792 item records, and 58,515 images. I’ve written this data to a CSV file with the following fields:

Field Notes
series_number 01 to 04
part_number Consecutive numbers – series 01 to 03 are broken into parts of about 50 items each
item_number Consecutive numbers within a series, eg. series 1 includes item numbers from 001 to 739
level SERIES or ITEM
object_number This seems to be an old system identifier – you can use it to find an item in Trove (see below)
priref Another system identifier – you can use this to construct a link to an item in the catalogue (see below)
intellectual_entity Another system identifier – you can use this to construct a link to the image viewer (see below)
date_start ISO formatted datetime
date_end ISO formatted datetime
quantity A string describing the number (and sometimes format) of negatives
url A link to the item in the catalogue
parent_url A link to the parent item (ie the series or series part)
number_images An integer generated by my harvesting script that represents the number of image identifiers attached to the item record
images A pipe-separated list of image identifiers – see below for how you can use these to create links
description Detailed notes from volunteers or transcribed from negative sleeves – pipe characters are substituted for line breaks
subjects A pipe-separated list of subject headings
topics A pipe-separated list of topics (subject tags)
people A pipe-separated list of people and organisations
places A pipe-separated list of places

As noted above, the identifiers can be used to create a variety of handy links.

Identifier Notes Pattern
object_number This is being used by Trove to construct its harvest identifiers. So if you search for it in the Trove Pictures Zone you should find the item.  
priref Use this to construct a link to the catalogue entry for the item. (This is how the url value is obtained.)
intellectual_entity Use this to construct a link to the image viewer.
images Each pipe-separated identifier can be used to create a link to that image in the image viewer. identifier

Licensing and reuse (enjoy the Open Data Awesomeness!)

The SLNSW claims no copyright in it’s catalogue metadata, but for clarity makes it available under a CC0 public domain dedication. Wherever possible, of course, you should attribute the SLNSW as the source of the data. If you’re wondering about the ethical (rather than legal) obligations around attribution, you should read Dan Cohen on CC0 (+BY).

Copyright in the images from the Tribune negatives collection is held by the SEARCH Foundation. The images are made available for research and reuse under a CC-BY licence. You should always acknowledge both the SEARCH Foundation and the SLNSW.

If you make things with the data or images and share the results on social media, make sure you tag @statelibrarynsw and use the #MadewithSLNSW tag. And don’t forget the #recordsofresistance tag for anything associated with this project!