Major research projects

My current research projects are documented in detail in my open Research Notebook:

Digital tools, sites and data – 2018

If you use any of my digital projects, you might like to support their continued development on Patreon.

You’ll also find some project updates on the 101 Digital Heritage Hacks Facebook page.

Digital tools, sites and data – 2017

For a possibly more complete list see 2017 – The making and the talking.

  • Explore Trove’s Digitised Journals
    There’s lots of exciting new digitised content being added to Trove’s journals zone, but it’s not always easy to find and search. This app lists journals that have been digitised by the NLA and have searchable records for individual articles. This means you can search inside the journal, just like you do in the newspapers zone. The code for the harvester and web app is on GitHub.

  • Real Face of White Australia – Data
    Regular updates of data generated through the Real Face of White Australia project which is transcribing records from the National Archives of Australia. Includes CSV files with transcribed fields, as well as photos and handprints.

  • The Real Face of White Australia
    Help transcribe records that document the lives of ordinary people living under the restrictions of the White Australia Policy. This site was built using the Scribe Framework with the help of students from my Exploring Digital Heritage class.

  • Tribune Negatives
    The Tribune was a newspaper published by the Communist Party of Australia. The State Library of NSW holds more than 60,000 negatives and photos from the Tribune which document a wide range of political events and social issues from 1964 to 1991. This is a work-in-progress site documenting my exploration of the negatives as a DXLab Digital Drop-in.

  • Historic Hansard Search
    I finally added a decent full-text search facility to Historic Hansard. Yay! Search for speeches or bills. Filter by date, house, and speaker. There’s even an option to save your complete result set as a CSV file for further analysis and exploration.

  • DIY Redaction Art
    A repository of images in JPG and SVG format drawn from a collection of #redactionart discovered in ASIO surveillance files held by the National Archives of Australia. Use them to create your own #redactionart projects!

  • The Redaction Zoo
    This collection of creatures was discovered amidst thousands of ASIO surveillance files held by the National Archives of Australia. While the practice of redaction is intended to withhold information from public view, an unknown archivist has used redactions to add an artistic flourish to the files. They are reminders that the processes that limit our access to information are human in their operation and design. There is nothing magical about the ‘secrets’ preserved in government archives. Video first exhibited as part of ‘Beauties and Beasts’, Belconnen Arts Centre, 6-28 May 2017.

  • The language of Hansard Code and work-in-progress website exploring the language of Hansard.

  • Real words :: Imagined tweets
    Have you ever wondered how the cut and thrust of parliaments past might translate to the world of social media? Wonder no longer, for here you can explore interjections in the Australian parliament from 1901 to 1980, reimagined as tweets. You might even find some emoji…

  • Closed Access dataset
    2017 update! Complete dataset of records held by the National Archives of Australia that had the access status of ‘closed’ (withheld from public access) on 9 January 2017.

Digital tools, sites and data – 2016

  • DFAT Documents
    Demonstration code to harvest the Department of Foreign Affairs and Trade’s collection of historical documents and extract some metadata. The harvested documents are available in Markdown format and can be explored through a simple website.

  • People of Australia
    @people_aus is a Twitter bot sharing random names drawn from late 19th and early 20th century naturalisation records held by the National Archives of Australia. Many names. Many cultures. These are the people of Australia.

  • RecordSearch Series Harvests
    Code to harvest the metadata and digitised images of all items in a series from the National Archives of Australia. Data from an assortment of harvested series are available as CSV files.

  • SRNSW indexes
    Code for harvesting indexes from the State Records of NSW website. Data from 59 harvested indexes is available as CSV files.

  • Facial detection demo
    Code and website to demonstrate the principles of facial detection using OpenCV.

  • Show Redactions userscript
    Code for inserting details of redacted files into RecordSearch results.

  • ASIO Experiments
    Code used for the extraction of redactions and other experiments with digitised ASIO files.

  • Redactions dataset
    Redactions extracted from ASIO surveillance records in National Archives of Australia Series A6119, <https://dx.doi.org/10.6084/m9.figshare.4101765.v1>

  • Non redactions dataset
    False positives (non-redactions) extracted from ASIO surveillance records in National Archives of Australia Series A6119, <https://dx.doi.org/10.6084/m9.figshare.4104651.v1<

  • Redacted
    Web interface for exploring redactions extracted from digitised ASIO files. Includes a collection of redaction art.

  • Open with Exception browser
    Code and website providing an experimental browser for digitised ASIO files from the National Archives of Australia.

  • Invisible Australians browser
    Updated code and website providing an experimental browser for digitised records from the National Archives of Australia relating to the administration of the White Australia Policy. Now includes a landscape view for exploring records by their orientation.

  • Closed Access harvester
    Updated code for harvesting and analysing records from the National Archives of Australia with the access status of ‘closed’.

  • Closed Access dataset
    Complete dataset of records held by the National Archives of Australia that had the access status of ‘closed’ (withheld from public access) on 1 January 2016.

  • Closed Access website
    Public web interface for the exploration, analysis, and visualisation of ‘closed’ records in the National Archives of Australia.

  • RecordSearch Functions
    Code and documentation for analysing the performance of functions by Commonwealth government agencies over time, using data from the National Archives of Australia.

  • Commonwealth Hansard XML repository
    A repository of the (almost) complete proceedings of the Commonwealth House of Representatives and Senate from 1901–1980. This comprises several gigabytes of XML-formatted files harvested from the ParlInfo database.

  • Historic Hansard
    A public website that presents the proceedings of the Commonwealth House of Representatives and Senate from 1901–1980 in a form that is optimised for browsing and reading. It includes additional features such as indexes to people and legislation, and the integration of tools for text analysis and annotation. Documentation is also provided.

  • Trove Harvester
    Code and documentation to support the creation of large datasets for research and analysis from Trove’s digitised newspapers.

  • Gadfly front pages
    Code and documentation to demonstrate how to harvest page images from Trove’s digitised newspapers.

  • Trove Proxy
    Code and active proxy service that generates links to download PDFs from Trove’s digitised newspapers, and provides a https wrapper around the Trove API.

  • DIY Headline Roulette
    Code and documentation that makes it easy for anyone to create their own simple game using Trove’s digitised newspapers.

  • Radio National program data
    Updated dataset of programs broadcast on Radio National from 2000–2016 harvested from Trove.

  • PMs Transcripts repository
    Repository of more than 20,000 XML transcripts of speeches by Australian Prime Ministers harvested from the PMs Transcripts site.

  • UMA Ellis Photos
    Repository of data and images from a collection of political photos by John Ellis held by the University of Melbourne Archives. Harvested using the Trove API.