LODBook: The Next Generation

This is a much overdue update on my LODBooks project, that aims to support the development of online historical narratives that embed rich structured data about people, places, events and resources.

My last LODBook demo looked pretty nice and I showed it to lots of people who seemed to like it. But I wasn’t happy. As a result of that I’ve pretty much started again from scratch.

The problems

I used AngularJS to build the last demo. I really liked the way I could have basically everything in a single HTML page. Angular loaded all the embedded JSON-LD and used it to create a complete site structure with assorted fancy widgets.

But that was also the problem – the site structure and LOD identifiers were dependent on Javascript. Added to that, I was having problems making the whole thing nicely responsive. And in the end, I felt that I just didn’t really understand what Angular was doing, and that made me very uncomfortable.

So a new approach was needed…

Back to basics

When I started this project many years ago I set myself a few rules.

  • Simple tools — should be possible for anyone with a text editor.
  • No platforms — no sneaky server-side stuff, it all had to happen in the browser, on the fly.
  • No markup madness — I wanted there to be a close relationship between the text and the data, but I wanted the markup process to be practical — something like creating a footnote.

I decided to focus again on the idea of simplicity – just HTML pages that could be LOD-ified by anyone with a text editor. Just HTML pages that weren’t dependent on fancy Javascript libraries for a decent user experience.

I’ve been playing around with Jekyll a lot over the last year or so – using it to create this Research Notebook, my Digital Heritage Handbook, and even Historic Hansard. I’m not sure whether Jekyll counts as a platform or not, but it produces plain old static HTML files that can be uploaded to any web server.

I also came across Ed. – ‘a Jekyll Theme for Minimal Editions’ – created by Alex Gil and team. As the GitHub site says, Ed is ‘designed for textual editors based on minimal computing principles, and focused on legibility, durability, ease and flexibility’. And on top of all that it looks beautiful. It seemed like a perfect fit for what I was trying to do, so I started building on top of it.

Where I’m up to

I haven’t had a lot of time to work on the project lately, and I’ve also had to pick up some Ruby to work with Jekyll. But a few basic bits and pieces are in place.

From an author’s point of view there will be two things to create and edit – texts and data. The texts are in Markdown format, and the data (at least in the current version) is in a YAML file. Yep – all you need is a text editor.

It’s expected that every item in the data.yml file will have a name and a collection. Collections are things like ‘people’, ‘organisations’, ‘places’, ‘resources’, and ‘events’. You can add whatever else you want under the item’s data attribute.

- collection: people
  name: James Minahan
  data:
    birthDate: '1876-10-04'
    familyName: Minahan
    givenName:
    	- James
		- Francis
		- Kitchen

You can also set some defaults for each collection in Jekyll’s _config.yml file.

data_types:
  people:
    template: 'person'
    dir: 'people'
    type: 'http://schema.org/Person'
  organisations:
    template: 'organisation'
    dir: 'organisations'
    type: 'http://schema.org/Organization'
  places:
    template: 'place'
    dir: 'places'
    type: 'http://schema.org/Place'
  events:
    template: 'event'
    dir: 'events'
    type: 'http://schema.org/Event'
  resources:
    template: 'resource'
    dir: 'resources'
    type: 'http://schema.org/CreativeWork'

As you can see, I’ve used Schema.org classes and properties so far, but I’m intending to make it possible to include other vocabularies.

I’m using a modified version of the Jekyll Data Pages Generator to automatically create individual pages for every entity in the data.yml file. So once you build your site there’ll be folders for each collection.

To make the links between the texts and the data I’ve created a few filters. Given my rudimentary knowledge of Ruby, these are probably pretty dodgy, but they seem to work. The basic idea is that you markup the text to create a link with the data item, so for example:

The letter {{"James Minahan" | lod_link: "" }} was waiting for had arrived at last.
Sent from the firm of {{"Quong Hing Yeong" | lod_link: ""}} in Hong Kong, the letter
contained the news that having received a remittance of HK$200 from Australia they could
now book him a passage to Melbourne, the city of his birth.

The lod_link filter looks for a data object that matches the marked up entity and inserts a HTML link. So the text above becomes:

<p id="para-0">The letter <a data-name="James Minahan" data-collection="people" property="name" href="/lodbook-ed/people/james-minahan/">
James Minahan</a> was waiting for had arrived at last. Sent from the firm of
<a data-name="Quong Hing Yeong" data-collection="organisations" property="name" href="/lodbook-ed/organisations/quong-hing-yeong/">
Quong Hing Yeong</a> in Hong Kong, the letter contained the news that having received a
remittance of HK$200 from Australia they could now book him a passage to Melbourne, the city of his birth.</p>

You’ll notice that it also adds in html data attributes for the name and collection. These will be handy later on. You only need to mark up each name once – another filter will work through the whole page and look for any other mentions of the same entity and add links. To link name variations to a single entity, you just pass the entity’s name to the lod_link filter as a parameter. So to link all references to ‘James’ back to the person ‘James Minahan’, you just include:

{{"James" | lod_link: "James Minahan" }}

There are a series of other filters that do useful things. I’m pretty sure some of these would be better as either tags or hooks, but I don’t understand Jekyll well enough yet to figure this out. And anyway, it works.

lod_ids and lod_labels

{{ content | lod_ids | lod_labels }}

These two filters operate on the page content. lod_ids just adds numeric ids to every paragraph and blockquote tag. I’ll be using these later on. lod_labels looks for links already created by the lod_link filter, finds any matching strings (ie names), and adds links to them as well.

lod_mentions

{{ page | lod_mentions }}

Generates JSON-LD that lists the entities mentioned in a text. This builds a semantic relationship between the text and the things it talks about, and provides an opportunity for crawlers to find the entity pages. The JSON-LD is neatly packaged inside a script tag.

lod_references

{{ page | lod_references }}

Loops through each paragraph and creates a Javascript object that lists the entities mentioned in each paragraph (using the ids created by lod_ids). This can then be embedded in the page to enable some fancy javascripty interfacey stuff.

jsonldify

{{ page | jsonldify }}

Generates JSON-LD describing an individual entity for embedding on the page about that thing.

lod_list

{{ page.data.spouse | lod_list: "Spouse" }}

Generates a HTML list of related entities.

A bare-bones demo

No fancy javascripty stuff yet, but you can get an idea of how the relationship between text and data works on the GitHub pages site. You can also browse the code.

The templates for each of the collection types aren’t finished yet and there’s lots of other stuff missing. But there are basic browse pages for the collections in the side nav.

If you feed a page to the Structured Data Linter, you’ll see that the JSON-LD seems to getting extracted. Here’s James Minahan’s page for example. That makes me happy.

When I’m satisfied with the basics, I’ll start playing with the interface a bit and reintroduce the walls, maps and timelines. A long way to go, but I feel like things are on a much more useful and sustainable footing.