Linked Data and the Semantic Web: GLAM’s betting on the wrong horse?

4 comments

Since the advent of the internet many GLAM’s (Galleries, Libraries, Archives and Museums) seem to be struggling with the same issue: what’s the best way to publish data online in a format that can be easily used, reused and linked to?

Two technologies that are often named to solve this problem are Linked (Open) Data and the Semantic Web. The first is a set of principles, the second a set of technologies implementing those principles. Both are published and developed mostly by the World Wide Web Consortium (W3C) and promise to solve the ‘data problem’ on the web.

Because the W3C is a non-profit organization it is of little surprise that many GLAM’s see these technologies as the logical solution to their problems.

Although the intentions of the W3C are good i don’t think the solutions as envisioned by the W3C are the best match for GLAM’s. Frankly, the proposed technologies are mostly of little to no use for most GLAM’s at all. Let me explain why.

A solution looking for a problem

Here’s the main problem: the semantic webtechnologies are solutions looking for a problem. Instead of a solution for an actual problem (how can GLAM’s make their data easily available on the web?) the semantic web repurposes mostly obsolete technologies under the new moniker of ‘linked open data’ as the silver bullet to every technological problem.

Why are these technologies useless? Here’s the simple reason: how many succesful ‘semantic’ projects can you name that have broad adoption, not only with GLAM’s, but also at other web projects?

Well, i actually know of a few. But they’re not the ones advocated by the W3C.

Here’s one: Open Graph, also known as ‘the Facebook tags’, is a way to annotate a webpage with some simple metadata, like a thumbnail image and a type (such as ‘video.movie’). It’s widely used, by almost 50% of all shared webpages.

Do webmasters add these tags because they want to add ‘semantic meaning’ and ‘linked data’ to their websites? Of course not, they simply want to have a thumbnail and a proper description whenever somebody shares their pages on Facebook.

So, why does Open Graph works where the semantic technologies fail? It’s very simple: it solves a problem. Given that it’s very easy to implement and the #2 website in the world actively supports it is probably a reason of its success as well.

The hell of XML

Here’s another problem with the semantic technologies. All of the semantic technologies are based on XML, and developers hate XML.

So, what do developers want? Developers simply want what we all want: something simple and easy that works most of the time. That’s why virtually all new API’s have settled on JSON as the best data format instead of XML.

Yes, there are semantic formats in JSON, such as JSON-LD. Unfortunately, the origin of the format, which is XML, is clearly visible in the ‘translated syntax’, which is still unwieldly and unnecessary verbose.

The semantic technologies are tricky to implement, so highly skilled developers are necessary. Unfortunately most GLAM’s don’t employ those in abundance. GLAM’s that actually build their sites ‘in-house’ are a minority, they usually outsource to external web developers. Do you think the nerds there have ever heard of OWL or SPARQL? Nope. But they do know how to parse JSON and do HTTP calls for sure.

Standards and the W3C

But, the semantic web standards are written by the W3C, the place of Sir Tim Berners-Lee, the inventor of the world wide web, surely it must be good?

Unfortunately, most of the standards the W3C develops have little practical use on the ‘real web’. You know HTML5, the standard that gave us useful things like native video and app-like features on the web? None of that came from the W3C. If the W3C would have it their way we would be writing strictly valid, non-backwards compatible XHTML (HTML written as XML). And it probably meant we’ll still be living in the dark ages of Flash and Silverlight to get real stuff done.

On the web, new technologies are usually taken up within months of release. Everything that hasn’t be widely used for a year or two is considered ‘legacy’ (think Flash or Silverlight again).

Look at the semantic technologies again in that context: the first version of the RDF spec is from 1999, OWL is from 2002. How many interesting posts on OWL or RDF do you think are posted on heavily visited developer sites like Hacker News?

Misconceptions on webservices

Another thing that most people propogating semantic web technologies tend to forget is that making a webservice (such as an API) work properly is hard work. It doesn’t come for free with your technology.

This means that your service:

Linked data is all nice and dandy, but if your SPARQL endpoint is only up 50% of the time and it takes a minute to do a query, how do you suppose a developer builds a stable app on top of it?

What you should do

Obviously all of my ranting serves little purpose if i don’t give you an alternative. Most of the principles of the linked data movement are actually good, but the implementation is where the wrong decisions are made.

The most frustrating aspect of the whole ‘everyone should use semantic web technologies’ is that organizations are spending money on useless technologies like triple stores where they could be spending it on stuff that’s actually useful.

Actually, the very first thing you should do is take a hard look at your website, not at the data behind it. Developers are one of your customers, but your very first priorirty should be your regular customers.

So think of the visitors to your museum, or the people in your library. Can they view your website properly on their smartphones? Does the site load fast enough (that means, under 3 seconds)? Is everything easily findable? Are the texts up-to-date? What about the design?

If your website still looks like it’s 1999, maybe it’s time to update that instead of thinking about the ideal world of SPARQL endpoints and structured RDF.

Actually, rebuild your website as the first customer of your API. They can be developed in a tandem, and your webdevelopers will give you invaluable feedback on the use of your API.

Putting it into practice

So, let’s get back to your API. Let’s take the vision of linked data according to Wikipedia:

(…) a method of publishing structured data so that it can be interlinked and become more useful (…) to share information in a way that can be read automatically by computers.

Here are a few things you can do to get to that vision:

Permalinks

Have permanent URL’s to your items. If you have a website with paintings don’t have an URL like

http://www.musee-orsay.fr/en/collections/index-of-works/resultat-collection.html?no_cache=1&zoom=1&tx_damzoom_pi1%5Bzoom%5D=0&tx_damzoom_pi1%5BxmlId%5D=000747&tx_damzoom_pi1%5Bback%5D=en%2Fcollections%2Findex-of-works%2Fresultat-collection.html%3Fno_cache%3D1%26zsz%3D9

Don’t laugh, that’s an actual link. Here’s a better solution:

http://www.musee-orsay.fr/work/000747

Machine-readable data

Offer some kind of machine readable data. If you don’t have the money to develop a full-fledged API that’s fine, CSV dumps are better than nothing.

If you are building an API, at the most basic level, something like

http://www.musee-orsay.fr/work/000747.json

Is okay. If you can deliver a ‘real’ API, that’s cool too. Make it simple. Don’t force people to use an API key. You want to offer a search option? What about something like:

http://api.musee-orsay.fr/search/?q=rembrandt

JSON

JSON should be the only output format. Everybody is doing it. So can you. You really don’t need a XML schema. Nobody will use it. Not quite sure how to translate your existing XML schema to JSON? Take the JSON output from the Europeana API as an example.

Documentation

Instead of trying to shoehorn your data in some metadata format put all your effort in documenting your metadata fields. Your ‘format’ field includes width and height of a painting, but it could also include ‘jpeg’? Fine, tell us about that weird stuff.

Code

If the only way you can deliver documentation on your website is in the form of a PDF file you’re doing it wrong. For developers the main place where they find code and documentation is Github. If you’re writing an example library for your API this is also the place to host it. You can also use Github pages to host your documentation. For a nice example view the Rijksmuseum API docs and their Github profile.

To conclude

Thanks for reading this article. If you think it’s useful, please share it using your favourite medium. Remarks or questions? Add a comment. And for more rambling about stuff like this, follow me on Twitter.

Add a comment

4 comments