This is how you easily get licensing and credit details for free images on Wikimedia Commons

Wikimedia Commons is a treasure trove of freely licensed images, videos, and other media, just waiting to be reused. The catch? You need to credit them properly and follow the licensing rules. But how exactly do you do that? And what if you’re building a website and want to automatically pull in the right attribution info? In this article, I’ll walk you through both, step by step.
Licensing and Wikimedia Commons
Let’s dive a bit deeper in how licensing actually works on Wikimedia Commons. Every file on Commons, whether it is an image, illustration, video or audio file must comply to the Commons licensing terms. In practice this means that the file is either:
- Freely licensed under one of the allowed licenses on Wikimedia Commons. In practice this is one of the allowed Creative Commons licenses. This usually tends to be either Attribution (“BY”) or Attribution-Sharealike (“BY-SA”).
- Put in the public domain by its author. This can also be done with the Creative Commons public domain mark (“CC-0”).
- In the public domain because its copyright has expired. International copyright is complicated, but for most works this means that the author of the work (e.g. the photographer) has been dead for more than 70 years. On Wikimedia Commons, reproductions of two-dimensional works in the public domain (e.g. paintings, photographs or drawings) are (usually) considered to be in the public domain as well.
Complying to a license
In general, whenever re-using a media file from Commons with a license it is expected you caption the reused media file with these points of information:
- Who made the work (the author).
- What the license is (e.g. Creative Commons BY-SA).
- A link to the license legal text.
- A link back to the origin of the file. This can either be a Wikimedia Commons page or a page as indicated by the author.
For an example, let’s take a look at the portrait of Hungarian director Judit Elek that’s at the top of this article. Proper attribution of this portrait would be something like this:
Photo: Vera de Kok / CC-BY-SA 4.0
In this case the name of the author links back to the file page on Wikimedia Commons, the name of the license links to the ‘legal deed’ of the Creative Commons license.
Optionally we could also include a ‘via Wikimedia Commons’ attribution, which is not required but does make it clearer where we found the image:
Photo: Vera de Kok / CC-BY-SA 4.0 via Wikimedia Commons
For public domain images it is not legally required to show attribution. However, it is a good habit to do so. For example, this picture of Marten Toonder from the Anefo fotocollection (put in the public domain by the Nationaal Archief) can be attributed like this:
Photo: Hans Peters for Anefo / CC0 via Wikimedia Commons
Fetching licensing data automatically
Okay, so now you know how to attribute Commons media. But how do you proceed if you want to automatically get the licensing information, for example to include them in your own website?
For a long time, the way metadata was added on Commons was by use of free text fields in a Mediawiki template. Note that this is not structured data. It also means that it can be tricky to extract the licensing metadata for reuse elsewhere. Fortunately, most of the hard work has been done for you already.
There are multiple ways of getting this metadata in a structured format. I’ll give two options, one of which is probably the most straightforward method.
Using the Mediawiki Action API with the ‘imageinfo’ property
The Mediawiki Action API is the default API to interact with any Mediawiki website (like Wikimedia Commons, Wikipedia or Wikidata). The imageinfo
property gives you back information about media files (not just images). The extmetadata
property gives back information concerning licensing and attribution metadata, parsed from the free text fields. This was originally developed for the Media Viewer. The imageinfo
property can also give back information like filesize, time of creation, media type and much more.
I’ve made an example on how to fetch the metadata using Python and the popular requests
HTTP library, which is available here.
Using SDC via concept URI’s
The metadata problem as described earlier in this document has been partly solved by the introduction of Structured Data on Commons (SDC). This is basically an implementation of Wikibase, the same software powering Wikidata, on Wikimedia Commons. This provides the same knowledge graph functionality to all Wikimedia Commons files.
There is a standard on how to model the copyright information using SDC, and this has been done for many files. Data can be exported in many LOD-formats using the concept URI / Entity Data endpoint, including RDF, JSON-LD and Turtle. There is also a SPARQL endpoint for SDC, but this is still experimental and has some implementation issues.
There are two reasons why i believe it’s probably better not to use this approach for now:
- There is still a large amount of metadata that is not available with SDC. Transferring the unstructured metadata to the SDC properties is a manual process, which is going slowly. This means that many media files will have incomplete attribution information.
- The export format is cumbersome to process. The JSON-LD for a single image can have hundreds of nodes in a deeply nested format, making it difficult to parse the correct information.
Given these circumstances, for now I would advise to use the Action API to fetch the license information for Wikimedia Commons files.