This post describes a knowledge graph extracted from Wikidata, which is intended as a reference dataset for research on this type of graph, and in particular for studies on graph embeddings.
The Joconde database is administered by the French Ministry of Culture. It contains a description of some 600,000 creations from France's cultural heritage.
We are working on a representation of these creations based on CIDOC-CRM. This is not the subject of this post.
Some of these creations are also represented in Wikidata. We used the WDumper tool (https://wdumps.toolforge.org/dump/3269) to create a Wikidata extract containing the triples relating to the creations in the Joconde database.
The main properties used for these creations are:
http://www.wikidata.org/prop/direct/P195 collection
http://www.wikidata.org/prop/direct/P170 creator
http://www.wikidata.org/prop/direct/P136 genre
http://www.wikidata.org/prop/direct/P18 image
http://www.wikidata.org/prop/direct/P276 location
http://www.wikidata.org/prop/direct/P571 inception
http://www.wikidata.org/prop/direct/P31 instance of
http://www.wikidata.org/prop/direct/P180 depicts
http://www.wikidata.org/prop/direct/P186 made from material
http://www.wikidata.org/prop/direct/P921 main subject
http://www.wikidata.org/prop/direct/P131 located in the administrative territorial entity
http://www.wikidata.org/prop/direct/P373 Commons category
http://www.wikidata.org/prop/direct/P17 country
http://www.wikidata.org/prop/direct/P625 coordinate location
http://www.wikidata.org/prop/direct/P361 part of
http://www.wikidata.org/prop/direct/P2048 height
http://www.wikidata.org/prop/direct/P2049 width
http://www.wikidata.org/prop/direct/P1476 title
http://www.wikidata.org/prop/direct/P217 inventory number
http://www.wikidata.org/prop/direct/P6216 copyright status
http://www.wikidata.org/prop/direct/P1257 depicts Iconclass notation
These properties are not always filled in. Some can take several values. Some are entity values - for example, P17, country-, others are literal values - for example, P1476, title.
We're particularly interested in filling in the values of properties such as P921 (main subject), P180 (depicts) whose values are entities, but also P18 (image) whose value is a string which must be the URL of an image of the creation.
The dump is freely available in Zenodo:
https://zenodo.org/record/7941537#.ZGOUiXZBw-U
On Zenodo, the info.json file gives general information on the dataset; preview.nt is an extract of around 130,000 triplets out of the almost 500,000 contained in the dataset; wdump-3269.nt.gz is the dataset itself; wdumper-spec.json contains the parameters used by WDumper to create the dump.
This last file shows that the dataset is defined by the entities that own a pro
In the latter file, we can see that the dataset is defined by entities with a P347=Joconde work ID property. It also shows that strings with a language attribute are only extracted for English and French.
Note that link exists between Wikipedia articles and entities described in this dataset. When this link exists, it is carried by the <http://schema.org/about> property, for example:
<https://fr.wikipedia.org/wiki/Mademoiselle_Rivi%C3%A8re> <http://schema.org/about> <http://www.wikidata.org/entity/Q24011> .
Make good use of it!