
Explorer les données de l'héritage culturel français

Extract from Wikidata, artworks from the Joconde database

This post describes a knowledge graph extracted from Wikidata, which is intended as a reference dataset for research on this type of graph, and in particular for studies on graph embeddings.

The Joconde database is administered by the French Ministry of Culture. It contains a description of some 600,000 creations from France's cultural heritage.

We are working on a representation of these creations based on CIDOC-CRM. This is not the subject of this post.

Some of these creations are also represented in Wikidata. We used the WDumper tool ( to create a Wikidata extract containing the triples relating to the creations in the Joconde database.

The main properties used for these creations are: collection creator genre image location inception instance of depicts made from material main subject located in the administrative territorial entity Commons category country coordinate location part of height width title inventory number copyright status depicts Iconclass notation

These properties are not always filled in. Some can take several values. Some are entity values - for example, P17, country-, others are literal values - for example, P1476, title.

We're particularly interested in filling in the values of properties such as P921 (main subject), P180 (depicts) whose values are entities, but also P18 (image) whose value is a string which must be the URL of an image of the creation.

The dump is freely available in Zenodo:

On Zenodo, the info.json file gives general information on the dataset; preview.nt is an extract of around 130,000 triplets out of the almost 500,000 contained in the dataset; wdump-3269.nt.gz is the dataset itself; wdumper-spec.json contains the parameters used by WDumper to create the dump.

This last file shows that the dataset is defined by the entities that own a pro

In the latter file, we can see that the dataset is defined by entities with a P347=Joconde work ID property. It also shows that strings with a language attribute are only extracted for English and French.

Note that link exists between Wikipedia articles and entities described in this dataset. When this link exists, it is carried by the <> property, for example:

<> <> <> .

Make good use of it!

Author: Moissinac

Maitre de conférence à Télécom Paris, Département Image, Données, Signal - Groupe Multimédia Jean-Claude Moissinac a mené des recherches sur les techniques avancées pour la production, le transport, la représentation et l’utilisation des documents multimédia. Ces travaux d'abord ont évolué vers la représentation sémantique de données liées au multimédia (process de traitement de médias, description d'adaptations de média, description formelle d'interactions utilisateurs). Aujourd'hui, les travaux portent sur la constitution de graphes de connaissances. Principaux axes de recherche actuel : représentations sémantiques de connaissances, constitution de graphes de connaissances, techniques d'apprentissage automatique sur ces graphes

Comments are closed.