Data&Musée

Explorer les données de l'héritage culturel français

Extract from Wikidata, artworks from the Joconde database

This post describes a knowledge graph extracted from Wikidata, which is intended as a reference dataset for research on this type of graph, and in particular for studies on graph embeddings.

The Joconde database is administered by the French Ministry of Culture. It contains a description of some 600,000 creations from France's cultural heritage.

We are working on a representation of these creations based on CIDOC-CRM. This is not the subject of this post.

Some of these creations are also represented in Wikidata. We used the WDumper tool (https://wdumps.toolforge.org/dump/3269) to create a Wikidata extract containing the triples relating to the creations in the Joconde database.

The main properties used for these creations are:

http://www.wikidata.org/prop/direct/P195 collection

http://www.wikidata.org/prop/direct/P170 creator

http://www.wikidata.org/prop/direct/P136 genre

http://www.wikidata.org/prop/direct/P18 image

http://www.wikidata.org/prop/direct/P276 location

http://www.wikidata.org/prop/direct/P571 inception

http://www.wikidata.org/prop/direct/P31 instance of

http://www.wikidata.org/prop/direct/P180 depicts

http://www.wikidata.org/prop/direct/P186 made from material

http://www.wikidata.org/prop/direct/P921 main subject

http://www.wikidata.org/prop/direct/P131 located in the administrative territorial entity

http://www.wikidata.org/prop/direct/P373 Commons category

http://www.wikidata.org/prop/direct/P17 country

http://www.wikidata.org/prop/direct/P625 coordinate location

http://www.wikidata.org/prop/direct/P361 part of

http://www.wikidata.org/prop/direct/P2048 height

http://www.wikidata.org/prop/direct/P2049 width

http://www.wikidata.org/prop/direct/P1476 title

http://www.wikidata.org/prop/direct/P217 inventory number

http://www.wikidata.org/prop/direct/P6216 copyright status

http://www.wikidata.org/prop/direct/P1257 depicts Iconclass notation

These properties are not always filled in. Some can take several values. Some are entity values - for example, P17, country-, others are literal values - for example, P1476, title.

We're particularly interested in filling in the values of properties such as P921 (main subject), P180 (depicts) whose values are entities, but also P18 (image) whose value is a string which must be the URL of an image of the creation.

The dump is freely available in Zenodo:

https://zenodo.org/record/7941537#.ZGOUiXZBw-U

On Zenodo, the info.json file gives general information on the dataset; preview.nt is an extract of around 130,000 triplets out of the almost 500,000 contained in the dataset; wdump-3269.nt.gz is the dataset itself; wdumper-spec.json contains the parameters used by WDumper to create the dump.

This last file shows that the dataset is defined by the entities that own a pro

In the latter file, we can see that the dataset is defined by entities with a P347=Joconde work ID property. It also shows that strings with a language attribute are only extracted for English and French.

Note that link exists between Wikipedia articles and entities described in this dataset. When this link exists, it is carried by the <http://schema.org/about> property, for example:

<https://fr.wikipedia.org/wiki/Mademoiselle_Rivi%C3%A8re> <http://schema.org/about> <http://www.wikidata.org/entity/Q24011> .

Make good use of it!

Author: Moissinac

Maitre de conférence à Télécom Paris, Département Image, Données, Signal - Groupe Multimédia Jean-Claude Moissinac a mené des recherches sur les techniques avancées pour la production, le transport, la représentation et l’utilisation des documents multimédia. Ces travaux d'abord ont évolué vers la représentation sémantique de données liées au multimédia (process de traitement de médias, description d'adaptations de média, description formelle d'interactions utilisateurs). Aujourd'hui, les travaux portent sur la constitution de graphes de connaissances. Principaux axes de recherche actuel : représentations sémantiques de connaissances, constitution de graphes de connaissances, techniques d'apprentissage automatique sur ces graphes

Comments are closed.