Data&Musée

Explorer les données de l'héritage culturel français

Creation of a Wikidata dump of the Joconde database

On 28/5/2023, I created a dump of a Wikidata extract composed of triplets concerning works referenced in the Joconde database.

We introduced the concept of the 'context graph' (CG) in 'Knowledge Base Completion With Analogical Inference on Context Graphs'. A context graph is an extract from a large knowledge graph that constitutes a body of knowledge specific to a given 'topic'. The 'topic' can be defined in various ways, including by selecting a set of entities that share one or more properties.

The Joconde database is maintained by the French Ministry of Culture and Communication (MCC). It contains descriptions of some 600,000 works of French cultural heritage. The descriptions of the works are essentially created by the institutions that conserve these works. As such, this database is a good reference for metadata on these works.

Some of the works in the Joconde database have been indexed in Wikidata. They can be identified by associating the Wikidata entity with an ID in the Joconde database, an association carried by the P347 property. For example:

<https://www.wikidata.org/entity/Q328523> wdt:P347 "000PE001569"

18099 Joconde works were present in Wikidata at the time of the dump.

To produce the dump, I used the WDumper tool. It's open source and available on github:

https://github.com/bennofs/wdumper/

It is also possible to create an online dump:

https://wdumps.toolforge.org/

The dump is then stored and made freely accessible on Zenodo. This is the solution I used.

The parameters used are visible in the wdumper-spec.json file available on Zenodo. These include

  • literals are retained only if they are in English or French,
  • triplet qualifiers are not retained,
  • aliases and external links are retained,
  • entities are retained only if they have a value for the P347 property.

This dataset will be used for a series of experiments with artwork data.

Author: Moissinac

Maitre de conférence à Télécom Paris, Département Image, Données, Signal - Groupe Multimédia Jean-Claude Moissinac a mené des recherches sur les techniques avancées pour la production, le transport, la représentation et l’utilisation des documents multimédia. Ces travaux d'abord ont évolué vers la représentation sémantique de données liées au multimédia (process de traitement de médias, description d'adaptations de média, description formelle d'interactions utilisateurs). Aujourd'hui, les travaux portent sur la constitution de graphes de connaissances. Principaux axes de recherche actuel : représentations sémantiques de connaissances, constitution de graphes de connaissances, techniques d'apprentissage automatique sur ces graphes