Data&Musée

Explorer les données de l'héritage culturel français

Apply simple rules to improve Wikidata

We're going to use WDQS here to check whether a few simple rules could usefully complement Wikidata.

Let's take the example of the following rule "one of the occupations of a creator who created a painting mentioned in Wikidata is 'painter'". We can postulate this, even if for some it's a marginal occupation; but this occupation was important enough to produce a referenced painting, notably in Wikidata.

On WDQS, as at 17/10/2023, the SPARQL query:

select (count(?s) as ?c) where { ?s wdt:P31 wd:Q3305213 }

gives us 906323 paintings.

(P31 is the 'instance of' property; Q3305213 is the 'painting' value)

The query

select (count(?s) as ?c) where { ?s wdt:P31 wd:Q3305213; wdt:P170 ?creator . }

gives us 773080 paintings with a known Wikidata creator.

Finally, the query:

select (count(?s) as ?c) where {

?s wdt:P31 wd:Q3305213; wdt:P170 ?creator .

?creator wdt:P106 wd:Q1028181

}

gives us 648947 paintings with a known by Wikidata creator and one of whose occupations is painter.

(P106 is the property 'occupation'; Q1028181 is the value 'painter')

With the initial postulate, we see that the creators of 773080-648947=124133 paintings could have their description completed by the fact that one of their occupations was to be a painter. This value could also have been found with the query:

select (count(?s) as ?c) where {

?s wdt:P31 wd:Q3305213; wdt:P170 ?creator .

filter not exists {?creator wdt:P106 wd:Q1028181 }

}

How many creators are there? The query

select (count(distinct ?creator) as ?c) where {

?s wdt:P31 wd:Q3305213; wdt:P170 ?creator .

filter not exists {?creator wdt:P106 wd:Q1028181 }

}

informs us. There are 113103 of them.

And if we want their list:

select distinct ?creator where {

?s wdt:P31 wd:Q3305213; wdt:P170 ?creator .

filter not exists {?creator wdt:P106 wd:Q1028181 }

}

(note: this request takes too much time for the server; you'd have to play with the LIMIT and OFFSET keywords to obtain this list progressively)

All that's left to do is to fill in the occupancy of those 113103 creators above. But that's another story: the improvement of Wikidata by robots. And we can imagine other rules. For example, with the pairs (sculpture/sculptor), (print/engraver)...

Author: Moissinac

Maitre de conférence à Télécom Paris, Département Image, Données, Signal - Groupe Multimédia Jean-Claude Moissinac a mené des recherches sur les techniques avancées pour la production, le transport, la représentation et l’utilisation des documents multimédia. Ces travaux d'abord ont évolué vers la représentation sémantique de données liées au multimédia (process de traitement de médias, description d'adaptations de média, description formelle d'interactions utilisateurs). Aujourd'hui, les travaux portent sur la constitution de graphes de connaissances. Principaux axes de recherche actuel : représentations sémantiques de connaissances, constitution de graphes de connaissances, techniques d'apprentissage automatique sur ces graphes

Comments are closed.