Since the beginning of DBpedia, there was always a strong consensus in the community, that one of the goals of DBpedia was to feed semantic knowledge back into Wikipedia again to improve its structure and data quality. It was a topic of many discussions over the years how to achieve this goal. No progress was made — not for the lack of motivation, but for lack of an effective AND efficient approach.
When DBpedia started over 13 years ago, two major impacts were made:
- It was the first showcase of the potential of open knowledge graphs through the semantification of Wikipieda’s knowledge which proved useful for the development of thousands of Semantic Web applications and technologies (such as DBpedia Mobile from 2008, long before any knowledge-rich map viewers existed).
- DBpedia played a major role as a nucleus, glueing the de-central web of data together into what has grown into the largest (de-centrally-stored, constantly updated) knowledge graph on earth – the linked data web.
Giving knowledge back to Wikipedia
We received a Wikimedia Grant for our project GlobalFactSyncRE and re-iterated the issue again. After almost two years of working on the topic, we would like to announce our final report. We submitted a summary of this report to the Qurator conference:
Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources. Sebastian Hellmann, Johannes Frey, Marvin Hofer, Milan Dojchinovski, Krzysztof Wecel and Włodzimierz Lewoniewski.
Please find our self-archived e-print here.
Highlights of the paper
- In sum, we laid a good foundation, but also have many things unfinished. The good thing about the paper is that it brings together many aspects that require attention and drafts a roadmap to bring external data into Wikipedia from Linked Data via DBpedia.
- Wikipedia’s infoboxes are still growing a lot. Overall, 150 % in the largest 140 Wikipedias and 200 % for English over the last 3 years.
- We could extract and analyse 725 million infobox facts from the largest 140 Wikipedias and 8.8 million references from the largest 11 Wikipedias.
- We compared existing data in Wikidata with infoboxes from 40 Wikipedias for ~200 infobox parameters and only found a 20% overlap. Wikidata needs to grow quite a lot in the right direction to be fit to replace the rich and growing infoboxes in Wikipedia, it seems.
Read the submitted paper here.