DBpedia Blog

Invitation to contribute to DBpedia by improving the infobox mappings + New Scala-based Extraction Framework

Summary

Hi all,

in order to extract high quality data from Wikipedia, the DBpedia extraction framework relies on infobox to ontology mappings which define how Wikipedia infobox templates are mapped to classes of the DBpedia ontology.

Up to now, these mappings were defined only by the DBpedia team and as Wikipedia is huge and contains lots of different infobox templates, we were only able to define mappings for a small subset of all Wikipedia infoboxes and also only managed to map a subset of the properties of these infoboxes.

In order to enable the DBpedia user community to contribute to improving the coverage and the quality of the mappings, we have set up a public wiki at http://mappings.dbpedia.org/index.php/Main_Page which contains:

1.  all mappings that are currently used by the DBpedia extraction framework
2. the definition of the DBpedia ontology and
3. documentation for the DBpedia mapping language as well as step-by-step guides on how to extend and refine mappings and the ontology.

So if you are using DBpedia data and you you were always annoyed that DBpedia did not properly cover the infobox template that is most important to you, you are highly invited to extend the mappings and the ontology in the wiki. Your edits will be used for the next DBpedia release expected to be published in the first week of April.

The process of contributing to the ontology and the mappings is as follows:

1.  You familiarize yourself with the DBpedia mapping language by reading the documentation in the wiki.
2.  In order to prevent random SPAM, the wiki is read-only and new editors need to be confirmed by a member of the DBpedia team (currently Anja Jentzsch does the clearing). Therefore, please create an account in the wiki for yourself. After this, Anja will give you editing rights and you can edit the mappings as well as the ontology.
3. For contributing to the next DBpedia relase, you can edit until Sunday, March 21. After this, we will check the mappings and the ontology definition in the Wiki for consistency and then use both for the next DBpedia release.

So, we are starting kind of a social experiment on if the DBpedia user community is willing to contribute to the improvement of DBpedia and on how the DBpedia ontology develops through community contributions J

Please excuse, that it is currently still rather cumbersome to edit the mappings and the ontology. We are currently working on a visual editor for the mappings as well as a validation service, which will check edits to the mappings and test the new mappings against example pages from Wikipedia. We hope that we will be able to deploy these tools in the next two months, but still wanted to release the wiki as early as possible in order to already allow community contributions to the DBpedia 3.5 release.

If you have questions about the wiki and the mapping language, please ask them on the DBpedia mailing list where Anja and Robert will answer them.

What else is happening around DBpedia?

In order to speed up the data extraction process and to lay a solid foundation for the DBpedia Live extraction, we have ported the DBpedia extraction framework from PHP to Scala/Java. The new framework extracts exactly the same types of data from Wikipedia as the old framework, but processes a single page now in 13 milliseconds instead of the 200 milliseconds. In addition, the new framework can extract data from tables within articles and can handle multiple infobox templates per article. The new framework is available under GPL license in the DBpedia SVN and is documented at http://wiki.dbpedia.org/Documentation.

The whole DBpedia team is very thankful to two companies which enabled us to do all this by sponsoring the DBpedia project:

1. Vulcan Inc. as part of its Project Halo (www.projecthalo.com). Vulcan Inc. creates and advances a variety of world-class endeavors and high impact initiatives that change and improve the way we live, learn, do business (http://www.vulcan.com/).
2.  Neofonie GmbH, a Berlin-based company offering leading technologies in the area of Web search, social media and mobile applications (http://www.neofonie.de/index.jsp).

Thank you a lot for your support!

I personally would also like to thank:

1.  Anja Jentzsch, Robert Isele, and Christopher Sahnwaldt for all their great work on implementing the new extraction framework and for setting up the mapping wiki.
2.  Andreas Lange and Sidney Bofah for correcting and extending the mappings in the Wiki.

Cheers,

Chris Bizer