DBpedia Blog

DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

Summary

Hi all,

we are happy to announce the release of DBpedia 3.9.

The most important improvements of the new release compared to DBpedia 3.8 are:

1. the new release is based on updated Wikipedia dumps dating from March / April 2013 (the 3.8 release was based on dumps from June 2012), leading to an overall increase in the number of concepts in the English edition from 3.7 to 4.0 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner concept descriptions.

3. we extended the DBpedia type system to also cover Wikipedia articles that do not contain an infobox.

4. we provide links pointing from DBpedia concepts to Wikidata concepts and updated the links pointing at YAGO concepts and classes, making it easier to integrate knowledge from these sources.

The English version of the DBpedia knowledge base currently describes 4.0 million things, out of which 3.22 million are classified in a consistent Ontology, including 832,000 persons, 639,000 places (including 427,000 populated places), 372,000 creative works (including 116,000 music albums, 78,000 films and 18,500 video games), 209,000 organizations (including 49,000 companies and 45,000 educational institutions), 226,000 species and 5,600 diseases.

We provide localized versions of DBpedia in 119 languages. All these versions together describe 24.9 million things, out of which 16.8 million overlap (are interlinked) with the concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 12.6 million unique things in 119 different languages; 24.6 million links to images and 27.6 million links to external web pages; 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories.

Altogether the DBpedia 3.9 release consists of 2.46 billion pieces of information (RDF triples) out of which 470 million were extracted from the English edition of Wikipedia, 1.98 billion were extracted from other language editions, and about 45 million are links to external data sets.

Detailed statistics about the DBpedia data sets in 24 popular languages are provided at Dataset Statistics.

The main changes between DBpedia 3.8 and 3.9 are described below. For additional, more detailed information please refer to the Change Log.

1. Enlarged Ontology

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 3.9 ontology encompasses

  • 529 classes (DBpedia 3.8: 359)
  • 927 object properties (DBpedia 3.8: 800)
  • 1290 datatype properties (DBpedia 3.8: 859)
  • 116 specialized datatype properties (DBpedia 3.8: 116)
  • 46 owl:equivalentClass and 31 owl:equivalentProperty mappings to http://schema.org

2. Additional Infobox to Ontology Mappings

The editors of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 3.9 extraction, we used 3177 mappings (DBpedia 3.8: 2347 mappings), that are distributed as follows over the languages covered in the release.

3. Extended Type System to cover Articles without Infobox

Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. The new 3.9 release now also contains type statements for articles without infobox that were inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2013. Applying the algorithm allowed us to provide type information for 440,000 concepts that were formerly not typed. A similar algorithm was also used to identify and remove potentially wrong links from the knowledge base.

4. New and updated RDF Links into External Data Sources

We added RDF links to Wikidata and updated the following RDF link sets pointing at other Linked Data sources: YAGO, FreebaseGeonamesGADM and EUNIS. For an overview about all data sets that are interlinked from DBpedia please refer to DBpedia Interlinking.

5. New Find Related Concepts Service

We offer a new service for finding resources that are related to a given DBpedia seed resource. More information about the service is found at DBpedia FindRelated.

Accessing the DBpedia 3.9  Release

You can download the new DBpedia datasets from http://wiki.dbpedia.org/Downloads39.

As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql

Credits

Lots of thanks to

  • Jona Christopher Sahnwaldt (Freelancer funded by the University of Mannheim, Germany) for improving the DBpedia extraction framework, for extracting the DBpedia 3.9 data sets for all 119 languages, and for generating the updated RDF links to external data sets.
  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • Heiko Paulheim (University of Mannheim, Germany) for inventing and implementing the algorithm to generate additional type statements for formerly untyped resources.
  • The whole Internationalization Committee for pushing the DBpedia internationalization forward.
  • Dimitris Kontokostas (University of Leipzig) for improving the DBpedia extraction framework and loading the new release onto the DBpedia download server in Leipzig.
  • Volha Bryl (University of Mannheim, Germany) for generating the statistics about the new release.
  • Petar Ristoski (University of Mannheim, Germany) for generating the updated links pointing at the GADM database of Global Administrative Areas.
  • Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint.
  • OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
  • Julien Cojan, Andrea Di Menna, Ahmed Ktob, Julien Plu, Jim Regan and others who contributed improvements to the DBpedia extraction framework via the source code repository on GitHub.

The work on the DBpedia 3.9 release was financially supported by the European Commission through the project LOD2 – Creating Knowledge out of Linked Data (http://lod2.eu/).

More information about DBpedia is found at http://dbpedia.org/About as well as in the new overview article about the project.

Have fun with the new DBpedia release!

Cheers,

Chris Bizer and Christopher Sahnwaldt