we are happy to announce the release of DBpedia 3.9.
The most important improvements of the new release compared to DBpedia 3.8 are:
1. the new release is based on updated Wikipedia dumps dating from March / April 2013 (the 3.8 release was based on dumps from June 2012), leading to an overall increase in the number of concepts in the English edition from 3.7 to 4.0 million things.
2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner concept descriptions.
3. we extended the DBpedia type system to also cover Wikipedia articles that do not contain an infobox.
The English version of the DBpedia knowledge base currently describes 4.0 million things, out of which 3.22 million are classified in a consistent Ontology, including 832,000 persons, 639,000 places (including 427,000 populated places), 372,000 creative works (including 116,000 music albums, 78,000 films and 18,500 video games), 209,000 organizations (including 49,000 companies and 45,000 educational institutions), 226,000 species and 5,600 diseases.
We provide localized versions of DBpedia in 119 languages. All these versions together describe 24.9 million things, out of which 16.8 million overlap (are interlinked) with the concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 12.6 million unique things in 119 different languages; 24.6 million links to images and 27.6 million links to external web pages; 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories.
Altogether the DBpedia 3.9 release consists of 2.46 billion pieces of information (RDF triples) out of which 470 million were extracted from the English edition of Wikipedia, 1.98 billion were extracted from other language editions, and about 45 million are links to external data sets.
Detailed statistics about the DBpedia data sets in 24 popular languages are provided at Dataset Statistics.
The main changes between DBpedia 3.8 and 3.9 are described below. For additional, more detailed information please refer to the Change Log.
1. Enlarged Ontology
The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 3.9 ontology encompasses
- 529 classes (DBpedia 3.8: 359)
- 927 object properties (DBpedia 3.8: 800)
- 1290 datatype properties (DBpedia 3.8: 859)
- 116 specialized datatype properties (DBpedia 3.8: 116)
- 46 owl:equivalentClass and 31 owl:equivalentProperty mappings to http://schema.org
2. Additional Infobox to Ontology Mappings
The editors of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 3.9 extraction, we used 3177 mappings (DBpedia 3.8: 2347 mappings), that are distributed as follows over the languages covered in the release.
- English: 431 mappings
- Polish: 382 mappings
- Dutch: 335 mappings
- German: 219 mappings
- Greek: 215 mappings
- Portuguese: 211 mappings
- Slovenian: 170 mappings
- French: 165 mappings
- Korean: 148 mappings
- Spanish: 137 mappings
- Hungarian: 111 mappings
- Turkish: 91 mappings
- Japanese: 72 mappings
- Czech: 66 mappings
- Italian: 62 mappings
- Bulgarian: 61 mappings
- Indonesian: 59 mappings
- Catalan: 52 mappings
- Arabic: 51 mappings
- Russian: 48 mappings
- Croatian: 36 mappings
- Basque: 32 mappings
- Irish: 17 mappings
- Bengali: 6 mappings
3. Extended Type System to cover Articles without Infobox
Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. The new 3.9 release now also contains type statements for articles without infobox that were inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2013. Applying the algorithm allowed us to provide type information for 440,000 concepts that were formerly not typed. A similar algorithm was also used to identify and remove potentially wrong links from the knowledge base.
4. New and updated RDF Links into External Data Sources
We added RDF links to Wikidata and updated the following RDF link sets pointing at other Linked Data sources: YAGO, Freebase, Geonames, GADM and EUNIS. For an overview about all data sets that are interlinked from DBpedia please refer to DBpedia Interlinking.
5. New Find Related Concepts Service
We offer a new service for finding resources that are related to a given DBpedia seed resource. More information about the service is found at DBpedia FindRelated.
Accessing the DBpedia 3.9 Release
You can download the new DBpedia datasets from http://wiki.dbpedia.org/Downloads39.
As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql
Lots of thanks to
- Jona Christopher Sahnwaldt (Freelancer funded by the University of Mannheim, Germany) for improving the DBpedia extraction framework, for extracting the DBpedia 3.9 data sets for all 119 languages, and for generating the updated RDF links to external data sets.
- All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
- Heiko Paulheim (University of Mannheim, Germany) for inventing and implementing the algorithm to generate additional type statements for formerly untyped resources.
- The whole Internationalization Committee for pushing the DBpedia internationalization forward.
- Dimitris Kontokostas (University of Leipzig) for improving the DBpedia extraction framework and loading the new release onto the DBpedia download server in Leipzig.
- Volha Bryl (University of Mannheim, Germany) for generating the statistics about the new release.
- Petar Ristoski (University of Mannheim, Germany) for generating the updated links pointing at the GADM database of Global Administrative Areas.
- Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint.
- OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
- Julien Cojan, Andrea Di Menna, Ahmed Ktob, Julien Plu, Jim Regan and others who contributed improvements to the DBpedia extraction framework via the source code repository on GitHub.
The work on the DBpedia 3.9 release was financially supported by the European Commission through the project LOD2 – Creating Knowledge out of Linked Data (http://lod2.eu/).
Have fun with the new DBpedia release!
Chris Bizer and Christopher Sahnwaldt