Popular Individual Datasets

Instead of using a DBpedia Latest Core Release plenty of the datasets from DBpedia extraction can be also used standalone. The following list gives an overview on the popular datasets that can be used individually. Also see the paper for details.

Generic extraction

These datasets are created with a generic smart parser. This data has the highest coverage, i.e. it is available for ~140 languages and in general they extract the most data out of the different parts of the Wikipedia pages.

Labels (human readable name of the entity derived from Wiki title)
Facts (all infobox facts in 140 languages using localized properties in /property/ namespace)
Geo-coordinates
Wikipedia categories (information about category structures and the articles belonging to them)
Wikipedia Links (outgoing links to other DBpedia Entities)
Wikilinks ( internal links of Wikipedia)
Wikilink Anchor text (surface form of the Articles when cross-referenced)

Text Extraction

Concerned with the textual content of the page. We list the most popular datasets here. Please see the language resource (NIF) page for details.

Abstracts (short description of the entity based on the article`s first sentences)
Long abstracts (text before the table of contents in the article)

Mappings-based Extraction

Mappings are created in the Mappings Wiki. They are rules that optimize the generic smart parser mentioned above. This data has lower coverage than generic, i.e. only ~40 languages, but data is of much higher quality.

Instance-types
Literals
Objects
Geo-coordinates

Did you consider this information as helpful?
Yep!Not quite ...

Contact us

Generic extraction

Text Extraction

Mappings-based Extraction