Popular Individual Datasets

Instead of using a DBpedia Latest Core Release plenty of the datasets from DBpedia extraction can be also used standalone. The following list gives an overview on the popular datasets that can be used individually. Also see the paper for details.

Generic extraction

These datasets are created with a generic smart parser. This data has the highest coverage, i.e. it is available for ~140 languages and in general they extract the most data out of the different parts of the Wikipedia pages.

Text Extraction

Concerned with the textual content of the page. We list the most popular datasets here. Please see the language resource (NIF) page for details.

Mappings-based Extraction

Mappings are created in the Mappings Wiki. They are rules that optimize the generic smart parser mentioned above. This data has lower coverage than generic, i.e. only ~40 languages, but data is of much higher quality.

  • Instance-types
  • Literals
  • Objects
  • Geo-coordinates