Instead of using a DBpedia Latest Core Release plenty of the datasets from DBpedia extraction can be also used standalone. The following list gives an overview on the popular datasets that can be used individually. Also see the paper for details.
These datasets are created with a generic smart parser. This data has the highest coverage, i.e. it is available for ~140 languages and in general they extract the most data out of the different parts of the Wikipedia pages.
- Labels (human readable name of the entity derived from Wiki title)
- Facts (all infobox facts in 140 languages using localized properties in
- Wikipedia categories (information about category structures and the articles belonging to them)
- Wikipedia Links (outgoing links to other DBpedia Entities)
- Wikilinks ( internal links of Wikipedia)
- Wikilink Anchor text (surface form of the Articles when cross-referenced)
Concerned with the textual content of the page. We list the most popular datasets here. Please see the language resource (NIF) page for details.
- Abstracts (short description of the entity based on the article`s first sentences)
- Long abstracts (text before the table of contents in the article)
Mappings are created in the Mappings Wiki. They are rules that optimize the generic smart parser mentioned above. This data has lower coverage than generic, i.e. only ~40 languages, but data is of much higher quality.