Language Resources (NIF)

DBpedia primarily focuses on representing the factual knowledge contained in the Wikipedia infoboxes. A vast amount of information, however, is comprised in the unstructured Wikipedia article texts. In order to broaden and deepen the amount of structured DBpedia data, the article texts are targeted as another data source.
With the representation of the wiki pages in the NLP Interchange Format (NIF) we provide all information directly extractable from the HTML source code divided in three datasets:

  • nif-context: the full text of a page as context (including begin and end index)
  • nif-page-structure: the structure of the page in sections and paragraphs (titles, subsections etc.)
  • nif-text-links: all in-text links to other DBpedia resources as well as external references

These datasets serve as the groundwork for further NLP fact extraction tasks to enrich the gathered knowledge of DBpedia.

Note: The first iteration of this extraction process only covers the abstracts of every wiki page as a trail run. It is based on the DBpedia 2016-10 release and provides the whole wiki page text in the NIF format.

IRIs: As you will see in the examples below, opposed to the IRI regime used for other DBpedia datasets, we use queries containing the version of DBpedia under which these instances were extracted. 

If you find inconsistencies in these files, please contact the DBpedia mailing lists or the DBpedia association directly.

Downloads

A sample list of the most recent files is given in the table below. The whole list of available languages can be found on the DBpedia Databus platform as nif-contextnif-page-structure, and nif-text-links.

Languagenif-contextnif-page-structurenif-text-links
de.ttl.ttl.ttl
en.ttl.ttl.ttl
es.ttl.ttl.ttl
fr.ttl.ttl.ttl
it.ttl.ttl.ttl
ja.ttl.ttl.ttl
ko.ttl.ttl.ttl
pl.ttl.ttl.ttl
pt.ttl.ttl.ttl

The Ontology

The following Figure represents the main classes and properties of the NIF vocabulary

NIF ontology

Libraries

Integrate the NIF library into your project by:

  • adding the NIF maven library.
  • compiling it on your own with the NIF-lib github project.
  • compiling the pyNIF-lib github project. 

Documentation

A deeper understanding of NIF can be gained by consulting the documentation. It provides the pointers to all important resources for the NLP Interchange Format.

Example:

input text: “Anthropology is the study of humanity. Its main subdivisions are social anthropology and cultural anthropology, which describes the workings of societies around the world, linguistic anthropology, which investigates the influence of language in social life, and biological or physical anthropology, which concerns long-term development of the human organism. Archaeology, which studies past human cultures through investigation of physical evidence, is thought of as a branch of anthropology in the United States, although in Europe, it is viewed as a discipline in its own right, or grouped under related disciplines such as history.”

The result is a set of .TTL files containing the context, page structure and text links.

nif-context.ttl

Represents the full text of a wiki page as the context for all subsequent information about this page.

dbr:Anthropology?dbpv=2016-04&nif=context a nif:#Context .

dbr:Anthropology?dbpv=2016-04&nif=context nif:isString "Anthropology is the study of humanity. Its main subdivisions are social anthropology and cultural anthropology, which describes the workings of societies around the world, linguistic anthropology, which investigates the influence of language in social life, and biological or physical anthropology, which concerns long-term development of the human organism. Archaeology, which studies past human cultures through investigation of physical evidence, is thought of as a branch of anthropology in the United States, although in Europe, it is viewed as a discipline in its own right, or grouped under related disciplines such as history." .

dbr:Anthropology?dbpv=2016-04&nif=context nif:beginIndex "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context nif:endIndex "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context nif:sourceUrl <http://en.wikipedia.org/wiki/Anthropology> .
dbr:Anthropology?dbpv=2016-04&nif=context nif:predLang <http://lexvo.org/id/iso639-3/eng> .

nif-page-structure​.ttl

Represents the structure of the wiki page as nif:Structure instances including section, paragraph and title.

dbr:Anthropology?dbpv=2016-04&nif=context nif:hasSection dbr:Anthropology?dbpv=2016-04&nif=section_0_634 .

dbr:Anthropology?dbpv=2016-04&nif=section_0_634    a    nif:Section    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:endIndex    "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:hasParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:hasParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:firstParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:lastParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_63    .

dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    a    nif:Paragraph    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:endIndex    "330"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    a    nif:Paragraph    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:beginIndex    "331"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:endIndex    "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

nif-text-links.ttl

Represents all in-text links of a wiki page as nif:Word or nif:Phrase instances.

dbr:Anthropology?dbpv=2016-04&nif=word_29_37    a    nif:Word .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:beginIndex    "29"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:endIndex    "37"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    <http://www.w3.org/2005/11/its/rdf#taIdentRef>    dbr:Human .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:anchorOf    "humanity" .

dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    a    nif:Phrase    .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:beginIndex    "65"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:endIndex    "84"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    <http://www.w3.org/2005/11/its/rdf#taIdentRef>    dbr:Social_anthropology .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:anchorOf    "social anthropology" .

Related Publications