DBpedia Live

DBpedia Live is a service complementary to monthly-scheduled DBpedia dump releases and the dump-based SPARQL endpoint allowing to retrieve and query “live” (w.r.t. Wikipedia) data about DBpedia resources.

DBpedia Live monitors edits on Wikipedia, and extracts the information of an article after it was changed. Moreover it transfers the updates to a dedicated online SPARQL database – DBpedia Live SPARQL endpoint – for querying. This continuous extraction allows to query for real-time information and bridges the information gap between monthly releases or update intervals of the primary (static) SPARQL endpoint.

Updates, Feedback and DBpedia Live 2.0

Major updates are posted on the DBpedia Blog . The DBpedia Forum has a category called DBpedia Live for minor updates. The forum is also the place to ask questions. We are also keeping FAQs below.

Currently, we are working on DBpedia Live 2.0: A lightweight, scalable, faster and multilingual microservice architecture for continuous extraction, retrieval and loading of live DBpedia data.

Important Pointers

DBpedia Live System Architecture

The main components of the DBpedia Live system are described on a high level below. If you are interested in more details have a look at the devilopment bible.

  • EventStreamsFeeder: The feeder is used in order to fetch information about recent changes of Wikipedia pages. It consumes the Wikimedia EventStreams recentchange stream.
  • MappingWiki: DBpedia mappings can be found at http://mappings.dbpedia.org, which is itself a wiki. We also use OAI-PMH to get a stream of updates in DBpedia mappings. Basically, a change of mapping affects several Wikipedia pages, which should all be reprocessed.
  • DBpedia Live Extraction Manager: This component is the actual DBpedia-Live extraction framework. When there is a page that should be processed, the framework applies the extractors to it. After processing a page, the newly extracted RDF statements are inserted into the backend data store (the Quad Store functionality of the Virtuoso Universal Server), where they replace the old RDF statements. The newly extracted RDF is also written to a compressed N-Triples file. Mirrors of DBpedia-Live, as well as other applications that should always be in synchronization with our DBpedia-Live endpoint, can download those changeset files and feed them into their own RDF data stores. The extraction manager is discussed in more detail below.

Features

The live-extraction framework is deployed on a server hosted by OpenLink Software.  It has a SPARQL endpoint, also operated by OpenLink Software, at http://live.dbpedia.org/sparql, and its status can be viewed at http://live.dbpedia.org/live/

It has the following  features:

  1. Abstract extraction: The abstract of of a Wikipedia article contains the first few paragraphs of that article. The new framework has the ability to cleanly extract the abstract of an article.
  2. Mapping-affected pages: Upon a change in mapping, the pages affected by that mapping should be reprocessed and their RDF descriptions should be updated to reflect that change.
  3. Updating unmodified pages: Sometimes a change in the system occurs, e.g. a change in the implementation of an extractor. This change can affect many pages even if they are not modified. In DBpedia-Live, we use a low-priority queue for such changes, such that the updates will eventually appear in DBpedia-Live, but recent Wikipedia updates are processed first.
  4. Publication of changesets: Upon modification, old RDF statements are replaced with updated statements. The added and/or deleted statements are also written to N-Triples files and then compressed. Any client application or DBpedia-Live mirror can download the files and integrate (and, hence, update) a local copy of DBpedia. This enables that application to stay in synchronization with our version of DBpedia-Live.
  5. Development of synchronization tool: The synchronization tool enables a DBpedia-Live mirror to stay in synchronization with our live endpoint. It downloads the changeset files sequentially, decompresses them, and integrates them with another DBpedia-Live mirror.

In addition to the infobox extraction process, the framework has currently 19 extractors which process the following types of Wikipedia content:  

  • Labels
  • Abstracts
  • Interlanguage links
  • Images
  • Redirects
  • Disambiguation
  • External links
  • Page links
  • Homepages
  • Geo-coordinates
  • Person data
  • PND
  • SKOS categories
  • Page ID
  • Revision ID
  • Category label
  • Article categories
  • Mappings
  • Infobox

FAQ