DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web. A knowledge graph is a special kind of database which stores knowledge in a machine-readable form and provides a means for information to be collected, organised, shared, searched and utilised. Google uses a similar approach to create those knowledge cards during search. We hope that this work will make it easier for the huge amount of information in Wikimedia projects to be used in some new interesting ways.
DBpedia data is served as Linked Data, which is revolutionizing the way applications interact with the Web. One can navigate this Web of facts with standard Web browsers, automated crawlers or pose complex queries with SQL-like query languages (e.g. SPARQL). Have you thought of asking the Web about all cities with low criminality, warm weather and open jobs? That’s the kind of query we are talking about.
Learn about DBpedia
If you like what our project does but are still new to DBpedia there are a few articles that can help you get started:
- Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer. DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia.
- About DBpedia Spotlight: Pablo N. Mendes, Max Jakob, André García-Silva, Christian Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents. In the Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011). Graz, Austria, September 2011.
About DBpedia Internationalization
- Dimitris Kontokostas, Charalampos Bratsas, Sören Auer, Sebastian Hellmann, Ioannis Antoniou, George Metakides. Internationalization of Linked Data: The case of the Greek DBpedia edition, Web Semantics: Science, Services and Agents on the World Wide Web, Volume 15, September 2012, Pages 51–61, ISSN 1570–8268, 10.1016/j.websem.2012.01.001.
About DBpedia Live
- Mohamed Morsey, Jens Lehmann, Sören Auer, Claus Stadler, Sebastian Hellmann (2012). DBpedia and the live extraction of structured data from Wikipedia. Program: electronic library and information systems, Vol. 46 Iss: 2, pp.157 – 18.
The DBpedia Knowledge Base
Knowledge bases are playing an increasingly important role in enhancing the intelligence of Web and enterprise search and in supporting information integration. Today, most knowledge bases cover only specific domains, are created by relatively small groups of knowledge engineers, and are very cost intensive to keep up-to-date as domains change. At the same time, Wikipedia has grown into one of the central knowledge sources of mankind, maintained by thousands of contributors.
The DBpedia project leverages this gigantic source of knowledge by extracting structured information from Wikipedia and by making this information accessible on the Web under the terms of the Creative Commons Attribution-ShareAlike 3.0 License and the GNU Free Documentation License.
The DBpedia knowledge base has several advantages over existing knowledge bases: it covers many domains; it represents real community agreement; it automatically evolves as Wikipedia changes, and it is truly multilingual. The DBpedia knowledge base allows you to ask quite surprising queries against Wikipedia, for instance “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century”. Altogether, the use cases of the DBpedia knowledge base are widespread and range from enterprise knowledge management, over Web search to revolutionizing Wikipedia search.
The DBpedia Data Provision Architecture
The DBpedia RDF Data Set is hosted and published using OpenLink Virtuoso. The Virtuoso infrastructure provides access to DBpedia’s RDF data via a SPARQL endpoint, alongside HTTP support for any Web client’s standard GETs for HTML or RDF representations of DBpedia resources.
Illustration of Current DBpedia Data Provision Architecture
Though the DBpedia RDF Data has always been housed in Virtuoso, which has supported all desired means of access since the DBpedia project began, early DBpedia releases used Pubby Linked Data Deployment services in front of the Virtuoso SPARQL endpoint.
As the project gained traction, the HTTP demands on Pubby’s out-of-process Linked Data Publishing services increased, and the natural option was to take advantage of Virtuoso’s SPASQL (SPARQL inside SQL) and other Linked Data Deployment features, by moving these services in-process with Virtuoso.
Illustration of Deprecated Architecture
Nucleus for the Web of Data
Within the W3C Linking Open Data (LOD) community effort, an increasing number of data providers have started to publish and interlink data on the Web according to Tim Berners-Lee’s Linked Data principles. The resulting Web of Data currently consists of several billion RDF triples and covers domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. In addition to publishing and interlinking datasets, there is also ongoing work on Linked Data browsers, Linked Data crawlers, Web of Data search engines and other applications that consume Linked Data from the Web.
The DBpedia knowledge base is served as Linked Data on the Web. As DBpedia defines Linked Data URIs for millions of concepts, various data providers have started to set RDF links from their data sets to DBpedia, making DBpedia one of the central interlinking-hubs of the emerging Web of Data.
A Brief History
The project was started in 2007 by Sören Auer and Jens Lehmann from the University of Leipzig and Christian Bizer from FU Berlin (now University of Mannheim) along with support from OpenLink.
It was througout its history mostly maintained by the following organisations:
The first face-to-face meeting in Leipzig 2009
Some members of the DBpedia Team at a meeting in Leipzig (December 2009). From left to right: Sören Auer, Sebastian Hellmann, Chris Bizer, Christopher Sahnwaldt, Jens Lehmann, Robert Isele, Claus Stadler