Lookup linking is a text enrichment method involving the DBpedia Lookup service.
The DBpedia Lookup service is an entity retrieval service for DBpedia entities that resolves keywords to resource identifiers. Thus it can be used to enrich plain text documents or tables with DBpedia URIs. Linking your labels to resource identifiers opens the door to the linked data world for the task at hand.
Let’s have a look at the following example:
We have an imaginary CSV table with employee data but we forgot to collect the area codes of the adresses. Without the area codes we are unable to call our employees! Note that this table might as well be 1000 or more entries long. Prepare for a Google-Marathon this weekend!
The DBpedia Lookup service accepts keywords and returns resource identifiers in the DBpedia knowledge graph. This graph contains a vast amount of city data – including the area codes. Linking our city labels to DBpedia resources and retrieving the codes from the knowledge graph can be done with a few lines of code in the following steps
- Retrieve all the entries from the City column
- Iterate over the city labels and send a get request to the DBpedia Lookup
(using https://lookup.dbpedia.org/api/search?type=City&query= + city label)
- Fetch the resource identifier of the first result
- Send a SPARQL query against http://dbpedia.org/sparql for each URI to retrieve the respective area code
After retrieving the data and writing the results back into our CSV the result could look like this:
|Name||Phone||City||City URI||Area Code|
Problem almost solved! Data from the DBpedia Knowledge Graph can still be incomplete or inconsistent in some cases. However, even if our script only solves 95% of our entries correctly, it will still save us hours of manual search.
Time to enjoy the weekend!
Minimum Relevance Filtering
Each result of any DBpedia Lookup search is given a score based on label matches and other factors. Sometimes a label cannot be matched to a resource properly and only a few results are returned. Let’s assume that these results are only matched because of a similar label and not because they describe the entity you were actually looking for. In this case it might be better to reject those results as they would lead to a bad link. The DBpedia Lookup Service lets you specify a minimum score (or minimum relevance). All results with a score less than the specified minimum score are discarded and not suggested as a result.