DBpedia Blog

Databus Mods – Linked Data-driven Enrichment of Metadata

Summary

DBpedia Databus Feature – Over the last few months, we gave our DBpedia members multiple chances to present their work, tools, and applications. In this way, our members have given exclusive insights on the DBpedia blog. This week we will start the DBpedia Databus Feature, which allows you to get more information about current and future developments around DBpedia and the DBpedia Databus. Have fun while reading!

As a review, the DBpedia Databus is a digital factory platform that aims to support FAIRness by facilitating a registry of files (on the Web) using DataID metadata. In a broader perspective, the Databus is part of DBpedia’s Vision which aims to establish a FAIR Linked Data backbone by building an ecosystem using its stable identifiers as a central component. Currently, this ecosystem consists of the Databus file registry, DBpedia Archivo, and the DBpedia Global ID management.

As part of this vision, this article presents Databus Mods, a flexible metadata enrichment mechanism for files published on the Databus using Linked Data technologies. 

Databus Mods are activities analyzing and assessing files published with the Databus DataID that provide additional metadata in the form of fine-grained information containing data summaries, statistics, or descriptive metadata enrichments. 

These activities create provenance metadata based on PROV-O to link any generated metadata to the persistent Databus file identifiers, independent of its publisher. The generated metadata is provided in a SPARQL endpoint and an HTTP file server, increasing (meta)data discovery and access. 

Additionally, this thesis proposes the Databus Mods Architecture, which uses a master-worker approach to automate Databus file metadata enrichments. The Mod Master service monitors the Databus SPARQL endpoint for updates, distributes scheduled activities to Mod Workers, collects the generated metadata, and stores it uniformly. Mod Workers implement the metadata model and provide an HTTP interface for the Mod Master to invoke a Mod Activity for a specific Databus file. The Mod Master can handle multiple Mod Workers of the same type concurrently, allowing scaling the system’s throughput.

The Databus Mods Architecture implementation is provided in a public accessible GitHub repository, allowing other users to deploy their Mods reusing existing components. Further, the repository contains a maven library that can be used to create your own Mod Workers in JVM-like languages or validate the implementation of the so-called Mod API, which is necessary for the Mod Master to control a Mod Worker.

Currently, the DBpedia Databus provides five own initial Databus Mod Workers. The following paragraphs showcase two essential Mods, the first feasible for all Databus files and the second specific for RDF files.

MIME-Type Mod. This essential Mod provides metadata for other applications or Mods about the specific MIME-Type of Databus files. The MIME-Type Mod analyzes every file on the Databus, sniffs on their data using Apache Tika, and generates metadata that assigns detected IANA Media Types to Databus file identifiers using the Mods metadata model.

VoID-Mod. The Vocabulary of Interlinked Datasets (VoID) is a popular metadata vocabulary to describe the content of Linked Datasets. The VoiD Mod generates statistics based on the RDF VoID vocabulary for RDF files. A major use case of the VoID Mod is to search for relevant RDF datasets by using the VoID Mod metadata. By writing federated queries, it is possible to filter files on the Databus that have to contain specific properties or classes.

Listing 12: Federated query over VoID Mod Results and the DataID to retrieve Databus files containing RDF statements having dbo:bithData as property or dbo:Person as type. The results are filtered by dct:version and dataid:account

Example: Federated query over VoID Mod Results and the DataID to retrieve Databus files containing RDF statements having dbo:birthDate as property or dbo:Person as type. The results are filtered by dct:version and dataid:account.

Databus Mods were created as part of my master’s thesis, which I submitted in spring 2021.

Stay safe and check Twitter or LinkedIn. Furthermore, you can subscribe to our Newsletter for the latest news and information around DBpedia.

Marvin Hofer

on behalf of the DBpedia Association

Leave a Reply

Your email address will not be published. Required fields are marked *