Exactly 1 year ago, we presented DBpedia Archivo (https://archivo.dbpedia.org, paper, video) at SEMANTiCS 2020. Our initial vision was to create a fully automated, persistent Ontology Archive that serves as a backbone for the Semantic Web and brings a convenient and stable interface to ontology users. In the following, we are listing some points that we would judge as great successes and highlights of running Archivo for over a year.
September 9th, 2021 at 1pm CEST: In particular, we would like to invite you to the DBpedia Ontology session at the DBpedia Day at SEMANTiCS 2021 to discuss the future roadmap for Archivo as a Unified Semantic Ontology Space (USOS) and what the role of the DBpedia Ontology will be in the Semantic Web.
The session will host impulse talks with ample room for discussion. For the first time in the history of the Semantic Web, Archivo offers the possibility to create a Unified Semantic Ontology Space (USOS), a holistic view over all available ontologies. Instead of soft and fuzzy principles such as FAIR, we will discuss hard, implementable criteria to evaluate ontologies in preparation of a well-defined, measurable standard, which will ultimately yield better and reliable ontologies for industrial applications. Another topic is the central collaboration on links and mappings between ontologies to create a more dense and well-connected web of ontologies. Join the discussion and register here.
Successes and Highlights
An Exhaustive Ontology Archive
We implemented 5 discovery mechanisms that run each week. These mechanisms have proven effective to develop Archivo into one of the most exhaustive ontology archives. As of today, Archivo provides an alternative, persistent download location for 1407 ontologies. Growth has not reached a plateau, yet and it is steadily growing at a pace of 12.6 ontologies per week (6 month average).
While 1246 ontologies were automatically discovered, we also received 159 user submission (i.e. adding the Ontology URL at https://archivo.dbpedia.org/add). Archivo is also serving 90 ontology downloads on an average day (plus 640 daily downloads from major bots) and will soon provide popularity ratings. The archive can be downloaded as a whole. Note that we also keep some ontologies that are no longer available under their original URL such as: GEORSS (info, download) to allow stable operation of the Semantic Web.
Archivo uses all kinds of cunning tricks to find, access and persist ontologies. Our crawlers and parsers have matured over the last year and – although we might have overlooked something – we are quite certain that the following statement holds: “If DBpedia Archivo can not process an ontology, the ontology is not retrievable or parseable, which will negatively impact all further applications”. On the other hand, if Archivo manages to access and parse the ontology, it will be persisted for future generations (following a fair use / no abuse policy regarding size restrictions).
Ontology Quality vs. Coverage
Besides accessibility, Archivo evaluates availability and conformity of license statements as well as consistency as a minimal baseline to assign the 4 Archivo stars. On August 16th, 2021, we can report that the web of ontology reached above 2 stars on average with 303 ★★★★, 246 ★★★☆, 18 ★★☆☆ and 836 ★☆☆☆ ontologies. Two weeks later the average fell to 1.999 stars as 4 more ontologies were discovered. We see it as a challenge for Archivo to likewise improve the orthogonal goals of exhaustive coverage as well as high quality ontologies. We believe, however, that the system is able to accommodate both over time.
Ontologies are checked every 8 hours for changes. So far Archivo has archived 3713 for the 1407 ontologies. Ontology practitioners are now able to code applications to specific archived ontology versions and need not fear that major ontological changes are published under the same URL, breaking SPARQL queries and applications.
on behalf of the DBpedia Association