DBpedia Blog

GSoC 2021 with DBpedia – We did it again ;)


Like every year DBpedia is again a part of the Google Summer of Code project. We got many applications this year. Out of these applications 10 great projects from students all over the world were selected. We are excited to see them working further on their projects! In the following we present you more details about what has been planned:

Project Overview GSoC 2021

Towards a neural extraction framework by Ziwei XU

In the large majority of cases in DBpedia, it is not clear what kind of relationship exists between the entities. Instead of extracting the triples <subject, predicate, object> from semi-structured data only, we want to leverage information found in the entirety of a Wikipedia article, including page text.

The goal of this project is to develop a framework for predicate resolution of wiki links among entities, specifically, we focus on the direct cause-effect relations between events. Our task then is to extract the cause-effect entity pairs (e.g., Peaceful_Revolution, German_reunification) from the wikipedia text. We combine the idea of using the seed data (e.g., the known cause-effect entity pairs) with training a classifier (e.g., a discriminative model— LSTM ), so as to discover more cause-effect entity pairs from wikipedia text, which is known as distant supervised relation extraction. The procedures include pattern matching, knowledge exploration, entity recognition, entity mapping, etc. Eventually, we aim to acquire more reliable causal relations between entities of DBpedia. Mentors: Tommaso Soru, Thiago Castro Ferreira, Zheyuan BAI

Neural QA Model for DBpedia by Siddhant Jain

In order to make DBpedia and its humongous linked data available to a larger user base in their natural languages, a Neural QA model has been developed to answer the question in English posed by users. This particular project aims to make our end-to-end system learn better compositionality of questions, by improving our current dataset and our learning model. Mentors: Tommaso Soru, Anand Panchbhai

Lifecycle Management of DBpedia Neural QA Models by Sahan Dilshan

DBNQA is a large database that can be used to create question answering models. Using a question answering model like NSPM, we can create and experiment with various kinds of QA models with DBNQA. But as we create more and more models it’s become more complicated to manage and use these models. To overcome a situation like this, it’s mandatory to maintain an AI lifecycle management mechanism. This project is trying to address this problem by designing and implementing a lifecycle management framework for DBpedia Neural Question Answering Models. Mentors: Edgard Marx, Lahiru Hinguruduwa

DBpedia Spotlight Dashboard: an integrated statistical information tool from the Wikipedia dumps and the DBpedia Extraction Framework artifacts by José Manual Díaz Urraco

DBpedia Spotlight was released in 2011 by DBpedia. It is a tool that allows to annotate DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia.

To make this possible, a model must be created for each language using Wikistats (uriCounts, pairCounts, sfAndTotalCounts and tokenCounts), that are obtained from the Wikipedia dump, and the following DBpedia Extraction Framework artifacts: instance types, redirects and disambiguations.

The main idea of the project is to generate a Dashboard that shows statistical information about data collected by DBpedia Extraction Framework and Wikistats. This information will help to have an overview of the existent types of classes, how they are statistically represented (which type of entity is the most common), and the trend that exists. In addition, it is intended to add a comparative element between versions of the same language which will help to appreciate changes from one version to another (number of entities, types of entities, trends of each version, etc.). All this information can be used to improve the identification of topics in documents. Mentors: Said Polanco-Martagón, Maribel, Beyza Yaman, Julio Hernandez, Jan Forberg

Update DBpedia Sparql for newly updated wiki resources and specifically related to pandemic, healthcare, and health AI fields by Guang Zhang

IMPACT User-friendly QA platform for a given DBpedia resource and specifically about pandemic in real-time.

GOALS: To improve the DBpedia Sparql for real-time monitoring of pandemic situations near the users and the interested country by the users. Compile pandemic data and wikidata for answering users’ questions by query databases, and support research and development in the healthcare field related to model predictions and vaccine development.

As Google engine can answer any question, DBpedia Sparql can query databases and answer questions using Wikipedia resources. The COVID-19 pandemic is an ongoing global pandemic, and humans would need to co-exist with the viruses. We should compile the public wikidata related to coronavirus and learn from them. The aim of this project is to update the DBpedia Sparql tool for answering questions related to wikidata and specifically about the coronaviruses. Thus, the improved DBpedia Sparql tool would provide better understanding of the coronavirus pandemic for the public, and serve as a platform for research and development in the public healthcare field. Mentors: Marvin Hofer, Sebastian Hellmann, Alexander Winter

WARM-UP Git repo: https://github.com/guang-zh/coronavirus_info_app.git

Social Knowledge Graph: Employing SNA measures to Knowledge Graph by Zhipeng Zhao

When novice users use DBpedia for querying, the information they really want is always overwhelmed by numerous query results. In this project, we want to leverage the Knowledge Graph of DBpedia to develop a graph-query tool that can help the end user to obtain relevant information w.r.t his request/input/query. We can give the users a subgraph where the concept/entity that students query for is center and it is surrounded by its most important concepts (like the top-5 or top-10, in terms of the Social Network Analysis measures). Mentor: Luca Virgili

Modular DBpedia Chatbot by Jayesh Desai

The project aims to extend the functionality of the current DBpedia chatbot by integrating the ecosystem of the Qanary framework including its plug-and-play components and plans to move to Google DialogFlow to use state-of-art technology. Mentors: Andreas Both, Aleksandr Perevalov, Ram G Athreya, Ricardo Usbeck

DBpedia Live Neural Question Answering Chatbot by Ashutosh Kumar

This project aims at building a chatbot that can query the DBpedia based on the (DBQNA) dataset, Using Natural language as well as Query language so that the DBpedia content accessibility can be increased and we can also enable community evaluation and feedback on DBpedia NSpM model. Mentors: Edgard Marx, Diego Moussallem, Thiago Castro Ferreira, Nausheen Fatma

User Centric Knowledge Engineering and Data Visualization by Karan Kharecha

The ontologies dashboard was developed last year to show the statistics of the data in an interactive manner for helping community members to get a quick overview from different SPARQL endpoints and Databus collections. This year, the focus is more on user engagement for performing data analysis without leaving DBpedia’s Ecosystem. The project is about including more user-customized activities. This includes enabling user login and creating multiple dashboard instances by specifying the Databus collections. The users can write queries and get the results for visualizing the data, here itself. The system will allow users to publish their own dashboards of their linked data by plotting the graphs they like. With this, there are benefits like: User retention rate, and Flexibility for deriving insights.

The system design of this project uses the state-of-the-art approach for developing the user specific dashboards by querying the data on specified sources in a modularized manner. This could be the addition of a new sub-system in existing DBpedia’s Ecosystem. Mentors: Jan Forberg, Luca Virgili

Web app to generate RDF from DBpedia abstracts by Fernando Casabán Blasco

With the recent advances in the processing and analysis of texts in natural language, the conversion of texts into RDF triples is becoming a real possibility, allowing to build knowledge graphs from raw text.

In order to contribute to the open source software development for the DBpedia community I propose a project for the software development of an online tool that allows users to convert texts such as the abstract of a certain DBpedia resource into a set of RDF triples. This will be achieved by combining the use of syntactic analyzers and name entity identifiers. Mentor: Mariano Rico

Stay safe and check Twitter or LinkedIn for more info about the DBpedia projects for the GSoC 2021. Furthermore, you can subscribe to our newsletter for the latest news and information around DBpedia.

Julia & Emma

on behalf of the DBpedia Association

Leave a Reply

Your email address will not be published. Required fields are marked *