- Pinky: Gee, Brain, what are we gonna do this year?
- Brain: The same thing we do every year, Pinky. Taking over GSoC.
And, this is exactly what we did. We had been accepted as one of 206 open source organizations to participate in Google Summer of Code (GSoC) again. More than 25 students followed our call for project ideas. In the end, we chose six amazing students and their project proposals to work with during summer 2019.
In the following post, we will show you some insights into the project ideas and how they turned out. Additionally, we will shed some light onto our amazing team of mentors who devoted a lot of time and expertise in mentoring our students.
Meet the students and their projects
A Neural QA Model for DBpedia by Anand Panchbhai
With booming amount of information being continuously added to the internet, organising the facts and serving this information to the users becomes a very difficult task. Currently, DBpedia hosts billions of data points and corresponding relations in the RDF format. Accessing data on DBpedia via a SPARQL query is difficult for amateur users, who do not know how to write a query. This project tried to make this humongous linked data available to a larger user base in their natural languages (now restricted to English). The primary objective of the project was to translate natural language questions to a valid SPARQL query. Click here if you want to check his final code.
Multilingual Neural RDF Verbalizer for DBpedia by Dwaraknath Gnaneshwar
Presently, the generation of Natural Language from RDF data has gained substantial attention and has also been proven to support the creation of Natural Language Generation benchmarks. However, most models are aimed at generating coherent sentences in English, while other languages have enjoyed comparatively less attention from researchers. RDF data is usually in the form of triples, <subject, predicate, object>. Subject denotes the resource, the predicate denotes traits or aspects of the resource and expresses the relationship between subject and object. In this project, we aimed to create a multilingual Neural Verbalizer, ie, generating high-quality natural-language text from sets of RDF triples in multiple languages using one stand-alone, end-to-end trainable model. You can follow up on the progress and outcome of the project here.
Predicate Detection using Word Embeddings for Question Answering over Linked Data by Yajing Bian
Knowledge-based question-answering system (KBQA) has demonstrated an ability to generate answers to natural language from information stored in a large-scale knowledge base. Generally, it completes the analysis challenge via three steps: identifying named entities, detecting predicates and generate SPARQL queries. In these three steps, predicate detection identifies the KB relation(s) a question refers to. To build a predicate detection structure, we identified all possible named entity first, then collected all predicates corresponding to the above entities. What follows is to calculate the similarity between problem and candidate predicates using a multi-granularity neural network model (MGNN). To find the globally optimal entity-predicate assignment, we use a joint model which is based on the result of entity linking and predicate detection process rather than considering the local predictions (i.e. most possible entity or predicate) as the final result. More details on the project are available here.
A tool to generate RDF triples from DBpedia abstract by Jayakrishna Sahit
The main aim of this project was to research and develop a tool in order to generate highly trustable RDF triples from DBpedia abstracts. In order to develop such a tool, we implemented algorithms which would take the output generated from the syntactic analyzer along with DBpedia spotlight’s named entity identifiers. Further information and the project’s results can be found here.
A transformer of Attention Mechanism for Long-context QA by Stuart Chan
In this GSoC project, I choose to employ the language model of the transformer with an attention mechanism to automatically discover query templates for the neural question-answering knowledge-based model. The ultimate goal was to train the attention-based NSpM model on DBpedia with its evaluation against the QALD benchmark. Check here for more details on the project.
Workflow for linking External datasets by Jaydeep Chakraborty
The requirement of the project was to create a workflow for entity linking between DBpedia and external data sets. We aimed at an approach for ontology alignment through the use of an unsupervised mixed neural network. We explored reading and parsing the ontology and extracted all necessary information about concepts and instances. Additionally, we generated semantic vectors for each entity with different meta information like entity hierarchy, object property, data property, and restrictions and designed a User Interface based system which showed all necessary information about the workflow. Further info, download details and project results are available here.
Meet our Mentors
First of all, a big shout out and thank you to all mentors and co-mentors who helped our students to succeed in their endeavours.
- Aman Mehta, former GSoC student and current junior mentor, recently interned as a software engineer at Facebook, London.
- Beyza Yaman, a senior mentor and organizational admin, Post-Doctoral Researcher based in ADAPT, Dublin City University, former Springer Nature-DBpedia intern and former research associate at the InfAI/University of Leipzig. She is responsible for the Turkish DBpedia and her field of interests are information retrieval, data extraction and integration over Linked Data.
- Tommaso Soru, senior mentor and organizational admin. I’m a Machine Learning & AI enthusiast, Data Scientist at Data Lens Ltd in London and a PhD candidate at the University of Leipzig.
“DBpedia is my window to the world of semantic data, not only for its intuitive interface but also because its knowledge is organised in a simple and uncomplicated way”Tommaso Soru, GSoC 2019
- Amandeep Srivastava, Junior Mentor and analyst at Goldman Sachs. He’s a huge fan of Christopher Nolan and likes to read fiction books in his free time.
- Diego Moussalem, Senior mentor, Senior Researcher at Paderborn University, an active and vital member of the Portuguese DBpedia Chapter.
- Luca Virgili, currently a Computer Science PhD student at the Polytechnic University of Marche.He was a GSoC student for a year and a GSoC mentor for 2 years in DBpedia.
- Bharat Suri, former GSOC student, Junior Mentor, Masters degree in Computer Science at The Ohio State University
“I have thoroughly enjoyed both my years of GSoC with DBpedia and I plan to stay and help out in whichever way I can”Bharat Suri, GSoC 2019
- Mariano Rico, senior mentor, Senior Doctor Researcher at Ontology Engineering Group, Universidad Politécnica de Madrid.
- Nausheen Fatma, senior mentor, Data Scientist, Natural Language Processing, Machine Learning at Info Edge (naukri.com).
- Ram G Athreya long-term GSoC mentor, Research Engineer at Viv Labs, Bay Area, San Francisco.
- Ricardo Usbeck, team leader ‘Conversational AI and Knowledge Graphs’ at Fraunhofer IAIS.
- Rricha Jalota, former GSoC students, current senior mentor, developer in the Data Science Group at University of Paderborn, Germany
“The reason why I love collaborating with DBpedia (apart from the fact that, it’s a powerhouse of knowledge-driven applications) is not only it gave me my first big break to the amazing field of NLP but also to the world of open-source!”Rricha Jalota, GSoC 2019
Mentor Summit Recap
This GSoC marked the 15th consecutive year of the program and was the 8th season in a row for DBpedia. As usual in each year we had two of our mentors, Rricha Jalota and Aashay Singhal joining the annual GSoC mentor summit. Selected mentors get the chance to meet each other and engage in a vital knowledge and expertise exchange around various GSoC related and non-related topics. Apart from more entertaining activities such as games, a scavenger hunt and a guided trip through Munich mentors also discussed pressing questions such as “why is it important to fail your students” or “how can we have our GSoC students stay and contribute for long”.
After GSoC is before the next GSoC
If you are interested in either mentoring a DBpedia GSoC project or if you want to contribute to a project of your own we are happy to have you on board. There are a few things to get you started.
- Have a look at previous DBpedia projects on GitHub
- Get in touch with old mentors and potential future mentors for example via our DBpedia Forum. We have a dedicated group for exchange about the upcoming season in 2020.
Likewise, if you are an ambitious student who is interested in open source development and working with DBpedia you are more than welcome to either contribute your own project idea or apply for project ideas we offer starting in early 2020.
See you soon,