DBpedia Blog

My DBpedia Internship Journey

Summary

written by Haniya Konain

After successfully completing GSoC, I asked my mentor if there were any internship opportunities available at DBpedia. Getting at least one internship during my college years has always been one of my main goals, so I reached out to him to see if DBpedia had anything open. He suggested that I contact Julia, as she would have more information regarding internships. I messaged her, and that eventually led to my first internship offer on August 28.

Soon after, I got the chance to speak with the Head of DBpedia. He showed me how my code had improved the coordinates quality in the fusion, which made me really happy to see. He also introduced me to concepts like knowledge graphs, the semantic web, Linked Open Data (LOD), open knowledge graphs, and the overall structure and mission of DBpedia. At first, I was very confused because these topics were completely new to me. My mentor then shared several research papers like 

  • Wikidata through the Eyes of DBpedia
  • DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia
  • DBpedia – A Crystallization Point for the Web of Data

… which helped me understand the discussions much better.

During the onboarding phase, I had to choose one of three focus areas for my internship:

  1. Improving the Wikipedia/Wikidata extraction and increasing the data quality of the DBpedia core.
  2. Connecting with other Open Knowledge Graph Projects to form a “co-evolution partnership & network,” maintain links and mappings, and build a well-maintained Knowledge Graph Catalog on the DBpedia Databus.
  3. Collaborating with RDF tool providers to increase maturity and interoperability of tools connected to the Knowledge Graph Catalog.

I had a brainstorming session with Dimitris, where we discussed how the first option would be a good fit for me since I already had experience working with the DBpedia Extraction Framework (DIEF) and its codebase. He also recommended books for software engineering after I asked him how I could become a better developer like him.

My official kick-off meeting was on September 17. By that time, I had completed all the research papers and gained enough background knowledge to understand the discussions. In the meeting, I met two new mentors, Johannes and Fabian, who would guide me throughout the internship. Additionally, my GSoC code got merged around the same time, which felt like a perfect start.

We use Canvas for weekly sync meetings, where I receive my tasks for the coming week. Throughout this internship, I’ve learned a lot, especially things I didn’t realize were important earlier. For example:

  • Committing all necessary changes, not just selected files
  • Avoiding committing trailing whitespaces, which kept happening in IntelliJ until Johannes explained the issue
  • Not bundling too many changes into one PR
  • Writing proper, professional commit messages

One thing I particularly struggled with was understanding what a “simple action” meant in the CI context. When I searched for it online, all I found were YAML files, which gave me only an abstract idea of how actions work. I wasn’t sure how to test my script through CI. Later, Johannes walked me through how I could include my script in the existing CI YAML file so that the workflow would run and validate it properly. This helped me understand GitHub Actions much more clearly.

What I Did with DIEF:

  • Ran custom extraction scripts to test different extractors.
  • Generated RDF data
  • Enabled Wikidata extraction in the server GUI.
  • Added a comma-separated extractor list so the extraction runs with only a few selected extractors
  • Checked how data appears in different formats using View Page Source.
  • Changed the UI dropdown to check boxes and so much more

Unexpected heavy rains in Hyderabad caused frequent WiFi and electricity issues, which made some days challenging, but I continued working with the flow. We actively worked on improving the quality of Wikidata extractions and contributing towards cleaner, higher-quality data within the DBpedia ecosystem.

During the internship, I also learned many small-but-important things that I never knew before.

For example, we enabled Wikidata in the language configurations, learned what QIDs are, how namespaces work, and how even a tiny mistake in one extractor can affect the whole extraction process.

I understood how to add scripts in CI, why CI/CD is important, what a .yml file does, and why some tests pass while others fail. I learned about different testing types, why timeout errors happen, and how to increase timeouts in scripts.

I worked with revision IDs, learned what each Wikidata extractor does, and also handled proper date and calendar logic, including BCE padding, Julian and Gregorian calendars, calendar models, XSD formats, and even the canonical date conversion algorithm.

I also fixed issues like precision handling, HTTP header handling, understanding SHA, and resolving code quality warnings.

I learned what quads are, what XSLT is, and how outputs differ in different formats.

I also learned how to sanitize HTML, detect warnings, show them to users, and even how to create proper GitHub issues.

Along the way, I also did my first ever proper code review for another DBpedia contributor and learned how important clean and readable code is.

Another big lesson I learned was how a small change in one language folder can break extraction for the whole system so being careful really matters.

I also explored restoring old code from Git history, making clean commit messages, and the biggest lifelong lesson: remove whitespace and unwanted changes before committing. It saves time for reviewers and keeps the code clean.

I also helped newbies on Slack, guided them with issues, and it honestly felt good to be able to help someone even though I was still learning myself.

Honestly, I learned so many things in these four months. I don’t regret anything, even though it was a little hard with all the rain, my mids and externals happening at the same time, and me wrapping up the internship. It taught me time management in the most unexpected way. And in the middle of all this, my mom randomly brought home a stray cat I had to take care of which actually felt like a sweet ending because my favourite pet passed away before GSoC (as I wrote in my earlier blog).

I really want to thank Dimitris sir and Julia ma’am for giving me this chance. And a big thanks to Sebastian sir, Johannes sir, and Fabian sir for guiding me so patiently. Especially Johannes sir, who explained everything so clearly on Canvas and always made sure I had no doubts. It honestly made me emotional because not everyone takes that kind of effort for a newbie.

At first the pace was overwhelming, and even German-English felt very abstract compared to Indian English, but slowly I got used to it. When I look back, I realise GSoC felt more like solving code issues, while this internship felt like real-world, business-level work. And now when I compare my old GSoC code… I feel so embarrassed about all the whitespace and extra changes I used to push 😭. But it just shows how much I improved.

So with this, my semester ends, my internship ends, and the year ends too.

I genuinely thank DBpedia for this experience. It became a very important part of my journey. ✨