The research group is involved in the educational activities of the Faculty of Engineering and Architecture of Ghent University within the following programs:

and the Faculty of Sciences of Ghent University within the following programmes:

Within these programmes, the group is responsible for the courses on databases and information management, as there are:

Beside this, the group also takes responsibility in the supervision of apprenticeships, master theses and PhD theses. Because all our educational activities closely match our research activities, an optimal, strengthening and beneficial interaction between both has been established.


Master thesis

An overview of possible research domains and topics for master students: 

 Avoiding False Statistics

Using possibility theory to improve big data model accuracy.

keywords: possibility theory, 3D modeling, big data

The hallmark of a good model is that it is an accurate reflection of all available information. If information is scarce or of low quality, the last thing a good model should do is lie about it. In practice, statistics derived from probability theory are often misused. Possibility theory is an excellent alternative which has promising properties when it comes to dealing with imperfect information.

At DDCM, you will be free to use live data from a real dataset to help improve a 3D model of the subsurface of the North Sea. Some of the challenges you will be facing:

  • dealing with the big amount of data
  • dealing with the poor quality of the data
  • analyzing the correctness of the model
  • investigating possibilistic predictors

Contact: Robin De Mol

 Quantifying Data Quality

Towards a better understanding of measuring data quality.

keywordsdata quality, implementation, big data

Data quality has many different dimensions: timeliness, completeness, consistency, accuracy, interpretability, ... Because of this, it can not be measured directly. The current state of the art techniques suggest expressing data quality with a number in the unit interval. However, as illustrated on the front, this representation is not meaningful.

Regarding data quality, DDCM offers you the chance to:

  • work with a big data set full of data quality issues
  • implement a framework to perform experiment
  • investigate absolute scales for semantically rich quantification
  • separate objective, measurable properties from subjective preferences
  • develop professional writing skills towards publicising your results
  • work on "healing algorithms" to improve data quality
  • resolving inconsistencies between large amounts of data structures

Contact: Robin De Mol

Offshore Windmill Farms

Finding the optimal location for offshore windmill farms in the face of incomplete information.

keywords: decision support, big data, imperfect data

Finding the optimal location for an offshore windmill farm is not trivial. Current models of the North Sea are mostly interpolated from a very small amount of actual data points, some of which are over 50 years old. Because performing measurements is costly regarding both time and money, a rigorous mathematical solution is needed.

A master thesis on this subject will let you work on a real, live, large and imperfect data set. You will have the ability to craft your research around any of the many interesting challenges ranging from the quality of the data to the importance of criteria towards suggesting opportune locations for constructing offshore windmill farms.

In addition, you will:

  • get in touch with our business partners
  • develop professional writing skills towards publications
  • work on a practical problem towards a real solution

Contact: Robin De Mol

 Big Bad Data Mining

New methods for discovering knowledge from big data sets that are of poor quality.

keywords: data quality, big data, data analytics

Traditional data mining techniques are lagging behind when it comes to the ever increasing volume of data that are being generated each day. Especially if those data are of poor quality. Ranging from machinery which generates terabytes of data from thousands of sensors each second on one hand to world-wide social media analyses regarding marketing studies on the other hand, the fields of application are endless. There is an immediate need for fast, accurate and fault-tolerant algorithms to discover knowledge from large data sets.

A master thesis on this subject allows you to choose between many different techniques to focus on.

Contact: Robin De Mol

 Influencer marketing

Evaluation of influencer marketing campaigns.

keywords: social media, influencer marketing, marketing campaigns, big data

Influencer marketing is a rather novel technique used by marketers. Those marketing campaigns orient around “influencers”, people with a huge social network who can distribute a campaign or even directly promote a product among their followers. But how does one evaluate the success of such a campaign?  Within the context of a master thesis a piece of this complex puzzle could be solved.

Contact: Antoon Bronselaer

 Quality of news

Automatic detection of fake news.

keywords: automatic detection, text analytics, data quality, fake news reports, media

Currently fake news messages circulate quite often on the (social) media. Not seldom those messages create a buzz in society. Some major companies have taken action to prevent those messages to spread. However, those measurements are often based on users manually reporting the fake messages. A technique that can automatically estimate the quality of news articles is required to put a stop to those hoaxes. A first attempt for such a technique could be developed within the context of a master thesis.

Contact: Antoon Bronselaer

 Big versioned data

Tackling one of the most rewarding challenges in big data: Time!

keywords: big data, time, versions 

Versioning is omnipresent in big data. Not only are massive amounts of data versioned, versioning itself is often the source of the ‘big’ in ‘big data’. Enhancing versioning in big data may thus enhance big data management itself!

At DDCM, You’ll try to tackle any facet of this problem, like:

  • efficient versioning of big data
  • efficient querying of big versioned data
  • correctness of data versions

Contact: Christophe Billiet

 NOSQL Performance enhancement

NoSQL may already be fast, but maybe it could be faster?

keywords: query speed, indices, NoSQL 

It is well-known that the speed of query processing in relational database systems can be severely enhanced using indices. Can indices enhance querying speed in NoSQL solutions? Which types of indices will work best?

Come find out the answers to these and other questions at DDCM!

Contact: Christophe Billiet


Unlocking the defining dependencies in (big) data.

keywords: big data, dependencies 

Data depends on other data. Data dependencies of all sorts may hugely impact the efficiency of data manipulation tasks and data querying. This problem obviously explodes when dealing with large volumes of data. Hence, finding out all about Your (big) data’s dependencies is important.

At DDCM, You’ll try to tackle any facet of this problem, like:

  • automating normalisation
  • finding the closure of a set of functional dependencies
  • constructing example sets of data for given dependencies

Contact: Christophe Billiet


You have the sports data. How do You make the decisions?

keywords: decision support, data analytics, sports, football 

More and more sports teams collect data about their team’s actions and performances. The next big step is to automatically analyse these data in order to support the coach’s tactical decisions.

At DDCM, You’ll get the chance to work with actual data from football teams like Club Brugge, volleyball teams, … etc. The biggest challenges are:

  • how to efficiently analyse these data
  • how to effectively analyse these data
  • how to present such data in a useful way

Contact: Christophe Billiet

 e-Learning Platform

Handle students' solutions and provide feedback to teacher and student.

keywords: implementation, query language, SQL, Xpath, web 

DDCM has an online e-learning platform in development that allows students to exercise various query languages. This encompasses the query languages SQL, which queries databases, and XPath, which queries XML documents. The platform is used by a number of partners and is on the brink of expanding its user base considerably. A lot of challenges remain to be solved within the project, with freedom as to how to tackle the problems at hand:

  • Use insight in the queries themselves to provide better feedback. This requires parsing the queries and using deep knowledge of the query language.
  • Analyse students’ answers to provide expansive feedback to the teachers. Submitted data grows quickly and provides a considerable challenge w.r.t. performance.
  • Introduce a gamification element into the system, comparing students’ performances and challenging them to study differently.
  • Perform a smart analysis of the student answer and the model answer in order to provide to-the-point feedback to the student. A rudimentary approach is used for SQL, which can be expanded, but remains to be researched for XPath.

Contact: Joachim Nielandt

 Online Data Extraction

Perform data extraction from web pages and build experience with web crawlers.

keywords: web, HTML, crawler, spider, big data, XPath 

Study data extraction on a large scale. This is usually done by using HTML sources, which provide several complexities. They are mostly malformed, requiring careful handling. Other sources are possible as well, comprising databases (usually not accessible), PDF files (very interesting, yet very difficult to process), .docx (prepare to handle various versions of the proprietary format) … Besides extracting simple pieces of information, such as e-mail addresses, interpretation of blocks of text also provide a considerable challenge. This is typically necessary when the documents do not provide enough structure, prompting us to provide that structure.

Study the following problems at DDCM within the scope of online data extraction:

  • Store and deal with changes to harvested data
  • Extract data in a smart way
  • Deal with a variety of storage formats (.docx, .pdf, HTML ...)
  • Handle interpretation of the data
  • Integrate data from different sources in a smart way
  • Expose your data to be used by other parties

Contact: Joachim Nielandt