Student Projects (MSc/BSc Theses), Academic Year 2022-2023

Below is a list of project topics for Masters and Bachelors theses offered by the Software Engineering & Information Systems Research Group for students who intend to defend in June 2023. The projects are divided into:

Software Engineering Master's theses topics (30 ECTS) offered by:
- Marlon Dumas, Professor of Information Systems
- Dietmar Pfahl, Professor of Software Engineering
- Kristiina Rahkema, Junior Research Fellow of Software Engineering
- Alexander Nolte, Associate Professor of Information Systems
- Fredrik Milani, Associate Professor of Information Systems (0.5)
- Ezequiel Scott, Lecturer (Assistant Professor) of Software Engineering (0.25)
- Hina Anwar, Lecturer (Assistant Professor) of Software Engineering
- Anastasija Nikiforova, Lecturer (Assistant Professor) of Information Systems
- Faiz Ali Shah, Lecturer (Assistant Professor) of Software Engineering
- David Chapela de la Campa, Research Fellow in Information Systems
- Orlenys López-Pintado, Research Fellow in Information Systems
- Vimal Kumar Dwivedi, Junior Lecturer of Software Engineering
- Baseer Ahmad Baheer, Junior Lecturer of Software Engineering
- Alejandra Duque-Torres, Junior Research Fellow of Software Engineering
- Ilia Bider, Adjunct Professor
- Fabrizio Maggi, Adjunct Associate Professor
- Mohamad Gharib, Lecturer (Assistant Professor) of Information Systems
IT Conversion Master's theses topics (15 ECTS)
- Dietmar Pfahl, Professor of Software Engineering
- Alexander Nolte, Associate Professor of Information Systems
- Anastasija Nikiforova, Lecturer (Assistant Professor) of Information Systems
- Ilia Bider, Adjunct Professor
Computer Science Bachelor's theses topics (9 ECTS)
- Dietmar Pfahl, Professor of Software Engineering
- Alejandra Duque-Torres, Junior Research Fellow of Software Engineering
- Anastasija Nikiforova, Lecturer (Assistant Professor) of Information Systems
- Ilia Bider, Adjunct Professor
- Faiz Ali Shah, Lecturer (Assistant Professor) of Software Engineering

If you're interested in any of the projects listed below, please contact the corresponding supervisor.

NB: If you want to look for thesis topics offered by other groups within the Chair of Software Engineering and Information Systems, please consult their respective group pages. You find the links to the individual research groups here: https://cs.ut.ee/en/content/research
(This web-page even includes links to research groups in other Chairs of the Institute of Computer Science.)

SE Master Thesis Projects (30 ECTS)

To trust or not to trust? Uncertainty-aware prescriptive monitoring of business processes (BOOKED)

Supervisor: Marlon Dumas (firstname [dot] lastname [ät] ut [dot] ee) and Mahmoud Shoush

Prescriptive Process monitoring is a family of methods that uses historical data about business process executions, to learn if, when, and how to trigger runtime actions (e.g. giving a discount to a customer) in order to prevent negative case outcomes (e.g. preventing customer complaints or churn).

A common technique to tackle this question is to train machine learning models to estimate the probability that a given business process instance (a case) will finish in a negative outcome [1]. When the predicted probability of a negative case outcome is above a given threshold, the action (also called the "intervention") is performed.

Oftentimes though, the predictions that machine learning classifiers product have a high level of uncertainty. If the prediction uncertainty is too high, there is little point in triggering interventions based on this prediction.

In this Masters thesis project, you will address this problem by predicting how an ongoing case will unfold and by estimating the level of uncertainty of this prediction. You will use real-life data and you will learn about a range of techniques for predictive modeling and uncertainty estimation. You will conduct benchmarks to compare different techniques for predictive modeling under uncertainty.

[1] Stephan A. Fahrenkrog-Petersen, Niek Tax, Irene Teinemaa, Marlon Dumas, Massimiliano de Leoni, Fabrizio Maria Maggi, Matthias Weidlich. Fire now, fire later: alarm-based systems for prescriptive process monitoring. Knowledge and Information Systems 64(2):559-587, 2022.

Case Study in Software Testing or Software Analytics (focus on software quality)

Supervisor: Dietmar Pfahl (dietmar dot pfahl ät ut dot ee)

This is a "placeholder" Masters project topic, which needs to be negotiated individually. If you work in a IT company and you are actively engaged in a software testing or software analytics, or if you can convince your hierarchy to put in time and resources into such a project in the near-term, we can make a case study out of it. We will sit down and formulate concrete hypotheses or questions that you investigate as part of this project, and we will compare your approach and results against state-of-the-art practices. I am particularly interested in supervising theses topics related to mutation testing, testing of embedded software, testing safety-critical systems, security testing of mobile apps, anlysis of project repositories to make software development processes more efficient and effective, but I welcome other topic areas.

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst Important elements of the thesis are literature study, measurement and interviews with experts in the target company.

A mandatory pre-requisite for me to act as supervisor is that there exists a co-supervisor within the case company who is willing to help with the exact scoping of the thesis project and confirms that the topic is in the interest of the case company to an extend that the student can work on the thesis project (at least partly) during work time.

Analysis of iOS Jailbreaks (reserved)

Supervisor: Kristiina Rahkema (kristiina dot rahkema ät ut dot ee)

The iOS operating system is more restrictive than Android. Applications on iOS run in sandboxes that on the one hand protects its users against malicious apps but on the other hand also greatly restricts the capabilities of these applications. It is for example not possible to deploy system wide services on an iPhone. To overcome these restrictions developers, security researchers and hackers have developed Jailbreaks for the iPhone that make it possible to root the device. Jailbreaks are developed by taking advantage of multiple vulnerabilities present in iOS and are often usable until Apple fixes these vulnerabilities. Some research papers, for example [1], document the exploits used in Jailbreaking, there is however no comprehensive overview of the different approaches how the different jailbreaks have been achieved. The aim of this theses is to firstly compile a list of different jailbreaks, investigate how these jailbreaks have been archived and to describe these different approaches.

[1] Liu, Feng, et al. "Research on the technology of iOS jailbreak." 2016 Sixth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC). IEEE, 2016.

Dependency Analysis of Closed Source iOS Apps

Supervisor: Kristiina Rahkema (kristiina dot rahkema ät ut dot ee)

When studying dependency networks it is quite straight forward do detect used third party libraries from open source projects. Detecting used third party libraries from closed sourced applications is more challenging, Zhan et al. studied the current status for third party library detection for Android applications [1]. We however do not know which tools exist (if they exist) for iOS. The aim of this thesis is to investigate if and which tools exist for third party library detection for closed sourced iOS applications. Depending on the possibilities found, one aim could be to develop a tool that detects third party library uses in closed sourced iOS applications. Depending on limitation the tool might only work in specific circumstances.

[1] Zhan, Xian, et al. "Automated third-party library detection for android applications: Are we there yet?." 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2020.

How do developers update dependencies in iOS libraries? (reserved)

Supervisor: Kristiina Rahkema (kristiina dot rahkema ät ut dot ee)

Kula et al. [1] analysed java developers update their dependencies by analysing 4600 projects on GitHub. They found that most developers do not update their dependencies and new versions of libraries are mostly used by new uses of the library. We created a dataset that contains the dependency network for open source libraries used through CocoaPods, Carthage and Swift PM [2]. The aim of this thesis would be to conduct the library update analysis on this new dataset.

[1] Kula, Raula Gaikovina, et al. "Do developers update their library dependencies?." Empirical Software Engineering 23.1 (2018): 384-417. [2] Dietmar Pfahl Kristiina Rahkema. 2022. Dependency Networks of Open Source Libraries Available Through CocoaPods, Carthage and Swift PM. https://doi.org/10.5281/zenodo.6376009

How well could have existing vulnerability detection tools prevented publicly reported vulnerabilities? (reserved)

Supervisor: Kristiina Rahkema (kristiina dot rahkema ät ut dot ee)

NVD (National vulnerability database) contains publicly reported vulnerabilities for many projects. Sometimes a vulnerability can remain in the codebase for a long time before it is detected. The aim of this thesis, for a selected list of open source libraries) is to determine if the publicly reported vulnerabilities for this library could have been prevented by using a vulnerability detection tool. This could be done with the following steps: 1) find vulnerability detection tools 2) apply these tools for the vulnerable code to determine if an existing vulnerability detection tool could have found the vulnerability 3) apply the same tools for the version of the library that did not yet have the vulnerability and for the version of the library where the vulnerability was fixed to determine wether the tool could correctly detect that the vulnerability was fixed 4) report on the results and determine how many of the openly reported vulnerabilities could have been prevented by using a specific tool.

Developing an Online Platform to Support Startup Creation at Entrepreneurial Hackathons (booked)

Supervisors: Maria Angelica Medina Angarita (maria [dot] medina [ät] ut [dot] ee) and Alexander Nolte (alexander [dot] nolte [ät] ut [dot] ee)

Entrepreneurial hackathons are events that facilitate the fast development of technical projects in teams. Many participants go to a hackathon to create a startup, however, most of them stop working on their projects soon after the hackathon ends. Our findings so far indicate [1,2] that goal awareness before the hackathon could boost startup formation. Giving feedback, forming teams with diverse skills, and inviting participants with entrepreneurial intentions can also influence startup formation.

The aim of the thesis is to support goal awareness, participant selection, team formation, and feedback before the hackathon using a registration form. The registration form would be embedded into a website for hackathon organizers to prepare their hackathons. Organizers would create a new registration form (based on a template) and retrieve information from the participants’ responses, such as their roles, the roles missing in their teams, their startup ideas, and if they need feedback.

The main technical requirement is to create a webservice and frontend that allows hackathon organizers to create a registration form, analyze its results and provide tailored suggestions. The thesis is part of a socio-technical framework that supports startup creation at hackathons. The framework consists of a set of guidelines that are available on an existing website as well as the registration form as the main technical tool. The framework would be applied in a hackathon in the fall semester of 2023/2024. However, the development and setup of the form would be pre-tested around June 2023.

[1] Medina Angarita, M. A., & Nolte, A. (2019). Does it matter why we hack? – Exploring the impact of goal alignment in hackathons. 17th European Conference on Computer-Supported Cooperative Work, 16.

[2] Medina Angarita, M. A., & Nolte, A. (2021). Supporting Entrepreneurship with Hackathons—A Study on Startup Founders Attending Hackathons. In X. Wang, A. Martini, A. Nguyen-Duc, & V. Stray (Eds.), Software Business—12th International Conference, ICSOB 2021, Drammen, Norway, December 2-3, 2021, Proceedings (Vol. 434, pp. 107–121). Springer.

What is a hackathon? - A study through the eyes of event organizers

Supervisor: Alexander Nolte (alexander [dot] nolte [ät] ut [dot] ee)

Hackathons started out as time-bounded competitive events during which young developers formed ad-hoc teams and worked on software projects for pizza and the potential prospect of a future job. Since those humble beginnings hackathons have become a global phenomenon with thousands of individuals participating in hundreds of events every weekend. The wide spread of hackathons has consequently led to the original format evolving in different directions to suit specific needs or foster specific goals within a given context.

Most research on hackathons so far has focused on the perception of participants. Moreover most studies report findings from only a few events that took place in a specific context. This situation makes it difficult to distinguish which findings can be attributed to the specific format or domain and which findings are common between different events.

To address this gap you will conduct interviews with organizers of events from different domains. The aim is to distill the essence of what a hackathon is and what makes it different from other collaboration formats.

Emerging Tech & Financial Industries

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

New technologies provide value when used to improve processes or products. However, how new technologies can innovate, enhance, or significantly improve existing processes and products is not always clear. This thesis topic explores one emerging technology to understand better how it can deliver value for the financial sector. The work required for this thesis predominantly includes (1) research on the technology (what it is, how it works, its capabilities, use cases, etc.) and (2) conducting 8-12 interviews with people within the financial sector to learn about potential use cases within the financial sector. Finally, analyze and overlay the results with a framework. IoT, Web3, Quantum Computing, Digital Twins, and Metaverse have been covered. If you have another emerging tech, we can discuss.

Benchmark Study of Log Argumentation Tools

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Due to different reasons such as privacy or lack of data, it is possible to use existing data to create more data. In this data, synthetic data is generated to create more data. In other words, existing data is augmented. This thesis topic is about selecting several existing tools, benchmark them on different data sets, including event logs, and assess their accuracy. The contribution is some form of framework that aids in selecting the right log augmentation tool. This thesis is in collaboration with a bank.

Securing Data Quality in Software Development Processes – A Case Study

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Quality in software development processes help reduce waste and costs. Furthermore, it can help speed up delivery without compromising quality. However, it is not clear how quality can be measured in a software development process. It is also not clear where in the process one should use what kind of metrics to assess quality. This topic is about exploring this topic to provide a set of suggestions on how the software development process of a bank can be assessed and monitored. This thesis is in collaboration with a bank.

UX-Driven Privacy for Financial Product

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Financial product offered by banks are heavily regulated. Privacy requirements can often prolong the onboarding process and reduce customer experience. This thesis topic is about exploring how privacy can be embedded in the UX. To this end, we begin with a review of privacy by design approaches and examples, take an existing financial product that is heavily regulated and analyze its onboarding, propose a new onboarding process that is easier, more customer friendly, and compliant, and evaluate the new design. This thesis is in collaboration with a bank.

Data-Driven Process Analysis – A Case Study

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Most information systems can log the transactions. These logs can be used to conduct process analysis using process mining tools. Although there are some automated tools, most analysis is still conducted manually. As such, this topic is about taking an event log for a financial business process and analyze it using Apromore (a process mining tool). The objective is to provide descriptive analysis of the business process, highlight strengths and weaknesses in the business process, and propose a set of changes that, if implemented, can improve process performance. This thesis is in collaboration with a bank.

Dashboards for Process Analysis in Apromore

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Process analysts often use process mining tools, like Apromore, to identify how they can improve the business process. This is challenging task as there is so much data and different data and filters one can use. To this end, we have developed 21 templates for identifying improvement opportunities using Apromore process mining tool. These templates provide step by step instructions on what to do in Apromore to identify improvement opportunities. However, Apromore has a dashboard that can be configured. This thesis topis is about configurating dashboards in Apromore for the analysis templates to help analysts. This topic does not require coding but to configure dashboards and evaluating them.

Defining a Data Mesh Architecture for Financial Industry – A Case Study

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Companies with legacy systems struggle with making necessary data available for advanced data analytics. While cloud is a strategy being pursued, there are some options, at least as a step. One option is to introduce the concept of data mesh architecture where the focus is on making needed data accessible. Such a concept strives to define domains that define what data is accessed. Defining the domain is not straightforward. Domains, or data hubs, can be defined based on product area, services, processes, or by other principles. The objective of this thesis is to investigate how such domains (data hubs) should be defined for a financial institution so to enable data access for business development. This thesis is in collaboration with a research group in Portugal and a bank.

Digital Twin for Financial Services – A Case Study

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Digital twins are used for optimizing manufacturing processes. In such use cases, IoT devices are used to track, predict, prescribe, and optimize manufacturing processes. However, digital twins are not as clearly understood for non-manufacturing processes such as services of financial service products. The aim of this thesis is to outline a conceptual framework for digital twins using financial service processes as an example. To this end, this thesis requires conducting literature review and conducting interviews. This thesis is in collaboration with a bank.

Technology-Driven Redesign of Business Processes – SLR Study

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee) and Kateryna Kubrak

Substituting a technology with a newer one does not necessarily yield significant benefits. It is when technology is used to redesign processes that new value can be added. However, it is not easy to know what technology can enable what kind of redesigns. Previous work has examined capabilities of technologies and mapped them to process redesign patterns. However, they examined case studies in academic publications. In this thesis, we explore grey literature, i.e., non-academic literature to elicit a framework for how digital technologies can help deliver value through process redesign.

Discovery of Potential Interventions for Prescriptive Process Monitoring

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee) and Mahmoud Shoush

When process workers are dealing with an ongoing case, they benefit from receiving recommendations on how they should process the case. This is called prescriptive process monitoring. Using different techniques, we can tap into previous executed cases, identify what made them conclude successfully given a specific metric, and recommend (prescribe) an action (intervention) to the process worker when dealing with an ongoing case. However, most existing techniques focus on a specific prescription. Currently, there is no solution for discovering potential recommendations from an event log. This thesis topic is about first developing a conceptual framework for discovering candidate prescriptions and, then, developing a solution that can detect, assess, and produce a list of candidate interventions.

Analysis Templates for Process Mining using Apromore

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee) and Katsiaryna Lashkevich

Process analysts often use process mining tools, like Apromore, to identify how they can improve the business process. This is challenging task as there is so much data and different data and filters one can use. Initially, we developed 21 templates for identifying improvement opportunities using Apromore process mining tool. These templates provide step by step instructions on what to do in Apromore to identify improvement opportunities. However, there are other aspects that can be analyzed using process mining tools. The topic of this thesis is to develop analysis templates for additional use cases such as variant analysis, conformance checking etc.

Self-Driving Process Automation based on Prescriptive Monitoring

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee) and Marlon Dumas

Prescriptive monitoring helps optimize execution of a process case by providing recommendations to process workers. These recommendations can be, for instance, for what activity to execute next. Using different techniques, we use historical data and recommendations. However, optimizing one ongoing case can negatively impact another ongoing case. Thus, it is important to ensure that a recommendation improves the overall process performance. In this thesis, the objective is to use an event log to measure the performance of the process, detect possible recommendations, and assess if the recommendation will improve the overall process performance. To this end, we capitalize on existing methods for performance assessment, prescriptive monitoring, prediction, and simulation methods.

Data-Driven Process Analysis using Apromore– A Case Study

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee)

Most information systems can log the transactions. These logs can be used to conduct process analysis using process mining tools. Although there are some automated tools, most analysis is still conducted manually. As such, this topic is about taking an event log and analyze it using Apromore (a process mining tool). The objective is to provide descriptive analysis of the business process, highlight strengths and weaknesses in the business process, and propose a set of changes that, if implemented, can improve process performance. Access to an event log is a prerequisite for this topic. This topic is for those who have a log to work with.

How to Explain Outputs of Prescriptive Process Monitoring to End Users

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee) and Kateryna Kubrak

Prescriptive monitoring help optimize execution of a process case by providing suggestions to process workers. These suggestions can be, for instance, for what activity to execute next. Such suggestions can also be shared with end users. However, little work has been done on how to communicate the suggestions to end users so they understand and can act on it. In this thesis, the aim is to explore how suggestions in other fields are communicated, contextualize that to prescriptive monitoring, and design a set of examples that are evaluated with real users.

Gamification for Teaching Courses – Case Study of SPM Course

Supervisor: Fredrik Milani (fredrik [dot] milani [ät] ut [dot] ee) and Kateryna Kubrak

Prescriptive monitoring help optimize execution of a process case by providing suggestions to process workers. These suggestions can be, for instance, for what Gamification is a phenomenon that has traction in recent years. The idea is to use game-inspired elements to increase engagement. Within education, gamification has also become more prevalent. At the same time, tools for gamification have become more available. However, it is not clear how gamification can or should be used to enhance learning experience. In this thesis, we explore the theory of gamification for teaching Master-level courses. Then, we review the gamification strategy of one course, propose how it can be improved, and evaluate it. The course in question is Software Product Management course. It is highly recommended that you have taken this course.

Topic/Title ??

Supervisor: Ezequiel Scott (xxx [dot] xxx [ät] ut [dot] ee)

Description

Topic/Title ??

Supervisor: Hina Anwar (xxx [dot] xxx [ät] ut [dot] ee)

Description

A recommender system for an improved data findability in open government data portals

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

The research suggests that it is difficult for users of open government data (OGD) portals to find the datasets they are interested in and even more difficult to find datasets with which the selected one could be used, either by complementing / enriching it, or substituting it as the similar / alternative one. A recommender system might be a solution. However, the recommender system for OGD portals is slightly different from what we might expect in the case of other portals where user data and related preferences can be used, since OGD portals, by definition and the general idea of open data, do not require the user to be authenticated (although data portal owner may have at least log of visited sites with the users’ IP, which are sometimes even published as a separate open dataset, which can be useful here). This makes it a bit trickier to propose a very efficient recommender system, including suggestions like “other users also find interesting ...”. Thus, the thesis would review existing techniques of recommender systems (content-based, collaborative-based etc.), by selecting those that can be applied to OGD portals, examining OGD portals and identifying features that can be used as input to generate a recommendation (both external, such as the title, description (! please, take into account that you will be asked to carry out at least a simplified text analytics) and tags, and preferably internal, such as names of parameters (if are sufficiently expressive)). Then respective recommender system is expected to be developed and preferably tested on real users for the level of their satisfaction with the results provided. This would contribute to the FAIRness of the open data in order to provide social, economic and technological benefits from individual users, SMEs and governments.

The role of Social media-based crisis management

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Technology acceptance-driven analysis of ChatGTP acceptance in academia - 2 perspectives (students vs teaching staff)

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Automated classification of open datasets to improve data findability on open government data portals

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

While many open government data (OGD) portals provide a large number of open datasets that are free to use and transform into value, not all of these data are actually used. In some cases, this is because these data are difficult to find due to the low level of detail presented in them, including, but not limited to the absence or inaccuracy of the category(-ies) and tags assigned to a particular dataset, which is a part of the data publisher task. In the case of some OGD portals, 1/3 of the datasets are not categorized, although the portal provides a rich list of data categories that are in line with best practices and allow to classify these datasets. This leads to cases where the dataset cannot be found if the user searches for data using catalog or tags (only using the search bar will return the dataset, if the search query matches the title or description of the provided dataset). This thesis is intended to propose an automated data classification mechanism, which, based on a dataset and the data provided on it (title, description of the dataset (! please, take into account that you will be asked to carry out at least a simplified text analytics), parameters of the dataset (if sufficiently expressive)), will suggest a categories and tags to be assigned to it.

First, the author will be asked to examine the state-of-the-art on the topic, to explore OGD portals and how datasets look like, and what can be scenarios for OGD user to search for a particular dataset. Then, a list of indicators will be defined, which should constitute the input for data classification (mostly in line with the above but can be enriched, if possible), and an appropriate solution will be developed. Finally, testing of the output should be conducted with users, thereby evaluating the consistency of the result, preferably comparing the level of users’ satisfaction with the current one. This would contribute to the FAIRness of the open data, although mainly referring to F – findability, but indirectly affecting other features that the OGD should meet in order to provide social, economic and technological benefits from individual users, SMEs and governments.

Aligning Categories from OGD Portals to a Comprehensive Set of Categories

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

This thesis, although having a similar idea in mind as the above, supposes revisiting - replicating and reproducing an existing study (Pinto et al., 2022) (re-use of artifacts, including the code is possible), addressing and resolving limitation of that study, as well as making it country-agnostic. The list of limitations and suggestions on how to make it country-agnostic will be provided. In the case it will be needed, the supervisor can help in reaching out the authors (for clarification, additional code pieces etc.)

Pinto, H., Barcellos, R., Bernardini, F., & Viterbo, J. (2022). A Methodology for Aligning Categories from Open Government Data Portals to a Comprehensive Set of Categories. In International Conference on Electronic Government (pp. 258-273). Springer, Cham.

Chatbot for open government data portals: towards making open data user-friendly to users regardless their level of (open) data literacy

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Open government data (OGD) are expected to be used by everyone, regardless of the domain to which they belong, education, specialism, gender, level of income, etc. However, current research suggests that the availability of these data on the respective portal is insufficient and many users find it difficult to use them, referring to both their findability, accessibility, understandability etc. This is also in line with recent reports of digital literacy level, which in many countries is very low. This requires providing additional support to users to improve their experience with OGD and OGD portals. Thus, the thesis would refer to the existing literature, examine the portal and, preferably, conduct an experiment aimed at identifying challenges associated with the use of OGD (portals). This would constitute the basis for the chatbot. Then, the respective chatbot is expected to be developed and tested with users. Considering the limited time to develop a chatbot and the significance of the underlying database, it is expected that it will be possible to report issues not covered / answered by the chatbot, thereby collecting an additional set of queries for further consideration.

Proactive and Personalised Public Services: towards Meaningful Human Control in Algorithmic Government

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

The future is likely to see an increase in the use of automated decision-making systems in the public sector, which employ Artificial Intelligence techniques, to enable more proactive and personalised delivery of public services. Proactive delivery can reduce the administrative burdens on citizens and civil servants. While the benefits of proactive public services have been highlighted by a small but growing body of research literature on this topic, the implementation of such services is data intensive and can harm citizens beyond privacy concerns. Proactive service delivery requires high degrees of automated data processing using various data sources and algorithms that reduce the level of human control that both citizens and civil servants have in verifying service delivery and correcting system errors. The purpose of the thesis is to explore the state-of-the-art about proactive and personalised public services by conducting a Systematic Literature Review, followed by the development of the initial version of the framework, which is then expected to be refined by conducting case-studies.

Data Quality or (Big) Data Analytics or DataOps

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

In case you are interested in ((Big) or (Linked)) data quality or (Big) data analytics, it is possible to discuss your own topic or to collaboratively select the topic based on your knowledge, experience and interest. Topics of both conceptual and purely practical (case studies with pre-defined well-grounded methodology) nature can be proposed and will be considered.

Topics covering or related to sensor-generated data are also welcome.

Machine Learning (ML) based data quality-related topic is well appreciated.

For DataOps - it is considered as an emerging topic for improving quality and efficiency of data analysis thereby allowing to derive greater value from the available data (internal, i.e., of the organization, external or combining both, i.e., combining the data available within the organization with publicly available data or replying purely on the latter). Considering the specifics of the topic and its emerging popularity with a rather limited body of knowledge and lack of standards, the thesis expect, first, to conduct a SLR on the topic, forming a knowledge base, which would then allow the student to choose a topic of interest for the practical part of the study, depending on the interests and prior experience.

Towards automating data quality specification by extracting data quality requirements from data features

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

In order to preserve the value of the data in their subsequent use, the prerequisite of data quality must be met. To be able to verify the quality of data, especially third-party data (data produced / collected by a source that is different from the data user), the quality of data should be verified, which is time- and efforts- consuming task. Moreover, it requires skills and knowledge to carry out even relatively simple data quality checks, that the data user may not have. This could be improved by allowing the user with to at least partly check the quality of the data by automatically determining the appropriate data quality requirements (rules) depending on the nature of data (data values, although parameter names could also be used, if consistent with best practices). Thus, the thesis would review the literature on data quality and the most popular data quality issues typically met in data. This list, supplemented by self-defined quality requirements depending on the nature of the data, will serve as an input to a tool (preferably a web-based, but not mandatory), which would allow a user with no or limited data quality knowledge to verify the quality of a dataset with no (or limited) involvement in defining the data quality requirements for the dataset. It would be beneficial, if the author would be able to apply ML knowledge (to continuously enriching the database of the requirements improve their assignment to the data).

Integrating artificial intelligence (AI) technologies into customer service

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Today, organizations face a mass of feedback originating from multiple channels, including phone calls, emails, and social media. It is clear that the unstructured textual feedback given by customers online and on social media in particular has tremendous potential and business value. Despite this potential, harnessing or capturing the value from data remains a persistent challenge in the field of Big Data analysis. Partially, the problems are technical: methods for large-scale detection and analysis of feedback at scale have only recently achieved a level of adequacy. Other parts of the problem relate to forming actionable insights from these data and integrating data analysis outputs into corporate decision-making processes concerning strategy, NPD, and other use cases that require customer insights. At least the following operational challenges in processing UGC-based customer feedback remain:(1) detection of legitimate/useful feedback from noise/promotion/spam; (2) aspect-based sentiment analysis: e.g., first carry out unsupervised topic modeling to detect thematic issues, and then use sentiment analysis to “rank” how severe the mentioned issues are; (3) rank customer feedback by urgency/priority. This uses human annotations to inform ML algorithms on the importance of a given customer comment. The above challenges could be merged to form an ideal pipeline as follows: (1) the ML model detects if a tweet is constructive customer feedback or not; (2) for those that are, another ML model prioritizes them based on urgency/priority. This approach relies on estimating customer request constructiveness. Such a focus is instrumental for identifying “signal from noise”, which is a typical challenge in large-scale data mining problems. Accordingly, communication with customers in both cases can be adjusted based on the request constructiveness (e.g., in determining urgency and priority of requests processing, in the formation of patterns for building a dialogue strategy with identified categories of customers) and further automated using chatbots, this way alleviating the work of the helpdesk. To this end, randomized stratified sampling on a large customer feedback dataset that contain mentions of 10 large global brands is expected to be applied. The dataset is being cleaned and pre-processed to ensure its validity for testing purposes. Then we proceed with data coding, which involves extracting the following knowledge types from the data: (1) complaint (by means of negative sentiment extended by the contextual topics); (2) suggestion (using specific keywords semantically indicating a suggestion/solution to the problem, e.g., “need”, “propose”, “suggest”, modal verbs “could”, “might”); (3) situational context (utilizing the tweet length, its informativeness, unique nouns and verbs implying the most semantic load of the sentence); (4) positive experience (employing positive sentiment extended by the contextual topics); (5) contextless (content we cannot use as making no sense and added value). Depending on the presence or absence of certain knowledge types, the constructiveness level of the tweet will be determined. For instance, a tweet containing a complaint, a suggestion, and background information will be classified as “highly constructive”, and a tweet containing only a complaint – “not constructive”. A semi-automated approach (already existing) is expected to be used for the technical implementation of the knowledge extraction. An initial plan for the data analysis: (1) the Structural Topic Model (STM) is applied to the data sample to (i) identify the topics (aspects) and (ii) build the aspects’ construct (taxonomy); (2) the topics’ descriptive keywords are used to build the vocabularies (linguistic markers) for each of the knowledge types and “contextualize” our method; (3) identified knowledge types are used as linguistic features for further experiments with ML models. This topic although allows an independent research, initially supposes the work in the team with several researchers and most probably at least one organization, which supposes a guidance and assistance at our end.

Automatic Extraction of App Features from User Reviews using Large Language Models

Supervisor: Faiz Ali Shah (Faiz [dot] Ali [dot] Shah [at] ut [dot] ee)

This project aims to evaluate the potential of large language models (LLMs) for automatically extracting app features from software product reviews. The use of LLMs for information extraction has increased rapidly in recent years, and their application to the software engineering domain has shown great promise. However, there is still a need to assess their effectiveness for extracting app features from software product reviews. The findings of this research project will provide insights into the potential of LLMs for app feature extraction and identify areas for improvement. The research project will also contribute to the development of more effective methods for automatically extracting app features from software product reviews, which can improve software development practices and enhance the user experience.

References

[1] Dąbrowski, J., Letier, E., Perini, A., & Susi, A. (2022). Analysing app reviews for software engineering: a systematic literature review. Empirical Software Engineering, 27(2), 43. (Web link: https://link.springer.com/article/10.1007/s10664-021-10065-7)

[2] Faiz Shah, Kairit Sirts, and Dietmar Pfahl. 2019. Simulating the Impact of Annotation Guidelines and Annotated Data on Extracting App Features from App Reviews. In Proceedings of the 14th International Conference on Software Technologies (ICSOFT 2019). SCITEPRESS - Science and Technology Publications, Lda, Setubal, PRT, 384–396. https://doi.org/10.5220/0007909703840396 (https://dl.acm.org/doi/abs/10.5220/0007909703840396)

Using Synthetic Data Generation of Large Language Models to Help App Review Mining

Supervisor: Faiz Ali Shah (Faiz [dot] Ali [dot] Shah [at] ut [dot] ee)

App reviews contain valuable information that can be used to enhance the quality of mobile apps. However, due to the large volume of reviews that popular apps receive, manual analysis can be challenging and time-consuming. To address this issue, supervised machine learning has been employed to extract relevant information for developers from app reviews. However, this approach requires a significant amount of human-labeled data, which can be costly and difficult to obtain in a real-world setting. In this thesis, we propose to use synthetic data generated by large language models to augment app review data and evaluate its impact on machine learning model performance. Our study aims to provide insights into the effectiveness of synthetic data augmentation for app review analysis and identify best practices for generating high-quality synthetic data. By utilizing synthetic data augmentation techniques, the findings of this study could help app developers and data scientists improve their machine learning models' performance.

References

[1] Tang, R., Han, X., Jiang, X., & Hu, X. (2023). Does synthetic data generation of llms help clinical text mining?. arXiv preprint arXiv:2303.04360. (Web link: https://arxiv.org/abs/2303.04360)

[2] Lu, Y., Wang, H., & Wei, W. (2023). Machine Learning for Synthetic Data Generation: a Review. arXiv preprint arXiv:2302.04062. (Web link: https://arxiv.org/abs/2302.04062)

Evaluating Few-Shot Learning for App Review Classification

Supervisor: Faiz Ali Shah (Faiz [dot] Ali [dot] Shah [at] ut [dot] ee)

In the era of mobile apps, app reviews are a crucial source of feedback for developers. Analyzing these reviews can provide valuable insights into user satisfaction, pain points, and feature requests. However, manually analyzing large volumes of reviews can be time-consuming and error-prone. Therefore, there is a growing interest in using automated techniques to analyze app reviews. In this project, we aim to evaluate the potential of transfer learning techniques such as zero-shot learning and few-shot learning for detecting different types of information in app reviews.

References

[1] Devine, P., Koh, Y. S., & Blincoe, K. (2023). Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata. Empirical Software Engineering, 28(2), 26. (Web link: https://link.springer.com/article/10.1007/s10664-022-10254-y)

[2] Dąbrowski, J., Letier, E., Perini, A., & Susi, A. (2023). Mining and searching app reviews for requirements engineering: Evaluation and replication studies. Information Systems, 102181.(web link: https://www.sciencedirect.com/science/article/pii/S0306437923000170)

Automatic Summarization of Users’ Opinions in App Reviews

Supervisor: Faiz Ali Shah (Faiz [dot] Ali [dot] Shah [at] ut [dot] ee)

Co-supervisor: Roshni Chakraborty (Roshni [dot] Chakraborty [at] ut [dot] ee)

After the release of a new app version, app users submit their feedback in the form of user review also called "app reviews". In app reviews, users provide information about major bugs to fix and new features to add in an app. For developers, availability of such information is crucial for the next release planning, prioritization of quality assurance efforts, and meeting user expectations. Due to large number of reviews receive every day, manual analysis of app reviews is not feasible for app developers. This research topic aims to evaluate different summarization techniques for summarizing developer relevant information in app reviews.

References

[1] C. Wang, T. Liu, P. Liang, M. Daneva and M. van Sinderen, "The Role of User Reviews in App Updates: A Preliminary Investigation on App Release Notes," 2021 28th Asia-Pacific Software Engineering Conference (APSEC), 2021, pp. 520-525, doi: 10.1109/APSEC53868.2021.00061.

[2] Andrea Di Sorbo, Sebastiano Panichella, Carol V. Alexandru, Junji Shimagaki, Corrado A. Visaggio, Gerardo Canfora, and Harald C. Gall. 2016. What would users change in my app? summarizing app reviews for recommending software changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). Association for Computing Machinery, New York, NY, USA, 499–510. https://doi.org/10.1145/2950290.2950299

[3] https://medium.com/sciforce/towards-automatic-text-summarization-extractive-methods-e8439cd54715 (A brief overview of extractive text summarization methods)

Topic/Title ??

Supervisor: David Chapela de la Campa (xxx [dot] xxx [ät] ut [dot] ee)

Description

Topic/Title ??

Supervisor: Orlenys López-Pintado (xxx [dot] xxx [ät] ut [dot] ee)

Description

A Sharding-based Formalized Lifecycle for Scalable Mobile Smart-Contracts

Supervisor: Vimal Kumar Dwivedi (vimal [dot] kumar [dot] dwivedi [ät] ut [dot] ee)

With the inception of blockchain technology, scalability is a huge concern in terms of the number of transactions verified per second (TPS). E.g., the Bitcoin and the Ethereum blockchain have 7- and 14 tps throughput, respectively. The state-of-art shows approaches such as sharding, sharding with ledger-pruning, committee-based approach, and on/off-blockchain that significantly increase the scalability of the blockchains. However, the above algorithms are not feasible for the computation of blockchain on mobile devices due to the following reasons. The algorithms are less secure against malicious nodes because all algorithms are tested on a fraction of 1/4 or 1/3 of malicious nodes and use of the proof-of-work (PoW) for a consensus mechanism. In this paper, we develop a Colored Petri Nets (CPN) model for blockchain protocol that uses the sharding approach to verify transactions on mobile devices. The CPN is a formal method used to design and analyze such protocols to detect flaws and to reduce identified security risks. The sharding divides the blockchain network into multiple smaller sub-networks called shards that could be easily managed by a mobile device. The proposed protocol manages the connection management of a mobile node between the shards using the proof-of-stake (PoS) consensus mechanism and strong fault resiliency against malicious nodes. The empirical evaluation suggests that linear scalability, i.e., throughput is linear to the number of nodes in the network. Furthermore, the state space analysis of the model indicates that the result is a complete and correct formal specification used for further implementation of the protocol.

References:

V. Deval and A. Norta, "Mobile Smart-Contract Lifecycle Governance with Incentivized Proof-of-Stake for Oligopoly-Formation Prevention," 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2019, pp. 165-168, doi: 10.1109/CCGRID.2019.00029.
Deval, V., Norta, A., Dai, P., Mahi, N., Earls, J. (2021). Decentralized Governance for Smart Contract Platform Enabling Mobile Lite Wallets Using a Proof-of-Stake Consensus Algorithm. In: Patnaik, S., Wang, TS., Shen, T., Panigrahi, S.K. (eds) Blockchain Technology and Innovations in Business Processes. Smart Innovation, Systems and Technologies, vol 219. Springer, Singapore. https://doi.org/10.1007/978-981-33-6470-7_5

Topic/Title ??

Supervisor: Baseer Ahmad Baheer (xxx [dot] xxx [ät] ut [dot] ee)

Description

Discovering metamorphic relations from software documentation

Supervisor: Alejandra Duque-Torres (alejandra [dot] duque [dot] torres [ät] ut [dot] ee)

Metamorphic testing (TM) examines the relations between inputs and outputs of test runs. These relations are known as metamorphic relations (MR). Currently, MRs are handpicked and require in-depth knowledge of the System Under Test (SUT), as well as its problem domain. As a result, the identification and selection of high-quality MRs is a challenge.

This thesis aims to explore methods for discovering MRs from issues reports, user forums, or some software documentation to develop an initial tool to infer MR automatically. You can get some inspiration from the following papers:

[1] Arianna Blasi, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, Antonio Carzaniga, MeMo: Automatically identifying metamorphic relations in Javadoc comments for test automation, Journal of Systems and Software, Volume 181, 2021. DOI: https://doi.org/10.1016/j.jss.2021.111041

[2] X. Lin, M. Simon, Z. Peng and N. Niu, "Discovering Metamorphic Relations for Scientific Software From User Forums," in Computing in Science & Engineering, vol. 23, no. 2, pp. 65-72, 1 March-April 2021, doi: 10.1109/MCSE.2020.3046973.

[3] Alberto Goffi, Alessandra Gorla, Michael D. Ernst, and Mauro Pezzè. 2016. Automatic generation of oracles for exceptional behaviors. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA 2016). Association for Computing Machinery, New York, NY, USA, 213–224. DOI: https://doi.org/10.1145/2931037.2931061

Building metamorphic testing tool

Supervisor: Alejandra Duque-Torres (alejandra [dot] duque [dot] torres [ät] ut [dot] ee)

Metamorphic Testing (MT) has proven to be quite successful in alleviating the oracle problem in several application domains. In MT, system properties are represented as Metamorphic Relations (MRs), which are then utilized to automatically transform an initial set of test inputs (source inputs) into follow-up test inputs. If the system outputs for the initial and follow-up test inputs contradict the MR, the system is deemed faulty.

The aim of this thesis is to investigate if and which tools exist for automatic generations of tests using the MT approach. Depending on the possibilities found, one aim could be to develop a tool that provides the test cases (source and follow-up) given a set of MRs. Depending on limitations the tool might only work in specific circumstances or demine.

Test generation with Pynguin

Supervisor: Alejandra Duque-Torres (alejandra [dot] duque [dot] torres [ät] ut [dot] ee)

Automated unit test generation is a well-known methodology aiming to reduce the developers’ effort of writing tests manually. The most well-known open-source tools that perform automated unit test generation are Randoop and EvoSuite for Java programming language. Automated tool support for test generation is currently lacking for dynamically typed programming languages like Python.

A crucial problem impeding the development of test generation techniques is the fact that programs written in a dynamically typed language usually do not provide any information about variable types, as these languages often allow changing the type of a variable’s value throughout the program, dynamically modify objects at runtime, or provide type coercions that might not match the intent of the programmer.

Recently, an automated unit test generation for Python named Pyngin was proposed. Pynguin is an open-source framework written in and for Python. It uses search-based test generation to generate tests that maximize code coverage. Pynguin incorporates type information into the test-generation process. it is also able to generate covering test cases for programs that do not explicitly provide type information.

The aim of this thesis is to explore Pynguin capabilities on different systems by following the mutation testing approach. One can start by replicating "An Empirical Study of Automated Unit Test Generation for Python" study, and then extend the evaluation to different systems.

Important links,

https://pynguin.readthedocs.io/en/latest/index.html

S. Lukasczyk and G. Fraser. Pynguin: Automated Unit Test Generation for Python. In Proceedings of the 44th International Conference on Software Engineering Companion. ACM, 2022. DOI: 10.1145/3510454.3516829. arXiv:2202.05218

S. Lukasczyk, F. Kroiß, and G. Fraser. An Empirical Study of Automated Unit Test Generation for Python. Submitted to the EMSE Special Edition on “Advances in Search-Based Software Engineering”. arXiv:2111.05003

S. Lukasczyk, F. Kroiß, and G. Fraser. Automated Unit Test Generation for Python. In Proceedings of the 12th Symposium on Search-based Software Engineering. Lecture Notes in Computer Science, vol. 12420, pp. 9–24. Springer, 2020. DOI: 10.1007/978-3-030-59762-7_2. arXiv:2007.14049

Case Study in Using Fractal Enterprise Model (FEM) for Practical Tasks in an Organization

Supervisor: Ilia Bider (firstname [dot] lastname [ät] ut [dot] ee)

This is a "placeholder" Masters project topic, which needs to be negotiated individually. To engage in this Masters thesis, you need to cooperate closely with a company (preferably a company where you are currently working).

The master’s thesis would include understanding and modeling a (business) organization or a part of it (e.g. a department, service) with a modelling technique called Fractal Enterprise Model. Fractal Enterprise Model (FEM) is a relatively new advanced modeling technique that competes with other techniques used for Enterprise Architecture/Modeling world. It shows connection between different components (processes and assets) in an organization and can be used for business analysis and design on various levels, including the strategic one, like Business Model Innovation (BMI). The topics of your project can range from figuring out the ways FEM can be used in a case organization to using it for a specific task, e.g. finding a cause for a problem, suggesting alternative solutions for a known problem, finding where new IT systems are needed and for what, developing a new Business Model for the organization, or creating a capability map of the organization. The choice of the task depends on the needs of the organization, and on the student’s priorities. Ideally, your project should be connected to some problem/challenge/task that is already understood by the managers in a case organization, as beside your own time you might need to ask for engaging other people in the organization, e.g. for conducting interviews. A successfully completed project may result in a published paper later. Students who have full-time or part-time jobs and who can find a topic connected to their work place will particularly benefit for taking this topic.

Note: FEM is taught in the spring course called Enterprise Modeling. However, going through this course is not a prerequisite for taking this Masters thesis topic.

If you want to have a look on a thesis related to this topic completed at Tartu University, let me know, and I will send you an example.

References

https://www.fractalmodel.org/ - a site that has many resources related to FEM, including video recordings
Bider I., Chalak A. (2019) Evaluating Usefulness of a Fractal Enterprise Model Experience Report – an example of a published paper resulting from an MS thesis project
Bider, I., Lodhi, A. Moving from Manufacturing to Software Business: A Business Model Transformation Pattern – an example related to Business Model Innovation

Adding Annotation Capabilities to the FEM viewer

Supervisor: Ilia Bider (firstname [dot] lastname [ät] ut [dot] ee)

The thesis belongs to the “Applied research type”, more exactly, it belongs to the category “Thesis based on a software solution created by the author”. The task is to add new capabilities to a previously developed tool. The tool in question is called FEM viewer. It provides a user-friendly access to enterprise models created using a “heavy-weight” tool. The viewer is aimed for business people to view and navigate through a package of interconnected enterprise models created by a modeling expert who uses a heavy-weight modeling tool.

The FEM viewer was developed in a BS project by a student at Tartu university using a number of standard graphical libraries available as open source packages. Currently, it provides only possibility to view the models. The main objective for a new project is to extend the tool with capabilities for the viewer to provide feedback and annotate the models. The annotation includes textual annotation (unstructured feedback), as well as more formal logical annotation, like highlighting elements that needs special attention, e.g. from the security point of view.

The FEM viewer is aimed at viewing a special kind of enterprise models called Fractal Enterprise Models. The models are created using a specific tool – FEM toolkit. The latter was developed based on the ADOxx modeling environment. ADOxx has been used by different research and professional groups for creating tools for other modeling techniques, which makes the topic general, as a similar to the FEM viewer tools could be developed for other modeling techniques supported by the tools created with ADOxx. The thesis work would first consist of clarifying the requirements for the new functionality of the FEM viewer, making engineering decisions on how to implement them in the existing FEM viewer and then implementing them. The analysis and implementation process, as well as the resulting software product would have to described in the thesis.

References and pointers

You can investigate the current FEM viewer by going to https://femviewerserver.cloud.ut.ee and using FEMguest/FEMviewer to login. Please, do not change password so that other students can access this account. This account allows only view models, not to add models or administrate the accounts.

The FEM viewer is installed on Linux UBUNTU server. The following components/packages were used when developing FEM viewer:

React: https://reactjs.org/
Node: https://nodejs.org/en/
Express - Node.js web application framework. https://expressjs.com/
Passport: https://www.passportjs.org/
Certbot: https://certbot.eff.org/
Mysql: https://dev.mysql.com

More information on FEM viewer is available at https://github.com/siimlangel/FEM

Some ideas on annotating Enterprise Models can be found here https://hal.archives-ouvertes.fr/hal-00232842/document

Overview of FEM and FEM toolkit see in https://www.fractalmodel.org/

For information on ADOxx, see https://www.adoxx.org

Development of the Rules Mining (RuM) toolset

Supervisors: Fabrizio Maggi and Anti Alman (firstname [dot] lastname [ät] ut [dot] ee)

Rule mining is focused on the analysis and optimization of business processes using rules that the process is expected to fulfil. In this project, you will work on extending the Rules Mining toolset (RuM), which is developed at University of Tartu in collaboration with other universities. We invite you to have a look at the website. If you are interested in this topic, we can offer you to develop several new features of RuM for your Masters thesis, like for example a module for detecting and visualizing violations of business rules in a user-friendly manner. Knowledge of Java is required.

Extending the Nirdizati Predictive Process Monitoring Engine

Supervisor: Fabrizio Maggi (firstname [dot] lastname [ät] ut [dot] ee)

Predictive process monitoring is concerned with leveraging historical process execution data to predict how running (uncompleted) cases will unfold up to their completion. Historical data is given as input to a machine learning method to train a predictive model that is queried at runtime to predict a process outcome. A predictive model can also be used to provide, together with predictions, also recommendations to the user on what to do to minimize the probability of a negative process outcome. In this thesis project, we will work on the development of Nirdizati (http://nirdizati.org/nirdizati-research/) a predictive process monitoring web application for validating and comparing the performance of different predictive models on the same dataset. If you are interested in this topic, a thesis project can be developed in different directions and can be focused on engineering tasks related to the development of existing predictive process monitoring approaches in Nirdizati or research tasks related to the development of novel predictive process monitoring approaches in the same application. Knowledge of Python and of data science is required.

Title: On the assessment of Machine Learning Algorithms for Fairness

Supervisor: Mohamad Gharib (mohamad dot gharib at ut dot ee)

Co-supervisor: Modar Sulaiman (modar dot sulaiman at ut dot ee)

Artificial intelligence (AI)/Machine learning (ML) can be described as the art and science of letting computers learn to perform complex tasks without being explicitly programmed to [1]. This has led to a dramatic increase in AI/ML adoption in almost all the main domains of our lives. One main advantage of using AI/ML systems is making or assisting in making [critical] decisions. Unlike humans, who might have various biases that can influence their objective decisions, AI/ML systems were expected to make precise and objective decisions [2]. However, AI/ML systems have been proven to suffer from bias and discriminative behavior just like humans [3]. Examples of such biased behavior cover many AI/ML applications [4][5], and may have serious consequences when they occur in sensitive domains, where AI/ML decisions may influence essential human rights (e.g., the right to equality). That is why assuring AI/ML fairness has emerged as an important area for research within the ML community [6].

This has led to a growing interest among AI/ML researchers on the issue of fairness metrics, and vast number of metrics have been developed to quantify AI/ML. However, many recent works have identified limitations, inadequacies, and insufficiencies in almost all existing fairness metrics [7], given that there is no universal means to measure fairness, i.e., there are no clear criteria to assess which measure is the “best”.

The aim of this thesis is to: (1) critically review available AI/ML fairness literature; (2) identify the strength and weaknesses of the best current approaches to measure fairness in AI/ML; (3) specify the requirements for developing new metric(s) that address inadequacies/insufficiencies in existing fairness metrics; and (4) implementing and testing adequate fairness metric(s) that satisfy the aforementioned requirements.

Note: for a comprehensive survey of fairness in machine learning, you can refer to [8].

References:

[1] M. Gharib, P. Lollini, M. Botta, E. Amparore, S. Donatelli, A. Bondavalli, On the Safety of Automotive Systems Incorporating Machine Learning Based Components: A Position Paper, in: Proc. - 48th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Networks Work. DSN-W 2018, 2018: pp. 271–274. https://doi.org/10.1109/DSN-W.2018.00074.

[2] G. Sheshasaayee, Ananthi and Thailambal, Comparison of classification algorithms in text mining, Int. J. Pure Appl. Math. 116 (2017) 425–433.

[3] P. Molnar, L. Gill, Bots at the Gate: a human rights analysis of automated decision-making in Canada’s immigration and refugee system, 2018.

[4] L. Sweeney, Discrimination in online Ad delivery, Commun. ACM. 56 (2013) 44–54. https://doi.org/10.1145/2447976.2447990.

[5] S.L. Blodgett, B. O’Connor, Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English, in: Fairness, Accountability, Transpar. Mach. Learn., 2017.

[6] A. Agarwal, A. Beygelzimer, M. Dudfk, J. Langford, W. Hanna, A reductions approach to fair classification, in: 35th Int. Conf. Mach. Learn. ICML 2018, 2018: pp. 102–119.

[7] Yao, Sirui, and Bert Huang. "New fairness metrics for recommendation that embrace differences." arXiv preprint arXiv:1706.09838 (2017).

[8] Caton, Simon, and Christian Haas. "Fairness in machine learning: A survey." arXiv preprint arXiv:2010.04053 (2020).

Title: From User Stories to Privacy-aware user stories

Supervisor: Mohamad Gharib (mohamad dot gharib at ut dot ee)

Requirements elicitation is defined as the process of uncovering, acquiring, and elaborating requirements for computer-based systems [1]. There is a general agreement in the Requirements Engineering (RE) community that requirements elicitation is one of the most critical activities in the RE process (e.g., [2]), since getting the right requirements is considered a vital success factor for software development projects [3]. Although there are several requirements elicitation approaches and techniques that have been proposed in the literature, including but not limited to: interviews, questionnaires, task analysis, workshops, prototyping, etc., user stories [4] become almost the standard method for eliciting requirements in the industry [5]. A user story is a short description of high-level stakeholders’ requirements that is represented using a simple template such as “As a <role>, I want <goal>, so that <benefit>”. User stories have been successfully used for eliciting functional requirements, yet they are still being criticized for appropriately eliciting non-functional requirements (NFRs) such as privacy, safety, reliability, etc., where the satisfaction of NFRs is essential for successful software projects.

Privacy has emerged as a key concern since such companies need to protect the privacy of personal information to comply with various privacy laws and regulations (e.g., GDPR in the EU) that many governments have enacted for privacy protection. Accordingly, dealing with privacy concerns is a must these days [6]. Like other NFRs, there is neither standard nor agreed upon user stories approach for eliciting privacy requirements. To this end, the main objective of this thesis is to develop, verify, and validate a privacy-aware user stories approach.

Note: the privacy ontology provided in [7] can be used to facilitate understanding and dealing with privacy requirements in the proposed approach.

References:

[1] Didar Zowghi and Chad Coulin, "Requirements elicitation: A survey of techniques, approaches, and tools," in Engineering and managing software requirements.: Springer, 2005, pp. 19-46.

[2] Ian Sommerville, Software engineering 8: pearson Education limitd, 2007.

[3] Jones Carpers, "Applied Software Measurement: Assuring Productivity and Quality," McGraw-Hill, New York, vol. 17, no. 1, p. 2.

[4] Cohn, M.: User stories applied: for agile software development. Addison Wesley (2004)

[5] Lucassen, Garm, et al. "The use and effectiveness of user stories in practice." International working conference on requirements engineering: Foundation for software quality. Springer, Cham, 2016.

[6] Gharib, Mohamad, John Mylopoulos, and Paolo Giorgini. "COPri-a core ontology for privacy requirements engineering." International Conference on Research Challenges in Information Science. Springer, Cham, 2020.

[7] Gharib, Mohamad, Paolo Giorgini, and John Mylopoulos. "COPri v. 2—A core ontology for privacy requirements." Data & Knowledge Engineering 133 (2021): 101888.

A safety-aware architecture for Safety-Critical Systems incorporating Machine Learning components

Supervisor: Mohamad Gharib (mohamad dot gharib at ut dot ee)

Machine learning (ML) components are increasingly adopted in many automated systems. Their ability to learn and work with novel input/incomplete knowledge and their generalization capabilities make them highly desirable solutions for complex problems [1]. This has motivated many system manufacturers to adopt ML components in their products in many industrial domains (e.g., medical, automotive), performing complex tasks such as pattern recognition, image recognition, and even control [2]. However, some of these systems can be classified as safety-critical systems (SCS), where their failure may cause death or injuries to humans [3]. Accordingly, the performance of such ML components must be assessed and guaranteed to be compliant with the safety requirements of incorporating SCS. Although the area of system safety is well-established, and there exist various methods to identify potential components faults/failures along with countermeasures to eliminate or at least limit the consequences of such faults/failures. Most of these methods do not apply to ML components as they do not properly address the special characteristics of ML components such as non-determinism, non-transparency, and instability to mention a few [4].

The objective of this thesis is to propose general-purpose fail-controlled [5] software architecture for incorporating ML components into SCS. The architecture will adopt state-of-art system and safety engineering principles, and adapt them to address the special characteristics of ML components. The architecture should be able to identify when an ML component may fail to behave as expected and tackle hazardous situations resulting from such failure by implementing countermeasure mechanisms appropriate for the type of failure. The architecture will be validated by applying it to a real/realistic case study/scenario concerning an SCS.

Note: Section 3 in [6] provides a short description of fail-controlled software architecture.

References:

[1] Z. Kurd, T. Kelly, and J. Austin, “Developing artificial neural networks for safety-critical systems,” Neural Computing and Applications, vol. 16, no. 1, pp. 11–19, oct 2007.

[2] J. Schumann, P. Gupta, and Y. Liu, “Applications of Neural Networks in High Assurance Systems,” in Neural Networks, 2010, vol. 268, pp. 1–19.

[3] M. Bozzano and A. Villafiorita, Design and safety assessment of critical systems. Auerbach Publications, 2011.

[4] Gharib, Mohamad, et al. "On the safety of automotive systems incorporating machine learning-based components: a position paper." 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 2018.

[5] Avizienis, A., Laprie, J. C., Randell, B., Landwehr, C., 2004. Basic Concepts And Taxonomy Of Dependable and Secure Computing. IEEE Transactions On Dependable And Secure Computing 1 (1), Pp. 11–33.

[6] Gharib, Mohamad, Tommaso Zoppi, and Andrea Bondavalli. "On the Properness of Incorporating Binary Classification Machine Learning Algorithms into Safety-Critical Systems." IEEE Transactions on Emerging Topics in Computing (2022).

Title: Towards an information type lexicon and taxonomy to improve informational self-determination

Supervisor: Mohamad Gharib (mohamad dot gharib at ut dot ee)

The monetary value of information, and especially Personal Information (PI), is large and growing, and many organizations have already started profiting from this trend. Accordingly, breaches and misuse of PI have increased [1]. For example, privacy merchants shadow Internet users to create very detailed profiles concerning their online behavior and activities. Then, sell these profiles to whoever pays the demanded price [2]. In response to that and other potential misuses of PI, many governments around the world have enacted laws and regulations for privacy/PI protection (e.g., the GDPR in the EU). However, these laws and regulations rely heavily on the concept of informational self-determination that is, usually, implemented through the notice and consent/choice model. A notice (e.g., privacy policy) is supposed to inform Data Subjects (DSs) about how their PI will be processed and shared, and a consent/choice is supposed to acquire a signifying acceptance at the DSs' side concerning the offered notice. Although notifying DSs about data practices is supposed to enable them to make informed privacy decisions, current mechanisms for presenting the notice and obtaining the consent are deeply flawed as indicated by many researchers. More specifically, most notices are long, complex, hard to comprehend, change frequently, do not, usually, precisely specify potential future use of PI, and most importantly they either do not specify what type of information/PI is subject to this notice or use very high abstract terms. To improve the understandability of notices (privacy policies) on DSs side, and allow future automated analysis of such notices, a well-defined taxonomy of information/PI types should be provided.

This thesis aims to: (1) construct a lexicon of information/PI by analyzing an appropriate number (e.g., 15) of privacy policies; (2) derive a well-defined taxonomy of information/PI from the information/PI lexicon; and (3) verify and validate the information/PI taxonomy by applying it to case studies from different domains and assessing its completeness for classifying information/PI.

Note 1: To get an idea of how an information type lexicon can be constructed, you can refer to [3]. Note 2: The information/PI partial taxonomies provided in [4] and [5], can be used as a reference for the taxonomy to be developed.

References:

[1] Gharib, Mohamad, Paolo Giorgini, and John Mylopoulos. "Towards an ontology for privacy requirements via a systematic literature review." International conference on conceptual modeling. Springer, Cham, 2017.

[2] Etzioni, Amitai. "The privacy merchants: What is to be done." U. Pa. J. Const. L. 14 (2011): 929.

[3] Bhatia, Jaspreet, and Travis D. Breaux. "Towards an information type lexicon for privacy policies." 2015 IEEE eighth international workshop on requirements engineering and law (RELAW). IEEE, 2015.

[4] Gharib, Mohamad, Paolo Giorgini, and John Mylopoulos. "COPri v. 2—A core ontology for privacy requirements." Data & Knowledge Engineering 133 (2021): 101888.

[5] Gharib, Mohamad, and John Mylopoulos. "On the Philosophical Foundations of Privacy: Five Theses." IFIP Working Conference on The Practice of Enterprise Modeling. Springer, Cham, 2021.

An integrated approach for analyzing safety and security requirements for Cyber-Physical Systems (CPSs)

Supervisor: Mohamad Gharib (mohamad dot gharib at ut dot ee)

The increased digitization of traditional Physical Systems (PSs) gave birth to the so called Cyber-Physical Systems (CPSs), which integrate sensing, computational, and control capabilities into traditional PSs combined with network connectivity. Consequently, traditional security solutions, although well established and consolidated, might not be effective to protect CPSs against human planned, malicious, complex attacks, which are the typical modern cyber-security attacks. This is quite clear with the increasing number of cyber-security attacks that now can target some of the safety-critical functionalities of CPSs. For instance, modern automotive vehicles have been proven vulnerable to hacking attacks aiming at getting control over the safety-critical functions of the vehicle [1]. An example is the hijacking of the steering and braking units in a Ford Escape [2]. Similarly, hackers were able to remotely hijack a Tesla Model S from a distance of around 12 miles [3]. Chrysler announced a recall for 1.4 million vehicles after a pair of hackers demonstrated that they could remotely hijack a Jeep’s digital systems over the Internet [4]. These are just a few examples of how attackers can exploit weaknesses in the design of safety-critical CPSs and use these weaknesses to conduct their attacks. In short, a CPS cannot be safe unless it is secured.

This thesis aims at proposing an approach that can identify potential cyber-security attack(s) that a specific safety-critical functionality of an automotive system might be subject to, and analyze how each identified attack might be performed (e.g., attack method/means, attacker’s capabilities), and the potential consequences in case such attack success. Then, identify countermeasures to prevent or at least mitigate/minimize the consequences of the attack. Note: application domain can be the automotive domain, or any other safety-critical CPS domain such as Industrial Internet of Things (IIoT), Smart Cities, etc.

References:

[1] M. Dibaei, X. Zheng, K. Jiang, R. Abbas, S. Liu, Y. Zhang, Y. Xiang, and S. Yu, “Attacks and defences on intelligent connected vehicles: a survey,” Digital Communications and Networks, 2020.

[2] A. Greenberg, “Hackers Reveal Nasty New Car Attacks-With Me Behind The Wheel (Video),” p. 1, 2013. https://cutt.ly/4jIQVlX

[3] O. Solon, “Team of hackers take remote control of Tesla Model S from 12 miles away — Technology — The Guardian,” 2016. https://cutt.ly/hjIQZ7P

[4] A. Greenberg, “The Jeep Hackers Are Back to Prove Car Hacking Can Get Much Worse,” 2016. https://www.wired.com/2016/08/jeep-hackers-return-high-speed-steering-acceleration-hacks/

Other Master Thesis Projects

Additional topics proposed by other groups in the Institute of Computer Science are available (click here).

Conversion Master Thesis Projects (15 ECTS)

Case Study in Software Testing or Software Analytics (focus on software quality)

Supervisor: Dietmar Pfahl (dietmar dot pfahl ät ut dot ee)

How did participants like my hackathon? A benchmarking tool (booked)

Supervisor: Alexander Nolte (alexander [dot] nolte [ät] ut [dot] ee)

Hackathons and similar time-bounded events have become a global phenomenon with thousands of individuals participating in hundreds of events every weekend. They are organized by corporations, (higher) education institutions, civic engagement groups, (online) communities and others with the aim to create innovative technologies, tackle civic, environmental and public health issues, spread knowledge and expand communities.

Despite their widespread adoption organizers often still struggle to answer seemingly simple questions such as “How did participants like my hackathon?”, “Did they achieve what they wanted to achieve?” and “How does my hackathon compare to other similar events?”. There are existing survey instruments that can help organizers answer these questions. These instruments are however not widely accessible, sometimes time consuming to set up, and they do not allow organizers to compare their hackathon to other (similar) events.

The aim of this thesis is to develop a web-based application for hackathon organizers to compare their hackathon to similar events. For this you will utilize an existing survey platform and an existing database of survey responses. The application itself will be embedded into an existing website for hackathon organizers (https://hackathon-planning-kit.org/).

Proactive and Personalised Public Services: towards Meaningful Human Control in Algorithmic Government

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Case Study in AI-driven customer service

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

A pre-requisite for this topic is access to the data to be used - the data themselves do not need to be made publicly available and accessible, but rather have the data to work with and demonstrate the validity of the results in quantivative and/ or qualitative way AND bring added value to the organization a candidate is involved in. While the topic allows an independent research, initially supposes the work in the team with several researchers, which supposes a guidance and assistance at our end.

Barriers to Openly Sharing Government Data: towards Open Data-adapted Innovation Resistance Theory

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Government agencies increasingly share their data on the internet so that citizens, companies, other government agencies, researchers, and other actors that freely reuse this data. However, many entities are reluctant to openly share their data with the public. While the resistance of public organizations to openly share government data has been investigated in previous research, most of these studies are focused on the reuse of open government data by companies and citizens. There is a paucity of research applying theoretical models to study the provision of OGD, and more specifically, the resistance of public organizations to make government data publicly available. We argue that Innovation Resistance Theory (IRT), which considers both functional and psychological factors, can be used to study OGD barriers, where OGD is seen as a source of innovation. In previous research we proposed an initial version of such open government data-adapted IRT model for empirical determination of barriers towards sharing the data. We also expect that it can be applied in B2G context, considering that the European Commission is taking regulatory action and is preparing the Data Act to set the rules and conditions, thereby changing the current voluntary model to a more mandatory data sharing.

The thesis suppose conducting interviews with public officials with the aim to identify barriers for openly sharing the data, as well as verify applicability and appropriateness of Innovation Resistance Theory to tasks of this nature. These interviews are expected to allow us to refine the model, excluding items that does not make sense, and adding those we missed, as well as identifying control variables.

If the time will be sufficient, after the model will be refined as a result of a set of interviews combined with the results from other countries (will be provided), a quantitative study applying the model is expected to be conducted to gather the data. Then, a structural equation modelling (SEM) technique is expected to be applied to analyse interrelationships among variables in the model, applying the partial least squares (PLS)-SEM (e.g., software SmartPLS) to (1) evaluate the measurement model on its internal consistency reliability, convergent validity and discriminant validity, and (2) assess the structural model for considering the causation among constructs based on the coefficients of determination (R2 values) and the significance of the path coefficients.

The topic and a plan can be adapted, considering interests of the author of thesis.

Case study on how to make the public / open data ecosystem user engaging and user-friendly regardless of their level of digital and data literacy

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Public data ecosystems are expected to be used by every citizen, regardless of the domain to which they belong, education, specialism, gender, level of income, etc. However, current research suggests that many users find it difficult to use, referring to both data findability, accessibility, understandability, systems usability and lack of mechanisms allowing or facilitaitng citizen engagement thereby contributing to the collaborativeness etc. This is also in line with recent reports of digital literacy level, which in many countries is very low. This requires providing additional support to users to improve their experience with these ecosystems. Thus, the thesis would refer to the existing literature, examine the system (mostly refers to the portal itself) and, preferably, conduct an experiment aimed at identifying challenges associated with the use of the ecosystem. This would constitute the basis for the theoretical foundation of the assisting assets such as chatbot and similar mechanisms, while more general findings would constitute a list of guidelines and agenda for the improvements of the current state of the art. Then, the respective improvements are expected to be implemented and the result of this implementation is expected to be tested with users. Considering the limited time and the significance of the underlying base of knowledge, the scope of the thesis can be set during an individual discussion. Also see the topic suggested above ("Chatbot for open government data portals: towards making open data user-friendly to users regardless their level of (open) data literacy").

Case Study in DataOps to improve the quality and efficiency of data analysis

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

DataOps is considered as an emerging topic for improving quality and efficiency of data analysis thereby allowing to derive greater value from the available data (internal, i.e., of the organization, external or combining both, i.e., combining the data available within the organization with publicly available data or replying purely on the latter) by means of the set of changes of technical, organizational, cultural changes within the organization and the way the data are deal with. Considering the specifics of the topic and its emerging popularity with a rather limited body of knowledge and lack of standards, the thesis expect, first, to conduct a SLR on the topic, forming a knowledge base, which would then allow the student to choose a topic of interest for the practical part of the study, depending on the interests and prior experience. The latter part, however, can be substituted by the investigation on how DataOps can be invovled in the oraginization a candidate represent, i.e., what technical, organizational and cultural changes would be required, how close/far the current approach the organization is using is from DataOps, what would be benefits an organization could gain and challenges an organization could face in transforming to this model.

A pre-requisite for this topic is an access to the system within which an explorative study can be conducted, analysing the current state of the art and investigating the changes required for DataOps to be involved - the details on the system do not need to be made publicly available, rather have the system to work with and demonstrate the validity of the results in quantivative and/ or qualitative way AND bring added value to the organization a candidate is involved in.

Analysis of citizen engagement practices in public / open data ecosystems: towards a collaborative public data ecosystems

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

The role of Social media-based crisis management

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Technology acceptance-driven analysis of ChatGTP acceptance in academia - 2 perspectives (students vs teaching staff)

Supervisor: Anastasija Nikiforova (Anastasija [dot] Nikiforova[ät] ut [dot] ee)

Case Study in Using Fractal Enterprise Model (FEM) for Practical Tasks in an Organization

Supervisor: Ilia Bider (firstname [dot] lastname [ät] ut [dot] ee)

In this master’s thesis project, you will model a (business) organization or a part of it (e.g. a department, service) with a modelling technique called Fractal Enterprise Model. Fractal Enterprise Model (FEM) is a relatively new advanced modeling technique that competes with other techniques used for Enterprise Architecture/Modeling world. It shows connection between different components (processes and assets) in an organization and can be used for business analysis and design on various levels, including the strategic one, like Business Model Innovation (BMI).

The topics of your project can range from figuring out the ways FEM can be used in a case organization to using it for a specific task, e.g. finding a cause for a problem, suggesting alternative solutions for a known problem, finding where new IT systems are needed and for what, developing a new Business Model for the organization, or creating a capability map of the organization. The choice of the task depends on the needs of the organization, and on the student’s priorities. Ideally, your project should be connected to some problem/challenge/task that is already understood by the managers in a case organization, as beside your own time you might need to ask for engaging other people in the organization, e.g. for conducting interviews. A successfully completed project may result in a published paper later. Students who have full-time or part-time jobs and who can find a topic connected to their work place will particularly benefit for taking this topic.

Note: FEM is taught in the spring course called Enterprise Modeling. However, going through this course is not a prerequisite for taking this Masters thesis topic.

If you want to have a look on a thesis related to this topic completed at Tartu University, let me know, and I will send you an example.

References

https://www.fractalmodel.org/ - a site that has many resources related to FEM, including video recordings
Bider I., Chalak A. (2019) Evaluating Usefulness of a Fractal Enterprise Model Experience Report – an example of a published paper resulting from an MS thesis project
Bider, I., Lodhi, A. Moving from Manufacturing to Software Business: A Business Model Transformation Pattern – an example related to Business Model Innovation

Bachelor Thesis Projects (9 ECTS)

Lab Package Development & Evaluation for the Course 'Software Testing' (LTAT.05.006)

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

The course Software Testing (LTAT.05.006) has a series of practice sessions in which 2nd and 3rd year BSc students learn a specific test technique. We would like to improve existing labs and add new labs.

This topic is intended for students who have already taken this software testing course and who feel that they can contribute to improving it and by the same token complete their Bachelor's project. The scope of the project can be negotiated with the supervisor to fit the size of a Bachelors project.

The tasks to do for this project are as follows:

Selection of a test-related topic for which a lab package should be developed (see list below)
Development of the learning scenario (i.e., what shall students learn, what will they do in the lab, what results shall they produce, etc.)
Development of the materials for the students to use
Development of example solutions (for the lab supervisors)
Development of a grading scheme
Evaluation of the lab package

Topics for which lab packages could be developed (list can be extended based on student suggestions / one bullet point corresponds to one BSc thesis):

Automatic Test Case Generation (with EvoSuite)
Visual GUI Testing (with SikuliX) - There exists a lab package but an update would be required
Scriptless Testing (with Testar)
Model-Based Testing (with GraphWalker)
Fuzzing (Book, AFL)
Metamorphic Testing (https://en.wikipedia.org/wiki/Metamorphic_testing)
Mocking (Mockito)
Symbolic Testing (with JPF)
Other topics that you find interesting and would like to discuss with me regarding their suitability

Good examples of past BSc theses that developed new lab packages:

Lab Package: Mutation Testing by Cornelia Efros
Lab Package: Debugging by Hiie-Helen Raju
Lab Package: Automated GUI Testing by Kert Prink
Lab Package: Random Testing by Tiit Hendrik Piibeleht

Overview of metamorphic testing tools

Supervisor: Alejandra Duque-Torres (alejandra [dot] duque [dot] torres [ät] ut [dot] ee)

Metamorphic Testing (MT) is a software testing approach proposed by Chen et al. [1] to alleviate the test oracle problem. A test oracle is a mechanism for detecting whether or not the outputs of a program are correct [2], [3]. The oracle problem arises when the SUT lacks an oracle or when developing one to verify computed outputs is practically impossible [3]. MT differs from traditional testing approaches in that it examines the relations between input-output pairs of consecutive SUT executions rather than the outputs of individual SUT executions [1]. These relations are known as metamorphic relations (MR). Currently, MRs are handpicked and require in-depth knowledge of the SUT as well as its problem domain. As a result, the identification and selection of high-quality MRs is a challenge.

The aim of this thesis is to give an updated overview of MT, highlighting the main advances in the technique, its applications, integration with other techniques, and experimental results. On top of that, we are interested to know which tools are available to perform testing using the MT approach, and how the MRs are identified.

The main contribution of this work is to bring together previously scattered studies to lay the groundwork for future research, as well as to introduce newcomers to this testing technique.

[1] T. Y. Chen, S. C. Cheung, and S. M. Yiu, “Metamorphic testing: A new approach for generating next test cases,” Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, Tech. Rep. HKUST-CS98-01, 1998.

[2] A. Duque-Torres, A. Shalygina, D. Pfahl, and R. Ramler, “Using rule mining for automatic test oracle generation,” in 8th International Workshop on Quantitative Approaches to Software Quality (QuASoQ), ser. QuASoQ’20, 2020.

[3] E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,” IEEE Transactions on Software Engineering, vol. 41, no. 5, pp. 507–525, 2015.

[4] H. Liu, F.-C. Kuo, D. Towey, and T. Y. Chen, “How effectively does metamorphic testing alleviate the oracle problem?” IEEE Transactions on Software Engineering, vol. 40, no. 1, pp. 4–22, 2014.

Lab Package: Random testing using EvoSuite

Supervisor: Alejandra Duque-Torres (alejandra [dot] duque [dot] torres [ät] ut [dot] ee) and Dietmar Pfahl (dietmar [dot] pfahl [ät] ut [dot] ee

The course Software Testing (LTAT.05.006) has a series of practice sessions in which 2nd and 3rd year BSc students learn a specific test technique. We would like to improve the existing lab on random testing. In particular, we would like to use EvoSuit instead of Randoop. EvoSuite is a tool that automatically generates test cases with assertions for classes written in Java code.

This topic is intended for students who have already taken this software testing course and who feel that they can contribute to improving it and by the same token complete their Bachelor's project.

The tasks to do for this project are as follows:

Development of the learning scenario (i.e., what shall students learn, what will they do in the lab, what results shall they produce, etc.)
Development of the materials for the students to use
Development of example solutions (for the lab supervisors)
Development of a grading scheme
Evaluation of the lab package

Question Answering System for Software Documentation

Supervisor: Faiz Ali Shah (faiz [dot] ali [dot] shah [ät] ut [dot] ee)

The purpose of this project is to develop a Question Answering (QA) system for software documentation, which will enable users to quickly and accurately find answers to their questions without having to read through large documents or manuals. The system will use natural language processing (NLP) and machine learning techniques to understand user queries and provide relevant and accurate answers from software documentation. The system will be able to understand different types of questions, such as fact-based questions, how-to questions, and comparison questions, and provide relevant answers based on the type of question and the context of the query.

Lab Package: Behavior Driven Testing (BDD)

Supervisor: Faiz Ali Shah (faiz [dot] ali [dot] shah [ät] ut [dot] ee) and Dietmar Pfahl (dietmar [dot] pfahl [ät] ut [dot] ee

The course Software Testing (LTAT.05.006) has a series of practice sessions in which 2nd and 3rd year BSc students learn a specific test technique. We want to have a new lab package on Behavior Driven Testing. In particular, we would like to use Gherkin and Cucumber for the creation of lab material.

This topic is intended for students who have already taken this software testing course and who feel that they can contribute to improving it and by the same token complete their Bachelor's project.

The tasks to do for this project are as follows:

Development of the learning scenario (i.e., what shall students learn, what will they do in the lab, what results shall they produce, etc.)
Development of the materials for the students to use
Development of example solutions (for the lab supervisors)
Development of a grading scheme
Evaluation of the lab package

Adding Annotation Capabilities to the FEM viewer

Supervisor: Ilia Bider (firstname [dot] lastname [ät] ut [dot] ee)

This is a hand-on engineering project, where you will add new capabilities to a previously developed tool. The tool in question is called FEM viewer. It provides a user-friendly access to enterprise models created using a “heavy-weight” tool. The viewer is aimed for business people to view and navigate through a package of interconnected enterprise models created by a modeling expert who uses a heavy-weight modeling tool.

References and pointers

The FEM viewer is installed on Linux UBUNTU server. The following components/packages were used when developing FEM viewer:

React: https://reactjs.org/
Node: https://nodejs.org/en/
Express - Node.js web application framework. https://expressjs.com/
Passport: https://www.passportjs.org/
Certbot: https://certbot.eff.org/
Mysql: https://dev.mysql.com

More information on FEM viewer is available at https://github.com/siimlangel/FEM

Some ideas on annotating Enterprise Models can be found here https://hal.archives-ouvertes.fr/hal-00232842/document

Overview of FEM and FEM toolkit see in https://www.fractalmodel.org/

For information on ADOxx, see https://www.adoxx.org