Student Projects, Academic Year 2020-2021

Below is a list of project topics for Masters and Bachelors theses offered by the Software Engineering & Information Systems Research Group for students who intend to defend in June 2020. The projects are divided into:

Software Engineering Masters topics (30 ECTS)
IT Conversion Masters topics (15 ECTS)
Bachelor Thesis Projects (9 ECTS)

If you're interested in any of these projects, please contact the corresponding supervisor.

Master Thesis Projects

The secret life of hackathon code - booked

Alexander Nolte (alexander [dot] nolte [ät] ut [dot] ee)

Developing and showcasing a functioning (software) prototype is at the core of most hackathon events. Our understanding of whether and how they utilize existing code and what happens to the code that has been developed during a hackathon after the event has ended is however still limited. In particular the connection between utilizing existing code, developing code during a hackathon and the continued use of that code after an event is still unknown. The aim of this thesis is to close this gap.

For the thesis you will utilize the novel resource “World of Code (WoC)” which allows you to trace code not only with a single repository but between repositories and even platforms. Based on an existing dataset which includes more than 90000 hackathon projects you will use WoC to analyze whether teams used pre-existing code, which code they developed during a hackathon and what happened to this code after the event has ended.

Moving hackathons online – Challenges and opportunities - booked

Alexander Nolte (alexander [dot] nolte [ät] ut [dot] ee)

Hackathons started out as time-bounded competitive coding events during which young developers formed small ad-hoc teams and engaged in short-term intense collaboration. Such events were typically held in a face-to-face setting. The recent COVID-19 crisis however has led to a surge in events that are strictly organized online. While sharing many similarities, moving such events from a collocated to an online format poses new challenges but it also creates novel opportunities.

The aim of this thesis is to study differences between offline and online events and to provide suggestions to organizers and participants on how to overcome the challenges of online hackathons and exploit their possibilities.

Using an existing planning kit for offline hackathons as a basis (https://hackathon-planning-kit.org/) you will first identify potential differences between offline and online events. You will then use a combination of data analytics and interviews with hackathon organizers and participants to validate your findings, study connections between the discovered differences and develop suggestions for how to overcome issues and exploit future potentials.

Expectations and outcomes of entrepreneurial hackathons - booked

Alexander Nolte (alexander [dot] nolte [ät] ut [dot] ee)

Hackathons started out as time-bounded competitive events during which young developers formed small ad-hoc teams and engaged in short-term intense collaboration on software projects for pizza and the potential prospect of a future job. Since those humble beginnings hackathons have become a global phenomenon with thousands of individuals participating in hundreds of events every weekend. In Estonia in particular hackathons have become an integral part of the vivid startup scene promising fame and fortune to young entrepreneurs.

Such promises can however lead to unrealistic expectations and potential negative experiences for participants and organizers if they are not fulfilled. The aim of this master thesis is to study expectations and outcomes of hackathons participants and organizers and develop suggestions for managing expectations before, during and after an event to prevent potential negative experiences.

Based on existing work you will use a combination of interview, observation, survey and archival data analysis methods to identify expectations of participants and organizers, discover their connection and propose means to manage expectations based on the results obtained.

<topic>

Supervisor: Kuldar Taveter (kuldar [dot] taveter [ät] ut [dot] ee)

Conflicts Management in GORE: Socio-Technical Systems Perspective

Supervisor: Ishaya Gambo (ishaya [dot] gambo [ät] ut [dot] ee)

Conflict management is a genuine problem in many design systems, and so further development in this area could be extremely beneficial, and has scope to impact across a large number of domains. The problem of conflict management in requirements is further aggravated by the iterative nature of agile software engineering (SE) methodologies [1], where requirements should be changed and elaborated repeatedly along with the iterations of an agile SE process [2]. In goal-oriented requirements engineering (GORE), requirements are treated as goals [3]. Handling conflicts in goals is one of the active research areas in GORE [4 - 6]. GORE expresses the statements by the stakeholders concerning the desired system as goals to be achieved by the system. In socio-technical systems (STS), the goals are achieved through the cooperation of man-made agents within the software-to-be and human agents. As stakeholders frequently pursue mismatching goals, identification and resolution of conflicts in requirements is an inevitable part of GORE. For this masters thesis, we want to investigate the identification and resolution of conflicts in the requirements within the agile agent-oriented modelling (AAOM) method [7] methodology for engineering STS. The main objective is to develop a strategy and its supporting tool for conflicts identification and resolution in GORE for STS in the context of AAOM. AAOM is derived from agent-oriented modelling (AOM) [8]. The thesis proposes to use the notions of STS and "agent" for understanding and representing conflicts in requirements, so that conflicts could be more easily identified and resolved. STS in this context consists of diverse, active components - both human and man-made - that collaborate in designing and sustaining the STS. Remarkably, STSs are designed to meet business goals [80 = 9]. Agents pursue two kinds of goals: functional and non-functional goals. In [8] a functional goal is defined as a particular state of affairs intended by one or more active entities - agents - in the STS and a non-functional or quality goal as a quality requirement for the achievement of the functional goal. This masters thesis topic is expected to (i) advanced conflict identification and resolution strategy for GORE for STS within the AAOM Methodology, (ii) propose a conflict identification and resolution strategy that works with hierarchical goal models. The proposed strategy should take advantage of (a) attachment of the corresponding roles to goals of the hierarchical goal model, which naturally brings out needs and intentions by the corresponding stakeholders, and (ii) relating the goal models to the most popular artefacts of agile SE - user stories - which will naturally enable conflict management in the context of agile SE.

References

[1] Version One, I. (2015). 9th annual state of agile survey. Last accessed on January 26, 2018 at http://info.versionone.com/state-of-agile-development-survey-ninth.html
[2] van Dijk, R. W. (2011, June). Determining the suitability of agile methods for a software project. In Proceedings of the 15th Twente Student Conference on IT, pp. 1-8.
[3] Eridaputra, H., Hendradjaya, B., and Sunindyo, W. D. (2014, November). Modelling the requirements for big data application using goal oriented approach. In Proceedings of the IEEE International Conference on Data and Software Engineering, November 26th - 27th, 2014, Aula Timur ITB, Bandung (Indonesia), pp. 1-6.
[4] Horkoff, J., Aydemir, F. B., Cardoso, E., Li, T., Maté, A., Paja, E., and Giorgini, P. (2017). Goal-oriented requirements engineering: an extended systematic mapping study. In Requirements Engineering Journal, Springer London, pp. 1-28
[5] Vijayasarathy, L. E. O. R., and Turk, D. (2008). Agile software development: A survey of early adopters. Journal of Information Technology Management, 19(2), pp. 1-8
[6] Horkoff, J., Aydemir, F. B., Cardoso, E., Li, T., Maté, A., Paja, E., and Giorgini, P. (2016, September). Goal-oriented requirements engineering: a systematic literature map. In Proceedings of the 24th IEEE International Conference on Requirements Engineering (RE), 12-16 Sept. 2016, Beijing, China, pp. 106-115. IEEE.
[7] Tenso, T.; Taveter, K. (2013). Requirements engineering with agent-oriented models. ENASE 2013 - Proceedings of the 8th International Conference on Evaluation of Novel Approaches to Software Engineering: 8th International Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2013; Angers; France; 4 July 2013 through 6 July 2013. SciTePress, pp. 254-259.
[8] Sterling, L., and Taveter, K. (2009). The art of agent-oriented modelling. MIT Press.
[9] Sommeville, I. (2010). Software Engineering. London, England: Pearson Education.

Resolving Conflicting Stakeholders Emotional Concern in the Requirements Engineering Process.

Supervisor: Ishaya Gambo (ishaya [dot] gambo [ät] ut [dot] ee)

Software engineering (SE) as a discipline is expected to change, adapt and accommodate what happens around us in the world. So as software engineers, it becomes necessary to recognize the need for engineering the way we live, feel, think, behave, and not just the software. The software is to provide the required services that make a living satisfactorily. In this context, socio-technical systems (STS) are perfect examples of systems that require human satisfaction and technological acceptance. At the same time, requirements engineering (RE) establishes the foundation for a successful system [1, 2].

Both STS and RE are complex and collaborative, involving a larger portion of human involvement concerning their social status and perspective [3, 4]. Finding a conflict-free system within the premise of STS and problem domain in a development process and/or project is difficult because of their complexity and human insatiable needs/desire/expectations.To tackle the complexity issues and arrive at a mutual consensus in stakeholders’ expectations, a thoughtful technique is inevitable that harmonizes the psychological, social and behavioural perspectives during requirements elicitation and analysis. The goal will be to provide a proper understanding of human behaviour, motivation, strategies and be able to harmonize their emotional differences.

From Affective Computing RE perspectives, this master thesis will identify and resolve conflicts in stakeholders' emotional requirements (ER) in such a complex and collaborative setting. The idea is to develop an approach to harmonize the feelings of all involved stakeholders to ensure mutual feelings and satisfaction, especially when it has to do with developing software that is emotionally acceptable [5]. ER in this context are the stakeholders' feelings expressed based on the quality goals of a system. The ER supports the thorough analysis of vague and uncertain expression of stakeholders captured from functional and quality goals or from scenarios and user stories. From a psychological perspective, emotions are constructed in the brain, in concordance with the goals aimed to be achieved. However, different stakeholder can be subject to different emotions that are individually constructed. The possibility for some trickish disagreements (I call this conflict) might arise. Notably, stakeholders express out their emotions/feelings not minding if such feeling disagrees with others. Thus, many difficulties exist for requirements engineers in trying to recommend a standard technique for documenting emotions and resolving arising conflicts. How can we deal with these kinds of conflicts in emotional goals?

References:

[1] Pohl, K. (2010). Requirements engineering: fundamentals, principles, and techniques. Springer Publishing Company, Incorporated.
[2] Gambo, I., Ikono, R., Achimugu, P., & Soriyan, A. (2018). An Integrated Framework for Prioritizing Software Specifications in Requirements Engineering, International Journal of Software Engineering and Its Applications, 12(1), 33-46. DOI:10.14257/ijseia.2018.12.1.03.
[3] Colomo-Palacios, R., Hernández-López, A., García-Crespo, Á., & Soto-Acosta, P. (2010). A study of emotions in requirements engineering. In the World Summit on Knowledge Society (pp. 1-7). Springer, Berlin, Heidelberg.
[4] Capretz, L. F. (2014). Bringing the human factor to software engineering. IEEE Software, 31(2), 104-104, doi:10.1109/MS.2014.30.
[5] Taveter, K., Sterling, L., Pedell, S., Burrows, R., & Taveter, E. M. (2019). A method for eliciting and representing emotional requirements: Two case studies in e-healthcare. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW) (pp. 100-105). IEEE.

Engineering Learning Techniques for Predictive Privacy Modelling and Analysis of Socio-Technical Systems

Supervisor: Ishaya Gambo (ishaya [dot] gambo [ät] ut [dot] ee)

The Master thesis topic is tailored towards addressing and satisfying privacy requirements during the requirements engineering process (REP). The overarching goal is to exploit, support, or mitigate the interplay between privacy concern and human behavior. The focus will be on assets and threats that happen before a system is built. In this regard, understanding the group dynamics of people, especially in a given problem domain (for example, the healthcare system) will be investigated. In context, ascertaining privacy solutions for socio-technical systems (STS) is a difficult and error-prone task because of their heterogeneity and complexity that has limited the traditional requirements engineering (RE) methodologies in terms eliciting or capturing the privacy expectations of stakeholders. To tackle the complexity issues in requirements elicitation, this Master thesis will develop an appropriate learning technique require to balance the various privacy needs from both technical and human perspectives by understanding the group dynamics of people. The data set will comprise of features on how people react and behave under different situations. The goal is to understand privacy concerns and analyze group dynamics using group theory, social norms and social identity theory. The thesis/project will explore on the satisfaction level of building systems that cater for privacy problems. There exist the possibility of using a supervised learning technique with associated learning algorithms to analyze data used for classification, and to provide the correct mapping from input to output. Additionally, suitable techniques, especially the argumentation techniques (the reason for the different alternatives and behaviours) can be used to prove that these systems are correct. It is possible to use the healthcare system or any suitable problem domain as a case study. Summarily, the Master thesis will focus on engineering a technique that learns new privacy policies for dynamic adaptive systems, thereby providing continuous satisfaction of privacy requirements for socio-technical systems (STS).

Recommended References:

1. Thomas, Keerthi; Bandara, Arosha K.; Price, Blaine A. and Nuseibeh, Bashar (2014). Distilling Privacy Requirements for Mobile Applications. In: 36th International Conference on Software Engineering (ICSE 2014), 31 May - 7 Jun 2014, Hyderabad, India.

2. Calikli G, Law M, Bandara AK, Russo A, Dickens L, Price BA, Stuart A, Levine M, Nuseibeh B.(2016). Privacy Dynamics: Learning Privacy Norms for Social Software. Proceedings of the 11th International Symposium on Software Engineering for Adaptive and Self-Managing Systems.

3. Omoronyia, Inah; Cavallaro, Luca; Salehie, Mazeiar; Pasquale, Liliana and Nuseibeh, Bashar (2013). Engineering adaptive privacy: on the role of privacy awareness requirements. In: 35th International Conference on Software Engineering (ICSE 2013), 18-26 May 2013, San Francisco, CA, USA (forthcoming), 632–641.

Investigating the Extent of Architecture-based Testing: industrial practices and needs

Supervisor: Ishaya Gambo (ishaya [dot] gambo [ät] ut [dot] ee)

Software architecture and software testing have been already considered together. However, there is much to be gained from the cooperation between them. The architecture of a system can have a very direct impact on its testability, and the testing can provide valuable feedback to the architectural design process, as well as having a significant impact on the quality of the delivered system. In an attempt to progress the cooperation between these fields, workshops, conferences, and meetings have been organized over the years. For example, the WICSA workshop on “Architecture-Based Testing and System Validation”, the charette session at AST, the IEEE/ACM ICSE international workshop on Automation of Software Test, the ROSATEA workshops on “The Role of Software Architecture for Testing and Analysis”. Still, a systematic study linking those two domains is missing. This Master thesis topic wants to investigate the current role of software architecture in the testing of complex software systems and the role of software testing in the architecting process. The main research question is: what are the industrial practices and needs related to architecture-based testing? Further questions from the main research question includes: (i) do software testers use the system software architecture to plan the testing campaign? (ii) do software testers provide feedback to architects on how to improve the system architecture? (iii) do software architects design the architecture to make it more testable? The goal for the Master thesis is to understand (i) how industries practice software architectures (do they have a software architecture team, how do they define software architecture, how they specify it, …), (ii) how industries practice software testing (do they have a software testing team, how do they define a software testing plan, automation and tools, …), (iii) if there is any link between those two roles, (iv) what industry thinks their needs are, and so on. I have outlined the guidelines on how to conduct the study for anyone interested in this topic.

Scenarios and semantic support/description (ontologies) in requirements engineering

Supervisor: Ishaya Gambo (ishaya [dot] gambo [ät] ut [dot] ee)

Using ontologies in requirements engineering activities (RE) activities is beneficial to the industry and academia as a whole. Some of these benefits include addressing and overcoming the problems of ambiguities, inconsistencies and incompleteness of requirements. The interest in this Master thesis topic is on scenarios with semantic description (ontologies). The idea is to describe the text of the scenarios in a not ambiguous way, that is, with some ontology, that will provide a precise definition of the terms and relationship between the terms. Of particular interest, the Master thesis will come up with a strategy for writing scenarios describing a given problem domain. The goal is to develop a technique to facilitate RE activities, such as elicitation, analysis, specification, validation and management of requirements in a large collaborative design. The focus will be on a given problem domain and case study.

Requirements Engineering for Self-Driving Cars

Tahira Iqbal, Junior Research Fellow, tahira [dot] iqbal [ät] ut [dot] ee

Description: The requirements engineering (RE) phase plays a vital role in software engineering. The project success and failures reported in past studies support the importance of the RE. The ongoing research boom is self-driving cars and autonomous systems. The current research foresees that highly automated driving (HAD) will be available not only for prototypical cars but also for production vehicles. With the advancement of automated driving, the requirements considerably change for this domain. Therefore, we would like to focus on the role and practices of RE in self driving cars. In the thesis, our research question (RQ) will be:

RQ1: What are the current practices in the industry for developing self-driving cars related software applications? RQ2: How the current practices differ from the state-of-the-art practices in RE?

The student will initially read the existing literature and later conduct empirical research such as interview study and survey to answer the RQs, as mentioned earlier.

Cross-Lingual Linking of Texts Using News Media

Raul Sirel (TEXTA) and Rajesh Sharma (rajesh [dot] sharma [ät] ut [dot] ee)

Description: Same topics are often covered in multiple publications and languages: e.g. protests in Belarus and Hong Kong or COVID-19 pandemic are popular topics all over the world. The aim of the project is to link together news articles in different languages talking about the same events and topics. The linked results are then used to build supervised classification models for topic analysis for identifying topics in multiple languages. Possible solutions include transfer learning, machine translation, cross-lingual embeddings etc. The project will be conducted using news media articles from multiple European news agencies, including Sputnik and Euronews.

Media Perception of events and personalities across borders.

Raul Sirel (TEXTA) and Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Description: Media is often attributed for creating a perception of various events, personalities and brands to name a few. In this thesis, we will analyse a large collection of corpus from multiple European news agencies, including Sputnik and Euronews, related to various entities (for example, Belarussian protests, Apple as a brand, public figures such as politicians etc). The thesis will investigate how different media houses perceive entities (topics, personalities, brands, etc). A cross cultural analysis using NLP(sentiment analysis, topic modeling etc) will be employed. Dataset will be provided.

Understanding users' preference for languages on online social media

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Tymofii Brik (Kyiv School of Economics)

Description: Common knowledge suggests that users prefer a particular language to communicate online. It could be a native language or an international "lingua franca". Sometimes, switching between a native and international language is easy, e.g. when languages belong to the same family (Roman, German, Slavic). However, sometimes users switch to very distant languages. The goal of this thesis is to investigate two patterns:

When people use other alphabets for transliteration of their native words
When people use other alphabets to communicate universal symbols (citations, memes, references)
When people use other alphabets to genuinely speak a foreign language.

Dataset will be provided. The anonymised dataset is from a Facebook page dedicated to Euromaidan revolution. As the posts and comments are in Ukrainian and Russian. Thus, the knowledge of these languages is preferred.

Understanding social media users and their relation with hate speech

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Tymofii Brik (Kyiv School of Economics)

Description: Recently it has become relatively easy to detect hate speech. There are libraries of obscene words and negative connotations that are used to train models. However, little is known about how people change their online behavior after they were exposed to hate speech? We suggest exploring the following questions:

Do people react to hate speech or ignore it
Do people distinguish shades and flavors of hate speech
Do people adopt and apply hate speech himself and whether this is sticky

Masculinity as an indicator of Corruption or Voting behavior ?

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Tymofii Brik (Kyiv School of Economics)

Description: Recent papers identified that obesity is a significant predictor of corruption in post-Soviet countries (Blavatskyy, 2020). Other researchers showed that facial masculinity by the facial width-to-height ratio (fWHR) of politicians correlates with voters' support (Talavera et al., 2019). Yet, little is known if politicians themselve respond to visual clues when collaborating with other politicians. We propose a project to collect data of fWHR of Ukrainian politicians from open sources and correlate it with the available data of legislative behavior. The data of legislative behavior are vast and include the "roll-call voting matrix" where each row (observation) is a member of a parliament (MP) and each column is the proposed legislation.The data include several years of voting patterns from 1998 till present.

The main question of this project is whether politicians collaborate with each other based on the visual features. This can be addressed by merging two datasets (fWHR) and voting matrix, and using social network analysis techniques to model collaboration as joint voting.

Dataset will be provided.

Predicting Transaction type to be perfomed by a mobile user.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Huber Flores (huber dot flores [ät] ut dot ee)

Description: This projects consists of modeling the different data transferred rate of mobile users based on mobility patterns within a trajectory. Given the following dataset collected (features described below) in the wild, the goal is to estimate the different type of transactions (app usage sessions) and amount of data that can be transferred in a particular transaction type, such that it is possible to predict the transaction type to be perfomed by a mobile user. This prediction is important to extend mobility-based contracts that will ensure that there is enough time to perfom a valid transaction while the user is on the move. Dataset: Dataset features (CellularTraffic_OneWeek is the traffic data collected by an ISP from Shanghai between Aug 1st and Aug 7st 2014). For security reasons, the ID of devices and base station ID are all anonymized.

Topic: Exploring Group Mobility.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Huber Flores (huber dot flores [ät] ut dot ee)

Description: While mobility of individual devices has been widely explored, group mobility is less well understood. In this thesis, we will analyze a large scale dataset to understand group mobility by analysing trajectories of users. Our goal is to identify groups of users that move together between different points of interests in a city. For instance, between train station and a residential area. Dataset will be provided

Behaviour analysis of city users: biker, pedestrians and public transport

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Flavio Bertini (University of Bologna, Italy)

Description:Bella Mossa is a program of the City of Bologna which promotes a healthy lifestyle and sustainable mobility. In 2017, the program collected information on the transport habits of people. The data set contains mobility data (latitude, longitude, timestamp) of different transportation means for the period from April 1, 2017, to September 30, 2017. In particular, the activity types include Bus, Car Share, Cycle, Train and Walk. During 6 months of the experiment, there were over 15,000 unique users of the program and 3.7 million km was covered by them. By using this data, we would like to perform several descriptive analysis to study each single activity type and compare them together. Some partial list of goal: 1) analysis of the mobility behaviour of different users: when, where, distance, duration , 2) analysis of the mobility pattern of different users, also taking into account the different areas of the city, 3) comparison among activity types (e.g., "within that area the bike is faster than the car") and extract the road network for each of them

Explainable Model for identifying Fake news/Rumor detection on Social Networks

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Shakshi Sharma (shakshi dot sharma [ät] ut [dot] ee)

Description: With the advancement in neural networks techniques, their demand has found its place in various fields. Neural networks are often termed as the “black-box” models. However, within the field of AI, researchers have started investigating explanations behind every decision made by the “black-box” models, which is called Explainable AI (XAI). In other words, researchers would like to interpret the models decisions especially in human-understandable form.

Social networks (such as Twitter, Tumblr, Facebook etc) have become default platforms for exchanging information at a rapid pace. However, it may lead to misinformation spread which can greatly impact the society. Therefore, it is very crucial to detect misinformation, which can be categorized into two classes - Fake News and Rumor. In order to detect misinformation in social networks, various AI models have been proposed. However, it is also important to know why a particular model has detected a particular post (or user) as fake news (or rumor). In regard to this, in this thesis, we plan to use generative adversarial neural network approach in order to interpret the decision made by the model in human-understandable form. Dataset will be provided

Identifying insincere questions on Quora using AI

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Shakshi Sharma (shakshi dot sharma [ät] ut [dot] ee)

Description: Quora is a crowd-sourced question answering website that allows users to build reputation and enhance one’s intellectual level. In Quora, sometimes users post insincere questions. An example of insincere question could a be question which is asked with an intention to make a statement rather than a question being posted to seek genuine answers. These insincere questions are crucial to be detected and removed from the website in order to allow users to get a fruitful experience. This also leads to removal of the excess traffic (or noise) from the website. In this thesis, we will explore NLP and Deep Learning based techniques for identifying the score of insincere questions. In addition, we are interested in analyzing top categories of questions that are mostly (or never) answered by the users. Dataset will be provided.

Logical querying the Knowledge Graph

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Mohit Mayank (TCS Research)

Description: Graph Neural Networks (GNN) is a hot and trending field with applications cutting across domains like recommendation systems, knowledge completion, question answering, etc. Much of the research in this field has been to create a continuous embedding of nodes, edges, subgraph or even complete graph in supervised or unsupervised fashion for the required downstream application. Taking the case of Question-Answering in Knowledge Graph, historically similar embeddings were used to facilitate better traversal of the graph to find answers. The goal of this research is to investigate and advance the recent paradigm shift of embedding the questions (and not nodes/edges) in the continuous space to find the answer. This moves away from graph traversal techniques, which are computationally expensive and have low accuracy for incomplete knowledge graph.

Visualisation of Code Evolution

Supervisor: Kristiina Rahkema (kristiina [dot] rahkema [ät] ut [dot] ee)

Take analysis of app through different versions and visualise the change of the app through time. Visualisation should display how classes are added, how they grow, shrink, how methods are added. Data for the tool is provided as a neo4j database. The user should be able to move backward and forward in time. The idea behind the tool is to help understand how a project evolves.

Investigating The Correlation Between Structural Information and Resource Utilization of Functionally Equivalent Android Apps.

Supervisor: Hina Anwar (hina [dot] anwar [ät] ut [dot] ee)

On app stores, many similar apps are available that offer similar functionalities to the user and that could be treated as alternatives. For example, there are many to-do list apps, calculator apps, alarm clock apps or search engines apps that offer similar functionalities. By functional equivalence, we mean that two apps provide similar but not identical functions/utilities to users. In functionally equivalent apps, the execution paths could be slightly different due to user event variation in GUIs, indicating possible variation in resource utilization. The goal of this thesis is to extract functionally equivalent apps from app stores and assess their GUI and method similarity. Further, for the functionally equivalent app pairs investigate the correlation between overall product structural information and resource utilization. Structural information captured via CK, MOOD and Martin metrics and resource utilization including battery/ memory/ CPU consumption captured via ADB logs. Such analysis could be useful in developing services to recommend functionally equivalent alternatives to app users based on their resource utilization.

References:

S. Almrayat, R. Yousef, and A. Sharieh, “Evaluating the Impact of GUI Similarity between Android Applications to Measure their Functional Similarity,” Int. J. Comput. Appl., vol. 178, no. 21, pp. 31–38, Jun. 2019.
V. F. Taylor and I. Martinovic, “SecuRank: Starving permission-Hungry apps using contextual permission analysis,” in SPSM 2016 - Proceedings of the 6th Workshop on Security and Privacy in Smartphones and Mobile Devices, co-located with CCS 2016, 2016, pp. 43–52.
C. Kin Keong, K. Tieng Wei, A. A. Abd. Ghani, and K. Y. Sharif, “Toward using software metrics as indicator to measure power consumption of mobile application: A case study,” in 2015 9th Malaysian Software Engineering Conference (MySEC), 2015, pp. 172–177.

Developing an Android Studio Plugin to Detect Energy Patterns in Android Apps

Supervisor: Hina Anwar (hina [dot] anwar [ät] ut [dot] ee)

It is important to give knowledge to the developers about how they can modify the code in order to make energy-efficient apps. In previous research, many energy patterns have been identified. Many custom tools are also available to detect and refactor some of these energy patterns. However, many of these tools lack IDE integration; therefore, it is rather tricky for developers to use them. There are tools such as “Leafactor” which are available as plugins for Eclipse IDE. However, no such tool exists for Android Studio IDE (which is the official IDE for Android app development). The goal of this thesis is to develop an Android Studio plugin that could detect energy patterns (listed in web catalogue “Energy patterns for Mobile apps”) and also recommend corrections for those energy patterns (as listed in the web catalogue).

References:

Luis Cruz, Rui Abreu and Jean-Noel Rouvignac (2017). Leafactor: Improving Energy Efficiency of Android Apps via Automatic Refactoring. In IEEE/ACM International Conference on Mobile Software Engineering and Systems, MobileSoft 2017.
Luis Cruz and Rui Abreu (2019). Catalog of Energy Patterns for Mobile Applications. Journal of Empirical Software Engineering.

Exploration of the World of Code (WoC) Infrastructure for Mining the Universe of Open Source VCS Data - Focus: Quality of Mobile Application Code

Supervisor: Dietmar Pfahl (dietmar [dot] pfahl [ät] ut [dot] ee)

WoC is an "Infrastructure for Mining the Universe of Open Source VCS Data" and may be used to investigate code properties in open source projects. The main goal of this thesis is to familiarize with the WoC infrastructure and to describe the available data from mobile app development projects (with regards to programming language, development platform, code evolution, and so on). Once the overview is available, specific code analysis tasks for defined target codes (e.g., Swift code) will made. The exact goals will be defined together with the student once it is clear how much time is available after the familiarization phase. Information about the WoC project can be found here:

From Fault Injection to Fault Removal: Understanding Bug History through Visualization

Supervisor: Dietmar Pfahl (dietmar [dot] pfahl [ät] ut [dot] ee)

Fault-fixing activities are usually detectable in project repositories by searching for "bug-fixing" commits. In order to better understand why and when a "bug" (fault) was introduced into the code, the corresponding fault-inducing commit needs to be found. This problem has been solved with the help of the Sliwerski-Zimmermann-Zeller (SZZ for short) algorithm, that is, an algorithm based on the annotation/blame feature of version-control systems. Researchers in Finland have recently published the OpenSZZ tool, a free, open-source, web-accessible implementation of the SZZ algorithm. The goal of this thesis project is to develop a visualisation component on top of OpenSZZ. With the help of the visualisation component a range of question shall be answered and made easily visible by browsing through the visualisations. Questions could be, for example, how long it takes for certain types of faults to be fixed (depending on type of fault, code context, and other properties). A student who takes this topic should be a good programmer and must be interested in quality analysis and visualisation. Potential code repositories could be found via the World of Code (WoC) infrastructure or other public repositories.

Information on OpenSZZ can be found here:

Paper on OpenSZZ

Information about the WoC project can be found here:

Case Study in Software Testing or Software Analytics (focus on software quality)

Supervisor: Dietmar Pfahl (dietmar dot pfahl ät ut dot ee)

This is a "placeholder" Masters project topic, which needs to be negotiated individually. If you work in a IT company and you are actively engaged in a software testing or software analytics, or if you can convince your hierarchy to put in time and resources into such a project in the near-term, we can make a case study out of it. We will sit down and formulate concrete hypotheses or questions that you investigate as part of this project, and we will compare your approach and results against state-of-the-art practices. I am particularly interested in supervising theses topics related to mutation testing, testing of embeded software, testing safety-critical systems, security testing of mobile apps, anlysis of project repositories to make software development processes more efficient and effective, but I welcome other topic areas.

Team size and communication channels in open source (booked)

Ezequiel Scott (ezequiel [dot] scott [at] ut [dot] ee)

Open-source projects often involve hundreds of contributors that vary from one iteration to another. Fluctuations in team-size affects productivity and it also impacts on communication: the more people involved, the more communication is needed to coordinate tasks. In this thesis, you will investigate several questions in this regard such as how teams manage their communication in open-source projects, what practices and tools are used to address communication, how team-member longevity affect communication, and how teams manage communication overhead.

References:

A dashboard to visualize the product quality level (booked)

Ezequiel Scott (ezequiel [dot] scott [at] ut [dot] ee)

Product owners need to understand the product quality level, in a synthetic and intuitive way to facilitate the decision of accepting or rejecting the requirements completed during the iterations. Even though developers can use information from different sources to understand the software quality, there is no agreement on what information is needed for a product owner. The goal of this thesis is to develop a dashboard that will allow product owners to visualize the completed requirements along with quality metrics for the different iterations of a software product. The dashboard first requires defining a set of metrics that are useful for product owners and determine how to visualize them in a proper way. The metrics can be simple to calculate as test coverage or user defined. Once metrics are defined, then it is required to develop connectors and interfaces to obtain the measurement values from platforms such as Sonarqube, Jenkins, TravisCI, CircleCI, GitHub, among others. Finally, these values must be visualized in a web dashboard.

References:

Compliance checking of Agile Practices (booked)

Ezequiel Scott (ezequiel [dot] scott [at] ut [dot] ee)

Software development teams add agility to their processes by implementing agile practices and frameworks, and the correct execution of them creates artefacts and traces in the development environment. This wealth of information that can be used to compare the actual practices against the practises defined by the frameworks and, then, suggest potential improvements to the process. The goal of this thesis is to apply compliance checking techniques to data extracted from software development environments. You will first model current agile practices and define the related potential data sources. Then, you will apply existing compliance checking techniques and evaluate the outcomes through case studies.

References: https://dl.gi.de/bitstream/handle/20.500.12116/23621/gi-proc-141-007.pdf?sequence=1 https://www.sciencedirect.com/science/article/pii/S1877050915026150

A Multi-objective Issue Recommender System

Ezequiel Scott (ezequiel [dot] scott [at] ut [dot] ee)

In agile software development, issue allocation is often based on self-assignment. That is, developers choose the issues (e.g. user stories, bug reports) that they will develop during the sprint based on their own preferences and experience, which can be difficult for non-experienced developers. In this context, a recommender system can help developers to choose their issues. However, typical issue recommender systems have the goal of producing a list of issues that optimizes one measure of interestingness, such as accuracy. Suggesting issues that are simultaneously accurate, novel and diverse is much more challenging, since the attempt to improve an additional measure may result in worsening other measures [1]. The goal of this project is to build and evaluate a multi-objective recommender system to aid developers during the assignment of issues.

References:

Ribeiro, M. T., Ziviani, N., Moura, E. S. D., Hata, I., Lacerda, A., & Veloso, A. (2015). Multiobjective pareto-efficient approaches for recommender systems. ACM Transactions on Intelligent Systems and Technology (TIST), 5(4), 53. https://dl.acm.org/citation.cfm?id=2629350
Borg M., Runeson P. (2014) Changes, Evolution, and Bugs. In: Robillard M., Maalej W., Walker R., Zimmermann T. (eds) Recommendation Systems in Software Engineering. Springer, Berlin, Heidelberg https://link.springer.com/chapter/10.1007/978-3-642-45135-5_18

Automatic User Story Splitting (booked)

Ezequiel Scott (ezequiel [dot] scott [at] ut [dot] ee)

User stories should be “small enough” before they are ready for implementation in an upcoming iteration [1]. In practice, many user stories do not satisfy this property and it can introduce misunderstanding during the development. To deal with “large” user stories, practitioners apply patterns for splitting user stories [2] that allow for breaking up one user story into smaller ones. The goal of this project is to build an automatic approach for user story splitting that supports developers during this task. To address this, you will apply NLP and ML techniques to identify user stories that potentially requires splitting and suggest suitable patterns for that. You will work with data taken from several open-source projects.

References: [1] The INVEST criteria. https://xp123.com/articles/invest-in-good-stories-and-smart-tasks/ [2] User Story Splitting. https://www.agilealliance.org/glossary/split/

Implementing DevOps practices: a case study (booked)

Ezequiel Scott (ezequiel [dot] scott [at] ut [dot] ee)

DevOps is an approach that combines development and operations to shorten the development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile Software Development, and it requires to apply a set of practices. The goal of this thesis is to study what the current DevOps practices are, how suitable they are for different contexts, and how to apply these practices in a case study.

References

Maintainability and Management of Data Science projects

Supervisor: Ezequiel Scott (ezequiel [dot] scott [at] ut [dot] ee)

Nowadays, there are many software solutions that incorporate Machine Learning models. The process of creating useful models often requires going through an experimentation pipeline that includes data pre-processing, cleaning, modelling, and evaluation steps. Although there are several libraries that support the creation and experimentation of this kind of pipelines, there have been few studies that explore the maintainability of these applications. Moreover, the integration of the model into production with existing software presents additional challenges. The goal of this thesis is to study the maintainability and the management of these projects from a code perspective (e.g. analysing software design patterns), a methodological perspective (e.g. the use of agile practices), or both.

References:

Several Topics in Software Engineering and Information Systems

Supervisor: Ulrich Norbisrath (ulno [ät] ulno [dot] net)

The topics can be found here: https://ulno.net/advising/#suggestions

Development of the Rules Mining (RuM) toolset

Supervisors: Fabrizio Maggi and Anti Alman (firstname [dot] lastname [ät] ut [dot] ee)

Rule mining is focused on the analysis and optimization of business processes using rules that the process is expected to fulfil. In this project, you will work on extending the Rules Mining toolset (RuM), which is developed at University of Tartu in collaboration with other universities. We invite you to have a look at the website. If you are interested in this topic, we can offer you to develop several new features of RuM for your Masters thesis, like for example a module for detecting and visualizing violations of business rules in a user-friendly manner. Knowledge of Java is required.

Extending the Nirdizati Predictive Process Monitoring Engine

Supervisors: Fabrizio Maggi and Anti Alman (firstname [dot] lastname [ät] ut [dot] ee)

Predictive process monitoring is concerned with leveraging historical process execution data to predict how running (uncompleted) cases will unfold up to their completion. Historical data is given as input to a machine learning method to train a predictive model that is queried at runtime to predict a process outcome. A predictive model can also be used to provide, together with predictions, also recommendations to the user on what to do to minimize the probability of a negative process outcome. In this thesis project, we will work on the development of Nirdizati (http://nirdizati.org/nirdizati-research/) a predictive process monitoring web application for validating and comparing the performance of different predictive models on the same dataset. If you are interested in this topic, a thesis project can be developed in different directions and can be focused on engineering tasks related to the development of existing predictive process monitoring approaches in Nirdizati or research tasks related to the development of novel predictive process monitoring approaches in the same application. Knowledge of Python and of data science is required.

Discovering Action Recommendation Rules to Enhance Business Process Performance

Supervisor: Marlon Dumas (marlon [dot] dumas [ät] ut [dot] ee)

In this project, you will develop a tool that takes as input data about past executions of a business process and that produces as output a ranked list of recommendation rules to improve the performance of the process. For example, your tool will recommend actions that can help to reduce undesirable outcomes (e.g. rules that reduce the number of customer complaints or that reduce the percentage of Service Level Agreement violations). The tool will generate rules of the form Condition-Action (C-A), with the following interpretation: When the condition C holds (e.g.\ when a customer visits our Web site more than twice within a 24-hours period), then action A should be triggered (e.g. we should send an email to the customer to offer them a discount on their next purchase). You will tackle the challenge of discovering rules that have an optimal effect on a performance measure, rules that maximize total revenue, that minimize customer complaints or that maximize customer satisfaction score. To achieve this goal, you will use existing techniques for extracting causal association rules. You will analyze and experiment with different techniques, possibly do some adaptations to these techniques, and run lots of experiments with real-world data and use cases. You will then wrap up the best techniques you find into a tool. If you want to have a look at the types of techniques you will be using, you can check the following two papers:

Process Mining for Minimizing Waste in Business Processes

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Organizations have and continue to work with improving their business processes. In the past decades, analysts have produced an impressive set of frameworks and methods for systematically analyze and improve business processes. One of these is lean thinking that aims to remove wastes in busines processes. Waste identification and elimination is still a predominantly manual work conducted by process analysts. In the last decade, process mining techniques have been developed that allow for data driven analysis of business processes. However, there is still a gap between waste management in processes that require manual analysis and process mining techniques. The objective of this thesis is to reduce this gap. More specifically, the objective is to support business process improvements by employing process mining. This thesis will require the student to build an understanding of lean management, examine more closely how improvement opportunities (wastes) are identified, how they can be eliminated, survey existing process mining techniques that can support such activities, develop a new process mining algorithm (using, modifying, and extending existing process mining methods), and, finally, evaluate the output. This thesis topic is similar to the following publication: https://link.springer.com/article/10.1007/s12599-020-00649-w

Framework for Managing Positive Deviances in Business Processes with Process Mining

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

When seeking to improve their business processes, companies look at the relevant top performers. By examining the processes of top performers, one learns what they do that make them better. This is sometimes referred to as positive deviance. Systems log data that can be used for data-driven benchmark analysis. This thesis topic is about eliciting and defining positive deviances in business processes that can be detected from event logs. The work requires the student to build up an understanding of benchmarking, survey existing work on the topic, extract, analyze, and synthesize data into a framework, identify process mining techniques that can support such activities, possibly develop a new process mining algorithm (using, modifying, and extending existing process mining methods), and, finally, evaluate the output.

Business Process Improvement using Process Mining – A Case Study

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Process mining can be used to conduct data-driven analysis of business processes. This thesis is about doing exactly that for a company. The topic will require preparation of the execution log for certain processes, surveying process mining use cases relevant and useful for this particular case, analysis of processes using the Apromore process mining tool, discuss the results with domain experts, identify and propose process improvements, and evaluate the results. This topic is dependent on access to data.

Robotic Process Automation – A Case Study

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Robotic Process Automation (RPA) has recently been gaining popularity. With RPA, manual work can be automated without requiring heavy investment in IT development. However, it is not straightforward which process to select, what RPA tool to use, and how to best add value. This topic is about applying a framework for identifying candidate processes/process fragments, surveying suitable RPA tools, implement RPA for a few processes, evaluate the results, and draw conclusions. This topic is dependent on access to data.

Identifying Business Process Improvement Opportunities from Event Logs

Supervisor: Katsiaryna Lashkevich katsiaryna [dot] lashkevich [at] ut [punkt] ee and Fredrik Milani (milani [ät] ut [dot] ee)

Organizations have and continue to work with improving their business processes. In the past decades, analysts have produced an impressive set of frameworks and methods for systematically analyse and improve business processes. Business process improvement and redesign is still a predominantly manual work conducted by process analysts. In the last decade, process mining techniques have been developed that allow for data driven analysis of business processes. However, most of the techniques provides descriptive analysis of the business process. Thus, process analysts have to examine the results and identify where the process can be improved. This thesis serves the objective of identifying, as opposed to merely presenting analytics, business process improvements opportunities in the business process. This thesis will require the student to build an understanding of a framework, examine more closely how improvement opportunities are identified, how they can be redesigned, survey existing process mining techniques that can support such activities, develop a new process mining algorithm (using, modifying, and extending existing process mining methods), and, finally, evaluate the output.

The above topic will focus on one or few aspects of a business process (activity, control-flow, data-objects, resources, process fragments, variants etc.).

Blockchain Capabilities for Business Process Redesign

Supervisor: Fredrik Milani

Blockchain technology has emerged as a disruptive technology and while it is not receiving the same level of attention as before, companies are seriously examining its use to improve their business processes. However, it is still not clear for what processes it is best applied to and when it can enable what kind of redesign. This thesis aims at exploring this topic by conducting a systematic literature review to identify what capabilities of blockchain technology are relevant/applicable for business processes, what processes can be redesigned, and how processes can be redesigned if powered by blockchain technology. The thesis requires to develop an SLR protocol, conduct the SLR, analyze the papers, and elicit a framework that addresses the above stated questions.

Business Process Analysis with Process Mining – Case Study

Supervisor: Fredrik Milani

This topic is for students that have access to event logs from industry and which to conduct a business process project to discover, analyze, and improve business processes at a company. For this topic, you will have to extract a sample log so we can be sure process mining can be applied. This thesis topic requires of a student to apply parts of the BPM life cycle on a case. Thus, the thesis requires using process mining to discover business process models, conduct analysis (both manual and data driven), and propose redesign of the business process. The results are then evaluated by either implementing and assessing the changes, simulation, or interviews.

Thesis Topics from Information Security Team:

1. Modelling Languages for Blockchain Applications

Supervisors: Mubashar Iqbal and Raimundas Matulevičius

Contact: rma ät ut dot ee

While designing and developing blockchain applications (dApps) developers need to deal with the principles of distributed ledger, chains of blocks, smart contracts, crypto-hashes and other domain specific concepts. However the standard modelling languages (e.g., BPMN, UML, Archimate) does not contain constructs to represent dApp components. Although there exists a few attempts to enrich the modelling languages, but these mainly result in the model annotations and they do not include systematic extensions of the modelling language. The main goal of this topic is to develop the semantic and syntactic modelling constructs to support modelling of the blockchain applications. The main steps of the research include:

Review the literature for the blockchain application modelling
Define the dApp modelling domain
Develop the semantics, concrete and abstract syntax for the dApp modelling (this could be done either as extensions of the existing standard languages or as a proposal of the new modelling language)
Illustrate feasibility of the proposal in the dApp modelling example.

2. Determining process compliance to GDPR using process logs

Supervisors: Mari Seeba and Raimundas Matulevicius

Contact: rma ät ut dot ee

Process models are our desires of how processes should perform. We plan our processes to be compliant with GDPR. However, the reality could be different. How could it be possible to evaluate the real process compliance to GDPR models? Does process mining give us time-saving to find non-conformities? What should be the requirements of the event logs of the personally identifiable information processing system? For this research, a student can use his/her own company event log data, refine the logs, refine the processes and assess the process compliance to GDPR business process compliance model using process methods.

3. Information System Security Risk Management in the Autonomous Driving Vehicles

Supervisors: Abasi-Amefon O. Affia and Raimundas Matulevicius (rma [ät] ut [dot] ee)

Contact: rma ät ut dot ee

Autonomous driving vehicles characterise a complex cyber-physical system. It uses a network, sensors, and electronic control unit (ECU) to control functions of the vehicle and to connect this vehicle to other system entities (e.g., other connected vehicles, road side equipment, and traffic management centres). This way it exchanges the information about the car location, environment, direction, condition of driving, and information necessary for vehicle’s device control. However, such a system could suffer from various security risks. For example, an attacker could establish a connection between the attacker’s device and target vehicle. Security risks could be mitigated by limiting the VMM port functionality, by monitoring the incoming information and by blocking the abnormal requests/services. The goals of this topic are:

Explain the system and business assets in the autonomous driving vehicles;
Assess the security risks in the autonomous driving vehicles;
Analyse the trade-offs in order to define the best suited countermeasures to mitigate these risks.

To reach the above goals you would use the information systems security risk management approach combined with the model-driven and data analysis methods. The approach includes: (1) systematic explanation of the architecture of the connected autonomous vehicle, thus resulting in the models for the system and business assets; (2) definition of security needs (e.g., regarding the vehicle’s tire pressure data, fuel level data, braking service, gearing service, information in emergency situation, infotainment services, firmware, and etc.); (3) systematic analysis and estimation of the security risks using the data analysis methods; (4) reasoning and taking the security risk treatment decision; (5) elicitation of the security requirement; (6) recommendation to implement security controls regarding the secure network services, communication, data privacy, secure software/firmware, physical security, access control, data input, fault tolerance, and others.

Case Study in Using Fractal Enterprise Model in organizational practice

Supervisor: Ilia Bider (ilia12b at gmail dot com)

This is a "placeholder" Masters project topic, which needs to be negotiated individually. The topic is best suited to students who have a full-time or part-time job and who would like to have a topic connected to their work place. Fractal Enterprise Model (FEM) is a relatively new advance modeling technique that compete with other techniques used in Enterprise Architecture/Modeling world. It shows connection between different components in an organization and can be used for business analysis and design on various levels, including the strategic one, like Business Model Innovation (BMI). The topics of your project can range from finding what for FEM can be used in your organization to developing a new Business Model for your organization. You need to convince your hierarchy (managers) to put in time and resources into such a project in the near-term. Ideally, your project should be connected to some problem/challenge that already understood by the managers, as beside your own time you might need to ask for engaging other people in your organization, e.g. for conducting interviews. A successfully complete project may result in published paper later.

References

https://www.fractalmodel.org/ - a site under construction that has some resources on FEM
Bider I., Chalak A. (2019) Evaluating Usefulness of a Fractal Enterprise Model Experience Report (an example of a publish paper resulting from an MSc thesis project)
Bider, I., Lodhi, A. Moving from Manufacturing to Software Business: A Business Model Transformation Pattern (an example related to Business Model Innovation)

---

Supervisor: Yar Muhammad (Yar [dot] Muhammad [ät] ut [dot] ee)

Expanded Work on Development of EEG-Based BCI Application Using Machine Learning to Classify Motor Movement and Imagery

Level: Bachelor/Master

Supervisor: Yar Muhammad (Yar.Muhammad@ut.ee), Co-supervisor: Naveed Muhammad

A brain-computer interface (BCI) is a system that implements human-computer communication by interpreting brain signals. The signals can be recorded through different neuroimaging techniques that can read brain activity, such as electroencephalography (EEG).

The goal of BCI technology is to enable the user to communicate with or control an external device using their mind. BCIs are widely used in medicine to help patients with limited motor abilities to communicate with their environment. However, there are many challenges faced when building a BCI capable of classifying the subject’s intention, such as the highly individualized nature of brain waves, which makes the development of a universal classifier difficult.

This work is aimed to develop a better electroencephalography (EEG) based machine learning classifier model capable of cross-subject motor movement and imagery classification and to build a BCI system to validate the performance of the developed classifier. The classifier was based on convolutional neural networks (CNN) with a multi-branch feature fusion approach. The classifier was developed using Tensorflow machine learning framework, the BCI system was developed in the Python programming language using the PyQT framework, and the Emotiv EPOC EEG device was used for signal collection.

The resulting classifier was tested on a publicly available dataset of 103 subjects. The classifier achieved an accuracy of 84.1% when predicting executed left- or right-hand movement and an accuracy of 83.8% when predicting imagined left- or right-hand movement.

The aim of the thesis is to extend the work in order to improve the accuracy of existing algorithm by using different approaches and techniques such as (but not limited to):

• Explore alternative models for the task at hand. More thorough comparison with state of the art.

• Test and evaluate n-fold validation

• Thorough investigation on transfer learning

• Use more intuitive/visual ways to generate results.

• Investigate the usage of a better signal acquisition device

• Real-time performance of an example task using the developed system

For a more detailed discussion on the above aspects, please refer to [1].

Some relevant literature:

[1] Karel, Roots, Yar Muhammad, Muhammad Naveed, “Development of EEG-Based BCI Application Using Machine Learning to Classify Motor Movement and Imagery”, Bachelor Thesis, 2020, University of Tartu (https://comserv.cs.ut.ee/ati_thesis/datasheet.php?id=69742&year=2020)

[2] Karel, Roots, Yar Muhammad, Muhammad Naveed, “Fusion Convolutional Neural Network for Cross-Subject EEG Motor Imagery Classification”, In Journal of Computers 2020, 9 (3), 72; Machine Learning for EEG Signal Processing, September 5, 2020 (https://doi.org/10.3390/computers9030072).

[3] Software Download and Installation Instructions Link: https://github.com/rootskar/MotorImageryBCI

Systematic Literature Review on Smart Homes Using Brain Computer Interface (BCI)

Level: Bachelor/Master

Supervisor: Yar Muhammad (Yar.Muhammad@ut.ee), Co-supervisor: Naveed Muhammad

Smart homes have been an active area of research, however despite considerable investment, they are not yet a reality for end-users. Moreover, there are still accessibility challenges for the elderly or the disabled.

The number of aged and disabled people has been increasing worldwide. To look after these people is challenging. Latest communication technologies can be helpful in this regard. The smart home and medical systems are a predominant concept in research and development, specially utilizing the brain-computer interface (BCI) technology to control daily use appliances, for example. BCI acquires the brain signals that transmit to a digital device for analyzing and interpreting into further command or action. In this thesis work, you will survey BCI application targeted to smart-home environments.

Some relevant literature: [1] Kosmyna N, Tarpin-Bernard F, Bonnefond N, Rivet B. Feasibility of BCI Control in a Realistic Smart Home Environment. Front Hum Neurosci. 2016;10:416 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4999433/)

[2] W. T. Lee, H. Nisar, A. S. Malik and K. H. Yeap, "A brain computer interface for smart home control," 2013 IEEE International Symposium on Consumer Electronics (ISCE), Hsinchu, 2013, pp. 35-36 (https://ieeexplore.ieee.org/document/6570240)

[3] Syed Rehan Abbas Jafri, Tehreem Hamid, Rabia Mahmood, Muhammad Asjad Alam, Talha Rafi, Muhammad Zeeshan Ul Haque & Muhammad Wasim Munir, “Wireless Brain Computer Interface for Smart Home and Medical System”, Wireless Personal Communications volume 106, pages2163–2177(2019), (https://doi.org/10.1007/s11277-018-5932-x)

[4] Masood, Muhammad Hammad et al. “BRAIN COMPUTER INTERFACE BASED SMART HOME CONTROL USING EEG SIGNAL.” (2016)

Systematic Literature Review on Trends for the future and uses of Brain Computer Interface (BCI) Applications

Level: Bachelor/Master

Supervisor: Yar Muhammad (Yar.Muhammad@ut.ee), Co-supervisor: Naveed Muhammad

Many institutes in the world have selected the (brain-computer interface; BCI) as a promising and important technology. The BCI is a technology that makes a subject control robots or computers using brain signals without movements. Using the BCI technology, patients with paralysis can type characters to express their thought, drink a cup of water by controlling robot arm, and move around by controlling electrical wheelchair. Moreover, BCI is useful in general because it can be used for user interface for various electrical devices. In this study, focus will be on future trends and possible uses for BCI applications. The following points will be explored in the thesis:

• Future trends of BCI Applications

• Challenges and Opportunities for the BCI Applications

• BCI commercialization (current status, further possibilities)

• Collaboration between companies and academia to develop and manufacture BCI devices

• Different potential medical and non-medical uses of BCI

• Combination of different technologies (e.g. EEG, virtual reality, fMRI)

• Summarize principles of the BCIs and discuss pros and cons of the technologies. Moreover, recent BCI studies and the future direction of the BCI research will be discussed.

Some relevant literature:

[1] Aricò P, Borghini G, Di Flumeri G, Sciaraffa N, Babiloni F. Passive BCI beyond the lab: current trends and future directions. Physiol Meas. 2018;39(8):08TR02. Published 2018 Aug 29. https://iopscience.iop.org/article/10.1088/1361-6579/aad57e

[2] H. G. Yeom, "Trends and Future of Brain-Computer Interfaces," 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), Toyama, Japan, 2018, pp. 785-788, doi: 10.1109/SCIS-ISIS.2018.00130. https://ieeexplore.ieee.org/document/8716189

[3] Brain Computer Interface Market by Type (Invasive BCI, Non-invasive BCI and Partially Invasive BCI), Application (Communication & Control, Healthcare, Smart Home Control, Entertainment & Gaming, and Others): Global Opportunity Analysis and Industry Forecast, 2020–2027 (https://www.alliedmarketresearch.com/brain-computer-interfaces-market)

[4] Brain Computer Interface (BCI) Market: Global Industry Analysis and Opportunity Assessment 2017-2027 (https://www.futuremarketinsights.com/reports/brain-computer-interface-bci-market)

[5] Brain Computer Interface (BCI) Market 2020 Business Revenue, Future Growth, Trends Plans, Top Key Players, Business Opportunities, Industry Share, Global Size Analysis by Forecast 2025 by Market Reports World (https://www.marketwatch.com/press-release/brain-computer-interface-bci-market-2020-business-revenue-future-growth-trends-plans-top-key-players-business-opportunities-industry-share-global-size-analysis-by-forecast-2025-by-market-reports-world-2020-07-23)

A Systematic Literature Review on Classification Algorithms for EEG-based Brain Computer Interfaces (BCI)

Level: Bachelor/Master

Supervisor: Yar Muhammad (Yar.Muhammad@ut.ee), Co-supervisor: Naveed Muhammad

Brain-Computer Interface (BCI): devices that enable its users to interact with computers by mean of brain-activity only, this activity being generally measured by ElectroEncephaloGraphy (EEG).

Electroencephalography (EEG): physiological method of choice to record the electrical activity generated by the brain via electrodes placed on the scalp surface.

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data.

Most current electroencephalography (EEG)-based brain-computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field. Many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs.

In this study you will survey the BCI and machine learning literature from 2015 up-to-now to identify the new classification approaches that have been investigated to design BCIs. To synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons.

The result of study will provide a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.

Some relevant literature:

[1] Lotte F, Bougrain L, Cichocki A, Clerc M, Congedo M, Rakotomamonjy A, Yger F., A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update, J Neural Eng, June 2018

[2] Geeta Sharma, Neha Sharma, Tanya Singh, Rashmi Agrawal, A Detailed Study of EEG based Brain Computer Interface, Proceedings of the First International Conference on Information Technology and Knowledge Management pp. 137–143, 2017

[3] CarmenVidaurre, ClaudiaSannelli, WojciechSamek, SvenDähne, Klaus-RobertMüller, Machine Learning Methods of the Berlin Brain-Computer Interface, IFAC-PapersOnLine, Volume 48, Issue 20, 2015, Pages 447-452

[4] Ewan S. Nurse, Philippa J. Karoly, David B. Grayden and Dean R. Freestone, A Generalizable Brain-Computer Interface (BCI) Using Machine Learning for Feature Discovery, PLoS One. 2015

[5] Natasha Padfield, Jaime Zabalza, Huimin Zhao, Valentin Masero, and Jinchang Ren, EEG-Based Brain-Computer Interfaces Using Motor-Imagery: Techniques and Challenges, MDPI, Sensors (Basel). 2019 Mar; 19(6): 1423.

[6] Benjamin Blankertz, Guido Dornhege, Steven Lemm, Matthias Krauledat, Gabriel Curio, Klaus-Robert Müller, The Berlin Brain-Computer Interface: Machine Learning Based Detection of User Specific Brain States

Systematic Review of the Literature on How Machine Learning is used to classify EEG signals/Brainwaves forms (Delta, Theta, Alpha, Beta, Gamma)

Level: Bachelor/Master

Supervisor: Yar Muhammad (Yar.Muhammad@ut.ee), Co-supervisor: Naveed Muhammad

The Electroencephalography (EEG) analysis has been an important tool in neuroscience’s applications such as Brain Computer Interface (BCI) and even commercial applications. Many of the analytical tools used in EEG studies have used machine learning (ML) to uncover relevant information for neural classification and neuroimaging.

Recently, the availability of large EEG datasets and in advance ML have both led to the deployment of deep learning architectures, especially in the analysis of EEG signals and understanding of the information it may contain for brain functionality. The robust automatic categorisation of these signals is an important step towards making the use of EEG more practical in many applications.

Towards this goal, a systematic review of the literature on all machine learning and non-machine learning algorithms and applications that are used for EEG classifications is to be performed to address the following critical questions:

1. Which EEG classification tasks have been explored with machine learning and non-machine learning algorithms?

2. What input formulations have been used for training the machine learning algorithms and non-machine learning?

3. Are there specific machine learning or non-machine learning algorithms suitable for specific types of tasks?

4. Compare all suitable results on the classification of EEG signals

5. Finally, a framework will be proposed based on the systematic review of the literature which serves as a path for the classifications of EEG signals/brain waveforms.

Motivation: In the near future, we envision these techniques to enable early diagnosis systems for the detection of neurodegenerative diseases. We can also use them to show signature patterns in physiological data. This can range from spine injuries to heart disease or cancer. This could even change how we treat early diagnosis.

Some relevant literature:

 [1] Yannick Roy, Hubert Banville, Isabela Albuquerque, Alexandre Gramfort “DEEP LEARNING-BASED ELECTROENCEPHALOGRAPHY ANALYSIS: A SYSTEMATIC REVIEW”. Jan 2019.

https://arxiv.org/pdf/1901.05498.pdf

[2] Craik A, He Y, Contreras-Vidal JL, “Deep learning for electroencephalogram (EEG) classification tasks: a review”, J Neural Eng. 2019 Jun;16(3) https://www.ncbi.nlm.nih.gov/pubmed/30808014

[3] Laura Dubreuil, “How can we apply AI, Machine Learning or Deep Learning to EEG?”, March 2018 (https://www.neuroelectrics.com/blog/from-ai-to-deep-learning-applied-to-eeg/)

Additional topics proposed by other groups in the Institute of Computer Science are available here.

Topics for IT Conversion Masters Theses (15 ECTS)

Software Product Management – A Systematic Grey Literature Review

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Software Product Management is a profession that is largely driven forward by practitioners. Therefore, practitioners have accumulated best practices and insights not readily available for the general public. This topic is to conduct a systematic literature review on one of the below listed aspects of software product management. An example of such a review is Paper The topics for which such a review can be conducted are for instance: Business Models, User Flows, UX Heuristics, MVP, Release Management, Risk Management, User Testing, Product Metrics.

Case Study in Using Fractal Enterprise Model in organizational practice

Supervisor: Ilia Bider (ilia12b at gmail dot com)

References

https://www.fractalmodel.org/ - a site under construction that has some resources on FEM
Bider I., Chalak A. (2019) Evaluating Usefulness of a Fractal Enterprise Model Experience Report – an example of a publish paper resulted from MS thesis project
Bider, I., Lodhi, A. Moving from Manufacturing to Software Business: A Business Model Transformation Pattern – an example related to Business Model Innovation

Case Study in Business Process Improvement or Business Data Analytics

Supervisor: Marlon Dumas (marlon dot dumas ät ut dot ee)

This is a "placeholder" Masters project topic, which needs to be negotiated individually. If you work in a IT company and you are actively engaged in a business process improvement or business data analytics project, or if you can convince your hierarchy to put in time and resources into such project in the near-term, we can make a case study out of it. We will sit down and formulate concrete hypotheses or questions that you will test/address as part of this project, and we will compare your approach and results against state-of-the-art practices. I am particularly interested in supervising theses topics related to customer analytics, product recommendation, business process analytics (process mining), and privacy-aware business analytics, but I welcome other topic areas.

Case Study in Software Testing or Software Analytics (focus on software quality)

Supervisor: Dietmar Pfahl (dietmar dot pfahl ät ut dot ee)

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst Important elements of the thesis are literature study, measurement and interviews with experts in the target company.

Bachelor Thesis Projects

Association Rule Mining for Automated Fault Localization

Supervisors: Alejandra Duque Torres and Dietmar Pfahl (firstname [dot] lastname [ät] ut [dot] ee)

A test oracle is a mechanism that determines the correct output of SUT for a given input. Although substantial research has been conducted to provide test oracle automatically, apart from model-driven testing, the oracle problem is largely unsolved. We developed a method to derive test oracles based on information contained in object state data produced during the execution of the SUT. Our proposed method employs Association Rule Mining (ARM). In our proof-of-concept, we used the Stack class from the Java collection framework as the SUT. To test our method, we generated seven faulty versions of the SUT. The results obtained have shown that our approach is capable of detecting failures and locating faults. The ARM-approach mainly benefits fault localization. The goal of this thesis is to extend the initial findings by automatically injecting faults in the original code and to assess whether our methodology can detect and locate those faults. The fault injection can be done by using mutation. Also, we want to extend our method to other classes of Java Collection framework.

Title (missing)

Supervisors: Fabrizio Maggi and Anti Alman (firstname [dot] lastname [ät] ut [dot] ee)

Rule mining is focused on the analysis and optimization of business processes using rules that the process is expected to fulfil. In this project, you will work on extending the Rules Mining toolset (RuM), which is developed at University of Tartu in collaboration with other universities. We invite you to have a look at the website. If you are interested in this topic, we can offer you to develop new features of RuM during your Bachelors thesis, like for example a module for detecting and visualizing violations of business rules in a user-friendly manner. Knowledge of Java is required.