Student Projects, Academic Year 2019-2020

(this page is outdated)

Below is a list of project topics for Masters and Bachelors theses offered by the Software Engineering & Information Systems Research Group for students who intend to defend in June 2020. The projects are divided into:

Software Engineering Masters topics (30 ECTS)
IT Conversion Masters topics (15 ECTS)
Bachelor Thesis Projects (9 ECTS)

If you're interested in any of these projects, please contact the corresponding supervisor.

Master Thesis Projects

Expectations and outcomes of entrepreneurial hackathons

Alex Nolte (alexander dot nolte [ät] ut [dot] ee)

Hackathons started out as time-bounded competitive events during which young developers formed small ad-hoc teams and engaged in short-term intense collaboration on software projects for pizza and the potential prospect of a future job. Since those humble beginnings hackathons have become a global phenomenon with thousands of individuals participating in hundreds of events every weekend. In Estonia in particular hackathons have become an integral part of the vivid startup scene promising fame and fortune to young entrepreneurs.

Such promises can however lead to unrealistic expectations and potential negative experiences for participants and organizers if they are not fulfilled. The aim of this master thesis is to study expectations and outcomes of hackathons participants and organizers and develop suggestions for managing expectations before, during and after an event to prevent potential negative experiences.

Based on existing work you will use a combination of interview, observation, survey and archival data analysis methods to identify expectations of participants and organizers, discover their connection and propose means to manage expectations based on the results obtained.

Design and evaluation of a user interface to increase trust in autonomous vehicles - booked

Alex Nolte (alexander dot nolte [ät] ut [dot] ee), Karl Kruusamäe

The main focus of human-vehicle interface design for decades has been on supporting the driver to steer a vehicle comfortably and safely to its destination. In autonomous vehicles the control of the vehicle will however no longer remain in the hands of the driver. This requires drivers to place trust in the systems that steer the vehicle. Gamification can be a promising approach to build this trust by allowing drivers to step-by-step disengage with the driving experience.

The aim of this master thesis is to design and evaluate a user interface that will place drivers in everyday situations that they would have to solve themselves, but that automation will now solve for them. Studying different designs using established trust measures and usability evaluation techniques you will identify advantages and disadvantages of different designs and develop suggestions for how trust in autonomous vehicles can be increased.

Design and implementation of a probabilistic cognitive architecture for predictive processing in the brain

Kuldar Taveter (kuldar dot taveter [ät] ut [dot] ee)

The purpose of this Master’s project is to design and implement a deep probabilistic cognitive architecture for predictive processing in the brain. The implemented architecture would be a computational simulation of how emotions are constructed in our brain, according to the theory of constructed emotion (Barrett, 2017). Such a computational simulation, if successfully designed and implemented, could be very useful in educational, as well as training and entertainment domains. It would help us to get a better insight of how humans behave within sociotechnical systems, which these days is an acute research topic for many large corporations. A deep probabilistic cognitive architecture for predictive processing in the brain can be implemented in a functional programming language, such as Haskell, or alternatively in a mainstream imperative programming language, such as Python. The outcome would be similar to how the appraisal theories of emotion have been simulated by software agents, as has been reported by Si, Marsella, & Pynadath (2010) but it would be based on a different paradigm – constructional view of how the brain works.

Barrett, L. F. (2017). How emotions are made: The secret life of the brain. Houghton Mifflin Harcourt.
Si, M., Marsella, S. C., & Pynadath, D. V. (2010). Modeling appraisal in theory of mind reasoning. Autonomous Agents and Multi-Agent Systems, 20(1), 14.

Eliciting and representing emotional requirements with colours

Kuldar Taveter (kuldar dot taveter [ät] ut [dot] ee)

Motivational modelling (Sterling & Taveter, 2009; Miller, Pedell, Lopez-Lorca, Mendoza, Sterling, & Keirnan, 2015) is a method that allows ethnographers and requirements engineers to elicit and represent emotional requirements for sociotechnical systems related to the goals to be achieved. The central artefact of motivational modelling is goal model. In a motivational goal model, hierarchically arranged parallelograms stand for functional goals representing what the system should do, whereby each sub-goal represents an aspect of achieving its parent goal. Corresponding quality goals representing how the system should be are associated with functional goals and are represented by clouds. Corresponding emotional goals, represented by hearts, indicate how stakeholders should feel when interacting with the system or, in other words, describe what emotions should constructed in the minds of the system’s users, based on the theory of constructed emotion (Barrett, 2017). The purpose of this Master’s work would be to find out if there is any objective basis for attaching colours to different emotions, based on some studies conducted so far and available literature. If such a basis is found, the thesis should propose a system or order, how colours should be assigned to emotional goals and also address how this could be utilized, apart from modelling, in applications of virtual reality.

Barrett, L. F. (2017). How emotions are made: The secret life of the brain. Houghton Mifflin Harcourt.
Sterling, L., & Taveter, K. (2009). The Art of Agent-Oriented Modeling. Cambridge, MA, and London, England: MIT Press.
Miller, T., Pedell, S., Lopez-Lorca, A. A., Mendoza, A., Sterling, L., & Keirnan, A. (2015). Emotion-led modelling for people-oriented requirements engineering: The case study of emergency systems. Journal of Systems and Software, 105, 54-71.

Analysis, formalization, and application of Ross’ Business Rules Diagrams

Kuldar Taveter (kuldar dot taveter [ät] ut [dot] ee)

The purpose of this Master’s project is to conduct an analysis of the feasibility of the notation for business rules proposed by Ronald G. Ross – Ross’ Business Rules Diagrams (see, for example, http://www.businessrulesgroup.org/first_paper/br01c0.htm), relying, among other materials, on the paper by The Master’s project also find out if and how Ross’ Business Rules Diagrams can be formalized by, e.g., predicate logic, Object Role Modeling, Object Constraint Language, etc. The thesis should relate Ross Business Rules’ diagrams to goal and domain models by Sterling, & Taveter (2009) and Miller, Lu, Sterling, Beydoun, & Taveter (2014). Finally, the thesis should compile an honest analysis of the applicability of Ross’ Business Rules Diagrams, based on real-life case studies preferably by the Master’s student herself/himself. In addition, the Master’s student should find out if Ross' Business Rules Diagrams are geared towards relational databases or are equally well usable in the context of No-SQL databases.

Taveter, K., & Wagner, G. (2001). Agent-Oriented Enterprise Modeling Based on Business Rules. In: H.S. Kunii, S. Jajodia, and A. Solvberg (Eds.): Conceptual Modeling (ER 2001). Springer, 527-540.
Sterling, L., & Taveter, K. (2009). The Art of Agent-Oriented Modeling. Cambridge, MA, and London, England: MIT Press.
Miller, T., Lu, B., Sterling, L., Beydoun, G., & Taveter, K. (2014). Requirements Elicitation and Specification Using the Agent Paradigm: The Case Study of an Aircraft Turnaround Simulator. IEEE Transactions on Software Engineering, 40, 1007-1024.

Goal modelling for Xatkit

Kuldar Taveter (kuldar dot taveter [ät] ut [dot] ee) and Jordi Cabot

Xatkit (https://xatkit.com) is an open source platform that allows anyone to easily create and deploy single chatbots using a domain-specific high-level chatbot definition language. Xatkit takes care of translating these chatbot specifications into the actual running bot. This specification involves a set of intents, where each intent represents a possible intention the customer has when interacting with the bot and for each intent the corresponding reaction to be executed – either a text reply as a part of the conversation, a call to an external service, possibly involving a person, or both. Intents are recognized via a Natural Language Understanding (NLU) component. The purpose of the Master’s thesis is to explore possible role of goal modelling as put forward by Sterling & Taveter (2009) and Miller, Lu, Sterling, Beydoun, & Taveter (2014) in deciding the intentions the customer might have when interacting with the chatbot. The resulting intentions are then specified as the intents for the chatbot. This method is based on systematic hierarchical modelling of the goals of the customer, including functional goals, answering the question “What should be accomplished?” as well as quality goals, characterizing what qualities should be considered when achieving the functional goals and emotional goals, characterizing what the customer should or should not feel when interacting with the chatbot to achieve the functional goals.

Sterling, L., & Taveter, K. (2009). The Art of Agent-Oriented Modeling. Cambridge, MA, and London, England: MIT Press.
Miller, T., Lu, B., Sterling, L., Beydoun, G., & Taveter, K. (2014). Requirements Elicitation and Specification Using the Agent Paradigm: The Case Study of an Aircraft Turnaround Simulator. IEEE Transactions on Software Engineering, 40, 1007-1024.

An Empirical Study on evolving communities on a dynamic temporal transportation network.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Angelo Furno

Traffic prediction has been studied in the past using various techniques, specifically using traffic flow model, statistical methods and machine learning techniques [1]. In this work, we are interested in modelling the road/transportation networks as complex networks, by taking inspirations from social network approaches. We would like to model the traffic flowing through the different roads as information flow. Dynamic communities are expected to provide indications of congested states of the transport system as well as an innovative and powerful spatio-temporal representation. In this work, we plan to study and identify dynamic but similar communities based on a transportation network dataset as a use case study. Dataset will be provided.

References:

1) Nagehan İlhan, Şule Gündüz Öğüdücü, Feature identification for predicting community evolution in dynamic social networks, Engineering Applications of Artificial Intelligence, Volume 55, 2016, Pages 202-218

2) Xu K.S., Kliger M., Hero A.O. (2011) Tracking Communities in Dynamic Social Networks. In: Salerno J., Yang S.J., Nau D., Chai SK. (eds) Social Computing, Behavioral-Cultural Modeling and Prediction. SBP 2011. Lecture Notes in Computer Science, vol 6589. Springer, Berlin, Heidelberg

3) http://www.jatit.org/volumes/Vol95No22/17Vol95No22.pdf

4) https://arxiv.org/pdf/1711.02053.pdf

5) M. Takaffoli, R. Rabbany and O. R. Zaïane, "Community evolution prediction in dynamic social networks," 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), Beijing, 2014, pp. 9-16.

Traffic prediction using complex network features.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Angelo Furno

Traffic prediction has been studied in the past using various techniques, specifically using traffic flow model, statistical methods and machine learning techniques [1]. In this work, we are interested in modelling the road/transportation networks as complex networks, by taking inspirations from social network approaches. We would like to model the traffic flowing through the different roads as information flow. In such settings, we would like to study the problem in terms of (weighted) link prediction [2]. In particular, we would be interested in studying the problem by combing social network approaches with machine learning techniques. The problem has not been studied in the past by coupling these two approaches. Dataset will be provided.

References:

1)Shang Q, Lin C, Yang Z, Bing Q, Zhou X. A Hybrid Short-Term Traffic Flow Prediction Model Based on Singular Spectrum Analysis and Kernel Extreme Learning Machine. PLoS One. 2016;11(8):e0161259. Published 2016 Aug 23. doi:10.1371/journal.pone.0161259

2) David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management (CIKM '03). ACM, New York, NY, USA, 556-559. DOI: https://doi.org/10.1145/956863.956972 Link Prediction: https://kdd2012.sigkdd.org/sites/images/summerschool/Jure-Leskovec-part1.pdf

Analysing Server Logs for Predicting Job Failures.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Alina Sirbu

Server logs generally refer to files which are created for monitoring the activities being performed on servers. In recent years a lot of research has been performed in analysing server logs for analysing the status of the jobs or tasks that arrive on servers. In this thesis, you will be analysing logs from Google cluster, which is a is a set of machines responsible for running real Google jobs for example, search queries. The research encompasses the domain of large scalable predictive analytics. The main contribution of the thesis includes proposing of model to predict the job failures on servers. A real dataset of Google traces will be provided along with related literature to ramp up the learning process.

References:

A. Rosà, L. Y. Chen and W. Binder, "Predicting and Mitigating Jobs Failures in Big Data Clusters," 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, 2015, pp. 221-230. doi: 10.1109/CCGrid.2015.139

H. Fadishei, H. Saadatfar and H. Deldari, "Job failure prediction in grid environment based on workload characteristics," 2009 14th International CSI Computer Conference, Tehran, 2009, pp. 329-334. doi: 10.1109/CSICC.2009.5349381

Dynamic Routing/ Learning in networks for finding optimal path

Anurag Singh and Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Robustness of algorithms for the optimization of networks is an under-explored area. Regardless of the importance and success of optimization theory in diverse graph problems, the area still have many issues when dealing with problems such as Network resource allocation. Resource can be bandwidth or any sort of information that should be allocated efficiently in the network. Most of the work relies on defining the data constraints and objective function optimization. But in due to exponential increase of data, traditional optimization algorithms do not fit in real life applications that can be formulated as graph problems. Also, most of the existing algorithms for solving the optimization problems in networks are centralized therefore cannot be used for Network resource allocation problems where distributed solutions are required. This problem can be proficiently solved using Deep Reinforcement learning (DRL). In this project, a distributed robust optimization framework will be designed for efficient resource allocation in network. Also, the fast recovery schemes will be incorporated in the proposed framework to guarantee interrupted services in the case of link failure. The focus of the project will be on Communication network. A centralized as well as the distributed solution will be proposed for the optimization of resource allocation problem using deep reinforcement learning. The distributed framework will be able to handle data with upto millions or billions of users. The proposed framework will be evaluated on the real life applications to prove its efficiency over the existing solutions.

References:

P. Hu, K. C. C. Chan, and T. He, “Deep Graph Clustering in Social Network.,” in the Proceedings of the 26th International Conference on World Wide Web Companion., 2017, pp. 1425–1426.

Optimization: Identify Critical Links,” IEEE Transactions on Network Science and Engineering, vol. 4697, no. c, pp. 1–13, 2018.

Fake news Detector: A user assissted Fake news system with explainable AI.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Fake news has attracted a lot of attention in the last few years especially to the concerns that it helps in spreading violence and hatred in the society. Plethora of approaches have been provided in the past to counter this problem especially using machine learning and deep learning technologies. In this thesis, we plan to propose a mix of user assisted + Explanable Artificial intelligence (AI) fake news detector which can help in detecting fake news by explaining to the users the reasons of tagging an article as possible fake news article and then getting confirmation from the users using crowdsourcing mehchanism.

Identifying Fake News using Linked Data and Network Science Approaches

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Deepak Padmanabhan

Fake news is often generated with malicious intent of spreading misinformation and for spreading rumours. The content in fake news is generally created to mislead readers in order to gain financially or politically, as well as to grab attention. Apart from social media such as Twitter and Facebook and Whatsapp, there are dedicated news agencies that propagate fake news.

The goal of this thesis is to use the content present in the news stories to identify as fake or not by using “Linked Data” in combination with “Network Science” approaches. The linked data approach will be used for identifying fake news indicators such as enhanced topical scatter in news content to be analyzed. The network science approach will be used for identifying the similarity among the topics of the content to boost accuracy of fake news detection. This involves analysis of a corpus of news stories that will be collected for the purpose of this project. Guidance on network science and Linked Data will be provided to get started on the project.

Zhou, Xinyi, and Reza Zafarani. "Fake news: A survey of research, detection methods, and opportunities." arXiv preprint arXiv:1812.00315 (2018).

Jiawei Zhang, Bowen Dong, Philip S. Yu. FAKEDETECTOR: Effective Fake News Detection with Deep Diffusive Neural Network. https://arxiv.org

Thota, Aswini; Tilak, Priyanka; Ahluwalia, Simrat; and Lohia, Nibrat (2018) "Fake News Detection: A Deep Learning Approach," SMU Data Science Review: Vol. 1 : No. 3 , Article 10. Available at: https://scholar.smu.edu/datasciencereview/vol1/iss3/10

Predicting Financial transactions:

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Financial transactions often can be used for infering the economical patterns of a society. However, this is different for alternate but not so popular bitcoin transactions. In this thesis, you will analyse the bitcoin dataset (we have the dataset) using network science and machine learning techniques. The aim of this thesis is to predict financial transactions (edge and weight prediction in terms of graph theory) using data science by exploiting network science for feature creation.

References:

Antulov-Fantulin, Dijana Tolic, Matija Piskorec, Zhang Ce, Irena Vodenska, Inferring short-term volatility indicators from Bitcoin blockchain, Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 813. Springer, https://doi.org/10.1007/978-3-030-05414-4_41

Agarwal RR, Lin C-C, Chen K-T, Singh VK (2018) Predicting financial trouble using call data—On social capital, phone logs, and financial trouble. PLoS ONE 13(2): e0191863. https://doi.org/10.1371/journal.pone.0191863

Hackathons as catalysts for future job opportunities

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee), Alex Nolte, Irene-Angelica Chounta

Hackathons are often perceived as events during which participants can expand their personal networks and develop or showcase their skills for future job opportunities. It is thus common for participants to participate in several hackathons that cover different themes and that take place in different locations.

The goal of this master thesis is to develop an understanding about the connection between hackathon participants and the potential impact of those connections on future job opportunities. As a starting point you will work with an existing dataset which covers roughly 120.000 hackathon participants (Devpost). Most of those participant profiles are connected to personal Github repositories, private websites or Linkedin profiles. The student will analyze these data from social network (or network science) perspective to understand the relations among the hackathon participants. Basic concepts and libraries to be used for network science and social network data analysis will be provided to speed up the process.

Measuring Corporate Reputation through Online Social Media

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

When businesses are caught out engaging in illegal or immoral activities, their reputation might suffer. Corporate reputation is a reflection of how a business is regarded by its customers and the public in general. If corporate misbehaviour negatively affects a business’ reputation, customers might switch to rival businesses. For this reason, reputation has got a central role in free markets as it has the potential to deter businesses from misbehaving.

The extent, to which corporate wrongdoings trigger a reputational loss is still debated and is subject to a large body of academic works. Most of these works are based on survey methods to measure reputation. This research relies on a more direct method to measure reputational changes, by conducting a sentiment analysis of 1) how the public reacted on Twitter, 2) Analysing the voice of experts using (Pistonheads), and 3) Traditional media. In this particular work thesis, corporate reputation will be studied using the Volkswagen (VW) scandal. The dataset and related literature will be provided for speeding up the work.

A number of interesting questions could be analysed. 1) What is the difference in how these sources respond to the scandal? 2) For which type of data is sentiment classification easiest? , 3) Compare stock prices and see where correlation is highest. Historical stock prices are free to download from various online sources.

References:

Bachmann, Rüdiger and Ehrlich, Gabriel and Ruzic, Dimitrije, Firms and Collective Reputation: The Volkswagen Emission Scandal as a Case Study (January 11, 2018). CESifo Working Paper Series No. 6805. Available at SSRN: https://ssrn.com/abstract=3124125

To be or not to be Mean: Exploring mean (or altruistic) behavior of influential people on Twitter.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Would you be interested and curious to explore about famous people on Twitter. Are they mean or altruistic in nature? In this thesis, you will analyse a large number of influential people from different sectors ( 1) Influential authors, 2) Influential chefs 3) Influenctial researchers (Academicians), 4) Influential people from media, cinema etc.) on Twitter using their Twitter activity. We expect the student will use techniques from NLP such as sentiment analysis and topic modeling to understand the topic about which influential people tweet. Are they tweeting about social cause topics or they are doing self promotion.

References:

D Quercia, R Lambiotte, D Stillwell, M Kosinski. The personality of popular facebook users. Proceedings of the ACM 2012 conference on computer, 2012.

Bakshy, Eytan & M. Hofman, Jake & Mason, Winter & Watts, Duncan. (2011). Everyone's an Influencer: Quantifying Influence on Twitter. Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011. 65-74. 10.1145/1935826.1935845.

Ma, Xingjun & Li, Chunping & Bailey, James & Wijewickrema, Sudanthi. (2017). Finding Influentials in Twitter: A Temporal Influence Ranking Model.

I hate you like I love you: A case study of likes and dislikes on Youtube.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Would you be interested in performing an empirical study about likes and dislikes on Youtube. This thesis will study various youtube (english) videos from celebreties and not so famous people. The videos will be analysed from different categories (food, music, dances, Technology, sports, motivational etc). Along with calculating likes and dislikes, length of the videos, source of origin (for example, USA or UK) the student will also perform NLP (text analysis) on the comments to identify the important aspects of the videos.

References:

Mathias Bärtl (2018). YouTube channels, uploads and views: A statistical analysis of the past 10 years. Sage publications. Volume: 24 issue: 1, page(s): 16-32

Happy and Peaceful: Analysing Twitter to identify happy and peaceful regions.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

In this thesis, we will analyse a large scale Twitter data to identify happy and peaceful regions. Some of the interesting questions to identify would be 1) Which is more prevalent: Peaceful or hatred ? 2) Which cities are more peaceful and which are violent ? It would be interesting to cross check results output of Twitter analysis with an alternative source such as online news media channels.

Alternatively, you can also check tweets of influential or famous people against non famous people and then check for happiness and the terms they analysed.

References:

D Quercia, R Schifanella, LM Aiello. The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city The 25th ACM Conference on Hypertext and Social Media

D Quercia, J Ellis, L Capra, J Crowcroft. Tracking gross community happiness from tweets. Proceedings of the ACM 2012 conference on computer …, 2012

D Quercia, J Ellis, L Capra, J Crowcroft. In the mood for being influential on twitter. 2011 IEEE Third International Conference on Privacy …, 2011

Welcome to my couch: Why some people attract more hosts and guests than others on couchsurfing?

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Couchsurfing is a social platform which helps travellers in finding free couch (or place to stay). However, not everyone is lucky to find hosts on this platform. It has been observed that some people have to put more efforts in looking for hosts than others. Similarly, some hosts are more popular than others. In this thesis, using a random sampling method we will collect various poupular and non-popular guests and hosts. Further by using NLP and text analytics approaches, we will try to identify the possible characteristics of popular and non-popular guests and hosts.

References:

Debra Lauterbach, Hung Truong, Tanuj Shah, Lada A. Adamic. Surfing a Web of Trust: Reputation and Reciprocity on CouchSurfing.com. Proceedings IEEE CSE'09, 12th IEEE International Conference on Computational Science and Engineering, August 29-31, 2009, Vancouver, BC, Canada

Tan, Jun-E. (2010). The Leap of Faith from Online to Offline: An Exploratory Study of Couchsurfing.org. 6101. 367-380. 10.1007/978-3-642-13869-0_27.

Chen, D.-J. (2018). Couchsurfing: Performing the travel style through hospitality exchange. Tourist Studies, 18(1), 105–122. https://doi.org/10.1177/1468797617710597

The power of stars: An empirical analysis of successful and flop movies.

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee)

Every year hundred of movies are released world wide. Some of them turned out to be box office hits (successful) and some not (flop). Various research have been done in the past about analyses of movies using social media platforms such as IMDB, Twitter, Blogs etc. Most of these works are around predictive analytics, that is predicting the outcome of box office results of movies. In comparison to previous works, in this thesis, we will analyse large amount of movies (successful and unsuccessful) from Hollywood and Bollywood (different geographic and cultural backgrounds) to analyse what characteristics can differentiate a successful from a flop movies.

References:

Mohanbir S Sawhney and Jehoshua Eliashberg. A parsimonious model for forecasting gross box-office revenues of motion pictures. Marketing Science, 15(2):113–131, 1996.

Suman Basuroy, Subimal Chatterjee, and S Abraham Ravid. How critical are critical reviews? the box office effects of film critics, star power, and budgets. Journal of marketing, 67(4):103– 117, 2003.

An exploratory study to characterize popular Vs. Non popular books

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Shirin Dora

Books are man's best friends however, some fail to attract readers attention. In this thesis, we will analyse the characteristics of popular vs. non popular books. In particular, we will analyse a large set of corpus of books and their reviews to understand what are the typical characteristics (author, number of pages, chapters, genra) of a popular and non popular books.

Discriminatory Speech on Digital Media Platforms

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Christian Simon Ritter

This thesis will explore how discriminatory (for example, anti-immigrant) and non-discriminatory (for example, pro-immigrant) groups spread their ideas (emergence and circulation) on digital media platforms (on online social media). In particular, the student will identify some specific number of groups in each category. For social media platforms three platforms will be selected, namely, Instagram, Twitter and Facebook . Drawing on critical social science perspectives on group classifications and boundary maintenance within ethnic and religious online communities, the student will identify discriminatory narratives on digital media platforms. The project will involve mixed-method (qualitative and quantitative) research. By analyzing qualitative and quantitative data in parallel, the project will provide new insights into the circulation of hate (bridging) speech, exclusionary (or inclusive) narratives, and anti-immigration (pro-immigration) discourses on digital media platforms. The outcome of this work possibly could be recommending strategies for more inclusive platform politics.

References:

1)Christopher A. Bail, Lisa P. Argyle, Taylor W. Brown, John P. Bumpus, Haohan Chen, M. B. Fallin Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, Alexander Volfovsky. (2018) Exposure to opposing views on social media can increase political polarization Proceedings of the National Academy of Sciences Sep 2018, 115 (37) 9216-9221; DOI: 10.1073/pnas.1804840115

2) Matuszewski, P., & Szabó, G. (2019). Are Echo Chambers Based on Partisanship? Twitter and Political Polarity in Poland and Hungary. Social Media + Society. https://doi.org/10.1177/2056305119837671

3) Bracey, G. & Moore, W. (2017) “Race Tests”: Racial Boundary Maintenance in White Evangelical Churches 87(2), 282-302. https://doi.org/10.1111/soin.12174

4) Burgess, J. & Matamoros-Fernandez, A. (2016) Mapping sociocultural controversies across digital media platforms: One week of #gamergate on Twitter, YouTube, and Tumblr. Communication Research and Practice 2(1), 79-96. https://doi.org/10.1080/22041451.2016.1155338

Measurement of companies innovativeness using web-sites and social-media data: application to Estonian data

Rajesh Sharma (rajesh dot sharma [ät] ut [dot] ee) and Jaan Masso and Priit Vahter

It has been observed that companies do closely follow other companies for innovation. This has been observed by recent study [1]. In this, we would like to perform a similar study in Estonian context. The work will include web scrapping the data from various Estonian companies and other online social media sources. A network will be created among companies which are involved in similar innovation or following some other companies for a similar innovation. This work involves Predicting Innovative Firms by creating features from techniques like network science, web mining, text analytics and then applying classical machine learning/deep learning approaches for prediction. The project would be supervised in cooperation with Jaan Masso and Priit Vahter from Economcis department.

References

[1] Jan Kinne and Janna Axenbeck. Web Mining of Firm Websites: A Framework for Web Scraping and a Pilot Study for Germany“, ZEW Discussion Paper No. 18-033; Kinne, Jan and David Lenz (2019), Predicting Innovative Firms Using Web Mining and Deep Learning, ZEW Discussion Paper No. 19-001, Mannheim.

Topics related to software quality and software analytics (Dietmar Pfahl and Kristiina Rahkema)

Topic 1: Building a tool for detecting code smells in Android application code -- TOPIC HAS BEEN TAKEN!!!

Supervisors: Kristiina Rahkema (kristiina dot rahkema ät gmail dot com) and Dietmar Pfahl (dietmar dot pfahl ät ut dot ee)

Code smells and their detection have been studied a lot for desktop applications. For mobile applications the focus has been on the Android platform. Most of these research papers concentrate on a small number of code smells [1][2][3]. In [4] researchers looked at a larger set of code smells, but they relied on a commercial tool that is not available anymore. They also compared Android code smells and java desktop code smells and concluded that the results should be transferable to other mobile and desktop platforms.

We are currently working on building a tool that finds code smells in swift (iOS and MacOS) applications. The objective is to implement as many code smells as possible (including 22 OOP code smells from Fowler and code smells studied in [4]). The approach is similar to PAPRIKA [5] (a tool used to find certain code smells in Android code), with some extensions. We would like to get an overview of code smells occurring in iOS and MacOS applications and see if the results are comparable to Android and java desktop applications as stated in [4].

The objective of the master thesis would be to implement detection for the same code smells for java (and potentially kotlin) applications. This would make it possible to have a direct comparison of code smell occurrences in iOS and Android. It would also be interesting to see how our interpretation of these code smells compare to the commercial tool inFusion used in article [4].

For the student this work will give a good overview of mobile application and general OOP code smells. This work should also give a good overview of how to analyse java source code and build static analysis tools. It might be possible to extend the PAPRIKA tool or to write a tool from scratch similar to our swift analysis tool.

Literature:

Kessentini, M., & Ouni, A. (2017, May). Detecting android smells using multi-objective genetic programming. In Proceedings of the 4th International Conference on Mobile Software Engineering and Systems (pp. 122-132). IEEE Press.
Hecht, G. (2015, May). An approach to detect Android antipatterns. In Proceedings of the 37th International Conference on Software Engineering-Volume 2 (pp. 766-768). IEEE Press.
Habchi, S., Hecht, G., Rouvoy, R., & Moha, N. (2017, May). Code Smells in iOS Apps: How do they compare to Android?. In 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft) (pp. 110-121). IEEE.
Mannan, U. A., Ahmed, I., Almurshed, R. A. M., Dig, D., & Jensen, C. (2016, May). Understanding code smells in Android applications. In 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft) (pp. 225-236). IEEE.
https://github.com/GeoffreyHecht/paprika

Topic 2: Case Study on Exploratory Testing

Supervisor: Dietmar Pfahl (dietmar dot pfahl ät ut dot ee)

Exploratory software testing (ET) is a powerful and fun approach to testing. The plainest definition of ET is that it comprises test design and test execution at the same time. This is the opposite of scripted testing (having test plans and predefined test procedures, whether manual or automated). Exploratory tests, unlike scripted tests, are not defined in advance and carried out precisely according to plan.

Testing experts like Cem Kaner and James Bach claim that - in some situations - ET can be orders of magnitude more productive than scripted testing, and a few empirical studies exist supporting this claim to some degree. Nevertheless, ET is usually is often confused with (unsystematic) ad-hoc testing and thus not always well regarded in both academia and industrial practice.

The objective of this project will be to conduct a case study in a software company investigating the following research questions:

To what extend is ET currently applied in the company?
What are the advantages/disadvantages of ET as compared to other testing approaches (i.e., scripted testing)?
How can the current practice of ET be improved?
If ET is currently not used at all, what guidance can be provided to introduce ET in the company?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst Important elements of the thesis are literature study, measurement and interviews with experts in the target company.

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study. In addition, the student must have a company co-supervisor. The exact thesis topic and goal(s) must be agreed between the student, the company co-supervisor, and me BEFORE the thesis project can start.

Topic 3: Case Study on Test Automation -

Supervisor: Dietmar Pfahl (firstname dot lastname ät ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of test automation. The objective of this project will be to investigating the following research questions:

To what extend is test automation currently applied in the company (i.e., what test-related activities are currently automated and how is this done)?
What are the perceived strengths/weaknesses of the currently applied test automation techniques and tools?
How can the current practice of test automation be improved (i.e., how can the currently automated test process steps be made more productive, and what steps currently done manually are promising to be automated)?

Topic 4: Case Study on Balancing Manual and Automated Testing

Supervisor: Dietmar Pfahl (firstname dot lastname ät ut dot ee)

Similar to the case study projects on Exploratory and Automated Testing (see Topics 2 and 3 above), a student can work in a company and investigate the balancing of manual and automated testing. The objective of this project will be to investigating the following research questions:

To what extend is test automation currently applied in the company (i.e., what test-related activities are currently automated and how is this done)?
What are the perceived strengths/weaknesses of the currently applied test automation techniques and tools?
To what extend is manual testing currently applied in the company?
What are the advantages/disadvantages of manual testing?
How can automated and manual testing be combined to optimize the overall effectiveness of testing?

Topic 5: Using Data Mining & Machine Learning to Support Decision-Makers in SW Development/Testing/Management

Supervisor: Dietmar Pfahl (firstname dot lastname ät ut dot ee)

Project repositories contain much data about software development activities ongoing in a company. In addition, there exists much data from open source projects. This opens up opportunities to analysis and learning from the past which can be converted into models that help make better decisions in the future - where 'better' can relate to either 'more efficient (i.e., cheaper) or more effective (i.e., with higher quality).

For example, we have recently started a research activity that investigates whether textual descriptions contained in issue reports can help predict the time (or effort) that a new incoming issue will require to be resolved.

There are, however, many more opportunities, e.g., analysing bug reports to help triagers assign issues to developers. And of course, there are other documents that could be analysed: requirements, design docs, code, test plans, test cases, emails, blogs, social networks, etc. But not only the application can vary, also the analysis approach can vary. Different learning approaches may have different efficiency and effectiveness characteristics depending on the type, quantity and quality of data available.

Thus, this topic can be tailored according to the background and preferences of an interested student.

If you are a first year student and planning to do your thesis in 2019/20, it is also possible to combine the thesis project with an ERASMUS Traineeship. Currently, two of my students are doing such a traineeship with industry-oriented research centres in Germany and Austria. In such a setting, the topic must also be negotiated with the receiving research centre. Only top-performing students can get admission to an ERASMUS traineeship opening.

Tasks to be done (after definition of the exact topic/research goal):

Selection of suitable data sources
Application of machine learning / data mining technique(s) to create a decision-support model
Evaluation of the decision-support model

Prerequisite: Students interested in this topic should have completed one of the courses on data mining / machine learning offered in the Master of Software Engineering program with a grade of 'B' or better.

A Multi-objective Issue Recommender System

Supervisor: Ezequiel Scott (ezequiel dot scott ät ut dot ee)

In agile software development, issue allocation is often based on self-assignment. That is, developers choose the issues (e.g. user stories, bug reports) that they will develop during the sprint based on their own preferences and experience, which can be difficult for non-experienced developers. In this context, a recommender system can help developers to choose their issues. However, typical issue recommender systems have the goal of producing a list of issues that optimizes one measure of interestingness, such as accuracy. Suggesting issues that are simultaneously accurate, novel and diverse is much more challenging, since the attempt to improve an additional measure may result in worsening other measures [1]. The goal of this project is to build and evaluate a multi-objective recommender system to aid developers during the assignment of issues.

References

Ribeiro, M. T., Ziviani, N., Moura, E. S. D., Hata, I., Lacerda, A., & Veloso, A. (2015). Multiobjective pareto-efficient approaches for recommender systems. ACM Transactions on Intelligent Systems and Technology (TIST), 5(4), 53. https://dl.acm.org/citation.cfm?id=2629350
Borg M., Runeson P. (2014) Changes, Evolution, and Bugs. In: Robillard M., Maalej W., Walker R., Zimmermann T. (eds) Recommendation Systems in Software Engineering. Springer, Berlin, Heidelberg https://link.springer.com/chapter/10.1007/978-3-642-45135-5_18

Analyzing the Quality of User Stories in Agile Software Projects

Supervisor: Ezequiel Scott (ezequiel dot scott at ut dot ee)

Requirements are usually expressed as User Stories in agile software development. Although User Stories are expected to follow a fixed structure (“As <a role> I want to <a feature> in order to <a benefit>”), they are still written by using natural language and informal descriptions. Recent research has defined a framework to assess the quality of the user stories [1] along with a tool to automatically detect errors in the description of the user stories. However, this approach has not been used with real datasets that comes from real projects. The aim of this project is to explore how the quality of user stories evolves along the time in open source projects and to check if low-quality user stories can lead to a larger number of bugs than high-quality ones.

References

Lucassen, G., Dalpiaz, F., van der Werf, J.M.E. and Brinkkemper, S., 2016. Improving agile requirements: the quality user story framework and tool. Requirements Engineering, 21(3), pp.383-403. https://link.springer.com/article/10.1007/s00766-016-0250-x
AQUSA Tool https://github.com/gglucass/AQUSA

A Semi-automatic Approach for User Story Splitting

Supervisor: Ezequiel Scott (ezequiel dot scott at ut dot ee)

User stories should be “small enough” before they are ready for implementation in an upcoming iteration [1]. In practice, many user stories do not satisfy this property and it can introduce misunderstanding during the development. To deal with “large” user stories, practitioners apply patterns for splitting user stories [2] that allow for breaking up one user story into smaller ones. The goal of this project is to build a semi-automatic approach for user story splitting that supports developers during this task. To address this, you will apply NLP and ML techniques to identify user stories that potentially requires splitting and suggest suitable patterns for that. You will work with data taken from several open-source projects.

References

The INVEST criteria. https://xp123.com/articles/invest-in-good-stories-and-smart-tasks/
User Story Splitting. https://www.agilealliance.org/glossary/split/

Improving the release planning of mobile apps using app-reviews

Supervisor: Ezequiel Scott (ezequiel dot scott at ut dot ee)

Software companies developing mobile apps make their users aware of new updates by implementing weekly or even daily releases. The engineering practice of continuous delivery aims to simplify release planning. However, many decisions have still to be made in the release planning, such as what functionality should be included in the release, when the release should happen, and how much quality should the release have. The goal of this project is to use information extracted from mobile-app stores (e.g. app-reviews written by final users) to develop an approach that will improve the release planning of mobile applications.

References

Nayebi, M., Adams, B., & Ruhe, G. (2016, March). Release Practices for Mobile Apps -- What do Users and Developers Think?. In 2016 ieee 23rd international conference on software analysis, evolution, and reengineering (saner) (Vol. 1, pp. 552-562). IEEE. https://ieeexplore.ieee.org/document/7476674

Interpretable Predictive Monitoring of Business Processes

Supervisor: Fabrizio Maggi (f.m.maggi@ut.ee)

Recent advances of supervised machine learning in various tasks stem from the use of powerful and complex models (neural networks, deep learning, random forests). However, adoption in practice remains challenging because of limited interpretability of these methods and low actionability (what should the user do to alter the ongoing process instance to improve the expected/predicted outcome). Lack of understandability and actionability poses a serious challenge in domains such as financial and medical services, where the understanding of the decision behind the prediction is crucial. As such, this thesis project goes beyond the state-of-the-art in predictive process monitoring by developing methods and techniques to translate complex predictive models into understandable knowledge for key stakeholders in the process. We will develop a system for predictive process monitoring that provides understandable explanations about the predictions to the users.

[1] https://dl.acm.org/citation.cfm?doid=3271482.3236009

Leveraging A-priori Knowledge in Predictive Business Process Monitoring

Supervisor: Fabrizio Maggi (f.m.maggi@ut.ee)

Predictive business process monitoring aims at leveraging past process execution data to predict how ongoing (uncompleted) process executions will unfold up to their completion. Nevertheless, cases exist in which, together with past execution data, some additional knowledge (a-priori knowledge) about how a process execution will develop in the future is available. This knowledge about the future can be leveraged for improving the quality of the predictions of events that are currently unknown. In this thesis, we will investigate how to use Deep Learning techniques to leverage knowledge about the structure of the process execution traces as well as a-priori knowledge about how they will unfold in the future for predicting the sequence of future activities of ongoing process executions. An idea of this approach can be found in https://link.springer.com/content/pdf/10.1007%2F978-3-319-65000-5_15.pdf

Tell it with your own words: Defining Business Process Models with Natural Language Processing and Speech Recognition

Supervisor: Fabrizio Maggi (f.m.maggi@ut.ee)

This is a new frontier for Business Process Management and a really interesting research topic. In this thesis, we will start from techniques for Speech Recognition and Natural language processing to create model editors that interact with the user by requiring as input his/her voice. The output of speech recognitions techniques will be used as input for Natural language Processing that, in turn, will produce process models as explained in https://hanvanderaa.com/wp-content/uploads/2019/03/CAISE2019-Extracting-declarative-process-models-from-natural-language.pdf

Online Process Discovery from Event Streams

Supervisor: Fabrizio Maggi (f.m.maggi@ut.ee)

Stream processing is defined as “technologies designed to process large real-time streams of event data” and one of the example applications is process monitoring. The challenge to deal with streaming event data is also discussed in the Process Mining Manifesto. A process discovery algorithm is a function that maps an event log in a process model such that the model is representative for the behavior seen in the event log. In [1] an approach to automatically discover process models from streams of data has been presented. However this approach did not consider Data-aware conditions. In this thesis, we extend the algorithm in [1] in order to generate Data-aware business process models. In this thesis project, we will use machine learning algorithms for incremental learning from data like Hoeffding trees to discover process models at runtime.

[1] https://dl.acm.org/citation.cfm?doid=3202710.3203154

Methods and Tool for Prescriptive Monitoring of Business Processes

Supervisors: Marlon Dumas (marlon dot dumas ät ut dot ee) and Fabrizio Maggi

Monitoring and controlling the execution of business processes is an essential activity in modern organizations. A poorly executed business process may lead to direct financial losses (e.g. missed sales) as well as customer dissatisfaction. Managers and business analysts rely on business process monitoring tools to ensure that their business processes run smoothly. Traditionally, these monitoring tools are able to detect issues "after-the-fact", in other words, once they have occurred. For example, a process monitoring tool may show us a dashboard with the executions of the process that are running late, but it does not tell us which process executions are likely to run late in future, nor how late will they be? The availability of detailed historical data about business process executions in modern enterprise systems, together with advances in machine learning, have made it possible to train accurate predictive models for business process monitoring. Our research group has previously developed a pioneering open-source tool for predictive process monitoring called Nirdizati. Nirdizati allows users to train predictive models and to use these models for making predictions about the remaining time and the expected outcome of each ongoing execution of a business process. Recently, our research group has started to develop techniques that use these predictions to generate alarms to prevent undesired outcomes during the execution of a process, such as customer complaints or missed deadlines. We call this "prescriptive process monitoring" because the goal is not only to "predict", but also to "prescribe" what should be done in order to prevent undesired outcomes [1]. Our research group is proposing two Masters thesis topics related to prescriptive process monitoring. The goal is of the first thesis topic is to develop a Web frontend that will allow users to specify business rules and cost functions, which will be used by the prescriptive process monitoring engine to raise alarms and to make recommendations during the execution of a process. This thesis topic requires you to have at least intermediate skills in Web application development. We have the ambition to develop a tool that can be used in real-life settings. We have contacts with potential users of the prescriptive process monitoring technology who would be willing to give input and feedback to guide the development of this tool. The second thesis topic is more research-oriented. The goal is to design machine learning methods to estimate the effect of triggering an intervention during the execution of a process. We will use “causal machine learning” methods for this purpose (we will give you some pointers and concrete ideas). Although this topic is more research-oriented, it will also involve a lot of programming -- you will try out several machine learning libraries, you will run tests using real-life datasets, and you will get feedback from potential users of this technology. This work is part of an ambitious EU-funded research project called PIX - https://sep.cs.ut.ee/Main/PIX We offer a remuneration to students who would be willing to undertake these Masters thesis topics in full-time mode (i.e. 30-40 hours per week, starting some time between October and January).

Reference

Stephan A. Fahrenkrog-Petersen, Niek Tax, Irene Teinemaa, Marlon Dumas, Massimiliano de Leoni, Fabrizio Maria Maggi, Matthias Weidlich:Fire Now, Fire Later: Alarm-Based Systems for Prescriptive Process Monitoring. Preprint # 1905.09568, Arxiv.org, 2019. https://arxiv.org/pdf/1905.09568.pdf

Deviance Analysis Using Redescription Mining

Supervisors: Marlon Dumas (marlon dot dumas ät ut dot ee) and Fabrizio Maggi

Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to the expected or desirable outcomes of the process. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. For example, in an order-to-cash process, the process instances (cases) that end up in a cancellation of the purchase order can be said to be "deviant". Those that end up in a correct delivery of the order and its payment are considered to be "normal".

Deviance analysis is concerned with uncovering the reasons for deviant executions by analyzing business process execution logs. Existing techniques for deviance analysis suffer from the fact that the output they produce is not easily interpretable. For example, when applied to real-life datasets, some of these techniques produce hundreds of rules, each one capturing one possible cause for the deviant cases. Such large sets of rules are difficult to understand.

In this Masters project, you will develop and evaluate an alternative technique for deviance analysis based on an emerging data mining technique called Redescription Mining. The main outcome of the project will be a tool that takes as input two event logs (the log of the normal cases and the log of the deviant cases) and that produces as output readable statements explaining how the deviant cases differ from the normal cases, using redescription mining techniques. You will try out at least two different redescription mining tools, for example CLUS-RM and SIREN and you will compare their performance using real-life business process execution logs. The project requires basic Python programming skills, some basic knowledge of business process management, and basic knowledge of data mining.

Dynamic analysis of Scratch code to infer computing skills

Supervisors: Marcello Sarini (firstname.lastname [ät] unimib.it) and Marlon Dumas

The aim of the project is the implementation of a tool to perform dynamic analysis of Scratch code and to compute some metrics useful to infer some computing skills of the coder (this could be intended also coding style, and this recall my previous work on working style).

The project requires a first part where literature on code analysis is analyzed, especially with regards to comparing static and dynamic analysis, to identify approaches used to infer computing skills, in particular with a focus on Scratch programming language. The final aspect of the first part is to propose some metrics based on dynamic code analysis to be applied to Scratch programs. The second part of the project requires the student to implement a dynamical analyzer of a Scratch program in order to return some metrics.

Knowledge of Javascript programming language and of the Node.js server is required.

Design and Implementation of Computational Models for Learning Analytics at University of Tartu.

Supervisors: Irene-Angelica Chounta and Marlon Dumas (marlon dot dumas ät ut dot ee)

Learning Analytics (LA) is a human-centered design discipline that employs computational methods to explore data traces originating from learning activities in order to promote learning by providing meaningful feedback. The aim of Learning Analytics is threefold: a) to help students improve their learning outcomes by scaffolding self-reflection, self-regulation and motivation, b) to support teachers in orchestrating learning activities and providing appropriate scaffolding for students and c) to assist researchers in uncovering underlying mechanisms of the learning process and determining the impact technology, context and other factors have on the ways people learn.

In this thesis, we aim to design computational models for the assessment of students’ academic performance (for example, successfully completing a course), identification of risks (such as, missing deadlines) and prevention of failures (such as, drop-outs) using Learning Analytics and Machine Learning for students in Higher Education. We will use existing data about students’ demographics, academic background and current practice in order to implement such computational models and to integrate them in a Learning Analytics infrastructure that aims to support stakeholders from the University of Tartu.

Process Mining for Managing Waste in Business Processes

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Organizations have and continue to work with improving their business processes. In the past decades, analysts have produced an impressive set of frameworks and methods for systematically analyze and improve business processes. One of these is lean thinking that aims to remove wastes in busines processes. Waste identification and elimination is still a predominantly manual work conducted by process analysts. In the last decade, process mining techniques have been developed that allow for data driven analysis of business processes. However, there is still a gap between waste management in processes that require manual analysis and process mining techniques. The objective of this thesis is to reduce this gap. More specifically, the objective is to support business process improvements by employing process mining. This thesis will require the student to build an understanding of lean management, examine more closely how improvement opportunities (wastes) are identified, how they can be eliminated, survey existing process mining techniques that can support such activities, develop a new process mining algorithm (using, modifying, and extending existing process mining methods), and, finally, evaluate the output. This thesis topic is similar to the following publication: https://link.springer.com/article/10.1007/s12599-020-00649-w

Framework for Managing Positive Deviances in Business Processes with Process Mining

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

When seeking to improve their business processes, companies look at the relevant top performers. By examining the processes of top performers, one learns what they do that make them better. This is sometimes referred to as positive deviance. Systems log data that can be used for data-driven benchmark analysis. This thesis topic is about eliciting and defining positive deviances in business processes that can be detected from event logs. The work requires the student to build up an understanding of benchmarking, survey existing work on the topic, extract, analyze, and synthesize data into a framework, identify process mining techniques that can support such activities, possibly develop a new process mining algorithm (using, modifying, and extending existing process mining methods), and, finally, evaluate the output.

Business Process Improvement using Process Mining – A Case Study

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Process mining can be used to conduct data-driven analysis of business processes. This thesis is about doing exactly that for a company. The topic will require preparation of the execution log for certain processes, surveying process mining use cases relevant and useful for this particular case, analysis of processes using Apromore (process mining tool), discuss the results with domain experts, identify and propose process improvements, and evaluate the results. This topic is dependent on access to data.

Robotic Process Automation – A Case Study

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Robotic Process Automation (RPA) has recently been gaining popularity. With RPA, manual work can be automated without requiring heavy investment in IT development. However, it is not straightforward which process to select, what RPA tool to use, and how to best add value. This topic is about applying a framework for identifying candidate processes/process fragments, surveying suitable RPA tools, implement RPA for a few processes, evaluate the results, and draw conclusions. This topic is dependent on access to data.

Identifying Business Process Improvement Opportunities from Event Logs

Supervisor: Katsiaryna Lashkevich katsiaryna [dot] lashkevich [at] ut [punkt] ee and Fredrik Milani (milani [ät] ut [dot] ee)

Organizations have and continue to work with improving their business processes. In the past decades, analysts have produced an impressive set of frameworks and methods for systematically analyse and improve business processes. Business process improvement and redesign is still a predominantly manual work conducted by process analysts. In the last decade, process mining techniques have been developed that allow for data driven analysis of business processes. However, most of the techniques provides descriptive analysis of the business process. Thus, process analysts have to examine the results and identify where the process can be improved. This thesis serves the objective of identifying, as opposed to merely presenting analytics, business process improvements opportunities in the business process. This thesis will require the student to build an understanding of a framework, examine more closely how improvement opportunities are identified, how they can be redesigned, survey existing process mining techniques that can support such activities, develop a new process mining algorithm (using, modifying, and extending existing process mining methods), and, finally, evaluate the output.

The above topic will focus on one or few aspects of a business process (activity, control-flow, data-objects, resources, process fragments, variants etc.).

Blockchain Capabilities for Business Process Redesign

Supervisor: Fredrik Milani

Blockchain technology has emerged as a disruptive technology and while it is not receiving the same level of attention as before, companies are seriously examining its use to improve their business processes. However, it is still not clear for what processes it is best applied to and when it can enable what kind of redesign. This thesis aims at exploring this topic by conducting a systematic literature review to identify what capabilities of blockchain technology are relevant/applicable for business processes, what processes can be redesigned, and how processes can be redesigned if powered by blockchain technology. The thesis requires to develop an SLR protocol, conduct the SLR, analyze the papers, and elicit a framework that addresses the above stated questions.

Business Process Analysis with Process Mining – Case Study

Supervisor: Fredrik Milani

This topic is for students that have access to event logs from industry and which to conduct a business process project to discover, analyze, and improve business processes at a company. For this topic, you will have to extract a sample log so we can be sure process mining can be applied. This thesis topic requires of a student to apply parts of the BPM life cycle on a case. Thus, the thesis requires using process mining to discover business process models, conduct analysis (both manual and data driven), and propose redesign of the business process. The results are then evaluated by either implementing and assessing the changes, simulation, or interviews.

Topics on Blokckchain Applications, Privacy-aware and Secure Software Design

Modelling Languages for Blockchain Applications

Supervisors: Mubashar Iqbal and Raimundas Matulevičius

Contact: rma ät ut dot ee

While designing and developing blockchain applications (dApps) developers need to deal with the principles of distributed ledger, chains of blocks, smart contracts, crypto-hashes and other domain specific concepts. However the standard modelling languages (e.g., BPMN, UML, Archimate) does not contain constructs to represent dApp components. Although there exists a few attempts to enrich the modelling languages, but these mainly result in the model annotations and they do not include systematic extensions of the modelling language. The main goal of this topic is to develop the semantic and syntactic modelling constructs to support modelling of the blockchain applications. The main steps of the research include:

Review the literature for the blockchain application modelling
Define the dApp modelling domain
Develop the semantics, concrete and abstract syntax for the dApp modelling (this could be done either as extensions of the existing standard languages or as a proposal of the new modelling language)
Illustrate feasibility of the proposal in the dApp modelling example.

1. Classification of EEG signals using Machine Learning

Supervisor(s): Yar Muhammad (Yar dot Muhammad ät ut dot ee) and Faiz Ali Shah

The idea is to use publicly available datasets from the EEG signals (brainwave forms). The Machine Learning (ML) algorithm(s) will be trained on the datasets and will be used to classify the subject’s brainwaves in real time.

The main idea is to classify brainwave forms to interpret states or Symptoms and Signs of the brain. The data from brain could be passed to ML algorithm(s) to predict/classify states or Symptoms and Signs of the brain. Machine algorithm(s) identifies a certain pattern in the data to distinguish states or Symptoms and Signs of the brain. The reason to rely on a machine learning based approach is inherent noise and variance in the data that any human would not reliably be able to filter out himself/herself manually.

Our goal is to develop application based on machine learning to classify states or Symptoms and Signs of the brainwaves in real time.

Motivation: In the near future, we envision these techniques to enable early diagnosis systems for the detection of neurodegenerative diseases. We can also use them to show signature patterns in physiological data. This can range from spine injuries to heart disease or cancer. This could even change how we treat early diagnosis.

Some relevant literature:

 [1] Using Machine Learning to Categorise EEG Signals From The Brain to Words

https://towardsdatascience.com/using-machine-learning-to-categorise-eeg-signals-from-the-brain-to-words-728aba93b2b3

[2] Notion in Motion: Wireless Sensors Monitor Brain Waves on the Fly https://www.scientificamerican.com/article/wireless-brain-wave-monitor/

2. Systematic Review of the Literature on How Machine Learning is used to classify EEG signals/Brainwave forms (Delta, Theta, Alpha, Beta, Gamma)

Supervisor(s): Yar Muhammad (Yar dot Muhammad ät ut dot ee) and Faiz Ali Shah

The Electroencephalography (EEG) analysis has been an important tool in neuroscience’s applications such as Brain Computer Interface (BCI) and even commercial applications. Many of the analytical tools used in EEG studies have used machine learning (ML) to uncover relevant information for neural classification and neuroimaging.

Recently, the availability of large EEG datasets and advances in ML have both led to the deployment of deep learning architectures, especially in the analysis of EEG signals, and understanding the information it may contain for brain functionality. The robust automatic categorisation of these signals is an important step towards making the use of EEG more practical in many applications.

Towards this goal, a systematic review of the literature on all machine learning algorithms and applications that use EEG classifications needs to be performed to address the following critical questions:

1. Which EEG classification tasks have been explored using machine learning?

2. What input formulations have been used for training the machine learning algorithms?

3. Are there specific machine learning algorithms suitable for specific types of tasks?

4. Compare all suitable results on the classification on EEG signals

5. Finally, a framework will be proposed based on the systematic review of the literature which serves as a path for the classifications of EEG signals/brain waveforms.

Some relevant literature:

 [1] Yannick Roy, Hubert Banville, Isabela Albuquerque, Alexandre Gramfort

“DEEP LEARNING-BASED ELECTROENCEPHALOGRAPHY ANALYSIS: A SYSTEMATIC REVIEW”. Jan 2019. (https://arxiv.org/pdf/1901.05498.pdf)

[2] Craik A, He Y, Contreras-Vidal JL, “Deep learning for electroencephalogram (EEG) classification tasks: a review”, J Neural Eng. 2019 Jun;16(3) https://www.ncbi.nlm.nih.gov/pubmed/30808014

[3] Laura Dubreuil, “How can we apply AI, Machine Learning or Deep Learning to EEG?”, March 2018 (https://www.neuroelectrics.com/blog/from-ai-to-deep-learning-applied-to-eeg/)

3. A systemic Review of the Literature on Classification Algorithms for EEG-based Brain Computer Interfaces

Supervisor(s): Yar Muhammad (Yar dot Muhammad ät ut dot ee) and Faiz Ali Shah

Brain-Computer Interface (BCI): devices that enable its users to interact with computers by mean of brain-activity only, this activity being generally measured by ElectroEncephaloGraphy (EEG). Electroencephalography (EEG): physiological method of choice to record the electrical activity generated by the brain via electrodes placed on the scalp surface.

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data.

Most current electroencephalography (EEG)-based brain-computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field. Many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs.

In this study the goal is to survey the BCI and machine learning literature from 2015 up-to-now to identify the new classification approaches that have been investigated to design BCIs. To synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons.

The result of study will provide a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.

[1] Lotte F, Bougrain L, Cichocki A, Clerc M, Congedo M, Rakotomamonjy A, Yger F., A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update, J Neural Eng, June 2018

[2] Geeta Sharma, Neha Sharma, Tanya Singh, Rashmi Agrawal, A Detailed Study of EEG based Brain Computer Interface, Proceedings of the First International Conference on Information Technology and Knowledge Management pp. 137–143, 2017

[3] CarmenVidaurre, ClaudiaSannelli, WojciechSamek, SvenDähne, Klaus-RobertMüller, Machine Learning Methods of the Berlin Brain-Computer Interface, IFAC-PapersOnLine, Volume 48, Issue 20, 2015, Pages 447-452

[4] Ewan S. Nurse, Philippa J. Karoly, David B. Grayden and Dean R. Freestone, A Generalizable Brain-Computer Interface (BCI) Using Machine Learning for Feature Discovery, PLoS One. 2015

[5] Natasha Padfield, Jaime Zabalza, Huimin Zhao, Valentin Masero, and Jinchang Ren, EEG-Based Brain-Computer Interfaces Using Motor-Imagery: Techniques and Challenges, MDPI, Sensors (Basel). 2019 Mar; 19(6): 1423.

[6] Benjamin Blankertz, Guido Dornhege, Steven Lemm, Matthias Krauledat, Gabriel Curio, Klaus-Robert Müller, The Berlin Brain-Computer Interface: Machine Learning Based Detection of User Specific Brain States

4. Systemic Review of the Literature on EEG-based BCI Applications

Supervisor(s): Yar Muhammad (Yar dot Muhammad ät ut dot ee) and Faiz Ali Shah

The goal of this study is to conduct a systematic review of the literature of last 10 years on all developed BCI application based on EEG signals.

[1] Sarah N.AbdulkaderAymanAtiaMostafa-Sami M.Mostafa, Brain computer interfacing: Applications and challenges, Egyptian Informatics Journal Volume 16, Issue 2, July 2015, Pages 213-230

[2] ED GRABIANOWSKI, How Brain-computer Interfaces Work (https://computer.howstuffworks.com/brain-computer-interface2.htm)

[3] Luis Fernando Nicolas-Alonso* and Jaime Gomez-Gil, Brain Computer Interfaces, a Review, Sensors (Basel). 2012; 12(2): 1211–1279. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3304110/

5.Lightweight Machine Learning Systematic Literature Review

Supervisor(s): Yar Muhammad (Yar dot Muhammad ät ut dot ee) and Yannick Le Moullec

Context:

Machine learning is very popular in many applications and comprises a large variety of methods and models. Machine learning is also increasingly implemented in resource limited devices, such as embedded systems, IoT nodes, etc. Such devices have limited computational power, memory, and energy budget; at the same time, they might have to fulfill (possibly stringent) latency requirements. For such devices, lightweight (e.g. small footprint, low latency) machine learning approaches are highly desirable since they can execute “on the edge” rather than in the cloud, thereby enabling local data analytics in e.g. IoT, mobile, and automotive applications.

Objectives and tasks:

The overall objective of this thesis is to conduct and document a systematic literature review of lightweight machine learning. The tasks to be carried out include, but are not necessarily limited to:

Survey, identify, and select relevant literature (both academic and commercial) and tools (including open-source and/or free);
Analyze critically the selected literature and tools;
Compare and contrast the selected literature and tools;
Provide recommendations for selecting and implementing lightweight machine learning in real-life applications.

Prerequisites:

An understanding of the fundamentals of machine learning
A strong interest for lightweight machine learning
Self-motivation and the ability to work independently

6. Air-flow sensing for applications in autonomous driving

Supervisor(s): Yar Muhammad (Yar dot Muhammad ät ut dot ee)

Summary: Establishing of a state of the art of fluid flow sensing in robotics, and investigating its applications in autonomous driving using computational-fluid-dynamics simulations.

Keywords: Perception, flow sensing, autonomous robotics, computational fluid dynamics

Skills required and developed: fluid-flow sensing, computational-fluid-dynamics simulation, data processing

Description: Sensing of fluid flow has gotten increased attention of underwater robotics community in the last decade. The field fluid flow has also been of interest in aerial robotics. It has however not been investigated much in the field of autonomous ground robotics to that extent.

The project aims to: 1. Establish a state of the art for fluid flow sensing in robotics (aerial, underwater as well as ground robotics). 2. Investigate potential uses of air-flow sensing in autonomous driving. 3. Validate potential applications of air-flow sensing in autonomous driving using computational-fluid-dynamics (CFD) simulations.

Some relevant literature: [1] Juan F. Fuentes-Pérez, Jeffrey A. Tuhtan, Gert Toming, Maarja Kruusmaa, Naveed Muhammad, Ruth Carbonell-Baeza, Mark Mussal, "Map-based localization in structured underwater environment using simulated hydrodynamic maps and an artificial lateral line", IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, 2017.

[2] Naveed Muhammad, Juan F. Fuentes-Pérez, Jeffrey Tuhtan, Gert Toming, Mark Musall, Maarja Kruusmaa, "Map-based localization and loop-closure detection from a moving underwater platform using flow features", Autonomous Robots, 43(6), 1419-1434.

[3] V. H. Bennetts, T. P. Kucner, E. Schaffernicht, P. P. Neumann, H. Fan and A. J. Lilienthal, "Probabilistic Air Flow Modelling Using Turbulent and Laminar Characteristics for Ground and Aerial Robots," in IEEE Robotics and Automation Letters, vol. 2, no. 2, pp. 1117-1123, April 2017. doi: 10.1109/LRA.2017.2661803

7. Advanced driver assistance systems using EEG

Supervisor(s): Yar Muhammad (Yar dot Muhammad ät ut dot ee)

Summary: Establishing of a state of the art on use of brain signals for advanced driver assistance systems (ADAS), and devising such a system using EEG.

Keywords: ADAS, EEG, signal classification, machine learning

Context: Advanced driver assistance systems (ADAS) have been investigates by researchers and industry alike, for a variety of applications ranging from assisted breaking to drowsiness detection.

Details: The project aims to:

Establish a state of the art for brain-signal based ADAS.
Devise an ADAS system using EEG brain signals for applications such as (but not limited to) drowsiness detection, alertness etc.

Some relevant literature:

[1] Adnan Shaout, Dominic Colella, S. Awad, “Advanced Driver Assistance Systems – Past, Present and Future” (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6153935&tag=1)

 [2] Arun Sahayadhas, Kenneth Sundaraj and Murugappan Murugappan, “Detecting Driver Drowsiness Based on Sensors: A Review”, Journal: sensors, MDPI, 2012 (https://www.mdpi.com/1424-8220/12/12/16937/htm)

[3] Jain, A.; Koppula, H.S.; Raghavan, B.; Soh, S.; Saxena, A., "Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models". In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.

[4] Yar M. Mughal, “A Parametric Framework for Modelling of Bioelectrical Signals”, Book, ISBN 978-981-287-969-1, Springer, 2015

[5] Ziebinski,Adam and Cupek,Rafal and Grzechca,Damian and Chruszczyk,Lukas, “Review of advanced driver assistance systems (ADAS)”, AIP Conference Proceedings, 2017.

Additional topics proposed by other groups in the Institute of Computer Science are available here.

Topics for IT Conversion Masters Theses (15 ECTS)

Software Product Management – A Systematic Grey Literature Review

Supervisor: Fredrik Milani (milani [ät] ut [dot] ee)

Software Product Management is a profession that is largely driven forward by practitioners. Therefore, practitioners have accumulated best practices and insights not readily available for the general public. This topic is to conduct a systematic literature review on one of the below listed aspects of software product management. An example of such a review is Paper The topics for which such a review can be conducted are for instance: Business Models, User Flows, UX Heuristics, MVP, Release Management, Risk Management, User Testing, Product Metrics.

Case Study in Business Process Improvement or Business Data Analytics

Supervisor: Marlon Dumas (marlon dot dumas ät ut dot ee)

This is a "placeholder" Masters project topic, which needs to be negotiated individually. If you work in a IT company and you are actively engaged in a business process improvement or business data analytics project, or if you can convince your hierarchy to put in time and resources into such project in the near-term, we can make a case study out of it. We will sit down and formulate concrete hypotheses or questions that you will test/address as part of this project, and we will compare your approach and results against state-of-the-art practices. I am particularly interested in supervising theses topics related to customer analytics, product recommendation, business process analytics (process mining), and privacy-aware business analytics, but I welcome other topic areas.

Case Study in Software Testing or Software Analytics

Supervisor: Dietmar Pfahl (dietmar dot pfahl ät ut dot ee)

This is a "placeholder" Masters project topic, which needs to be negotiated individually. If you work in a IT company and you are actively engaged in a software testing or software analytics, or if you can convince your hierarchy to put in time and resources into such a project in the near-term, we can make a case study out of it. We will sit down and formulate concrete hypotheses or questions that you investigate as part of this project, and we will compare your approach and results against state-of-the-art practices. I am particularly interested in supervising theses topics related to mutation testing, testing of embeded software, testing safety-critical systems, security testing of mobile apps, anlysis of project repositories to make software development processes more efficient and effective, but I welcome other topic areas.

Bachelor Thesis Projects

Privacy Issues in Predictive Process Monitoring

Supervisor: Fabrizio Maggi (f.m.maggi@ut.ee)

Predictive Process Monitoring applies machine learning algorithms, e.g., neural networks, to business processes. It aims at providing predictions of future states of a process case, such as the outcome of a process. For training and using the machine learning models the processing of personal data is required. The usage of personal data without consent is prohibited by privacy laws, such as the GDPR. However, anonymization techniques put the required data outside of the scope of such regulations. Therefore, adjusting anonymization techniques for predictive process monitoring is extremely important. Nonetheless, it should be ensured that anonymizing the data does not eliminate the utility of the prediction. Dimensionality reduction is a proven approach to ensure privacy and preserve utility for machine learning. This project aims at investigating the influence of dimensionality reduction techniques on predictive process monitoring. Reference paper: https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf

Tell it with your own words: Defining Business Process Models with Natural Language Processing and Speech Recognition

Supervisor: Fabrizio Maggi (f.m.maggi@ut.ee)

This is a new frontier for Business Process Management and a really interesting research topic. In this thesis, we will start from techniques for Speech Recognition and Natural language processing to create model editors that interact with the user by requiring as input his/her voice. The output of speech recognition techniques will be used as input for Natural language Processing that, in turn, will produce process models as explained in https://hanvanderaa.com/wp-content/uploads/2019/03/CAISE2019-Extracting-declarative-process-models-from-natural-language.pdf

Topic: Overview of code smell detection tools for Android and/or iOS -- TOPIC HAS BEEN TAKEN!!!

Supervisor: Kristiina Rahkema (kristiina dot rahkema ät gmail dot com)

There is a multitude of tools available for programmers that detect code smells and help refactor code. Most tools detect a different subset of code smells and sometimes use different rules for code smell detection. The aim of this project is to give an up to date overview of code smell detection tools that exist for the Android and/or iOS environment. The student should identify which code smells are detected by each tool and try the tools out for a list of Android/iOS applications. The given overview should compare these results. Do these tools detect the same code smells? Which tools are easier to use, more popular? Where are the gaps in code smell detection?

Lab Package Development & Evaluation for the Course 'Software Testing' (LTAT.05.006)

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

The course Software Testing (MTAT.03.159) has a series of practice sessions in which 2nd and 3rd year BSc students learn a specific test technique. We would like to improve existing labs and add new labs.

This topic is intended for students who have already taken this software testing course and who feel that they can contribute to improving it and by the same token complete their Bachelors project. The scope of the project can be negotiated with the supervisor to fit the size of a Bachelors project.

The tasks to do for this project are as follows:

Selection of a test-related topic for which a lab package should be developed (see list below)
Development of the learning scenario (i.e., what shall students learn, what will they do in the lab, what results shall they produce, etc.)
Development of the materials for the students to use
Development of example solutions (for the lab supervisors)
Development of a grading scheme
Evaluation of the lab package

Topics for which lab packages should be developed (in order of urgency / list can be extended based on student suggestions):

Automated Unit & Systems Testing
Visual GUI Testing
Issue Reporting
Continuous Integration & Testing
Mobile App Testing (focus on security)
Other topics that you find interesting and would like to discuss with me regarding their suitability

Variation in Estonian folksongs

Mari Sarv (mari@haldjas.folklore.ee) and Rajesh Sharma (rajesh dot sharma ät ut dot ee)

Estonian Runo songs form a digital corpus of approximately 100000 song texts unevenly spread over 101 parishes of Estonia. The task of MA research is to study the patterns of variation among this corpus on the basis of language data (word forms) and available metadata (singer, collector, place, time, classification of songs). The language of folk songs is highly variable including archaisms and dialectal variation, thus the NLP tools for standard language are not easily applicable. Main question of the research is to find out if there can be detected different patterns of variation on linguistic level and on content level, to contribute to the general discussion on the essence of folkloric communication as well as to the better knowledge of the regional variation of Estonian language and culture.