Software Engineering Group

Student Projects 2015/2016

Below is a list of project topics for Masters and Bachelors theses offered by the software engineering research group in 2015-2016. The projects are divided into:

Masters projects
Bachelors projects

If you're interested in any of these projects, please contact the corresponding supervisor.

Masters projects

Case Study on Exploratory Testing (NOTE: This project is already taken)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Exploratory software testing (ET) is a powerful and fun approach to testing. The plainest definition of ET is that it comprises test design and test execution at the same time. This is the opposite of scripted testing (having test plans and predefined test procedures, whether manual or automated). Exploratory tests, unlike scripted tests, are not defined in advance and carried out precisely according to plan.

Testing experts like Cem Kaner and James Bach claim that - in some situations - ET can be orders of magnitude more productive than scripted testing, and a few empirical studies exist supporting this claim to some degree. Nevertheless, ET is usually is often confused with (unsystematic) ad-hoc testing and thus not always well regarded in both academia and industrial practice.

The objective of this project will be to conduct a case study in a software company investigating the following research questions:

To what extend is ET currently applied in the company?
What are the advantages/disadvantages of ET as compared to other testing approaches (i.e., scripted testing)?
How can the current practice of ET be improved?
If ET is currently not used at all, what guidance can be provided to introduce ET in the company?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Mobile Testing (NOTE: This project is already taken)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of mobile testing. Focus shall be placed on the question how to optimise the manual testing of mobile applications. The objective of this project will be to investigating the following research questions:

To what extend is mobile testing currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied mobile testing techniques and tools?
How can the current practice of mobile testing be improved?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Test-Driven Development (TDD) (NOTE: This project is already taken)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of TDD. The objective of this project will be to investigating the following research questions:

To what extend is TDD currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied TDD techniques and tools?
How can the current practice of TDD be improved?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Test Automation

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of test automation. The objective of this project will be to investigating the following research questions:

To what extend is test automation currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied test automation techniques and tools?
How can the current practice of test automation be improved?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on A/B Testing (NOTE: This project is already taken)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in his/her company to analyse the current state-of-the-practice of A/B testing. The objective of this project will be to investigating the following research questions:

To what extend is A/B testing currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied A/B testing techniques and tools?
How can the current practice of A/B testing be improved?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Using Data Mining & Machine Learning to Support Decision-Makers in SW Development

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Project repositories contain much data about software development activities ongoing in a company. In addition, there exists much data from open source projects. This opens up opportunities to analysis and learning from the past which can be converted into models that help make better decisions in the future - where 'better' can relate to either 'more efficient (i.e., cheaper) or more effective (i.e., with higher quality).

For example, we have recently started a research activity that investigates whether textual descriptions contained in issue reports can help predict the time (or effort) that a new incoming issue will require to be resolved.

There are, however, many more opportunities, e.g., analysing bug reports to help triagers assign issues to developers. And of course, there are other documents that could be analysed: requirements, design docs, code, test plans, test cases, emails, blogs, social networks, etc. But not only the application can vary, also the analysis approach can vary. Different learning approaches may have different efficiency and effectiveness characteristics depending on the type, quantity and quality of data available.

Thus, this topic can be tailored according to the background and preferences of an interested student.

Tasks to be done (after definition of the exact topic/research goal):

Selection of suitable data sources
Application of machine learning / data mining technique(s) to create a decision-support model
Evaluation of the decision-support model

Prerequisite: Students interested in this topic should have successfully completed one of the courses on data mining / machine learning offered in the Master of Software Engineering program.

Tool for Assessing Release Readiness (NOTE: This project is already taken)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Release-readiness (RR) is a time dependent attribute of a software product. It reflects the status of the current implementation and quality of the software and can be determined (estimated) by aggregating the degree of satisfaction of so-called RR attributes (e.g., defect detection rate, number of open issues, code churn, ...).

The aim of this project is - after familiarisation with the relevant literature and concepts - to develop a tool that adresses the following (tentative) requirements:

R1: The tool should provide a set of pre-defined RR attributes and their corresponding metrics.
R2: The tool should integrate the existing project management tools so that required data for calculating RR metrics can be collected automatically.
R3: The tool should allow product managers evaluating degree of satisfaction of the RR attributes based on the objective measures.
R4: The tool should provide an interactive visual dashboard for monitoring the status of the degree of overall RR at any point of time in the release cycle.
R5: The tool should have the drill-down capability so that product manager can gain insights about the RR attributes which are limiting the RR.
R6: The tool should allow product managers to make projections of RR at the release time.
R7: The tool should provide visual indicator so that product manager can understand the impact of the individual RR on overall RR.

Depending on the interests of the student and the compatibility, this work might be conducted in collaboration with an ongoing PhD project at the University of Calgary.

Related literature:

S. McConnell. Gauging software readiness with defect tracking. Software, IEEE, 14(3):135-136, May 1997.
T.-S. Quah. Estimating software readiness using predictive models. Inf. Sci., 179(4):430–445, Feb. 2009.
S. M. Shahnewaz. RELREA - An analytical approach supporting continuous release readiness evaluation. Master’s thesis, University of Calgary, 2014.
M. Staron, W. Meding, and K. Palm. Release readiness indicator for mature agile and lean software development projects. In C.Wohlin, editor, Agile Processes in Software Engineering and Extreme Programming, volume 111 of Lecture Notes in Business Information Processing, pages 93–107. Springer Berlin Heidelberg, 2012.
M. Ware, F. Wilkie, and M. Shapcott. The use of intra-release product measures in predicting release readiness. In Software Testing, Verification, and Validation, 2008 1st International Conference on, pages 230–237, April 2008.
R.Wild and P. Brune. Determining software product release readiness by the change-error correlation function: On the importance of the change-error time lag. In System Science (HICSS), 2012 45th Hawaii International Conference on, pages 5360–5367, Jan 2012.

Text Analysis / Topic Mining / Sentiment Analysis for Open Innovation

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Background: Software development techniques are developing swiftly and depend on the particular context (e.g., types of products, application domains, development processes used, and other factors). Likewise, new features have to be delivered at the right time and in ever faster development cycles.

Goal of the thesis: In order to help developers do their job better, they need to use the best tools for their specific needs. In our research group, we would like to help software tool developers to help understand what features are most important for their customers (i.e., professional software developers). We would like to find out what it is that software developers like or not like about the tools they are using, and what features they are missing. We hope to find answers by analysing popular Q & A sites, like Stack Overflow, with regards to the topics software developers talk about and the opinions they have regarding their development tools.

Method: Tackling this MSc theme, involves the following tasks (list not complete)

Selection of the type of software developers (developers, testers, managers, etc.), their application domain (e.g., mobile apps, systems development, embedded software developers, etc.) and tools these developers are using (e.g., IDEs, test tools, requirements analysis tools, build tools, etc.)
Selection of the source(s) to analyse (e.g., Stack Overflow)
Selection of the analysis technique to apply (e.g., topic mining, sentiment analysis, etc.)

A recent example of a published study that could be used as a starting point for this thesis can be found here:
What are mobile developers asking about? A large scale study using stack overflow

Note: This thesis topic can be split into several theses and thus be shared by several students.

Prerequisite: Students interested in this topic should have successfully completed one of the courses on data mining / machine learning offered in the Master of Software Engineering program.

Data Analysis of Critical Success Factors for Software Projects (NOTE: This project is already taken)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Successful completion of software projects is critical for all project stakeholders. However, unfortunately, a large number of software projects are not completed within the specified budget and time constraints, are challenged, only partially meet their user specifications, or totally end in failure (failed projects).

To analyze success and failure in software projects systematically, various lists of Critical Success Factors (CSF) have emerged in the literature, e.g., [1-4].

Together with colleagues in Turkey, we have conducted a survey among project stakeholders. The survey data shall be analysed and results be presented and discussed.

Literature: [1] A. Ahimbisibwe, R. Y. Cavana, and U. Daellenbach, "A contingency fit model of critical success factors for software development projects," Journal of Enterprise Information Management, vol. 28, pp. 7-33, 2015. [2] H. Akkermans and K. van Helden, "Vicious and virtuous cycles in ERP implementation: a case study of interrelations between critical success factors," Eur J Inf Syst, vol. 11, pp. 35-46, 03/08/print 2002. [3] T. Chow and D.-B. Cao, "A survey study of critical success factors in agile software projects," Journal of Systems and Software, vol. 81, pp. 961-971, 6// 2008. [4] J. S. Reel, "Critical success factors in software projects," Software, IEEE, vol. 16, pp. 18-23, 1999.

Lab Package Development & Evaluation for the Course 'Software Testing' (MTAT.03.159) (NOTE: This project is already taken)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

The course Software Testing (MTAT.03.159) has currently 6 labs (practice sessions) in which 2nd and 3rd year BSc students learn a specific test technique. We would like to improve existing labs and add new labs.

The tasks to do for this thesis are as follows:

Selection of a test topic for which a lab package should be developed (see list below)
Development of the learning scenario (i.e., what shall students learn, what will they do in the lab, what results shall they produce, etc.)
Development of the materials for the students to use
Development of example solutions (for the lab supervisors)
Development of a grading scheme
Evaluation of the lab package

Topics for which lab packages should be developed (in order of urgency / list can be extended based on student suggestions):

Web-based Testing with Selenium
Mutation Testing
Unit Testing / TDD
Static Code Analysis
Search-Based Testing

Business Process Architecture

Supervisor: Fredrik P. Milani (milani at ut dot ee)

Business Process Architecture can be defined as "the structure of the enterprise in terms of its governance structure, business processes, and business information.” As such it captures the main processes of an organization and models them in a way to illustrate their relationship and type. Different proposals of how to capture the business process architecture as graphical model but there is no standard method or way. In this thesis project, you will:

Conduct a systematic literature survey of different approaches to model a business process architecture; compare and contrast the identified approaches
Prepare a business process architecture at an organization (it's a real organization in Estonia (I will communicate it to you if you ask me via e-mail).
Model the architecture of the organization in accordance with the different approaches and through experiment with domain experts of the organization, examine which approach they prefer and why?
Possibly build a simple application in which main processes of architecture can be entered and the business process architecture is presented according the approach chosen.

Model-driven engineering of hypermedia REST applications (NOTE: This project is already taken)

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Representational state transfer (REST) is an architectural style that has become popular in the development of Web-based information systems. For the purpose of design, we view a hypermedia REST application as consisting of two aspects: 1) a structural aspect that deals with the data structure of the resources exposed by the application and a set of (CRUD) operations over these resources, and 2) a dynamic part that deals with determining which operations can be applied to a resource given its current state. We foresee that the former aspect can be captured by means of annotated class diagrams while the latter can be captured by means of state chart diagrams.

In this project, you will design and implement a set of tools that takes as input a set of class diagrams and of statechart diagrams, and generates the skeleton of a hypermedia REST application. This project requires some background knowledge in software modeling and development of web-based applications.

Automated testing of Hypermedia REST application (NOTE: This project is already taken now)

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Representational state transfer (REST) is an architectural style that has become popular in the development of Web-based information systems. As for any other piece of software, testing plays a major role in the development of a hypermedia REST application. In this context, we see a hypermedia REST application as consisting of two aspects: 1) a static aspect, that is the set of resources exposed by the application and the operations over them, and 2) a dynamic aspect that describes the sequence of operations that follow the normal execution of the application.

Given a set of class diagrams and state charts, we aim at generating a set of test cases that exercise the application. As a way to specify a criterion of quality, we also aim at evaluating the coverage achieved by the generated test cases.

Mining Business Process Models with Advanced Synchronization Patterns

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Automated process discovery aims extracting process models from information captured in execution logs from information systems. Most of state-of-the-art methods are designed to discover models on which the execution of an event depends on the completion of a fixed number of other events. This type of dependency is referred to as basic synchronization pattern. In some real-world scenarios, however, this constraint is not well suited, e.g., a purchase decision could be taken even before all requested quotes are received (synchronization of “n-out-of’-m” events) or whenever a deadline is reached (time related constraints).

In this project, you will extend existing and/or design new techniques that enable the discovery of process models with advanced synchronization patterns evoked above.

Lightweight process mining tool based on BPMN

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

In this project, you will implement a lightweight and easy-to-use business process mining tool that natively supports the BPMN process modeling standard. Like other process mining tools, the tools (e.g. Disco or ProM), the tool will allow users to discover process models from event logs and will provide zoom-in and zoom-out functionality by filtering out infrequent paths and events. This project is quite challenging due to the difficulty of efficiently re-computing BPMN models from event logs at different levels of detail. The project will require excellent programming skills and ability to understand and perhaps also design relatively complex algorithms. The project will be conducted at STACC and may involve a remuneration. The project is intended for a full-time student who is able to work physically at STACC.

Editor and Execution Engine for DMN Decision Models

Marlon Dumas (marlon dot dumas ät ut dot ee)

The idea of managing complex decisions in business processes based on explicit models has gained a significant traction in recent years. Among other things, this trend has led to the emergence of a new standard for modeling decisions, namely DMN (Decision Model and Notation) endorsed by the Object Management Group (OMG). Commercial tools supporting this standard are seeing light, most notably the Signavio Decision Manager. However, there is currently no open-source or freely-available solution in this space. In this project, you will develop an integrated tools for creating and executing DMN models preferably in the form of a Web application but other options will be considered once the targeted users are defined more precisely. The project will be conducted at STACC and may involve a remuneration. The project is intended for a full-time student who is able to work physically at STACC.

Machine Learning Techniques for Optimization of Garbage Collection in Java Web applications (NOTE: This project is already taken)

Marlon Dumas (marlon dot dumas ät ut dot ee)

In this project, you will apply machine learning techniques to performance data extracted from instrumented Java Web applications, in order to identify suitable JVM configuration parameters to optimize memory management (specifically to minimize garbage collection overhead). The project will be conducted at STACC in cooperation with Plumbr. The project requires good knowledge of machine learning techniques (e.g. having taken the Data Mining and Machine Learning courses) and preferably also knowledge of JVM internals and Java performance optimization. The project requires a full-time student who is able to work physically at STACC.

Tool for Mobile Application Usage Analytics (NOTE: This project is already taken now)

Marlon Dumas (marlon dot dumas ät ut dot ee)

In this project, you will develop a toolset to analyze usage logs of instrumented mobile applications deployed on a continuous integration server (Greenhouse). The tool-chain will incorporate log mining methods with data visualization techniques in order to allow testers and analysts to identify performance, reliability or usability issues early on during the development of mobile applications. The project will be conducted at STACC in cooperation with Greenhouse. The project requires knowledge of Android development as well as some data mining skills. The project is intended for a full-time student who is able to work physically at STACC.

Visualizing the effects of Business Process Change (NOTE: This project is already taken now)

Marlon Dumas (marlon dot dumas ät ut dot ee)

Fredrik Milani (milani at ut dot ee) and Marcello Sarini

The aim of this project is to identify and to implement the most pertinent visualization functionalities to make both visible and easily comprehensible the effects of the business process change to the end users affected by the change.

The work is based on a conceptual framework aimed at describing the effects of the change on a business process by considering some high level factors easily interpreted by the people affected by the change and related to their Job characteristics, such as Autonomy, Skill variety, Dealing with others, Task Identity, Task significance and Feedback from the work itself.

The conceptual framework is actually implemented in a prototype focusing mainly on the back-end side functionalities and it is developed with the MEAN js development framework and considering the Signavio BPMN exported xml file.

It is expected that the output of the Masters thesis will become a publicly available tool that would be made available on a software-as-a-service basis, in a similar style as the BIMP simulator and possibly become part of the Signavio BPM Academic initiative.

The work will be not focused only on visualization aspects, but also on the completion of back-end aspects not actually considered such as the definition of measures which require additional information from the user, the identification (and visualization) of patterns (mainly identifying communication and coordination behaviors), the management of more complex business models (including multiple lanes within pools).

In this proposal the work is more technical being mainly devoted to the implementation of the system than to the definition of the conceptual framework.

The background research for this proposal is more focused on identifying novel visualization aspects to emphasize both process visualization (including patterns) and measures visualization. A good reference starting point is the paper by Suntinger et colleagues, "The event tunnel: interactive visualization of complex event stream for business process pattern analysis" which you can find in attachment in this mail.

Runtime Conformance Checking of Control and Data Flow

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Runtime Conformance Checking is considered an important building block in the Business Process Management lifecycle for reasons such as timely detection of non-compliance as well as provision of reactive and proactive countermeasures. In particular, compliance monitoring is related to operational decision support, which aims at extending the application of process mining techniques to on-line, running process instances, so as to detect deviations, recommend what to do next and predict what will happen in the future instance execution.

Runtime Conformance Checking with Fuzzy Logic

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

In different research fields a research issue has been to establish if the external, observed behavior of an entity is conformant to some rules/specifications/expectations. Most of the available systems, however, provide only simple yes/no answers to the conformance issue. Some works introduce the idea of a gradual conformance, expressed in fuzzy terms. The conformance degree of a process execution is represented through a fuzzy score.

Deviance Mining of Business Processes

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Deviance mining leverages information hidden in business process execution logs in order to provide guidance to stakeholders so that they can steer the process towards consistent and compliant outcomes and higher process performance. Deviance mining deals with the analysis of process execution logs off-line in order to identify typical deviant executions and to characterize deviance that leads to better or to worse performance. This technique enables evidence-based management of business processes, where process workers and analysts continuously receive guidance to achieve more consistent and compliant process outcomes and a higher performance.

Predictive Monitoring of Business Processes

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Modern information systems that support complex business processes generally maintain significant amounts of process execution data, particularly records of events corresponding to the execution of activities (event logs). Predictive monitoring approaches to analyze such event logs in order to predictively monitor business process executions. When an activity is being executed, they can identify input data values that are more (or less) likely to lead to a good outcome of the process.

Discovering Business Rules from Event Logs

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Process mining techniques can be used to effectively discover process models from logs that capture a sample of business process executions. Cross-correlating a discovered model with information in the log can be used to improve the underlying process. However, the existing process discovery techniques produce models that tend to be large and complex, especially in flexible environments where process executions involve multiple alternatives. This "overload" of information is caused by the fact that traditional discovery techniques construct procedural models explicitly showing all possible behaviors. Using a declarative model, the discovered process behavior is described as a (compact) set of business rules. Three sub-topics can be investigated in this scenario:

Discovering Business Rules from Event Logs (control flow)
Discovering Business Rules from Event Logs (data flow)
Discovering Business Rules from Event Logs (activity lifecycles)

Generating Synthetic Event Logs for Benchmarking Process Mining Algorithms

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

One way to test a process discovery technique is to generate an event log by simulating a process model, and then verify that the process discovered out of such log matches the original one. For this reason, a tool for generating event logs starting from declarative process models becomes vital for the evaluation of declarative process discovery techniques. The aim of this thesis is to implement an approach for the automated generation of event logs, starting from process models that are based on Declare, one of the most used declarative modeling languages in the process mining literature.

Understanding the Quality of E-Services: Accessibility, Efficiency, Security and Usability

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

E-services play an important role in everyone’s life. Even more different e-services are usually combined together to produce business artifact. This means that understanding what is e-service, how it could be efficiently use, accessed and utilized remains an important topic. The first goal of this thesis is to build a conceptual model, which would help to understand what are the key e-service components regarding its usability, efficiency, accessibility, and security. The second goal is to understand what regulations, rules and constraints (e.g., lawful, social, organizational, etc.) influence the use of e-services. To achieve these goals the following steps needs to be performed: The project will consist of four major steps:

Perform systematic literature review on e-service conceptual definition (main emphasis on e-service accessibility, usability, efficiency and security)
Develop conceptual model to understand qualitative characteristics of e-services
Experience how the proposed conceptual models performs with different Estonian e-services.

Library for Security Risk-oriented Patterns

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Security risk-oriented patterns (SRP) are used to derive security requirements from the system and software models. For example, a method for security requirements elicitation from business processes (SREBP) describes five contextual areas and five SRPs, which can be applied to derive security requirements from business processes. However the current set of SRPs is rather limited, thus it requires considerable extension.

The goal of this thesis is to extend the SRP library so that it could capture different security concerns from business processes. To achieve this goal the following steps must be performed:

Review literature on security risk patterns. The starting point might be the one, listed below.
Get acquainted to the SREBP approach;
Develop new SRPs in order to extend the SRP library;
Validate the newly proposed SRPs empirically.

Initial literature sources

[1] Uzunov A. V., E. B. Fernandez, An extensible pattern-based library and taxonomy of security threats for distributed systems, Computer Standards & Interfaces, 36 (4), 2014, June 734-747

[2] Schumacher M., Fernandez-Buglioni E., Security Patterns: Integrating Security and Systems Engineering, Security Patterns: Integrating Security and Systems Engineering, John Wiley & Sons, 2006

Pattern-based Security Requirements Derivation from Use Case Models

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Security requirements engineering plays an important role during software system development. However in many cases they are overlooked and considered only at the end of the software development. A possible way to improve this situation is development of the systematic instruments, which would facilitate security requirements elicitation. For instance, security patterns describe a particular recurring security problem that arises in a specific security context and presents a well-proven generic scheme for a security solution.

Use cases diagrams is a popular modelling technique to model, organize and represent functional system and software requirements and define major actors who interact with the considered system. Recently their security extension – a.k.a., misuse case diagrams – is proposed to address the negative scenarios. The major goal of this Master thesis is to develop a set of security patterns using use and misuse cases and to illustrate how these patterns could be used to derive security requirements from the use cases. The thesis include the following steps:

Conduct a literature review on (i) security engineering and security patterns, (ii) use cases and misuse cases, and (iii) security risk-oriented misuse cases;
Develop a set of security patterns using use/misuse case diagrams;
Develop guidelines to derive security requirements using the developed security patterns;
Validating the developed security patterns and their guidelines empirically.

Mining for temporal and spatial patterns of rescue events

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project, you will apply data mining methods in order find patterns and regularities in a dataset consisting of rescue events, provided by the Rescue Services and other relevant data sources. Specific aims include finding seasonal regional trends in events/accidents and/or deaths, and to find regions where the rescue response time is significantly influenced by date or time. This thesis will be conducted in cooperation with the Rescue Services. The the data is in Estonian, so knowledge of Estonian is an advantage. The thesis can be written in Estonian.

Linking rescue event data with public (dynamic) data

Supervisor: Siim Karus (siim04 ät ut.ee)

The operational planning of rescue services would benefit from exploiting operational public data (e.g. public events, roadworks, population density, land use, etc.). The task in this thesis is to create a solution that combines online public data with rescue event data. The ultimate aim is to find correlations between the rescue events (and their attributes) and the online data in order to estimate rescue event risk changes. It is expected, that the final solution can bring the Rescue Service's attention to public data that can affect the response times or risk of an accident (thus, allowing better planning of response teams).

The thesis will be conducted in cooperation with the Rescue Services. If the cooperation with the Rescue Services is fruitful, it might be possible to continue this research/analysis beyond Master's Thesis (i.e. by introducing predictive analytics and making use of private datasets made available to the Rescue Services).

The datasets to be used are in Estonian, so knowledge of Estonian is an advantage. The thesis can be written in Estonian.

Profiling of deaths and risk groups of rescue events

Supervisor: Siim Karus (siim04 ät ut.ee)

In order to better identify the people with higher risk of accident or death, it is necessary to develop profiles of people at higher risk. In this thesis you will get access to data collected by the Rescue Services regarding cases of deaths or injuries. The scope of these data is extremely limited due to limits set by data privacy regulations. Thus, you will enrich the data by tapping into public data sources so as to build risk profiles. The first part of the thesis project will be to create a data crawler that finds supplemental data about the people injured or killed in accidents. The second part of the thesis project will be to build a distinguishing model in order to identify, what causes these people to be at higher accident risk than other people. This information will be used to improve the focus of the preventive efforts of the Rescue Services.

This thesis will be conducted in cooperation with the Rescue Services. The datasets to be used are in Estonian, so knowledge of Estonian is an advantage. The thesis can be written in Estonian.

Relationship between computer generated code and fault proneness

Supervisor: Siim Karus (siim04 ät ut.ee)

We have developed a method for quantified estimations of the extent of computer generated code used in software modules. The hypothesis is that computer generated code leads to less errors. This thesis topic is about testing this hypothesis on software development data. In short, the student will collect or reuse source code revisioning data and calculate the computer-generated code amount estimate for the modules at different points in time. Then it will use the issue repository data to check, which modules have more errors found in them (at different points in time). Finally, it will try to model a relationship between computer generated code extent and error proneness.

GPU-accelerated data analytics

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project a set of GPU accelerated data mining or analytics algorithms will be implemented as an extension to an analytical database solution. For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to analytical databases (preferably MSSQL, Oracle or PostgreSQL), you will also need to learn the extension interfaces of these databases and their native development and BI tools. Finally, you will assess the performance gains of your algorithms compared to comparable algorithms in existing analytical database tools.

GPU-accelerated Developer Feedback System

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project you will implement source code analytics algorithms on GPU and devise a reliable and fast method for integrating the analysis feedback into integrated development environments (IDEs). For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to IDEs (preferably Visual Studio or Eclipse), you will also need to learn the extension interfaces of these IDEs and their native development tools. Finally, you will assess the performance gains of your algorithms compared to implementations of these algorithms running on CPU.

Replication of Empirical Software Engineering Case Study Experiments

Supervisor: Siim Karus (siim04 ät ut.ee)

Empirical software engineering community publishes many case studies validating different approaches and analytical algorithms to software engineering. Unfortunately, these studies are rarely validated by independent replication. To make matters worse, the studies use different validation metrics, which makes them incomparable. Thus, your mission, should you choose to accept it, is to analyse different published case studies on one topic (e.g. bug detection, code churn estimation) to evaluate their replicability and replicate the studies in order to make them comparable. In short you will:

envisage a workflow/pipeline for replicating published studies (including

testing for replicability);

use the workflow to replicate several studies;
validate these studies and compare their results on an common scale.

Hot Deployment of Linked Data for Online Data Analytics

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of this project is to design and implement an "hot" linked data deployment extension to an open source analytics server, such as RapidAnalytics. Software tools such as Weka or RapidMiner allow building analytical applications, which exploit knowledge hidden to data. However, one of the bottlenecks of such toolkits, in settings where a vast quantities of data with heterogeneous data models are available, is the amount of human effort required for first unification of the data models at the stage of data pre-processing and then extraction of relevant features for data mining. Furthermore, these steps are repeatedly executed each time a new dataset is added or an existing one is changed. However, in case of open linked data uniform representation of data leverages implicit handling of data model heterogeneity. Moreover, there exist open source toolkits, such as FeGeLOD [1] (), which automatically create data mining features from linked data. Unfortunately, the current approaches assume that a linked dataset is already pre-processed and available as a static file for which the features are created each time the file is loaded.

In this thesis project first an extension will be developed for discovering and loading a new dataset to an analytics server. Then existing data mining feature extraction methods will be enhanced and incorporated to the framework. Finally, the developed solution will be validated on a real-life problem.

[1] Heiko Paulheim, Johannes Fürnkranz. Unsupervised Generation of Data Mining Features from Linked Open Data. Technical Report TUD–KE–2011–2 Version 1.0, Knowledge Engineering Group, Technische Universität Darmstadt, November 4th, 2011. Available at http://www.ke.tu-darmstadt.de/bibtex/attachments/single/297 .

Semantic Interoperability Layer for Stateful Multi-Device Tizen Applications

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

This project aims at leveraging an interoperability layer for Tizen to allow potentially independently developed applications to interact with each other either at the same Tizen instance or between multiple ones on distributed devices. The solution will save time in developing B2B applications for Tizen and enhances adoption of Tizen for B2B applications on the Web, smart phones and embedded systems.

Current Tizen applications are developed in HTML5 with support for CSS and JavaScript and packaged according to W3C Widgets 1.0 family of specifications. Communication between such applications is implemented in terms of launching application services - similarly to intents, which are used to invoke activities referring to particular applications or components in Android. Both approaches are limited by 1) low granularity of communication primitives/data structures, 2) stateless application invocation, and 3) local application execution. These shortages affect the Tizen applications respectively as follows: limited amount of business object types, which can be exchanged between applications, support only for simple B2B tasks, no support for teamwork.

To tackle these limitations, our approach will extend the existing Tizen framework by introducing a set of Tizen applications services for interoperability, state-handling and inter-device communication. Thereby our extension does not require any modification to the existing Tizen framework itself. Instead, it will provide add-ons for making Tizen applications interoperable at finer level of granularity, incorporate inter-device capability and state handling to applications. For Tizen applications to benefit from the extension it is required that 1) they are extended with finer metadata on business objects, they support, 2) their application services are bound to the metadata-enriched business objects and 3) instead of launching object-specific application services, the interoperability application service should be launched. The latter will take care of selecting specific application services where to forward the business objects. Inter-device application communication and state handling will be transparent to the existing Tizen platform.

The developed framework layer will be demonstrated on a proof-of-concept implementation of cross-functional business performance management (BPM) application running on multiple Tizen-enabled devices in collaborative settings.

Open Cloud Infrastructure for Cost-Effective Harvesting, Processing and Linking of Unstructured Open Government Data

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Bootstrapping Open Government Data projects, even when not considering the complementary five-star initiatives for linking the data, represents a tremendous task, if implemented manually in an uncoordinated way in ad hoc settings. Although the essential datasets may be publicly available for downloading, location, documentation and preparation of them for publication in a central (CKAN) repository represent a burden, which is difficult to absorb by officials of public administration. Furthermore, linking the prepared and published datasets represents even further challenges, especially in the case of semistructured and full-text documents, where understanding the content is complicated by lack of clear structure. Namely, detection of entities, which should be linked and detection of metamodels, which should be used for linking, is a search-intensive task even for machines, not only for humans.

Luckily, there are some open source tools for simplifying the process. A set of tools, including the ones for semi-automatic link discovery, is represented in the LOD2 Technology Stack (http://stack.lod2.eu/blog/). In addition there are general-purpose text processing frameworks such as Apache OpenNLP and for the Estonian language there is a named entity recognition solution (svn://ats.cs.ut.ee/u/semantika/ner/branches/ner1.1) available. Finally, there is NetarchiveSuite (https://sbforge.org/display/NAS/Releases+and+downloads) for Internet archival, which can be used for creating Web snapshots.

This project aims at developing a cloud platform for harvesting Open Government Data and transforming it into Linked Open Government Data. The platform consists of a Web archival subsystem, open data repository (CKAN), document content analysis pipeline with named entity recognition and resolution pipeline, and finally a linked data repository for serving the processed data. The Web archival subsystem will continuously monitor changes in the Web by creating monthly snapshots of the Estonian public administration Web, comparing the snapshots and detecting new datasets (or changes) together with their metadata. The datasets together with their metadata are automatically published at a CKAN repository. The CKAN repository is continuously monitored for new datasets and updates and each change will trigger execution of the pipeline of document content analysis (i.e. analysis of CSV file content). The pipeline will detect named entities from the source documents, resolve the names wrt other linked datasets (i.e. aadresses or organizations) and finally publish the updates at a linked data repository with an open SPARQL endpoint. The latter will provide means for consumption of Linked Open Government Data.

A Crawler for RESTful, SOAP Services and Web Forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Transforming the Web into a Knowledge Base: Linking the Estonian Web

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of the project is to study automated linking opportunities for Web content in Estonian language. Recent advances in Web crawling and indexing have resulted in effective means for finding relevant content from the Web. However, getting answers to queries, which require aggregation of results, is still in its infancy since better understanding of the content is required. At the same time there has been a fundamental shift in the content linking - instead of linking Web pages, more and more Web content is tagged and annotated to facilitate linking of smaller fragments of Web pages by means of RDFa and microformat markups. Unfortunately this technology has not been widely adopted yet and further efforts are required to advance the Web in this direction.

This project aims at providing a platform for automating this task by exploiting existing natural language technologies, such as named entity recognition for Estonian language, in order to link content of the entire Estonian Web. For doing this, two Master students will work closely, first in setting up the conventional crawling and indexing infrastructure for the Estonian Web and then extending the indexing mechanism with a microtagging mechanism, which will enable linking the crawled Web sites. The microtagging mechanism will take advantage of existing language technologies to extract names (such as names of persons, organizations and locations) from the crawled Web pages. In order to validate the approach a portion of the Estonian Web is processed and exposed in RDF form through a SPARQL query interface such as the one provided by the Virtuoso OpenSource Edition.

Automated Estimation of Company Reputation

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Reputation is recognized as a fundamental instrument of social order - a commodity, which is accumulated over time, is hard to gain and easy to loose. In case of organizations reputation is also linked to their identity, performance and the way others respond to their behaviour. There is an intuition that reputation of a company affects perception of its value by investors, helps to attract new customers and to retain the existing ones. Therefore organizations, focusing to long-term operation, care about their reputation.

Several frameworks, such as WMAC (http://money.cnn.com/magazines/fortune/most-admired/, http://www.haygroup.com/Fortune/research-and-findings/fortune-rankings.aspx), used by the Fortune magazine, have been exploited to rank companies by their reputation. However, there are some serious issues associated with reputation evaluation in general. First, the existing evaluation frameworks are usually applicable to evaluation of large companies only. Second, the costs of applying these frameworks are quite high in terms of accumulated time of engaged professionals. I.e. in case of WMAC more than 10,000 senior executives, board directors, and expert analysts were engaged to fill questionnaires to evaluate nine performance aspects of Fortune 1000 companies in 2009. Third, the evaluation is largely based on subjective opinions rather than objective criteria making continuous evaluation cumbersome and increases the length of evaluation cycles.

This thesis project aims at finding a solution to these issues. More specifically, the project is expected an answer the following research question: in which degree the reputation of a company is determined by objective criteria such as its age, financial indicators, sentiment of news articles and comments in the Web etc. The more specific research questions are the following:

Which accuracy in reputation evaluation can be achieved by using solely objective criteria?
Which objective criteria and which combinations discriminate best reputation of organizations?
In which extent does reputation of an organization affect reputation of another organization through people common in their management?
How do temporal aspects (organization's age, related past events etc) bias reputation?

In order to answer to these questions network analysis and machine learning methods will be exploited and a number of experiments will be performed with a given dataset. The dataset to be used is an aggregation of data from the Estonian Business Registry, Registry of Buildings, Land Register, Estonian Tax and Customs Board, Register of Economic Activities, news articles from major Estonian news papers and blogs and some propriatory data sources.

Collaborative Decision-Making with Hot Deployment of Linked Data and Open API Endpoints

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Recent economic downturn has increased the pressure on organizations to focus on better decision making. In this light, Business Intelligence (BI) initiatives are used to reduce costs, run more targeted campaigns through better customer segmentation, or detect fraud, just to mention a few applications of BI. However, the major shortcoming of currently available BI tools is that they do not support the process of decision-making directly. In fact the tools provide input to decision-making while capturing the entire decision-making process is left outside the scope of the tools. Therefore interest in decision making processes, models and techniques by industry and academia has been growing in past years.

Another major shortcoming of BI is that it mostly assumes static datasets as its input and structural changes in datasets will lead to costly and redundant manual labour related to reconstruction and validation of new reference models. By structural changes we mean changes in the overall common data model, addition/replacement of an individual dataset and integration with external API for data retrieval, just to mention a few.

This project will tackle to mentioned shortcoming by:

developing a proof-of-concept solution for hot deployment of terabyte datasets and tens of thousands of open API endpoints;
developing an adaptivity layer for aligning and incrementally validating BI reference models to changes in data sources;
developing an effective scheme of collaborative decision-making, which will facilitate better decision making with respect to past decisions as measured in terms of relevant key performance identicators (KPI).

Visualization of traffic flow and/or people density changes with animated texture/particles.

Toivo Vajakas (firstname.lastname ät ut.ee)

Download the project description.

Cell constructor

Leopold Parts (firstname.lastname ät gmail.com)

Download the project description.

Data Analysis Toolkit for Solid State Physics

Sergey Omelkov (firstname.lastname ät ut.ee)

Modern experimental setups for solid state physics has approached the limits in data acquisition speeds, so that the amount of data obtained is growing faster than the scientists are able to analyze using "conventional" methods. In the case of well-established experimental methods, the problem is usually somehow solved by suppliers of equipment who develop a highly specialized expensive software to do batch data analysis for a particular problem. However, this is impossible for state-of-the-art unique experimental stations, which are the main workhorses for the high-end research.

The objective of this task will be to start an open-source project and develop a universal yet powerful tool for data analysis in solid state physics. The working proof-of-concept for such tool has been developed and tested by the Institute of Physics, this concept can be used as a starting point. The tool will be based on a math scripting engine to handle the calculations (currently the symbiosis of SAGE and numpy), and a document-oriented database for storing the raw experimental data and calculation results (currently MongoDB).

The tools to be developed are (in the order of importance):

A data type suitable to store the data and analysis results, which is serializeable to database.
A set of methods for data processing commonly used in spectroscopy for data analysis, using the power of underlying math scripting engine
A tool to add the experimental data to DB directly from experimental setup software (in a form of LabView VIs)
A graphical tool to browse the DB and quickly import the data to scripting engine
An interface to import the calculation results (mainly images) from DB into text processors (LaTex, LyX, MSWord), maybe also into conventional data analysis programs, like Origin.
A system should be a multiuser environment for data exchange and protection (by the means of database)

The main requirement to data analysis process is: the result of any calculation stored in DB should either bear the links to initial data and the calculation procedure, or be simply a script that produces the result.

This project requires that the student is willing to understand the way the physicists see the data acquisition and analysis process. Experience in Python would also be much needed.

Bachelors projects

Note: The number of projects for Bachelors students is limited. But you can find several other potential project ideas by checking this year's list of proposed software projects. Some of the projects in the "Available" category could be used as Bachelors thesis topics. Also, we're open to student-proposed Bachelors thesis projects. If you have an idea for your Bachelors projects and your idea falls in the area of software engineering (broadly defined), please contact the group leader: Marlon . Dumas ät ut.ee

Rescue event categorisation

Supervisor: Siim Karus (siim04 ät ut.ee)

In this thesis, you will analyze data provide by the Rescue services in order to find commonalities in rescue events so as to categorise them. One of the aims will be to isolate and characterize less common rescue event categories, which are of special interest to the Rescue services.

The thesis will be conducted in cooperation with the Rescue Services. The thesis can be written in Estonian.

Workflow Automation With Business Data Streams

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

There exist services such as Flowxo and IFTTT which facilitate automation of simple workflows by facilitating creation of trigger / action pairs or if-then recipes, whose execution will orchestrate applications wrt external stimuli (e.g. application data stream). An example of a popular recipe is the following: "IF I post a picture on Instagram THEN save the photo to Dropbox" or a more complex example is "for every new deal in a CRM send the deal with an e-mail to a person with the GMail service, then wait for about 1 day and send a reminder SMS via Twilio service.

Such systems mostly rely on proprietary application data while there are cases where external stimulus will provide extra benefits. An example of such a case is integration of CRM and credit management tools with external stimuli in form of streaming company debt and risk score data for Order-to-Cash business process. There is a Stream API for business data currently under development at Register OÜ and it will provide a stream of events such as company debt change, changes in board membership and data about newly registered companies. Such data changes events can be easily applied in the context of CRM and a credit management (CM).

The aim of the project will be to leverage provision of an analogue of IFTTT, where users can define recipes for reacting into business data changes via actions in applications such as GMail, Odoo CRM etc.

The project will be done in collaboration with Register OÜ. The application will be developed by using the Complex Event processing (CEP) feature of Register Stream API.

Lead generator for accelerating B2B sales

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Companies care a lot about improving their sales and to meet such a demand numerous online solutions have been proposed. While for B2C sales the prevalent solutions use social media campaign and Web visitor data, for B2B sales there are solutions, which allow generating a list of leads based on a set of attribute values over company data such as its activity field, size, financial metrics etc.

Some solutions for the Estonian market include https://www.baltictarget.eu, http://sihtgrupid.ee, http://turundusnimekirjad.ee/ and http://www.kliendibaas.ee/. However, these solutions have the following deficiencies:

The market segment must be known before generating the leads
The set of attributes is mostly limited to geographic, activity field and financial data
the data is returned as a file.

This project aims at innovating the B2B sales by providing a solution, which differs from the existing ones in the following way:

instead of a list of feature / value pairs a user can define its market segment by giving a set of prospective clients as input to lead generation;
in addition to the activity fields, company size and financial metrics also data about owned real estate, credit history, credit risk, media coverage and related persons can be used;
instead of outputting leads to a CSV file, lead data will be directly imported to an existing CRM system or a new cloud instance of a CRM will be deployed and populated with the leads.

The project will be done in collaboration with Register OÜ.

Lightning-Fast Multi-Level SOAP-JSON Caching Proxy

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

In a previous Master thesis a solution was developed for proxying SOAP requests/responses to JavaScript widgets exchanging messages with JSON payload. Although this approach was shown to be useful for surfacing Deep Web data, it suffers from some performance bottlenecks, which arise when a SOAP endpoint is frequently used.

This Bachelors thesis aims at developing a cache component, which will make dynamic creation of SOAP-JSON proxies more effective with respect to runtime latency. The resulting cache component will be evaluated from the performance point of view.

A Crawler for RESTful, SOAP Services and Web Forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Reverse engineering the RESTBucks’s API

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

In resent years, multiple tools have emerged that allow one to produce interactive documentation for RESTful applications, often referred to as API blueprints. An API blueprint is basically a Web application that describes the set of resources and operations on them and, on the other hand, provides a way to test the functionality of the application.

The goal of this project is to take an open-source RESTful application, to reverse engineer its API and to specify it using two tools, namely Apiary’s Blueprint and Swagger. The project will allow you to critically compare the two tools.

Web Front-End for BIMP

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Business process simulation is a valuable tool for understanding the trade-offs when (re-)designing business processes. For instance, it provides ways for assessing the impact of changes on business processes, thus providing valuable insights to business analysts when it comes to decide how to proceed in the implementation of changes.

A few years ago, our group developed a simulation tool, called BIMP. BIMP is available as "Software as a Service" and is currently used by several Universities in their courses of Business Process Management. Currently, BIMP uses a form-based interface to enter the simulation information. The goal of this project is to implement a Javascript front-end application (probably using JointJS) that renders the BPMN model and allows the user to pick BPMN elements to enter the simulation information. For this project, we expect the student to have knowledge of Javascript and proficiency with Java-based web application development.

Web Front-End for BPMN-Miner Tool

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Our research group has developed a Java-based tool (namely BPMN-Miner) that takes as input a business process execution log (extracted from an enterprise system) and generates a business process model captured in the standard BPMN notation (in XML format).

The goal of this project is to implement a Javascript-based web front-end that will expose the functionality of the BPMN-Miner tool online. The front-end application will allow users to upload logs, and will graphically render the resulting BPMN models on the browser. For this project, we expect the student to have knowledge of Javascript and be familiar with Java-based web application development.

Runtime Conformance Checking of Control and Data Flow

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Runtime Conformance Checking is considered an important building block in the Business Process Management lifecycle for reasons such as timely detection of non-compliance as well as provision of reactive and proactive countermeasures. In particular, compliance monitoring is related to operational decision support, which aims at extending the application of process mining techniques to on-line, running process instances, so as to detect deviations, recommend what to do next and predict what will happen in the future instance execution.

Runtime Conformance Checking with Fuzzy Logic

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

In different research fields a research issue has been to establish if the external, observed behavior of an entity is conformant to some rules/specifications/expectations. Most of the available systems, however, provide only simple yes/no answers to the conformance issue. Some works introduce the idea of a gradual conformance, expressed in fuzzy terms. The conformance degree of a process execution is represented through a fuzzy score.

Deviance Mining of Business Processes

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Deviance mining leverages information hidden in business process execution logs in order to provide guidance to stakeholders so that they can steer the process towards consistent and compliant outcomes and higher process performance. Deviance mining deals with the analysis of process execution logs off-line in order to identify typical deviant executions and to characterize deviance that leads to better or to worse performance. This technique enables evidence-based management of business processes, where process workers and analysts continuously receive guidance to achieve more consistent and compliant process outcomes and a higher performance.

Predictive Monitoring of Business Processes

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Modern information systems that support complex business processes generally maintain significant amounts of process execution data, particularly records of events corresponding to the execution of activities (event logs). Predictive monitoring approaches to analyze such event logs in order to predictively monitor business process executions. When an activity is being executed, they can identify input data values that are more (or less) likely to lead to a good outcome of the process.

Generating Synthetic Event Logs for Benchmarking Process Mining Algorithms

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

One way to test a process discovery technique is to generate an event log by simulating a process model, and then verify that the process discovered out of such log matches the original one. For this reason, a tool for generating event logs starting from declarative process models becomes vital for the evaluation of declarative process discovery techniques. The aim of this thesis is to implement an approach for the automated generation of event logs, starting from process models that are based on Declare, one of the most used declarative modeling languages in the process mining literature.

Discovering Business Rules from Event Logs

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Process mining techniques can be used to effectively discover process models from logs that capture a sample of business process executions. Cross-correlating a discovered model with information in the log can be used to improve the underlying process. However, the existing process discovery techniques produce models that tend to be large and complex, especially in flexible environments where process executions involve multiple alternatives. This "overload" of information is caused by the fact that traditional discovery techniques construct procedural models explicitly showing all possible behaviors. Using a declarative model, the discovered process behavior is described as a (compact) set of business rules. Three sub-topics can be investigated in this scenario:

Discovering Business Rules from Event Logs (control flow)
Discovering Business Rules from Event Logs (data flow)
Discovering Business Rules from Event Logs (activity lifecycles)

Plugin for Discovering Business Rules from Event Logs

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

The discovery of business rules from event logs is emerging as a new challenge in the business process management field. The actual process behavior as recorded in execution traces is described as a set of business rules (expressed, e.g., using linear temporal logic) and used for process analysis. The candidate is required to implement a (Java) plug-in for a well-known process mining framework called ProM. The code for discovering business rules is already available in C code, so the main task is to create a wrapper for this code and integrate it with ProM.

In this project, we will not assume that you have any prior knowledge in the field of business process management or process mining. We will give you all the input knowledge in this field that you will require to complete the project. All you need are Bachelors-level (Java) programming skills, basic knowledge of XML, and a strong desire to learn new things.

Critical Comparison of the Business Motivation Model (BMM) and i*

Raimundas Matulevicius (raimundas.matulevicius ät ut.ee)

The Business Motivation Model (BMM) is a standardized modeling language to capture important concepts about why a business is undertaking certain actions, such as developing an information system. On the other hand, i* is a modeling language to specify actors and their goals, during the early phases of information system development. There are clear overlaps between these two languages, but also some relevant differences.

The questions to be answered in this thesis are: How does BMM compares against i*? Do they address essentially the same perspective? Do they complement each other? This question will be approached by defining a correspondence between the concepts in these languages and by applying them to a concrete case study.

Comparison of BPMN Security Extensions

Raimundas Matulevicius (raimundas.matulevicius ät ut.ee)

Recently a lot of BPMN extensions are proposed towards security analysis. These extensions concern different aspects, starting from the security problem definition, to security requirements introduction and control identification. The goal of the thesis is develop systematic and coherent overview of the extensions and to define a set of guidelines for selecting certain BPMN security extensions for targeted problems. This should provide an overview of emerging trends.

Starting points:

R. Braun, W. Esswein, Classification of Domain-Specific BPMN Extensions, The Practice of Enterprise Modeling Lecture Notes in Business Information Processing Volume 197, 2014, pp 42-57
Menzel, M., Thomas, I., Meinel, C.: Security Requirements Specification in Service- oriented Business Process Management. In: ARES 2009, pp. 41–49 (2009)
Altuhhova, O., Matulevičius, R., Ahmed, N.: An extension of business process model and notation for security risk management. International Journal of Information System Modeling and Design (IJISMD) 4(4), 93–113 (2013)
Cherdantseva Y., Hilton J., Rana O., Towards SecureBPMN - Aligning BPMN with the Information Assurance and Security Domain, Business Process Model and Notation, Lecture Notes in Business Information Processing Volume 125, 2012, pp 107-115
Marcinkowski, B., Kuciapski, M.: A business process modeling notation extension for risk handling. In: Cortesi, A., Chaki, N., Saeed, K., Wierzchoń, S. (eds.) CISIM 2012. LNCS, vol. 7564, pp. 374–381. Springer, Heidelberg (2012)
Saleem, M., Jaafar, J., Hassan, M.: A domain-specific language for modelling security objectives in a business process models of soa applications. AISS 4(1), 353–362 (2012)
Rodriguez, A., Fernandez-Medina, E., Piattini, M.: A bpmn extension for the modeling of security requirements in business processes. IEICE Transactions on Information and Systems 90(4), 745–752 (2007)

Security Requirements Prioritisation

Raimundas Matulevicius (raimundas.matulevicius ät ut.ee)

Requirements prioritisation plays an important role in the software system development. But, is prioritisation of security requirements in particular different from the traditional prioritisation of requirements? In this thesis it is important to give an answer what prioritisation method and under what circumstance could be applied for security requirements prioritisation. The potential contribution of this thesis is a proposal of the security requirements prioritisation method and its empirical validation.

Starting points:

Kaur G., Bawa S., A Survey of Requirement Prioritization Methods, International Journal of Engineering Research & Technology (IJERT), Vol. 2 Issue 5, May – 2013
Achimugu P., Selamat A., Ibrahim R., Mahrin M. N., A systematic literature review of software requirements prioritization research, Information and Software Technology, Volume 56 Issue 6, June, 2014, 568-585
Massey A. K., Otto P. N., Anto?n A. I., Prioritizing Legal Requirements, 2009 Second International Workshop on Requirements Engineering and Law (relaw'09)
Park K.-Y., Yoo S.-G., Kim J., Security Requirements Prioritization Based on Threat Modeling and Valuation Graph, Convergence and Hybrid Information Technology Communications in Computer and Information Science Volume 206, 2011, pp 142-152

Lab Package Development & Evaluation for the Course 'Software Testing' (MTAT.03.159)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

The course Software Testing (MTAT.03.159) has currently 6 labs (practice sessions) in which 2nd and 3rd year BSc students learn a specific test technique. We would like to improve existing labs and add new labs.

This topic is intended for students who have already taken this software testing course and who feel that they can contribute to improving it and by the same token complete their Bachelors project. The scope of the project can be negotiated with the supervisor to fit the size of a Bachelors project.

The tasks to do for this project are as follows:

Selection of a test topic for which a lab package should be developed (see list below)
Development of the learning scenario (i.e., what shall students learn, what will they do in the lab, what results shall they produce, etc.)
Development of the materials for the students to use
Development of example solutions (for the lab supervisors)
Development of a grading scheme
Evaluation of the lab package

Topics for which lab packages should be developed (in order of urgency / list can be extended based on student suggestions):

Web-based Testing with Selenium
Mutation Testing
Unit Testing / TDD
Static Code Analysis
Search-Based Testing

Visualization of traffic flow and/or people density changes with animated texture/particles.

Toivo Vajakas (firstname.lastname ät ut.ee)

Download the project description.

Cell constructor

Leopold Parts (firstname.lastname ät gmail.com)

Download the project description.