Student Projects 2016/2017

Below is a list of project topics for Masters and Bachelors theses offered by the software engineering research group for students who intend to defend in June 2017. The projects are divided into:

Masters projects
Bachelors projects

If you're interested in any of these projects, please contact the corresponding supervisor.

Masters projects

The Digital Value Chain (THIS TOPIC HAS BEEN BOOKED)

Supervisor: Fredrik P. Milani (milani at ut dot ee)

The value chain is a concept introduced by Porter in the mid 1980-ties. It has been widely used and referred to since then. It is included in every management and business process book. The original value chain depicted manufacturing firms as they were dominating the business world at that time. The value chain has since then been extended and used for other types of companies such as service companies. However, recently we have seen the fast growth of online companies. This thesis is about evaluating how the value chain of online companies look like and how they differ from traditional value chains. For this thesis, a survey of existing value chains (and perhaps reference models) are examined for manufacturing and service companies. Then “online company” is defined, identified in Estonia and approached. By conducting interviews, the "digital value chain" is elicited, analysed, synthetized, and compared/contrasted to the traditional value chain.

Business Process Architecture (THIS TOPIC HAS BEEN BOOKED)

Supervisor: Fredrik P. Milani (milani at ut dot ee)

Business Process Architecture can be defined as "the structure of the enterprise in terms of its governance structure, business processes, and business information.” As such it captures the main processes of an organization and models them in a way to illustrate their relationship and type. Different proposals of how to capture the business process architecture as graphical model but there is no standard method or way. In this thesis project, you will:

Conduct a systematic literature survey of different approaches to model a business process architecture; compare and contrast the identified approaches
Prepare a business process architecture at an organization (it's a real organization in Estonia (I will communicate it to you if you ask me via e-mail).
Model the architecture of the organization in accordance with the different approaches and through experiment with domain experts of the organization, examine which approach they prefer and why?
Possibly build a simple application in which main processes of architecture can be entered and the business process architecture is presented according the approach chosen.

How Business Processes are Improved? (THIS TOPIC HAS BEEN BOOKED)

Supervisor: Fredrik P. Milani (milani at ut dot ee) and Toomas Saarsen

Although there are re-design patterns defined for how business processes can be improved, their usage and implementation in re-design efforts is a highly “creative” process. Each person or group sees the improvement opportunities differently and chooses to focus on different sets of patterns. This raises a number of questions such as what are the different “ways” business processes are improved or if the structure of the “as-is model” relate to certain re-design patterns? These questions are relevant as it can help understanding how re-design is approached and perhaps give a framework to this “creative” process. Furthermore, it is important to know if certain structures of “as-is” models will limit or enable better re-design when improving business processes. This thesis is about analysing a set of "as-is" and "to-be process models in order to identify and categorize different ways that processes are improved (by suggesting changes) and to investigate the correlation between model structure and re-design patterns.

Visualizing the Effects of Business Process Changes (THIS TOPIC HAS BEEN BOOKED)

Supervisor: Fredrik P. Milani (milani at ut dot ee) and Marcello Sarini

The aim of this project is to identify and to implement the most pertinent visualization functionalities to make both visible and easily comprehensible the effects of the business process change to the end users affected by the change.

The work is based on a conceptual framework aimed at describing the effects of the change on a business process by considering some high level factors easily interpreted by the people affected by the change and related to their Job characteristics, such as Autonomy, Skill variety, Dealing with others, Task Identity, Task significance and Feedback from the work itself.

The conceptual framework is actually implemented in a prototype focusing mainly on the back-end side functionalities and it is developed with the MEAN js development framework and considering the Signavio BPMN exported xml file.

It is expected that the output of the Masters thesis will become a publicly available tool that would be made available on a software-as-a-service basis, in a similar style as the BIMP simulator and possibly become part of the Signavio BPM Academic initiative.

The work will be not focused only on visualization aspects, but also on the completion of back-end aspects not actually considered such as the definition of measures which require additional information from the user, the identification (and visualization) of patterns (mainly identifying communication and coordination behaviors), the management of more complex business models (including multiple lanes within pools).

In this proposal the work is more technical being mainly devoted to the implementation of the system than to the definition of the conceptual framework.

The background research for this proposal is more focused on identifying novel visualization aspects to emphasize both process visualization (including patterns) and metric visualization. A good reference starting point is the paper by Suntinger et al., "The event tunnel: interactive visualization of complex event stream for business process pattern analysis".

Comparative Evaluation of Procedural and Declarative Process Mining Tools

Supervisor: Fredrik P. Milani (milani at ut dot ee) and Fabrizio Maggi

ProM is an open source tool for applying different process mining tools designed for various purposes. These tools can help businesses improve their processes. There are two main approaches for process mining. One is a technique for procedural processes (activities take place in a predefined sequence of order), and the other for declarative processes (no predefined order but activities have constraints). This thesis topic is to compare how these two approaches perform on a log of a procedural (standardized) process vs. a log of a declarative (highly variable) process for various purposes such as process discovery, analysis and monitoring. The different tools are applied on both logs and the results are compared. Part of the work is to evaluate the different tools on parameters such as functionality, effectiveness, and usability.

Comparative Evaluation of ProM Tools for Process Discovery (THIS TOPIC HAS BEEN BOOKED)

Supervisor: Fredrik P. Milani (milani at ut dot ee) and Fabrizio Maggi

ProM is an open source tool for applying different process mining tools designed for various purposes. These tools can help businesses improve their processes. The question then arises of how well does process mining tools work and how easy are they to apply? Do they all result in the same models or not? If the don't, what are the differences and which method is more accurate? This thesis topic is a comparative analysis of different Prom tools for process discovery. The work includes evaluating different tools on the same set of logs from different industries for the purpose of evaluating and comparing parameters such as functionality, effectiveness, and usability. The thesis also includes the task of visually expressing the discovered models. The domain experts will also be engaged in offering their input for the evaluation on parameters such as correctness.

Comparative Evaluation of Prom Tools for Process Performance Analysis (THIS TOPIC HAS BEEN BOOKED)

Supervisor: Fredrik P. Milani (milani at ut dot ee) and Fabrizio Maggi

ProM is an open source tool for applying different process mining tools designed for various purposes. These tools can help businesses improve their processes. The question then arises of how well does process mining tools work and how easy are they to apply? What parameters do they measure and how do they display the results? Do they all display the same values from the process analysis? If the don't, what are the differences and which method is more accurate? This thesis topic is a comparative analysis of different Prom tools for process performance analysis. The work includes evaluating different tools on the same set of logs from different industries for the purpose of evaluating and comparing parameters such as functionality, effectiveness, and usability. The thesis also includes the task of visually expressing the generated performance dashboards and reports. The domain experts will also be engaged in offering their input for the evaluation on parameters such as correctness.

Runtime Conformance Checking with Fuzzy Logic

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

In different research fields a research issue has been to establish if the external, observed behavior of an entity is conformant to some rules/specifications/expectations. Most of the available systems, however, provide only simple yes/no answers to the conformance issue. Some works introduce the idea of a gradual conformance, expressed in fuzzy terms. The conformance degree of a process execution is represented through a fuzzy score.

Hybrid Process Modeling Tool

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

While the standard BPMN notation is widely used to capture routine business processes with a clear structure, it is often not suitable to capture processes with high levels of variability, such as those found for example in the healthcare or legal domains. Less-structured processes with a high level of variability can be described in a more compact way using a declarative language. By contrast, procedural process modeling languages seem more suitable to describe structured and stable processes. However, in various cases, a process may incorporate parts that are better captured in a declarative fashion, while other parts are more suitable to be described procedurally. In these scenarios, hybrid models are the best choice for describing business processes. Starting from a well-defined formal semantics of hybrid models a tool for modelling hybrid processes will be implemented and experimented on a real life case study. This project requires strong software development skills as it will involve the development of a non-trivial business process editor.

Lightweight BPMN Process Mining Tool

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

In this project, you will implement a lightweight and easy-to-use business process mining tool that natively supports the BPMN process modeling standard. Like other process mining tools, the tools (e.g. Disco or ProM), the tool will allow users to discover process models from event logs and will provide zoom-in and zoom-out functionality by filtering out infrequent paths and events. A main challenge in this project will be to efficiently re-compute BPMN models from event logs at different levels of detail in order to support online filtering and zooms-in/zooms-out features in the tool. The project will require excellent programming skills and ability to understand and perhaps also design relatively complex algorithms.

Mining Business Process Models with Advanced Synchronization Patterns

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Automated process discovery aims extracting process models from information captured in execution logs from information systems. Most state-of-the-art methods are designed to discover models on which the execution of an event depends on the completion of a fixed number of other events. This type of dependency is referred to as basic synchronization pattern. In some real-world scenarios, however, this constraint is not well suited, e.g., a purchase decision could be taken even before all requested quotes are received (synchronization of “n-out-of’-m” events) or whenever a deadline is reached (time related constraints).

In this project, you will extend existing and/or design new techniques that enable the discovery of process models with advanced synchronization patterns mentioned above.

Causal Deviance Mining of Business Processes

Supervisor: Marlon Dumas (marlon dot dumas ät ut dot ee)

Business process deviance refers to the phenomenon whereby a subset of the executions of a business process deviate, in a negative or positive way, with respect to the expected or desirable outcomes of the process. Deviant executions of a business process include those that violate compliance rules, or executions that undershoot or exceed performance targets. Deviance mining is concerned with uncovering the reasons for deviant executions by analyzing business process event logs. Current deviance mining techniques are focused on identifying patterns or rules that are correlated with deviant outcomes. However, the obtained patterns might not actually help to explain the causes of the deviance. In this thesis, you will enhance existing deviance mining techniques with causal discovery techniques in order to more precisely identify the potential causes of deviant process executions.

Dynamic Time Warping for Predictive Monitoring of Business Processes

Supervisor: Marlon Dumas (marlon dot dumas ät ut dot ee)

Predictive business process monitoring refers to a family of online process monitoring methods that seek to predict as early as possible the outcome of each case given its current (incomplete) execution trace and given a set of traces of previously completed cases. In this context, an outcome may be the fulfillment of a compliance rule, a performance objective (e.g., maximum allowed cycle time) or business goal, or any other characteristic of a case that can be determined upon its completion. For example, in a sales process, a possible outcome is the placement of a purchase order by a potential customer, whereas in a debt recovery process, a possible outcome is the receipt of a debt repayment.

Existing approaches for predictive business process monitoring are designed for processes with a relatively high level of regularity, where most cases go through the same stages and these stages are more or less of the same length. In the case of very irregular processes where the number of stages and their length is variable, the accuracy of these techniques generally suffers. In this project, you will design an approach to predictive process monitoring that addresses this limitation by using a time series analysis technique known as dynamic time warping. The thesis will adopt an experimental approach. You will implement a prototype and compare it with implementations of other predictive process monitoring techniques using a collection of real-life event logs.

Collaborative Business Process Execution Engine Based on Distributed Ledger Technology

Supervisor: Marlon Dumas (marlon dot dumas ät ut dot ee) and Luciano García-Bañuelos

Lack of mutual trust is one of the roadblocks for implementing a range of collaborative business processes in the field of e-business. Distributed ledger technology (e.g. based on blockchains) provides basic primitives to overcome this roadblock by allowing a set of parties to reach "consensus" on a sequence of collaborative transactions without requiring mutual trust. In this project, you will implement a lightweight collaborative business process management engine that allows multiple independent parties (e.g. companies) to execute a common business process (e.g. an invoice financing process) using distributed ledger technology as a backbone in order to ensure integrity and traceability. The tool will take as input a business process model in BPMN, and will allow step-by-step execution of this process with all transactions being recorded in a public block-chain. As a testing platform we will use the Ethereum platform. To this end, we will implement a technique to map the BPMN model into Solidity code to be deployed in Ethereum.

Case Study on Exploratory Testing

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

Exploratory software testing (ET) is a powerful and fun approach to testing. The plainest definition of ET is that it comprises test design and test execution at the same time. This is the opposite of scripted testing (having test plans and predefined test procedures, whether manual or automated). Exploratory tests, unlike scripted tests, are not defined in advance and carried out precisely according to plan.

Testing experts like Cem Kaner and James Bach claim that - in some situations - ET can be orders of magnitude more productive than scripted testing, and a few empirical studies exist supporting this claim to some degree. Nevertheless, ET is usually is often confused with (unsystematic) ad-hoc testing and thus not always well regarded in both academia and industrial practice.

The objective of this project will be to conduct a case study in a software company investigating the following research questions:

To what extend is ET currently applied in the company?
What are the advantages/disadvantages of ET as compared to other testing approaches (i.e., scripted testing)?
How can the current practice of ET be improved?
If ET is currently not used at all, what guidance can be provided to introduce ET in the company?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Test-Driven Development (TDD) (this topic has been reserved)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of TDD. The objective of this project will be to investigating the following research questions:

To what extend is TDD currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied TDD techniques and tools?
How can the current practice of TDD be improved?

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Test Automation (this topic has been reserved)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of test automation. The objective of this project will be to investigating the following research questions:

To what extend is test automation currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied test automation techniques and tools?
How can the current practice of test automation be improved?

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on A/B Testing

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in his/her company to analyse the current state-of-the-practice of A/B testing. The objective of this project will be to investigating the following research questions:

To what extend is A/B testing currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied A/B testing techniques and tools?
How can the current practice of A/B testing be improved?

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

How Does Software Process Improvement Literature Address Software Project Management

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

This topic is defined in the context of an international collaboration in which a comprehensive mapping study on software process improvement (SPI) literature was conducted. In the scope of this mapping study, a set of scientific publications was identified that deals with software project management in the context of SPI. The goal of the MSc thesis is to systematically analyse the identified literature. The analysis should follow the scheme used in reference [1] below. This thesis requires that the student is willing to dive into the world of quantitative and qualitative text analysis.

Prerequisite: Students interested in this topic should have successfully completed one of the courses on data mining / machine learning offered in the Master of Software Engineering program.

Reference: [1] How Does Software Process Improvement Address Global Software Engineering? by Marco Kuhrmann et al. (link to PDF)

Using Data Mining & Machine Learning to Support Decision-Makers in SW Development

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Project repositories contain much data about software development activities ongoing in a company. In addition, there exists much data from open source projects. This opens up opportunities to analysis and learning from the past which can be converted into models that help make better decisions in the future - where 'better' can relate to either 'more efficient (i.e., cheaper) or more effective (i.e., with higher quality).

For example, we have recently started a research activity that investigates whether textual descriptions contained in issue reports can help predict the time (or effort) that a new incoming issue will require to be resolved.

There are, however, many more opportunities, e.g., analysing bug reports to help triagers assign issues to developers. And of course, there are other documents that could be analysed: requirements, design docs, code, test plans, test cases, emails, blogs, social networks, etc. But not only the application can vary, also the analysis approach can vary. Different learning approaches may have different efficiency and effectiveness characteristics depending on the type, quantity and quality of data available.

Thus, this topic can be tailored according to the background and preferences of an interested student.

Tasks to be done (after definition of the exact topic/research goal):

Selection of suitable data sources
Application of machine learning / data mining technique(s) to create a decision-support model
Evaluation of the decision-support model

Prerequisite: Students interested in this topic should have successfully completed one of the courses on data mining / machine learning offered in the Master of Software Engineering program.

A Tool to Visualize Release Readiness Information

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Note: This thesis topic is related to an on-going PhD at the University of Calgary, Canada, and thus requires the willingness and ability of the student to communicate with the PhD student in Calgary regarding implementation details.

Background: Release-readiness (RR) is a time dependent measure of a software product that reflects the status of the current implementation and quality of the upcoming release. To make this measure more operational, determining release readiness (RR) is formulated as a binary classification problem. The aim of this project is - after familiarisation with the relevant literature and concepts- - to develop a tool that addresses the following (tentative) requirements. This work will be conducted in collaboration with an ongoing PhD project at the University of Calgary. The ongoing PhD research is focused towards developing a recommendation model, which will i) identify whether a software release is going to be ready or not ready before the pre-determined release date and ii) recommend actions to achieve on-time release by improving release readiness.

Tentative list of tool requirements:

R1: Integrate the existing project management services (e.g. JIRA, GitHub); so that required data for RR analysis can be collected automatically. Product manager should get web crawling support to extract web-based information (e.g. release dates from wiki page).
R2: Provide two graphical interfaces for retrospectively analyzing a pre-determined set of i) release readiness (RR) attributes (e.g., defect detection rate, number of open issues, code churn) and ii) post release success (PRS) measures (e.g. user rating, number of reported bugs).
R3: Allow product managers to select RR attributes and PRS measures from a recommended list using an interactive interface. The recommended list of RR attributes and PRS measures should be prepared and prioritized applying pre-determined analysis of former releases.
R4: Present PRS measures and release classes (i.e. ready / non ready) for all past releases. Retrospective classification of releases is performed using the selected PRS measures. It should allow product managers to manually change release classes.
R5: Present the RR class predicted by pre-determined predictive method. Result should be accompanied by an interactive dashboard displaying current values for RR attributes.
R6: Allow product managers to drill down results using an interactive visual dashboard. This should present current and required minimum values for RR attributes and indicate attributes which are limiting the RR.
R7: Allow product managers to perform what-if analysis by manually changing RR attribute values using an interactive interface. The tool should also present a recommend set of RR attributes candidate for these changes.
R8: Present a list of optimized solutions for RR improvement by systematically changing values for RR attributes. An interactive interface should allow Product managers to select the range of change allowed per attribute while searching for an optimized solution.
R9: Provide a visual dashboard that allows product managers to monitor changes in RR attribute values over time with respect to a planned improvement. The improvement plan is determined by the optimized solution (R8) or what-if analysis (R7).

Related literature:

S. McConnell. Gauging software readiness with defect tracking. Software, IEEE, 14(3):135-136, May 1997.
T.-S. Quah. Estimating software readiness using predictive models. Inf. Sci., 179(4):430–445, Feb. 2009.
S. M. Shahnewaz. RELREA - An analytical approach supporting continuous release readiness evaluation. Master’s thesis, University of Calgary, 2014.
M. Staron, W. Meding, and K. Palm. Release readiness indicator for mature agile and lean software development projects. In C.Wohlin, editor, Agile Processes in Software Engineering and Extreme Programming, volume 111 of Lecture Notes in Business Information Processing, pages 93–107. Springer Berlin Heidelberg, 2012.
M. Ware, F. Wilkie, and M. Shapcott. The use of intra-release product measures in predicting release readiness. In Software Testing, Verification, and Validation, 2008 1st International Conference on, pages 230–237, April 2008.
R.Wild and P. Brune. Determining software product release readiness by the change-error correlation function: On the importance of the change-error time lag. In System Science (HICSS), 2012 45th Hawaii International Conference on, pages 5360–5367, Jan 2012.
S. Alam, D. Pfahl, and G. Ruhe, “Learning from Process Monitoring and Control for Optimized Release Readiness,” Managing Software Process Evolution, Springer, 2016.
S. Alam, M. Karim, D. Pfahl, and G. Ruhe “Comparative Analysis of Predictive Techniques for Release Readiness Classification,” RAISE, ICSE, 2016.
S. Alam, D. Pfahl, and G. Ruhe, “Release Readiness Classification – An Explorative Case Study,” ESEM, 2016.

Contrastive Opinion Mining and Summarization

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee) and Faiz Ali Shah

Background: Software development techniques are developing swiftly and depend on the particular context (e.g., types of products, application domains, development processes used, and other factors). Likewise, new features have to be delivered at the right time and in ever faster development cycles.

Goal of the thesis: In order to help developers do their job better, they need to use the best tools for their specific needs. In our research group, we would like to help software tool developers to help understand what features are most important for their customers (i.e., professional software developers). With the ever increasing use of web 2.0, people express opinions on various topics through blogs, discussion forums, twitter, and dedicated opinion websites. Since these platforms contain a huge amount of opinionated text about a topic, people mostly find it difficult to efficiently digest all the opinions. In recent years, techniques have been proposed to automatically extract and summarize multiple perspectives/viewpoints (also called Contrastive Opinions) of people on individual topics. We would like to find out what it is that software developers like or not like about the tools they are using, and what features they are missing. We hope to find answers by analysing popular Q & A sites, like Stack Overflow, with regards to the topics software developers talk about and the opinions they have regarding their development tools.

To do: Your task is to provide a systematic and coherent overview of the techniques that have been used for the extraction and summarization of contrastive opinions and to implement a prototypical tool.

A recent example of a published study that could be used as a starting point for this thesis can be found here:
What are mobile developers asking about? A large scale study using stack overflow

Additional literature (starting points): NB: Only links to the publishing venues are provided. You are expected to retrieve the complete information for correct referencing on your own.

Prerequisite: Students interested in this topic should have successfully completed one of the courses on data mining / machine learning offered in the Master of Software Engineering program.

Crowdsourced Software Testing (this topic has been reserved)

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

Crowdsourcing has become a popular approach in software development. However, can the crowd also be used to enhance software testing? Today, several dedicated crowdsourcing services exist for the testing of mobile applications. They specifically address the problem of the exploding number of devices on which a mobile application may run, and which the developer or tester may not own, but which may be possessed by the crowd at large. Examples of these services include Mob4Hire (www.mob4hire.com), MobTest (www.mobtest.com), and uTest (www.utest.com).

Your task is to provide a systematic and coherent overview of the tools and techniques that have been employed for supporting crowdsourced software testing, and the experience that has been made with using and managing such approaches. In addition, criteria for comparing the various platform and a comparative analysis (applying the proposed criteria) should be made.

Literature (starting points): NB: Only links to the publishing venues are provided. You are expected to retrieve the complete information for correct referencing on your own.

Evaluating Quality in Use of Apps through Analyzing Users Reviews (this topic has been reserved)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee) and Faiz Ali Shah

The quality in use model (ISO/IEC 25010-2011) proposes five quality characteristics: a) effectiveness, b) efficiency, c) satisfaction, d) freedom from risk and e) context coverage to evaluate software’s quality from users’ prospective. Several approaches have been proposed in the literature to evaluate these quality characteristics of a software however, either they are less effective or impractical. App Reviews available in the App Store could be used for the evaluation of these quality characteristics. This research aimed at evaluating aforementioned quality attributes of a given app by analysing app reviews using text mining and sentiment analysis techniques.

References:

Iso/iec 25010:2011: Systems and software engineering systems and software quality requirements and evaluation (square) system and software quality models (2011)
Qian, Zhenzheng, Chengcheng Wan, and Yuting Chen. "Evaluating quality-in-use of FLOSS through analyzing user reviews." 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). IEEE, 2016.
I. Atoum and C. H. Bong, “A framework to predict software quality in use from software reviews,” in Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), pp. 429–436, Springer, 2014.
W. T. W. Syn, B. C. How, and I. Atoum, “Using latent semantic analysis to identify quality in use (qu) indicators from user reviews,” arXiv preprint arXiv:1503.07294, 2015.

Model-based Secure Software System Development

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

The use of security models could support discussion about the security needs and its importance. Models contribute to security requirements validation, as well as potentially guide the secure software coding activities. However there exist a number of modelling approaches, which contribute with different perspectives or viewpoints on the developed secure system. The major goal of this Master thesis is to establish a systematic method to align different security perspectives expressed using various modelling notations. The major research steps are:

Perform literature survey (i) on the existing security modelling languages and (ii) on the existing transformation between different security models
Develop the systematic approach which would guide the developers with alignment of different security perspectives
Validate the proposed method either through the proof of concept or empirically (e.g., experimental comparison with similar approaches).

Pattern-based Security Requirements Derivation from Use Case Models

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Security requirements engineering plays an important role during software system development. However in many cases they are overlooked and considered only at the end of the software development. A possible way to improve this situation is development of the systematic instruments, which would facilitate security requirements elicitation. For instance, security patterns describe a particular recurring security problem that arises in a specific security context and presents a well-proven generic scheme for a security solution.

Use cases diagrams is a popular modelling technique to describe, organize and represent functional system and software requirements and to define major actors who interact with the considered system. Recently their security extension – a.k.a., misuse case diagrams – is proposed to address the negative scenarios.

The major goal of this Master thesis is to develop a set of security patterns using use and misuse cases and to illustrate how these patterns could be used to derive security requirements from the use cases. The thesis include the following steps:

Conduct a literature review on (i) security engineering and security patterns, (ii) use cases and misuse cases, and (iii) security risk-oriented misuse cases;
Develop a set of security patterns using use/misuse case diagrams;
Develop guidelines to derive security requirements using the developed security patterns;
Validating the developed security patterns and their guidelines empirically.

Aligning Attack Trees to Information Systems Security Risk Management

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Attack trees is a modelling technique used to understand security attacks, to estimate threat agent’s capabilities and to determine feasibility of security counter-measures. However this language addresses only limited aspects of security risk management. The main goal of this Master thesis is to understand what is the exact scope of attack trees with respect to an end-to-end security risk management process. The major steps of the research include:

Understand the concepts of the information systems security risk management
Survey existing alignments of the modelling languages to the security risk management domain
Develop the contribution – define the semantic and syntactic extensions to the attack trees to align it to the security risk management domain
Validate the contribution. Depending on the selection validation could be done (i) as the proof of concept, i.e., by developing the prototype tool, or (ii) empirically, i.e., by comparing the extended attack trees with other modelling approaches.

Comparison of Access Control Models

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

There exist a range of models for specifying under what conditions can data stored by a system be accessed. For example, one could name role-based access control, attribute-based access control, usage control model, risk-adaptive access control and token-based access control, and other. The main goal of this Master thesis is to compare different access control models and to understand their strong and weak features. The main step of the research include:

Systematic survey of the access control models including analysis of their underlying principles, meta-models, examples and related work (examples of their applications and reported experience)
Based on the performed survey – systematic analytical comparison of various access control models.
Systematic empirical comparison of the selected access control models. Depending on the selection, this step could be performed through development of the prototype tools and comparing the access control models using these tools, or empirically by conducting experiments, case studies and surveys on the specific characteristics of the access control models.

A Comprehensive Model for Blockchain-Based Distributed Ledger Technology

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Blockchain technology is mainly described with texts and algorithms. Very few works are available on a comprehensive model for blockchain technology in particular, and Distributed Ledger Technology more generally. To model the blockchain technology with EA models (or other, e.g. activity diagrams, sequence diagrams, etc.) would help the users / consumers to better understand how it works. Moreover, to have a reference model for blockchain would help application developers to build on top of it the models of their applications. We can first focus only on a model for BitCoin as a final application, or generalise it to any blockchain-based application. The major goal of this research is to develop a comprehensive model for blockchain technology. So I would say "framework, practical". After a state of the art and analysis of the existing approaches, the student will define models (it's still open which kind of models, but a static model to se concepts at stake (ArchiMate, Class diagram) and a dynamic model (sequence diagram, activity diagram, other) to see the interactions between users and the technology would be a must. But several models could be needed to have a better granularity, different use cases, etc. The validation of the proposal could include: (i) questioning of a focus group of people developing application using the blockchain technology (if such group becomes available), or (ii) asking the team, developing the blockchain technologies to use/evaluate the proposed model, or similar approaches.

This topic is proposed in collaboration with the Luxembourg research center LIST.

Mining for temporal and spatial patterns of rescue events

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project, you will apply data mining methods in order find patterns and regularities in a dataset consisting of rescue events, provided by the Rescue Services and other relevant data sources. Specific aims include finding seasonal regional trends in events/accidents and/or deaths, and to find regions where the rescue response time is significantly influenced by date or time. This thesis will be conducted in cooperation with the Rescue Services. The the data is in Estonian, so knowledge of Estonian is an advantage. The thesis can be written in Estonian.

Linking rescue event data with public (dynamic) data

Supervisor: Siim Karus (siim04 ät ut.ee)

The operational planning of rescue services would benefit from exploiting operational public data (e.g. public events, roadworks, population density, land use, etc.). The task in this thesis is to create a solution that combines online public data with rescue event data. The ultimate aim is to find correlations between the rescue events (and their attributes) and the online data in order to estimate rescue event risk changes. It is expected, that the final solution can bring the Rescue Service's attention to public data that can affect the response times or risk of an accident (thus, allowing better planning of response teams).

The thesis will be conducted in cooperation with the Rescue Services. If the cooperation with the Rescue Services is fruitful, it might be possible to continue this research/analysis beyond Master's Thesis (i.e. by introducing predictive analytics and making use of private datasets made available to the Rescue Services).

The datasets to be used are in Estonian, so knowledge of Estonian is an advantage. The thesis can be written in Estonian.

Profiling of deaths and risk groups of rescue events

Supervisor: Siim Karus (siim04 ät ut.ee)

In order to better identify the people with higher risk of accident or death, it is necessary to develop profiles of people at higher risk. In this thesis you will get access to data collected by the Rescue Services regarding cases of deaths or injuries. The scope of these data is extremely limited due to limits set by data privacy regulations. Thus, you will enrich the data by tapping into public data sources so as to build risk profiles. The first part of the thesis project will be to create a data crawler that finds supplemental data about the people injured or killed in accidents. The second part of the thesis project will be to build a distinguishing model in order to identify, what causes these people to be at higher accident risk than other people. This information will be used to improve the focus of the preventive efforts of the Rescue Services.

This thesis will be conducted in cooperation with the Rescue Services. The datasets to be used are in Estonian, so knowledge of Estonian is an advantage. The thesis can be written in Estonian.

Relationship between computer generated code and fault proneness

Supervisor: Siim Karus (siim04 ät ut.ee)

We have developed a method for quantified estimations of the extent of computer generated code used in software modules. The hypothesis is that computer generated code leads to less errors. This thesis topic is about testing this hypothesis on software development data. In short, the student will collect or reuse source code revisioning data and calculate the computer-generated code amount estimate for the modules at different points in time. Then it will use the issue repository data to check, which modules have more errors found in them (at different points in time). Finally, it will try to model a relationship between computer generated code extent and error proneness.

GPU-accelerated data analytics

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project a set of GPU accelerated data mining or analytics algorithms will be implemented as an extension to an analytical database solution. For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to analytical databases (preferably MSSQL, Oracle or PostgreSQL), you will also need to learn the extension interfaces of these databases and their native development and BI tools. Finally, you will assess the performance gains of your algorithms compared to comparable algorithms in existing analytical database tools.

GPU-accelerated Developer Feedback System

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project you will implement source code analytics algorithms on GPU and devise a reliable and fast method for integrating the analysis feedback into integrated development environments (IDEs). For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to IDEs (preferably Visual Studio or Eclipse), you will also need to learn the extension interfaces of these IDEs and their native development tools. Finally, you will assess the performance gains of your algorithms compared to implementations of these algorithms running on CPU.

Replication of Empirical Software Engineering Case Study Experiments

Supervisor: Siim Karus (siim04 ät ut.ee)

Empirical software engineering community publishes many case studies validating different approaches and analytical algorithms to software engineering. Unfortunately, these studies are rarely validated by independent replication. To make matters worse, the studies use different validation metrics, which makes them incomparable. Thus, your mission, should you choose to accept it, is to analyse different published case studies on one topic (e.g. bug detection, code churn estimation) to evaluate their replicability and replicate the studies in order to make them comparable. In short you will:

envisage a workflow/pipeline for replicating published studies (including

testing for replicability);

use the workflow to replicate several studies;
validate these studies and compare their results on an common scale.

Hot Deployment of Linked Data for Online Data Analytics

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of this project is to design and implement an "hot" linked data deployment extension to an open source analytics server, such as RapidAnalytics. Software tools such as Weka or RapidMiner allow building analytical applications, which exploit knowledge hidden to data. However, one of the bottlenecks of such toolkits, in settings where a vast quantities of data with heterogeneous data models are available, is the amount of human effort required for first unification of the data models at the stage of data pre-processing and then extraction of relevant features for data mining. Furthermore, these steps are repeatedly executed each time a new dataset is added or an existing one is changed. However, in case of open linked data uniform representation of data leverages implicit handling of data model heterogeneity. Moreover, there exist open source toolkits, such as FeGeLOD [1] (), which automatically create data mining features from linked data. Unfortunately, the current approaches assume that a linked dataset is already pre-processed and available as a static file for which the features are created each time the file is loaded.

In this thesis project first an extension will be developed for discovering and loading a new dataset to an analytics server. Then existing data mining feature extraction methods will be enhanced and incorporated to the framework. Finally, the developed solution will be validated on a real-life problem.

[1] Heiko Paulheim, Johannes Fürnkranz. Unsupervised Generation of Data Mining Features from Linked Open Data. Technical Report TUD–KE–2011–2 Version 1.0, Knowledge Engineering Group, Technische Universität Darmstadt, November 4th, 2011. Available at http://www.ke.tu-darmstadt.de/bibtex/attachments/single/297 .

Open Cloud Infrastructure for Cost-Effective Harvesting, Processing and Linking of Unstructured Open Government Data

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Bootstrapping Open Government Data projects, even when not considering the complementary five-star initiatives for linking the data, represents a tremendous task, if implemented manually in an uncoordinated way in ad hoc settings. Although the essential datasets may be publicly available for downloading, location, documentation and preparation of them for publication in a central (CKAN) repository represent a burden, which is difficult to absorb by officials of public administration. Furthermore, linking the prepared and published datasets represents even further challenges, especially in the case of semistructured and full-text documents, where understanding the content is complicated by lack of clear structure. Namely, detection of entities, which should be linked and detection of metamodels, which should be used for linking, is a search-intensive task even for machines, not only for humans.

Luckily, there are some open source tools for simplifying the process. A set of tools, including the ones for semi-automatic link discovery, is represented in the LOD2 Technology Stack (http://stack.lod2.eu/blog/). In addition there are general-purpose text processing frameworks such as Apache OpenNLP and for the Estonian language there is a named entity recognition solution (svn://ats.cs.ut.ee/u/semantika/ner/branches/ner1.1) available. Finally, there is NetarchiveSuite (https://sbforge.org/display/NAS/Releases+and+downloads) for Internet archival, which can be used for creating Web snapshots.

This project aims at developing a cloud platform for harvesting Open Government Data and transforming it into Linked Open Government Data. The platform consists of a Web archival subsystem, open data repository (CKAN), document content analysis pipeline with named entity recognition and resolution pipeline, and finally a linked data repository for serving the processed data. The Web archival subsystem will continuously monitor changes in the Web by creating monthly snapshots of the Estonian public administration Web, comparing the snapshots and detecting new datasets (or changes) together with their metadata. The datasets together with their metadata are automatically published at a CKAN repository. The CKAN repository is continuously monitored for new datasets and updates and each change will trigger execution of the pipeline of document content analysis (i.e. analysis of CSV file content). The pipeline will detect named entities from the source documents, resolve the names wrt other linked datasets (i.e. aadresses or organizations) and finally publish the updates at a linked data repository with an open SPARQL endpoint. The latter will provide means for consumption of Linked Open Government Data.

A Crawler for RESTful, SOAP Services and Web Forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Transforming the Web into a Knowledge Base: Linking the Estonian Web

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of the project is to study automated linking opportunities for Web content in Estonian language. Recent advances in Web crawling and indexing have resulted in effective means for finding relevant content from the Web. However, getting answers to queries, which require aggregation of results, is still in its infancy since better understanding of the content is required. At the same time there has been a fundamental shift in the content linking - instead of linking Web pages, more and more Web content is tagged and annotated to facilitate linking of smaller fragments of Web pages by means of RDFa and microformat markups. Unfortunately this technology has not been widely adopted yet and further efforts are required to advance the Web in this direction.

This project aims at providing a platform for automating this task by exploiting existing natural language technologies, such as named entity recognition for Estonian language, in order to link content of the entire Estonian Web. For doing this, two Master students will work closely, first in setting up the conventional crawling and indexing infrastructure for the Estonian Web and then extending the indexing mechanism with a microtagging mechanism, which will enable linking the crawled Web sites. The microtagging mechanism will take advantage of existing language technologies to extract names (such as names of persons, organizations and locations) from the crawled Web pages. In order to validate the approach a portion of the Estonian Web is processed and exposed in RDF form through a SPARQL query interface such as the one provided by the Virtuoso OpenSource Edition.

Automated Estimation of Company Reputation

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Reputation is recognized as a fundamental instrument of social order - a commodity, which is accumulated over time, is hard to gain and easy to loose. In case of organizations reputation is also linked to their identity, performance and the way others respond to their behaviour. There is an intuition that reputation of a company affects perception of its value by investors, helps to attract new customers and to retain the existing ones. Therefore organizations, focusing to long-term operation, care about their reputation.

Several frameworks, such as WMAC (http://money.cnn.com/magazines/fortune/most-admired/, http://www.haygroup.com/Fortune/research-and-findings/fortune-rankings.aspx), used by the Fortune magazine, have been exploited to rank companies by their reputation. However, there are some serious issues associated with reputation evaluation in general. First, the existing evaluation frameworks are usually applicable to evaluation of large companies only. Second, the costs of applying these frameworks are quite high in terms of accumulated time of engaged professionals. I.e. in case of WMAC more than 10,000 senior executives, board directors, and expert analysts were engaged to fill questionnaires to evaluate nine performance aspects of Fortune 1000 companies in 2009. Third, the evaluation is largely based on subjective opinions rather than objective criteria making continuous evaluation cumbersome and increases the length of evaluation cycles.

This thesis project aims at finding a solution to these issues. More specifically, the project is expected an answer the following research question: in which degree the reputation of a company is determined by objective criteria such as its age, financial indicators, sentiment of news articles and comments in the Web etc. The more specific research questions are the following:

Which accuracy in reputation evaluation can be achieved by using solely objective criteria?
Which objective criteria and which combinations discriminate best reputation of organizations?
In which extent does reputation of an organization affect reputation of another organization through people common in their management?
How do temporal aspects (organization's age, related past events etc) bias reputation?

In order to answer to these questions network analysis and machine learning methods will be exploited and a number of experiments will be performed with a given dataset. The dataset to be used is an aggregation of data from the Estonian Business Registry, Registry of Buildings, Land Register, Estonian Tax and Customs Board, Register of Economic Activities, news articles from major Estonian news papers and blogs and some propriatory data sources.

Data Analysis of Workflow Processes at METEC

Jaan Übi and Dirk Theis (dotheis ät ut.ee)

Download the project description.

Visualization of traffic flow and/or people density changes with animated texture/particles.

Toivo Vajakas (firstname.lastname ät ut.ee)

Download the project description.

Machine learning in the cloud

Chris Thompson (chris [ät] speaklanguages dot com)

Machine learning in the cloud

Cell constructor

Leopold Parts (firstname.lastname ät gmail.com)

Download the project description.

Bachelors projects

Note: The number of projects for Bachelors students is limited. But you can find several other potential project ideas by checking this year's list of proposed software projects. Some of the projects in the "Available" category could be used as Bachelors thesis topics. Also, we're open to student-proposed Bachelors thesis projects. If you have an idea for your Bachelors projects and your idea falls in the area of software engineering (broadly defined), please contact the group leader: Marlon . Dumas ät ut.ee

Rescue event categorisation

Supervisor: Siim Karus (siim04 ät ut.ee)

In this thesis, you will analyze data provide by the Rescue services in order to find commonalities in rescue events so as to categorise them. One of the aims will be to isolate and characterize less common rescue event categories, which are of special interest to the Rescue services.

The thesis will be conducted in cooperation with the Rescue Services. The thesis can be written in Estonian.

Workflow Automation With Business Data Streams

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

There exist services such as Flowxo and IFTTT which facilitate automation of simple workflows by facilitating creation of trigger / action pairs or if-then recipes, whose execution will orchestrate applications wrt external stimuli (e.g. application data stream). An example of a popular recipe is the following: "IF I post a picture on Instagram THEN save the photo to Dropbox" or a more complex example is "for every new deal in a CRM send the deal with an e-mail to a person with the GMail service, then wait for about 1 day and send a reminder SMS via Twilio service.

Such systems mostly rely on proprietary application data while there are cases where external stimulus will provide extra benefits. An example of such a case is integration of CRM and credit management tools with external stimuli in form of streaming company debt and risk score data for Order-to-Cash business process. There is a Stream API for business data currently under development at Register OÜ and it will provide a stream of events such as company debt change, changes in board membership and data about newly registered companies. Such data changes events can be easily applied in the context of CRM and a credit management (CM).

The aim of the project will be to leverage provision of an analogue of IFTTT, where users can define recipes for reacting into business data changes via actions in applications such as GMail, Odoo CRM etc.

The project will be done in collaboration with Register OÜ. The application will be developed by using the Complex Event processing (CEP) feature of Register Stream API.

Lead generator for accelerating B2B sales

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Companies care a lot about improving their sales and to meet such a demand numerous online solutions have been proposed. While for B2C sales the prevalent solutions use social media campaign and Web visitor data, for B2B sales there are solutions, which allow generating a list of leads based on a set of attribute values over company data such as its activity field, size, financial metrics etc.

Some solutions for the Estonian market include https://www.baltictarget.eu, http://sihtgrupid.ee, http://turundusnimekirjad.ee/ and http://www.kliendibaas.ee/. However, these solutions have the following deficiencies:

The market segment must be known before generating the leads
The set of attributes is mostly limited to geographic, activity field and financial data
the data is returned as a file.

This project aims at innovating the B2B sales by providing a solution, which differs from the existing ones in the following way:

instead of a list of feature / value pairs a user can define its market segment by giving a set of prospective clients as input to lead generation;
in addition to the activity fields, company size and financial metrics also data about owned real estate, credit history, credit risk, media coverage and related persons can be used;
instead of outputting leads to a CSV file, lead data will be directly imported to an existing CRM system or a new cloud instance of a CRM will be deployed and populated with the leads.

The project will be done in collaboration with Register OÜ.

Automated brand magazine

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

It is essential for companies to acquire new and retain existing customers. To target this need content marketing techniques have been developed and advocated. However, due to lack of proper skills the techniques are often underutilized. Multiple tools have created to simplify content marketing. For instance Flipboard (https://about.flipboard.com/advertisers) simplifies creation of brand magazines executing content marketing within users. LinkedIn has been used by top management to deliver company news to their employees. Instant Articles by Facebook (https://instantarticles.fb.com) provides means for publishers to make their articles to appear more attractive and to increase engagement rate on Facebook. Anyway, all of the mentioned solutions expect that relevant content will be provided and managed manually.

In this project a solution will be developed, which will simplify creation and maintenance of brand magazines for companies (especially SME-s) and persons (e.g. bloggers). The key innovation of the project is that it will *automatically* search for mentions of companies, persons and brands from the Web via Register Graph API (https://developers.ir.ee/graph-api) and create attractive brand pages out of them, which will be made visible via search engines to the target audience. Mentions origining from online news media, blogosphere, forums, corporate blogs and other Web sources will be presented at brand pages with specific cards (see https://www.google.com/search/about/learn-more/now/ for the concept of cards), e.g. "Customers' feedback", "Our partners", "New product launched", "About company", etc.

Some initial requirements:

Responsive design with Google Material
Use of Register Stream API and Graph API as data sources
Search engine and user friendly Web solution

Comparison of BPMN Security Extensions

Raimundas Matulevicius (raimundas.matulevicius ät ut.ee)

Recently a lot of BPMN extensions are proposed towards security analysis. These extensions concern different aspects, starting from the security problem definition, to security requirements introduction and control identification. The goal of the thesis is develop systematic and coherent overview of the extensions and to define a set of guidelines for selecting certain BPMN security extensions for targeted problems. This should provide an overview of emerging trends.

Starting points:

R. Braun, W. Esswein, Classification of Domain-Specific BPMN Extensions, The Practice of Enterprise Modeling Lecture Notes in Business Information Processing Volume 197, 2014, pp 42-57
Menzel, M., Thomas, I., Meinel, C.: Security Requirements Specification in Service- oriented Business Process Management. In: ARES 2009, pp. 41–49 (2009)
Altuhhova, O., Matulevičius, R., Ahmed, N.: An extension of business process model and notation for security risk management. International Journal of Information System Modeling and Design (IJISMD) 4(4), 93–113 (2013)
Cherdantseva Y., Hilton J., Rana O., Towards SecureBPMN - Aligning BPMN with the Information Assurance and Security Domain, Business Process Model and Notation, Lecture Notes in Business Information Processing Volume 125, 2012, pp 107-115
Marcinkowski, B., Kuciapski, M.: A business process modeling notation extension for risk handling. In: Cortesi, A., Chaki, N., Saeed, K., Wierzchoń, S. (eds.) CISIM 2012. LNCS, vol. 7564, pp. 374–381. Springer, Heidelberg (2012)
Saleem, M., Jaafar, J., Hassan, M.: A domain-specific language for modelling security objectives in a business process models of soa applications. AISS 4(1), 353–362 (2012)
Rodriguez, A., Fernandez-Medina, E., Piattini, M.: A bpmn extension for the modeling of security requirements in business processes. IEICE Transactions on Information and Systems 90(4), 745–752 (2007)

Comparison of CORAS and ArchiMate risk and security extension

Raimundas Matulevicius (raimundas.matulevicius ät ut.ee)

In this project, you will conduct a comparison of CORAS and ArchiMate risk and security extension as visual notations for modeling security risks. It will include modelling a case study using CORAS and then using ArchiMate and compare them. Comparison based on cognitive effectiveness or any other relevant criteria. The research consists of the following steps:

Introduce what is CORAS (book + few papers + tool)
Introduce what is ArchiMate risk and security extension
Define comparison criteria
Use at least 3 criteria to assess options 1 and 2 and compare assessment results

Lab Package Development & Evaluation for the Course 'Software Testing' (MTAT.03.159)

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

The course Software Testing (MTAT.03.159) has currently 6 labs (practice sessions) in which 2nd and 3rd year BSc students learn a specific test technique. We would like to improve existing labs and add new labs.

This topic is intended for students who have already taken this software testing course and who feel that they can contribute to improving it and by the same token complete their Bachelors project. The scope of the project can be negotiated with the supervisor to fit the size of a Bachelors project.

The tasks to do for this project are as follows:

Selection of a test topic for which a lab package should be developed (see list below)
Development of the learning scenario (i.e., what shall students learn, what will they do in the lab, what results shall they produce, etc.)
Development of the materials for the students to use
Development of example solutions (for the lab supervisors)
Development of a grading scheme
Evaluation of the lab package

Topics for which lab packages should be developed (in order of urgency / list can be extended based on student suggestions):

Mutation Testing
Combinatorial Testing
Automated Unit & Systems Testing
Issue Reporting

Literature Survey on "Techniques for automatically generating test cases from the code"

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

A class of testing methods, called search-based testing, generates test criteria from the code. The approach is promising, but there are of course limitations to it. Project task: Find literature on methods/techniques/tools that automatically derive test cases from code. Summarize the literature and extract information about the techniques applied, the applied techniques’ effectiveness and efficiency, their strengths and weaknesses, their applicability, etc.

Starting points for finding literature:

S. Ali, L. Briand, H. Hemmati, and R. Panesar-Walawege, A systematic review of the application and empirical investigation of search-based test-case generation, IEEE Transactions on Software Engineering (TSE), vol. 36, no. 6, pp. 742762, 2010.
M. Harman and P. McMinn. A theoretical and empirical study of search-based testing: Local, global, and hybrid search. IEEE Transactions on Software Engineering, 36(2):226-247, 2010.
Kostyantyn Vorobyov and Padmanabhan Krishnan. 2012. Combining Static Analysis and Constraint Solving for Automatic Test Case Generation. In Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (ICST '12). IEEE Computer Society, Washington, DC, USA, 915-920.

Methods that could be used:

Snowballing
Systematically searching literature repositories (ACM DL, IEEE DL, Scopus, SpringerLink, etc.)

Literature Survey on "Requirements Elicitation Techniques – Strengths and Weaknesses"

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

Elicitation is the process by means of which a software analyst gathers information about the problem domain. The analyst uses a series of analyst-user interaction mechanisms, called elicitation techniques, to acquire information. A very wide range of elicitation techniques have been proposed: interviews (structured, semi-structured, open), protocol analysis, laddering, work groups (or: focus groups), prototyping, etc. A number of studies suggest that elicitation techniques are not inter-changeable, and there are far-reaching differences with regard to what type of knowledge each technique can uncover. Other aspects, like quantity of information or elicitation efficiency, are features that might distinguish one elicitation technique from another.

Project task: Pick (at least) two software requirements elicitation techniques, find literature about them, and compare them with regards to

the type of requirements-related information each technique is best/worst at finding,
the effectiveness of each technique with regards to requirements elicitation,
the efficiency of each technique with regards to requirements elicitation

Starting point for finding literature: Dieste, O., & Juristo, N. (2011). Systematic review and aggregation of empirical studies on elicitation techniques. IEEE TSE 37(2), 283-304.

Literature Survey on "Open Innovation – How to use it for software requirements elicitation?"

Supervisor: Dietmar Pfahl (dietmar dot pfahl at ut dot ee)

Open innovation (OI) is a new paradigm that aims at opening up organizational boundaries in order to use and recombine internal and external knowledge to develop and commercialize innovative products. The idea of OI can become an interesting new approach to requirements elicitation for software products. In particular, social media, blogs, and other freely accessible resources could be systematically analyzed for relevant ideas that would help improve the value of future products.

Project task: Find literature on reported attempts to exploit social media, blogs, and other open sources for detecting new and complementing existing functionality of existing and new software products. Summarize and discuss the literature you find. In your analysis you may focus on the type of information sources exploited, the ways how they were analyzed, the kind of information (new requirements, discussion/evaluation of existing functionality, etc.) extracted, the type of products for which new requirements were sought, etc.

Starting point for literature search:

Anton Barua, Stephen W. Thomas, Ahmed E. Hassan (2014) What are developers talking about? An analysis of topics and trends in Stack Overflow. Empirical Software Engineering, June 2014, Volume 19, Issue 3, pp 619-654.

Customer Journey Mapping

Supervisor: Marlon Dumas (marlon dot dumas at ut dot ee)

A Customer Journey Map is a graphical representation of how a given customer interacts with an organization in order to consumer its products or services, possibly across multiple channels. Several approaches for customer journey mapping exist nowadays. Each relies on different concepts and notations. In this thesis, you will review the most popular approaches that are currently in use for customer journey mapping, and you will distill from them a common set of concepts and notations. You will then show how these concepts and notations can be applied in an organization of your choice (preferably an organization with which you have experience interacting as a customer).

Blockchain and Business Processes

Supervisor: Fredrik P. Milani (milani at ut dot ee)

The interest for blockchain is growing very strongly. As this new technology is gaining traction, many uses ranging from voting to financial markets application. Currently the hype around the technology is overshadowing the value it can deliver by enabling changes in the business processes. While blockchain can deliver value by replacing existing IT solutions, the real value comes from innovating the business processes. This topic is about exploring, for one of the below industries/cases what the current business processes are, how blockchain could enable innovation of the business processes, and finally comparing/contrasting them in order to draw conclusions.

Each of the below listed cases makes a separate bachelor thesis where the current processes are examined, conceptual processes with blockchain are designed and analysed.

Voting – voting solutions based on blockchain technology
Insurance – insurance firms offering a range of products and how that would be transformed if supported by blockchain technology
Health Care – transferring and owning your own medical health records and prescription management
Registry – management of assets (digital and physical) including registration, tracking, change of ownership, licensing and so on
Financial Markets – covering one or several cases such as post trading settlement of securities and bilateral agreements
IoT – connecting multiple devices with blockchain

Estonian e-government backlog (multiple thesis projects)

Supervisor: Andres Kütt (first contact person: Marlon Dumas - marlon dot dumas ät ut.ee)

Details