Student Projects 2014/2015

Below is a list of project topics for Masters and Bachelors theses offered by the software engineering research group in 2014-2015. The projects are divided into:

Masters projects
Bachelors projects

If you're interested in any of these projects, please contact the corresponding supervisor.

Masters projects

Case Study on Exploratory Testing

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Exploratory software testing (ET) is a powerful and fun approach to testing. The plainest definition of ET is that it comprises test design and test execution at the same time. This is the opposite of scripted testing (having test plans and predefined test procedures, whether manual or automated). Exploratory tests, unlike scripted tests, are not defined in advance and carried out precisely according to plan.

Testing experts like Cem Kaner and James Bach claim that - in some situations - ET can be orders of magnitude more productive than scripted testing, and a few empirical studies exist supporting this claim to some degree. Nevertheless, ET is usually is often confused with (unsystematic) ad-hoc testing and thus not always well regarded in both academia and industrial practice.

The objective of this project will be to conduct a case study in a software company investigating the following research questions:

To what extend is ET currently applied in the company?
What are the advantages/disadvantages of ET as compared to other testing approaches (i.e., scripted testing)?
How can the current practice of ET be improved?
If ET is currently not used at all, what guidance can be provided to introduce ET in the company?

The method applied is a case study. Case studies follow a systematic approach as outlined in: Guidelines for conducting and reporting case study research in software engineering by Per Runeson and Martin Höst

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Mobile Testing

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of mobile testing. Focus shall be placed on the question how to optimise the manual testing of mobile applications. The objective of this project will be to investigating the following research questions:

To what extend is mobile testing currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied mobile testing techniques and tools?
How can the current practice of mobile testing be improved?

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Test-Driven Development (TDD)

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of TDD. The objective of this project will be to investigating the following research questions:

To what extend is TDD currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied TDD techniques and tools?
How can the current practice of TDD be improved?

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Case Study on Test Automation

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Similar to the case study project on Exploratory Testing (see above), a student can work in a company to analyse the current state-of-the-practice of test automation. The objective of this project will be to investigating the following research questions:

To what extend is test automation currently applied in the company?
What are the perceived strengths/weaknesses of the currently applied test automation techniques and tools?
How can the current practice of test automation be improved?

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Exploring the Software Release Planning Problem with Constraint Solving

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Decision-making is central to Software Product Management (SPM) and includes deciding on requirements priorities and the content of coming releases. Several algorithms for prioritization and release planning have been proposed, where humans with or without machine support enact a series of steps to produce a decision outcome. Instead of applying some specific algorithm to find an acceptable solution to a decision problem, in this thesis we propose to model SPM decision-making as a Constraint Satisfaction Problem (CSP), where relative and absolute priorities, inter-dependencies, and other constraints are expressed as relations among variables representing entities such as feature priorities, stakeholder preferences, and resource constraints. The solution space is then explored with the help of a constraint solver without humans needing to care about specific algorithms.

The goal of this thesis project is to discuss advantages and limitations of CSP modeling in SPM and to give principal examples as a proof-of-concept of CSP modeling in requirements prioritization and release planning. If time permits, an evaluation of the CSP-based models via comparison with established tools such as ReleasePlanner will be part of the project.

The project will consist of the following steps:

Formulation of the release planning problem as a CSP
Familiarisation with JaCoP – Java Constraint Solver or an equivalent tool
Development of a constraint solver for the release planning problem with JaCoP
Application of the constraint solver to a set of open source feature models available from the SPLOT Feature Model Repository, maintained at the University of Waterloo, Canada
Performance evaluation of the constraint solver
Optional: Comparison with the performance of existing release planning tools, e.g., ReleasePlanner
Summary of the findings, discussion, outline of recommended follow-up research

Related literature:

Ruhe, G.; Saliu, M.O., "The art and science of software release planning," IEEE Software, vol.22, no.6, pp.47,53, Nov.-Dec. 2005
Regnell, B.; Kuchcinski, K., "Exploring Software Product Management decision problems with constraint solving - opportunities for prioritization and release planning," Fifth International Workshop on Software Product Management (IWSPM), pp.47,56, 2011

Using Data Mining & Machine Learning to Support Decision-Makers in SW Development

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Project repositories contain much data about software development activities ongoing in a company. In addition, there exists much data from open source projects. This opens up opportunities to analysis and learning from the past which can be converted into models that help make better decisions in the future - where 'better' can relate to either 'more efficient (i.e., cheaper) or more effective (i.e., with higher quality).

For example, we have recently started a research activity that investigates whether textual descriptions contained in issue reports can help predict the time (or effort) that a new incoming issue will require to be resolved.

There are, however, many more opportunities, e.g., analysing bug reports to help triagers assign reports to developers. And of course, there other documents that could be analysed, requirements, design docs, code, test plans, test cases, emails, blogs, social networks, etc. But not only the application can vary, also the analysis approach can vary. Different learning approaches may have different efficiency and effectiveness characteristics depending on the type, quantity and quality of data available.

Thus, this topic can be tailored according to the background and preferences of an interested student.

Tasks to be done (after definition of the exact topic/research goal):

Selection of suitable data sources
Application of machine learning / data mining technique(s) to create a decision-support model
Evaluation of the decision-support model

Prerequisite: Students interested in this topic should have successfully completed one of the courses on data mining / machine learning offered in the Master of Software Engineering program.

Tool for Assessing Release Readiness

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Release-readiness (RR) is a time dependent attribute of a software product. It reflects the status of the current implementation and quality of the software and can be determined (estimated) by aggregating the degree of satisfaction of so-called RR attributes (e.g., defect detection rate, number of open issues, code churn, ...).

The aim of this project is - after familiarisation with the relevant literature and concepts - to develop a tool that adresses the following (tentative) requirements:

R1: The tool should provide a set of pre-defined RR attributes and their corresponding metrics.
R2: The tool should integrate the existing project management tools so that required data for calculating RR metrics can be collected automatically.
R3: The tool should allow product managers evaluating degree of satisfaction of the RR attributes based on the objective measures.
R4: The tool should provide an interactive visual dashboard for monitoring the status of the degree of overall RR at any point of time in the release cycle.
R5: The tool should have the drill-down capability so that product manager can gain insights about the RR attributes which are limiting the RR.
R6: The tool should allow product managers to make projections of RR at the release time.
R7: The tool should provide visual indicator so that product manager can understand the impact of the individual RR on overall RR.

Depending on the interests of the student and the compatibility, this work might be conducted in collaboration with an ongoing PhD project at the University of Calgary.

Related literature:

S. Mcconnell, "Gauging software readiness with defect tracking," IEEE Software, vol. 14, pp. 135-136, 1997.
J. T. S. Quah and S. W. Liew, "Gauging Software Readiness Using Metrics," in IEEE Conference on Soft Computing in Industrial Applications, 2008, pp. 426-431.

Is the BPM Tool Suite Bizagi Suitable to Model and Simulate SW Development Processes?

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Software development processes have been modeled and simulated for a long time using various techniques, i.e., discrete-event modeling, continuous modeling (System Dynamics), and many others. Several distinct purposes exist for modeling and simulating SW development processes: analysis, exploration, learning, and teaching (to list a few). For example, a process simulator that correctly represents the actual software development process can be used to analyse to predict the behaviour of the next project to be conducted according to the given process model. Other applications would be to explore the potential of planned/suggested process changes (improvements) with regards to improving the performance of projects following the new process model. Questions that can be answers are: How much would the end product quality improve, if we add a new QA technique? What is the trade-off between time, effort and quality, if we introduce 100% statement coverage during unit testing? How much would the project performance improve, if we could allocate better skilled engineers to the project? And so on ...

In order to do SW process modeling and simulation, a modeling technique/language/tool must be able to model different types of entities, i.e., artifacts, activities, and resources - and their relationships - in a rather comprehensive way (i.e., providing means to to model control flows as well as data and information flows; decomposition and aggregation of entities, definition of attributes, etc.). Unfortunately, there neither exists a standardised modelling language particularly tailored to the needs of SW process engineers, nor do non-commercial modelling and simulation tools targeting at SW development processes exist.

Since the Business Process (BP) community seems to have achieved a rather mature state today, it is intriguing to use the BP notations and tools for modelling software development processes. One of the most popular tool suites is Bizagi Studio, which is freely available for academic purposes. Thus, why not use this tool for modelling SW development processes? The aim of this MSc thesis would be to evaluate the usability and usefulness of Bizagi for the purpose of SW process development processes. As a minimum, the following tasks would have to be completed in a successful thesis project:

Specification of exact research goal
Identification of evaluation method (incl. specification of evaluation criteria)
Identification of existing SW process simulation models that can be used as examples which must be re-implemented in Bizagi (i.e., to assess the ability of doing so, to detect potential limitations of Bizagi, and to assess the difficulty of reproducing existing models implemented in a different modelling language/technique). Regarding this point the supervisor can provide existing models.
Re-Implementation of existing process models (e.g., agile processes, lean processes, waterfall, incremental enhancement, etc.)
Write-up of the analysis including a (qualified) list of what can be done, what cannot be done, and what can only be done with difficulty in Bizagi.
Finally, a result could be a list of improvement suggestions for Bizagi and the underlying notation.

Empirical Comparative Evaluation of Business Process Management Systems (BPMS)

Naved Ahmed

A BPMS is a generic software system driven by explicit process designs to enact and manage business processes. The system should be process-aware and generic in the sense that it is possible to modify the processes it supports. The process designs are often graphical and the focus is on structured processes that need to handle many cases.

In this thesis, you will take several BPMS and compare them in terms of number functional and non-functional requirements (incl. ease of use). The comparison will be made not only on the basis of documentation, but rather more importantly on the basis of an complete implementation of a business process using each selected BPMS. The implementations will be compared with respect to a given set of criteria derived from the initial BPMS requirements.

The evaluated BPMS may be taken from the among those listed below, but we are open to your suggestions as well:

Oracle BPM Suite
IBM BPM
Bonita BPM
jBPM or Activiti

Model-driven engineering of hypermedia REST applications

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Representational state transfer (REST) is an architectural style that has become popular in the development of Web-based information systems. For the purpose of design, we view a hypermedia REST application as consisting of two aspects: 1) a structural aspect that deals with the data structure of the resources exposed by the application and a set of (CRUD) operations over these resources, and 2) a dynamic part that deals with determining which operations can be applied to a resource given its current state. We foresee that the former aspect can be captured by means of annotated class diagrams while the latter can be captured by means of state chart diagrams.

In this project, you will design and implement a set of tools that takes as input a set of class diagrams and of statechart diagrams, and generates the skeleton of a hypermedia REST application. This project requires some background knowledge in software modeling and development of web-based applications.

Automated testing of Hypermedia REST application

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Representational state transfer (REST) is an architectural style that has become popular in the development of Web-based information systems. As for any other piece of software, testing plays a major role in the development of a hypermedia REST application. In this context, we see a hypermedia REST application as consisting of two aspects: 1) a static aspect, that is the set of resources exposed by the application and the operations over them, and 2) a dynamic aspect that describes the sequence of operations that follow the normal execution of the application.

Given a set of class diagrams and state charts, we aim at generating a set of test cases that exercise the application. As a way to specify a criterion of quality, we also aim at evaluating the coverage achieved by the generated test cases.

Mining Business Process Models with Advanced Synchronization Patterns

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Automated process discovery aims extracting process models from information captured in execution logs from information systems. Most of state-of-the-art methods are designed to discover models on which the execution of an event depends on the completion of a fixed number of other events. This type of dependency is referred to as basic synchronization pattern. In some real-world scenarios, however, this constraint is not well suited, e.g., a purchase decision could be taken even before all requested quotes are received (synchronization of “n-out-of’-m” events) or whenever a deadline is reached (time related constraints).

In this project, you will extend existing and/or design new techniques that enable the discovery of process models with advanced synchronization patterns evoked above.

Discovering Business Rules from Event Logs

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Process mining techniques can be used to effectively discover process models from logs that capture a sample of business process executions. Cross-correlating a discovered model with information in the log can be used to improve the underlying process. However, existing process discovery techniques have two important drawbacks. The produced models tend to be large and complex, especially in flexible environments where process executions involve multiple alternatives. This "overload" of information is caused by the fact that traditional discovery techniques construct procedural models explicitly showing all possible behaviors. Moreover, existing techniques offer limited possibilities to guide the mining process towards specific properties of interest. These problems can be solved by discovering declarative models. Using a declarative model, the discovered process behavior is described as a (compact) set of business rules. Moreover, the discovery of such models can easily be guided in terms of rule templates. This work requires to develop an approach to automatically discover business rules from event log and implement it in the process mining tool ProM.

Business Process Deviance Mining

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

A long-standing challenge in the field of business process management is how to deal with processes that exhibit high levels of variability, such as customer lead management, product design or healthcare processes. One thing that is understood about these processes is that they require process designs and support environments that leave considerable freedom so that process workers can readily deviate from pre-established paths. At the same time, consistent management of these processes requires workers and process owners to understand the implications of their actions and decisions on the performance of the process. Deviance mining leverages information hidden in business process execution logs in order to provide guidance to stakeholders so that they can steer the process towards consistent and compliant outcomes and higher process performance. Deviance mining deals with the analysis of process execution logs off-line in order to identify typical deviant executions and to characterize deviance that leads to better or to worse performance. This technique enables evidence-based management of business processes, where process workers and analysts continuously receive guidance to achieve more consistent and compliant process outcomes and a higher performance. This work requires to develop an approach to automatically discover discriminative rules from event log explaining the characteristics of the traces that lead in the log to good or bad outcomes and implement it in the process mining tool ProM.

Discovering Hybrid Process Models from Event Logs

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

The declarative-procedural dichotomy is highly relevant when choosing the most suitable process modeling language to represent a discovered process. Less-structured processes with a high level of variability can be described in a more compact way using a declarative language. By contrast, procedural process modeling languages seem more suitable to describe structured and stable processes. However, in various cases, a process may incorporate parts that are better captured in a declarative fashion, while other parts are more suitable to be described procedurally. This work requires to develop a technique for discovering from an event log a so-called hybrid process model. A hybrid process model is hierarchical, where each of its sub-processes may be specified in a declarative or procedural fashion. The approach is to be implemented as a plug-in of the ProM platform.

Real-Time Compliance Monitoring of Business Processes

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

For today's organizations, proving that their business processes comply with certain regulations has become a major issue. Even though the necessary compliance checks are predominantly viewed as a burden, business processes that violate regulations cannot only cause damage to an organization's reputation and thus harm the business success but can also lead to severe penalties and even legal actions. Lifecycle support for process compliance comprises design time compliance checks, (online) compliance monitoring during runtime and post-mortem compliance analysis. Compliance monitoring is considered an important building block in this lifecycle for reasons such as timely detection of non-compliance as well as provision of reactive and proactive countermeasures. In particular, compliance monitoring is related to operational decision support, which aims at extending the application of process mining techniques to on-line, running process instances, so as to detect deviations, recommend what to do next and predict what will happen in the future instance execution. This work requires to develop an operational support plug-in in ProM implementing an algorithm that verify at runtime the compliance of a process execution.

Tools for software project data collection and integration

Supervisor: Siim Karus (siim04 ät ut.ee)

Data generated in software projects is usually distributed across different systems (e.g. CVS, SVN, Git, Trac, Bugzilla, Hudson, Wiki, Twitter). These systems have different purposes and use different data models, formats and semantics. In order to analyze software projects, one needs to collect and integrate data from multiple systems. This is a time-consuming task. In this project, you will design a unified data model for representing data about software development projects extracted for the purpose of analysis. You will also develop a set of adapters for extracting data from some of the above systems and storing it into a database structured according to the unified model.

GPU-accelerated data analytics

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project a set of GPU accelerated data mining or analytics algorithms will be implemented as an extension to an analytical database solution. For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to analytical databases (preferably MSSQL, Oracle or PostgreSQL), you will also need to learn the extension interfaces of these databases and their native development and BI tools. Finally, you will assess the performance gains of your algorithms compared to comparable algorithms in existing analytical database tools.

GPU-accelerated Developer Feedback System

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project you will implement source code analytics algorithms on GPU and devise a reliable and fast method for integrating the analysis feedback into integrated development environments (IDEs). For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to IDEs (preferably Visual Studio or Eclipse), you will also need to learn the extension interfaces of these IDEs and their native development tools. Finally, you will assess the performance gains of your algorithms compared to implementations of these algorithms running on CPU.

Replication of Empirical Software Engineering Case Study Experiments

Supervisor: Siim Karus (siim04 ät ut.ee)

Empirical software engineering community publishes many case studies validating different approaches and analytical algorithms to software engineering. Unfortunately, these studies are rarely validated by independent replication. To make matters worse, the studies use different validation metrics, which makes them incomparable. Thus, your mission, should you choose to accept it, is to analyse different published case studies on one topic (e.g. bug detection, code churn estimation) to evaluate their replicability and replicate the studies in order to make them comparable. In short you will:

envisage a workflow/pipeline for replicating published studies (including

testing for replicability);

use the workflow to replicate several studies;
validate these studies and compare their results on an common scale.

Hot Deployment of Linked Data for Online Data Analytics

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of this project is to design and implement an "hot" linked data deployment extension to an open source analytics server, such as RapidAnalytics. Software tools such as Weka or RapidMiner allow building analytical applications, which exploit knowledge hidden to data. However, one of the bottlenecks of such toolkits, in settings where a vast quantities of data with heterogeneous data models are available, is the amount of human effort required for first unification of the data models at the stage of data pre-processing and then extraction of relevant features for data mining. Furthermore, these steps are repeatedly executed each time a new dataset is added or an existing one is changed. However, in case of open linked data uniform representation of data leverages implicit handling of data model heterogeneity. Moreover, there exist open source toolkits, such as FeGeLOD [1] (), which automatically create data mining features from linked data. Unfortunately, the current approaches assume that a linked dataset is already pre-processed and available as a static file for which the features are created each time the file is loaded.

In this thesis project first an extension will be developed for discovering and loading a new dataset to an analytics server. Then existing data mining feature extraction methods will be enhanced and incorporated to the framework. Finally, the developed solution will be validated on a real-life problem.

[1] Heiko Paulheim, Johannes Fürnkranz. Unsupervised Generation of Data Mining Features from Linked Open Data. Technical Report TUD–KE–2011–2 Version 1.0, Knowledge Engineering Group, Technische Universität Darmstadt, November 4th, 2011. Available at http://www.ke.tu-darmstadt.de/bibtex/attachments/single/297 .

Semantic Interoperability Layer for Stateful Multi-Device Tizen Applications

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

This project aims at leveraging an interoperability layer for Tizen to allow potentially independently developed applications to interact with each other either at the same Tizen instance or between multiple ones on distributed devices. The solution will save time in developing B2B applications for Tizen and enhances adoption of Tizen for B2B applications on the Web, smart phones and embedded systems.

Current Tizen applications are developed in HTML5 with support for CSS and JavaScript and packaged according to W3C Widgets 1.0 family of specifications. Communication between such applications is implemented in terms of launching application services - similarly to intents, which are used to invoke activities referring to particular applications or components in Android. Both approaches are limited by 1) low granularity of communication primitives/data structures, 2) stateless application invocation, and 3) local application execution. These shortages affect the Tizen applications respectively as follows: limited amount of business object types, which can be exchanged between applications, support only for simple B2B tasks, no support for teamwork.

To tackle these limitations, our approach will extend the existing Tizen framework by introducing a set of Tizen applications services for interoperability, state-handling and inter-device communication. Thereby our extension does not require any modification to the existing Tizen framework itself. Instead, it will provide add-ons for making Tizen applications interoperable at finer level of granularity, incorporate inter-device capability and state handling to applications. For Tizen applications to benefit from the extension it is required that 1) they are extended with finer metadata on business objects, they support, 2) their application services are bound to the metadata-enriched business objects and 3) instead of launching object-specific application services, the interoperability application service should be launched. The latter will take care of selecting specific application services where to forward the business objects. Inter-device application communication and state handling will be transparent to the existing Tizen platform.

The developed framework layer will be demonstrated on a proof-of-concept implementation of cross-functional business performance management (BPM) application running on multiple Tizen-enabled devices in collaborative settings.

Open Cloud Infrastructure for Cost-Effective Harvesting, Processing and Linking of Unstructured Open Government Data

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Bootstrapping Open Government Data projects, even when not considering the complementary five-star initiatives for linking the data, represents a tremendous task, if implemented manually in an uncoordinated way in ad hoc settings. Although the essential datasets may be publicly available for downloading, location, documentation and preparation of them for publication in a central (CKAN) repository represent a burden, which is difficult to absorb by officials of public administration. Furthermore, linking the prepared and published datasets represents even further challenges, especially in the case of semistructured and full-text documents, where understanding the content is complicated by lack of clear structure. Namely, detection of entities, which should be linked and detection of metamodels, which should be used for linking, is a search-intensive task even for machines, not only for humans.

Luckily, there are some open source tools for simplifying the process. A set of tools, including the ones for semi-automatic link discovery, is represented in the LOD2 Technology Stack (http://stack.lod2.eu/blog/). In addition there are general-purpose text processing frameworks such as Apache OpenNLP and for the Estonian language there is a named entity recognition solution (svn://ats.cs.ut.ee/u/semantika/ner/branches/ner1.1) available. Finally, there is NetarchiveSuite (https://sbforge.org/display/NAS/Releases+and+downloads) for Internet archival, which can be used for creating Web snapshots.

This project aims at developing a cloud platform for harvesting Open Government Data and transforming it into Linked Open Government Data. The platform consists of a Web archival subsystem, open data repository (CKAN), document content analysis pipeline with named entity recognition and resolution pipeline, and finally a linked data repository for serving the processed data. The Web archival subsystem will continuously monitor changes in the Web by creating monthly snapshots of the Estonian public administration Web, comparing the snapshots and detecting new datasets (or changes) together with their metadata. The datasets together with their metadata are automatically published at a CKAN repository. The CKAN repository is continuously monitored for new datasets and updates and each change will trigger execution of the pipeline of document content analysis (i.e. analysis of CSV file content). The pipeline will detect named entities from the source documents, resolve the names wrt other linked datasets (i.e. aadresses or organizations) and finally publish the updates at a linked data repository with an open SPARQL endpoint. The latter will provide means for consumption of Linked Open Government Data.

A Crawler for RESTful, SOAP Services and Web Forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Transforming the Web into a Knowledge Base: Linking the Estonian Web

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of the project is to study automated linking opportunities for Web content in Estonian language. Recent advances in Web crawling and indexing have resulted in effective means for finding relevant content from the Web. However, getting answers to queries, which require aggregation of results, is still in its infancy since better understanding of the content is required. At the same time there has been a fundamental shift in the content linking - instead of linking Web pages, more and more Web content is tagged and annotated to facilitate linking of smaller fragments of Web pages by means of RDFa and microformat markups. Unfortunately this technology has not been widely adopted yet and further efforts are required to advance the Web in this direction.

This project aims at providing a platform for automating this task by exploiting existing natural language technologies, such as named entity recognition for Estonian language, in order to link content of the entire Estonian Web. For doing this, two Master students will work closely, first in setting up the conventional crawling and indexing infrastructure for the Estonian Web and then extending the indexing mechanism with a microtagging mechanism, which will enable linking the crawled Web sites. The microtagging mechanism will take advantage of existing language technologies to extract names (such as names of persons, organizations and locations) from the crawled Web pages. In order to validate the approach a portion of the Estonian Web is processed and exposed in RDF form through a SPARQL query interface such as the one provided by the Virtuoso OpenSource Edition.

Automated Estimation of Company Reputation

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Reputation is recognized as a fundamental instrument of social order - a commodity, which is accumulated over time, is hard to gain and easy to loose. In case of organizations reputation is also linked to their identity, performance and the way others respond to their behaviour. There is an intuition that reputation of a company affects perception of its value by investors, helps to attract new customers and to retain the existing ones. Therefore organizations, focusing to long-term operation, care about their reputation.

Several frameworks, such as WMAC (http://money.cnn.com/magazines/fortune/most-admired/, http://www.haygroup.com/Fortune/research-and-findings/fortune-rankings.aspx), used by the Fortune magazine, have been exploited to rank companies by their reputation. However, there are some serious issues associated with reputation evaluation in general. First, the existing evaluation frameworks are usually applicable to evaluation of large companies only. Second, the costs of applying these frameworks are quite high in terms of accumulated time of engaged professionals. I.e. in case of WMAC more than 10,000 senior executives, board directors, and expert analysts were engaged to fill questionnaires to evaluate nine performance aspects of Fortune 1000 companies in 2009. Third, the evaluation is largely based on subjective opinions rather than objective criteria making continuous evaluation cumbersome and increases the length of evaluation cycles.

This thesis project aims at finding a solution to these issues. More specifically, the project is expected an answer the following research question: in which degree the reputation of a company is determined by objective criteria such as its age, financial indicators, sentiment of news articles and comments in the Web etc. The more specific research questions are the following:

Which accuracy in reputation evaluation can be achieved by using solely objective criteria?
Which objective criteria and which combinations discriminate best reputation of organizations?
In which extent does reputation of an organization affect reputation of another organization through people common in their management?
How do temporal aspects (organization's age, related past events etc) bias reputation?

In order to answer to these questions network analysis and machine learning methods will be exploited and a number of experiments will be performed with a given dataset. The dataset to be used is an aggregation of data from the Estonian Business Registry, Registry of Buildings, Land Register, Estonian Tax and Customs Board, Register of Economic Activities, news articles from major Estonian news papers and blogs and some propriatory data sources.

Collaborative Decision-Making with Hot Deployment of Linked Data and Open API Endpoints

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Recent economic downturn has increased the pressure on organizations to focus on better decision making. In this light, Business Intelligence (BI) initiatives are used to reduce costs, run more targeted campaigns through better customer segmentation, or detect fraud, just to mention a few applications of BI. However, the major shortcoming of currently available BI tools is that they do not support the process of decision-making directly. In fact the tools provide input to decision-making while capturing the entire decision-making process is left outside the scope of the tools. Therefore interest in decision making processes, models and techniques by industry and academia has been growing in past years.

Another major shortcoming of BI is that it mostly assumes static datasets as its input and structural changes in datasets will lead to costly and redundant manual labour related to reconstruction and validation of new reference models. By structural changes we mean changes in the overall common data model, addition/replacement of an individual dataset and integration with external API for data retrieval, just to mention a few.

This project will tackle to mentioned shortcoming by:

developing a proof-of-concept solution for hot deployment of terabyte datasets and tens of thousands of open API endpoints;
developing an adaptivity layer for aligning and incrementally validating BI reference models to changes in data sources;
developing an effective scheme of collaborative decision-making, which will facilitate better decision making with respect to past decisions as measured in terms of relevant key performance identicators (KPI).

Estimating the Impact of Business Process Changes

Marlon Dumas (marlon dot dumas ät ut dot ee) and Marcello Sarini

Changes in the usual work practices especially if driven by the introduction of a BPM technology often are not well accepted by the involved workers, with often loss of time, money, productivity by the organization and also, at worst, the loss of confidence and of the levels of satisfaction of the involved workers with respect to the organization. Aim of this project is to find novel ways of reduce the effects of the changes, making workers aware in preliminary phases of changes of the consequences of these changes in their work life, especially by focusing on "core" organizational aspects related for instance to the degree of power within of the organization and how these are affected by the introduction of the BPM technology. This will help managers to faster identify the emergence of negative phenomena such as resistance to change and to limit their effects.

This Masters project aims at designing a platform for measuring the extents of a process's changes with regards to work distribution aspects after IT driven re-organization and for making people aware of the consequences of these changes in terms of "organizational power" (or other meaningful measures) which could affect the attitude of people involved by the change, in particular to identify how effects of changes (of perceptions) in organizational power might influence the emergence of the phenomenon of resistance to change with respect to the technology to be used, or decrease the level of users' satisfaction, or reduce organizational commitment, or to improve the level of negative workarounds (workplace deviance, production deviance, sabotage).

To this end, the following steps will be taken in this project:

to define a notation for work distribution aspects, finding also ways to incorporate it in different approaches and/or notations and to implement a GUI to make possible for the users use it.
to design and implement a comparison module (either manual, semi-automatic or automatic) to compare the process before and after change with focus on work distribution aspects.
to design and implement a metric computation module to measure the extent of the change of a business process, both considering generic similarity measures and also measures related to work distribution aspects;
to design and implement a visualization module to convey visually measures and make easily people aware of the effects of the changes of the business process in terms of something clearly perceived by the users affected by the change (e.g., their power within the organization).

It is expected that the output of your Masters thesis will become a publicly available tool.

Comparative Evaluation of Process Mining Tools

Marlon Dumas (marlon dot dumas ät ut dot ee)

Process mining is a rapidly emerging body of methods and tools for analyzing business processes based on event logs. A number of process mining tools have emerged over the past decade. In this Masters thesis, you will conduct a comparative evaluation of some of these tools with the aim to understand their relative trade-offs and applicability for different types of use cases.

Process Mining for Lean and Six Sigma

Marlon Dumas (marlon dot dumas ät ut dot ee)

In a blog post, Anne Rozinat from Fluxicon claims that process mining techniques can be used in the AS-is analysis phase to identify waste and improvement opportunities in the context of a Lean Six Sigma method. A similar claim has also been made by George Varvaressos and other consultants.

In this Master's thesis, you will attempt to give substance (or to invalidate) this claim by conducting a detailed analysis of how process mining techniques fit within the landscape of Lean and Six Sigma concepts and methods. The output will be well-defined methodological guidelines for using process mining during the analysis phase of Lean Six Sigma initiatives. The proposals will be applied to a case study.

Dependability Requirements: Engineering Safe and Secure Information Systems

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Dependability can be defined as the trustworthiness of an information system (IS) that allows reliance to be justifiably placed on the services it provides. Dependability, firstly, is focussed on properties such as availability, reliability and maintainability, but it has since been enhanced to include properties such as safety, security, privacy and others. This thesis primarily will focus in particularly on two of the latter types of dependability: security - or resilience to intended threats - and safety - or resilience to unintended hazards. The aim of the project is to design a systematic method for elicitation and validation of the dependable requirements with focus on security and safety requirements.

More specifically the project will focus on understanding the state of the art of the techniques and approaches that suggest aggregated means to elicit and validate dependability requirements. We will also consider the complementary strengths and weaknesses of the various dependability requirements engineering techniques to better understand how they should be combined. The thesis should propose a set of targeted rules for interplaying the dependability requirements. These rules should be applied in the case studies and/or experiments to drive development of the methods and tools and to ensure that the end results are backed empirically. Potentially, contributing to safer and more secure IS would facilitate creating a trust in the social sphere, every facet of which today has become ICT-based.

The project will consist of four major steps:

Performing the survey of techniques and approaches for dependability requirements;
Analysing fine-grained quality of the techniques and approaches for dependability requirements;
Developing the interplay between techniques and approaches for the dependability requirements;
Validating the methods in the case studies and/or experiments.

Security patterns for model-driven information system security

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Security requirements engineering plays an important role during software system development. However in many cases they are overlooked and considered only at the end of the software development. A possible way to improve this situation is development of the systematic instruments which would facilitate security requirements elicitation. Security patterns describes a particular recurring security problem that arises in a specific security context and presents a well-proven generic scheme for a security solution. Following this definition, the goal of this thesis is to develop a set of security risk-oriented patterns which would be applicable within different information system models expressed using modelling languages, such as Secure Tropos, misuse cases and mal-activity diagrams. The Masters project includes the following steps:

Performing a literature overview on the security patterns and security modelling languages
Selecting one modelling language for further analyses and determining a set of model for information systems expressed using this modelling language.
Developing a set of security patterns in the selected modelling language.
Validating the developed security patterns empirically.

Linking (Secured) Business Processes to (Security) Models of Information Systems

Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Securing information systems includes understanding how business processes could be performed in the secure way. In previous work, we have defined a method to elicit security requirements from business processes. The goal of this thesis is the development of how these security requirements should be represented using different security modelling techniques and then systematically addressed/implemented in the business processes. The Masters project includes the following steps:

Understanding how security requirements are elicited from business processes.
Defining the way to translate these security requirements and represent them in different modelling languages.
Introducing systematically the security requirements to the business processes.
Validating the proposal in the empirical settings.

Data Analysis Toolkit for Solid State Physics

Sergey Omelkov (firstname.lastname ät ut.ee)

Modern experimental setups for solid state physics has approached the limits in data acquisition speeds, so that the amount of data obtained is growing faster than the scientists are able to analyze using "conventional" methods. In the case of well-established experimental methods, the problem is usually somehow solved by suppliers of equipment who develop a highly specialized expensive software to do batch data analysis for a particular problem. However, this is impossible for state-of-the-art unique experimental stations, which are the main workhorses for the high-end research.

The objective of this task will be to start an open-source project and develop a universal yet powerful tool for data analysis in solid state physics. The working proof-of-concept for such tool has been developed and tested by the Institute of Physics, this concept can be used as a starting point. The tool will be based on a math scripting engine to handle the calculations (currently the symbiosis of SAGE and numpy), and a document-oriented database for storing the raw experimental data and calculation results (currently MongoDB).

The tools to be developed are (in the order of importance):

A data type suitable to store the data and analysis results, which is serializeable to database.
A set of methods for data processing commonly used in spectroscopy for data analysis, using the power of underlying math scripting engine
A tool to add the experimental data to DB directly from experimental setup software (in a form of LabView VIs)
A graphical tool to browse the DB and quickly import the data to scripting engine
An interface to import the calculation results (mainly images) from DB into text processors (LaTex, LyX, MSWord), maybe also into conventional data analysis programs, like Origin.
A system should be a multiuser environment for data exchange and protection (by the means of database)

The main requirement to data analysis process is: the result of any calculation stored in DB should either bear the links to initial data and the calculation procedure, or be simply a script that produces the result.

This project requires that the student is willing to understand the way the physicists see the data acquisition and analysis process. Experience in Python would also be much needed.

Visualization of traffic flow and/or people density changes with animated texture/particles.

Toivo Vajakas (firstname.lastname ät ut.ee)

Download the project description.

Bachelors projects

Note: The number of projects for Bachelors students is limited. But you can find several other potential project ideas by checking this year's list of proposed software projects. Some of the projects in the "Available" category could be used as Bachelors thesis topics. Also, we're open to student-proposed Bachelors thesis projects. If you have an idea for your Bachelors projects and your idea falls in the area of software engineering (broadly defined), please contact the group leader: Marlon . Dumas ät ut.ee

Reverse engineering the RESTBucks’s API

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

In resent years, multiple tools have emerged that allow one to produce interactive documentation for RESTful applications, often referred to as API blueprints. An API blueprint is basically a Web application that describes the set of resources and operations on them and, on the other hand, provides a way to test the functionality of the application.

The goal of this project is to take an open-source RESTful application, to reverse engineer its API and to specify it using two tools, namely Apiary’s Blueprint and Swagger. The project will allow you to critically compare the two tools.

Web Front-End for BIMP

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Business process simulation is a valuable tool for understanding the trade-offs when (re-)designing business processes. For instance, it provides ways for assessing the impact of changes on business processes, thus providing valuable insights to business analysts when it comes to decide how to proceed in the implementation of changes.

A few years ago, our group developed a simulation tool, called BIMP. BIMP is available as "Software as a Service" and is currently used by several Universities in their courses of Business Process Management. Currently, BIMP uses a form-based interface to enter the simulation information. The goal of this project is to implement a Javascript front-end application, that renders the BPMN model and allows the user to pick BPMN elements to enter the simulation information. For this project, we expect the student to have knowledge of Javascript and proficiency with Java-based web application development.

Web Front-End for BPMN-Miner Tool

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Our research group has developed a Java-based tool (namely BPMN-Miner) that takes as input a business process execution log (extracted from an enterprise system) and generates a business process model captured in the standard BPMN notation (in XML format).

The goal of this project is to implement a Javascript-based web front-end that will expose the functionality of the BPMN-Miner tool online. The front-end application will allow users to upload logs, and will graphically render the resulting BPMN models on the browser. For this project, we expect the student to have knowledge of Javascript and be familiar with Java-based web application development.

Lightning-Fast Multi-Level SOAP-JSON Caching Proxy (Bachelors topic)

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

In a previous Master thesis a solution was developed for proxying SOAP requests/responses to JavaScript widgets exchanging messages with JSON payload. Although this approach was shown to be useful for surfacing Deep Web data, it suffers from some performance bottlenecks, which arise when a SOAP endpoint is frequently used.

This Bachelors thesis aims at developing a cache component, which will make dynamic creation of SOAP-JSON proxies more effective with respect to runtime latency. The resulting cache component will be evaluated from the performance point of view.

A Crawler for RESTful, SOAP Services and Web Forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Plugin for Discovering Business Rules from Event Logs

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

The discovery of business rules from event logs is emerging as a new challenge in the business process management field. The actual process behavior as recorded in execution traces is described as a set of business rules (expressed, e.g., using linear temporal logic) and used for process analysis. The candidate is required to implement a (Java) plug-in for a well-known process mining framework called ProM. The code for discovering business rules is already available in C code, so the main task is to create a wrapper for this code and integrate it with ProM.

In this project, we will not assume that you have any prior knowledge in the field of business process management or process mining. We will give you all the input knowledge in this field that you will require to complete the project. All you need are Bachelors-level (Java) programming skills, basic knowledge of XML, and a strong desire to learn new things.

Creating a Smartphone Application to Use Human Knowledge to Improve State of the Art Artificial Neural Networks

Kristjan Korjus (firstname.lastname ät ut.ee)

Recent developments in artificial neural networks have led to huge improvements in image classification performance. Nevertheless, humans are still largely superior to classification algorithms in carrying out such tasks. Ideally, this knowledge would be exploited to improve state of the art artificial neural network classifiers. The goal of this project is to create a database that can be used for that purpose. The project of the student is to create a smartphone application (Android, IOS) which can be accessed from people around the world who can contribute to the goal of the project. There are supposed to be multiple tasks in which users can rate natural images according to different image dimensions, for example their similarity. The structure of the application should be similar to a game to improve motivation of subjects and get many people to participate. This would yield a large and reliable dataset of image ratings that can be used by experimenters, but also by image classification experts worldwide.

Critical Comparison of the Business Motivation Model (BMM) and i*

Raimundas Matulevicius (raimundas.matulevicius ät ut.ee)

The Business Motivation Model (BMM) is a standardized modeling language to capture important concepts about why a business is undertaking certain actions, such as developing an information system. On the other hand, i* is a modeling language to specify actors and their goals, during the early phases of information system development. There are clear overlaps between these two languages, but also some relevant differences.

The questions to be answered in this thesis are: How does BMM compares against i*? Do they address essentially the same perspective? Do they complement each other? This question will be approached by defining a correspondence between the concepts in these languages and by applying them to a concrete case study.

Data Analysis Toolkit for Solid State Physics

Sergey Omelkov (firstname.lastname ät ut.ee)

The tools to be developed are (in the order of importance):

A data type suitable to store the data and analysis results, which is serializeable to database.
A set of methods for data processing commonly used in spectroscopy for data analysis, using the power of underlying math scripting engine
A tool to add the experimental data to DB directly from experimental setup software (in a form of LabView VIs)
A graphical tool to browse the DB and quickly import the data to scripting engine
An interface to import the calculation results (mainly images) from DB into text processors (LaTex, LyX, MSWord), maybe also into conventional data analysis programs, like Origin.
A system should be a multiuser environment for data exchange and protection (by the means of database)

This project requires that the student is willing to understand the way the physicists see the data acquisition and analysis process. Experience in Python would also be much needed.

Visualization of traffic flow and/or people density changes with animated texture/particles.

Toivo Vajakas (firstname.lastname ät ut.ee)

Download the project description.