Student Projects 2013/2014

Below is a list of project topics for Masters and Bachelors theses offered by the software engineering research group in 2013-2014. The projects are divided into:

If you're interested in any of these projects, please contact the corresponding supervisor.

Note that the number of projects for Bachelors students is limited. For Bachelors students we're open to student-proposed projects. So if you have an idea for your Bachelors projects and your idea falls in the area of software engineering (broadly defined), please contact the group leader: Marlon . Dumas ät ut.ee


Masters projects

Case Study on Exploratory Testing

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Exploratory software testing (ET) is a powerful and fun approach to testing. The plainest definition of ET is that it comprises test design and test execution at the same time. This is the opposite of scripted testing (having test plans and predefined test procedures, whether manual or automated). Exploratory tests, unlike scripted tests, are not defined in advance and carried out precisely according to plan.

Testing experts like Cem Kaner and James Bach claim that - in some situations - ET can be orders of magnitude more productive than scripted testing, and a few empirical studies exist supporting this claim to some degree. Nevertheless, ET is usually is often confused with (unsystematic) ad-hoc testing and thus not always well regarded in both academia and industrial practice.

The objective of this project will be to conduct a case study in a software company investigating the following research questions:

  • To what extend is ET currently applied in the company?
  • What are the advantages/disadvantages of ET as compared to other testing approaches (i.e., scripted testing)?
  • How can the current practice of ET be improved?
  • If ET is currently not used at all, what guidance can be provided to introduce ET in the company?

This project requires that the student has (or is able to establish) access to a suitable software company to conduct the study.

Note that one such thesis project is currently ongoing in one Estonian company and thus, your target company must be a different one.

Exploring the Software Release Planning Problem with Constraint Solving

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Decision-making is central to Software Product Management (SPM) and includes deciding on requirements priorities and the content of coming releases. Several algorithms for prioritization and release planning have been proposed, where humans with or without machine support enact a series of steps to produce a decision outcome. Instead of applying some specific algorithm to find an acceptable solution to a decision problem, in this thesis we propose to model SPM decision-making as a Constraint Satisfaction Problem (CSP), where relative and absolute priorities, inter-dependencies, and other constraints are expressed as relations among variables representing entities such as feature priorities, stakeholder preferences, and resource constraints. The solution space is then explored with the help of a constraint solver without humans needing to care about specific algorithms.

The goal of this thesis project is to discuss advantages and limitations of CSP modeling in SPM and to give principal examples as a proof-of-concept of CSP modeling in requirements prioritization and release planning. If time permits, an evaluation of the CSP-based models via comparison with established tools such as ReleasePlanner will be part of the project.

The project will consist of the following steps:

  • Formulation of the release planning problem as a CSP
  • Familiarisation with JaCoP – Java Constraint Solver or an equivalent tool
  • Development of a constraint solver for the release planning problem with JaCoP
  • Application of the constraint solver to a set of open source feature models available from the SPLOT Feature Model Repository, maintained at the University of Waterloo, Canada
  • Performance evaluation of the constraint solver
  • Optional: Comparison with the performance of existing release planning tools, e.g., ReleasePlanner
  • Summary of the findings, discussion, outline of recommended follow-up research

Gamification for Software Engineering Education

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Gamification as a business practice has exploded over the past years. Organizations are applying it in areas such as marketing, human resources, productivity enhancement, sustainability, training, health and wellness, innovation, and customer engagement.

The objective of this project will be to apply gamification in the context of higher education in software engineering. To this end, the following steps will be taken in this project:

  • Selection of a suitable university course (whole course or course elements, including lab sessions). Suggested examples are courses on software testing or software engineering management. The final decision on the course selection and scope will be agreed between the student and supervisor before the start of the project.
  • Study into the nature and techniques of gamification (i.e., gamification design framework).
  • Application of gamification to the selected course.
  • Evaluation of the gamified course.

This project requires that the student either has taken the course that will be subject to gamification or that he/she accepts one of the course/lab proposals made by the supervisor.

Web-Based Single or Multi-Player Project Simulation Game

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Software development is a dynamic and complex process as there are many interacting factors throughout the life-cycle that impact cost and schedule of the development project, and quality of the developed software product. In addition, software industry constantly faces increasing demands for quality, productivity, and time-to-market, thus making the management of software development projects one of the most difficult and challenging tasks in any software organization.

The potential of simulation models for the training of managers has long been recognized: flight-simulator-type environments (or microworlds) confront managers with realistic situations that they may encounter in practice, and allow them to develop experience without the risks incurred in the real world.

The objective of this project is to develop a simulation-based software project management game for two to four players, comprising the following elements:

  • Development of a process simulation model (based on existing work), integrated into a web-based application suitable for single or multi-player gaming sessions
  • Didactic gaming scenarios
  • Proof of concept, i.e., at least one successful game played by students with lessons learnt recorded

A Simulator for Analysing the Robustness of Software Release Plans

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

With ReleasePlanner(TM), developed at the University of Calgary, Canada, a tool exists that proposes optimised alternatives of requirements allocations to software releases, i.e., release plans. However, the quality (optimality) of a release plan depends highly on assumptions made about the cost and benefit of requirements as well as their dependencies. It is generally unclear, to what extent small errors in the underlying assumptions impact the configurations of proposed optimised release plans. If the impacts are small (i.e., feature allocations don’t change dramatically), a proposed release plan might be considered robust against such kinds of errors.

The objective of this thesis project is to develop a method that would develop a systematic approach for the robustness analysis of automatically generated software release plans, using existing tools for software release planning, such as, for example, the tool ReleasePlanner developed at the University of Calgary, Canada.

This project could be conducted in collaboration with the University of Calgary and might offer the opportunity for a visit of Prof. Ruhe's Software Engineering Decision Support Laboratory (SEDSL) in Calgary, Canada.

Development and Application of a Process Simulator Family for Analysing Software Development Processes

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Process simulation is a common practice in many engineering disciplines. The advantage of process simulation is that the capability of existing (and future) process designs can be evaluated without executing the actual process. However, in software development, process simulation s not a common practice to analyse (and improve) the processes according to which software is developed. There are many reasons for this situation. Two important reasons are the difficulty of calibrating process models with real-world data (due to the lack of such data) and the lack of stability of used processes. Nevertheless, process simulation could be a useful research tool to analyse the capability of process paradigms (waterfall, iterative, incremental, various types of agile and lean processes, etc.) under varying assumptions about the application context (e.g., type of product, available resources, quality goals, project size).

The aims of this thesis project are the following:

  • Develop a family of process simulators representing process paradigms commonly used in industrial software development projects.
  • Systematic evaluation of process paradigms in various contexts using the developed process simulators.
  • Based on the analysis of evaluation results, discussion of the advantages/disadvantages of process paradigms.

The project will consist of the following steps:

  • Selection of a process modeling tool (choices are to be determined; one possibility is to use a business process modeling tool which has simulation capability)
  • Selection of process paradigms to be analysed
  • Definition of contexts and application scenarios (to evaluate process paradigms)
  • Development of process simulators
  • Application of process simulators to evaluate process paradigms in various contexts

Note: This thesis topic can be worked on by several students. The task can be split with regards to the choice of the modeling tools and/or the choice of process paradigms. For students interested in a BSc thesis, this topic can be tailored to fit into the reduced time frame.

Discovering BPMN Process Models

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Process mining is a family of techniques to discover business process models and other knowledge about business processes from event logs. Currently available techniques for automated discovery of process models, such as those provided by the ProM framework, are able to discover Petri nets or other types of net-based models, while most contemporary tools for business process modeling and analysis rely on the standard BPMN notation.

In this project you will design and implement a technique to automatically discover process models represented in the BPMN notation. You will start from existing automated process discovery techniques and design transformation techniques to convert the output of these techniques into flat BPMN models. You will then design a number of enhancements to this technique in order to produce BPMN models that are as readable as possible.

It is expected that the output of your Masters thesis will become a publicly available tool that would be made available on a software-as-a-service basis, in a similar style as the ‎BIMP simulator.

Mining Business Process Models with Exception Handlers

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Process mining is a family of techniques to discover business process models and other knowledge about business processes from event logs. Existing process mining techniques are able to automatically discover large and complex process models from event logs. However, oftentimes the models produced by these techniques are difficult to understand. One reason why such models are difficult to understand is because they only capture "local" control-flow relations such as sequential flow, conditional splits and parallel splits. For example, these techniques are not able to produce models with exception handlers, which in some situations make models easier to understand because they clearly separate normal behavior from exceptional or "secondary" behavior.

In this project you will design and implement new techniques to automatically discover process models with exception handlers. The models to be produced will be captured in the BPMN notation and will include both interrupting and non-interrupting boundary events, which are the constructs available in BPMN for capturing exceptional and secondary behavior. This project requires some background knowledge in BPM (for example having completed the BPM course).

Mining Business Process Models with Multi-Instance Activities

Supervisor: Luciano García-Bañuelos (luciano dot garcia ät ut dot ee)

Process mining is a family of techniques to discover business process models and other knowledge about business processes from event logs. Existing process mining techniques are able to automatically discover large and complex process models from event logs. However, oftentimes the models produced by these techniques are difficult to understand. One reason why such models are difficult to understand is because they only capture "local" control-flow relations such as sequential flow, conditional splits and parallel splits. For example, these techniques misinterpret the case where multiple instances of a given activity are performed in parallel and then synchronized upon completion. This happens for example in a business process where raw materials need to be obtained from multiple suppliers in order to assemble a product.

In this project you will design and implement new techniques to automatically discover process models with so-called multi-instance activities as well as synchronization constraints attached to these multi-instance activities. This project requires some background knowledge in BPM (for example having completed the BPM course).

Efficient Genetic Process Mining

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Process mining is a family of techniques to discover business process models and other knowledge about business processes from event logs. Process mining techniques need to extract process models that strike a tradeoff between accuracy (how well they reflect the behavior in the log) and understandability. A family of techniques that allow to effectively strike such a tradeoff are genetic process mining algorithms. However, genetic process mining techniques are not efficient. One of the bottlenecks of these algorithms is the evaluation of the fitness of each individual of a population of process models. This is due to the fact that the fitness is evaluated from scratch for each process model even if some models are very similar to each other. The outcome of the thesis will be an improved algorithm for genetic mining. This algorithm will improve the performances of the original one by reducing the time needed for the fitness evaluation exploiting the similarity between the individuals of a population of process models.

Data-Aware Declarative Process Mining on GPUs

Supervisor: Fabrizio Maggi (f.m.maggi ät ut dot ee)

Declarative process models offer various advantages relative to imperative ones, such as the ability to seamlessly represent a process model concisely and at different levels of detail. This advantage can be exploited for example in the context of automated discovery of process models from event logs. Instead of discovering models that tells us everything that can happen in the process, we can discover models that explain the most relevant constraints and rules of the process.

Existing techniques for automated discovery of declarative process models are rather inefficient when taking into consideration the data perspective of a business process in addition to the control-flow perspective. A solution for improving the performance of existing algorithms for data-aware declarative process discovery is to adapt them for distributed processing and to be run on Graphic Processing Units (GPUs). The algorithm will be implemented with CUDA libraries. This thesis will be conducted in collaboration with a team at University of Padova (Italy). There may be a possibility of obtaining a travel grant to visit the research team in Padova during the Masters project.

Analyzing Security Requirements via Business Process Execution Logs

Supervisor: Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Business process modelling is an activity to represent and organise enterprise working processes so that they could be optimatically analysed and improved. Assuming that business analyst concentrates on improving the business performance, security analysis could help discovering alternatives that do not offer sufficient security levels. The execution of the business processes in many cases is captured in the process logs. The major goal of this thesis is to analyse the information collected in the business logs and to develop a method (and technique) to capture security concerns, requirements and potential security controls. A potential approach is to identify a set of patterns that could potentially help define security concerns from the business logs.

This Masters project will consist of the following steps:

  • Performing a survey of existing literature on mining of business process logs to analyze security requirements;
  • Developing a method to identify and assess potential security concerns from business process logs given a set of security requirements;
  • Validating the developed proposals via cases studies using available business process logs.

Dependability Requirements: Engineering Safe and Secure Information Systems

Supervisor: Raimundas Matulevicius (firstname dot lastname ät ut dot ee)

Dependability can be defined as the trustworthiness of an information system (IS) that allows reliance to be justifiably placed on the services it provides. Dependability, firstly, is focussed on properties such as availability, reliability and maintainability, but it has since been enhanced to include properties such as safety, security, privacy and others. This thesis primarily will focus in particularly on two of the latter types of dependability: security - or resilience to intended threats - and safety - or resilience to unintended hazards. The aim of the project is to design a systematic method for elicitation and validation of the dependable requirements with focus on security and safety requirements.

More specifically the project will focus on understanding the state of the art of the techniques and approaches that suggest aggregated means to elicit and validate dependability requirements. We will also consider the complementary strengths and weaknesses of the various dependability requirements engineering techniques to better understand how they should be combined. The thesis should propose a set of targeted rules for interplaying the dependability requirements. These rules should be applied in the case studies and/or experiments to drive development of the methods and tools and to ensure that the end results are backed empirically. Potentially, contributing to safer and more secure IS would facilitate creating a trust in the social sphere, every facet of which today has become ICT-based.

The project will consist of four major steps:

  • Performing the survey of techniques and approaches for dependablity requirements;
  • Analysing fine-grained quality of the techniques and approaches for dependability requirements;
  • Developing the interplay between techniques and approaches for the dependability requirements;
  • Validating the methods in the case studies and/or experiments.

Hot Deployment of Linked Data for Online Data Analytics

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of this project is to design and implement an "hot" linked data deployment extension to an open source analytics server, such as RapidAnalytics. Software tools such as Weka or RapidMiner allow building analytical applications, which exploit knowledge hidden to data. However, one of the bottlenecks of such toolkits, in settings where a vast quantities of data with heterogeneous data models are available, is the amount of human effort required for first unification of the data models at the stage of data pre-processing and then extraction of relevant features for data mining. Furthermore, these steps are repeatedly executed each time a new dataset is added or an existing one is changed. However, in case of open linked data uniform representation of data leverages implicit handling of data model heterogeneity. Moreover, there exist open source toolkits, such as FeGeLOD [1] (), which automatically create data mining features from linked data. Unfortunately, the current approaches assume that a linked dataset is already pre-processed and available as a static file for which the features are created each time the file is loaded.

In this thesis project first an extension will be developed for discovering and loading a new dataset to an analytics server. Then existing data mining feature extraction methods will be enhanced and incorporated to the framework. Finally, the developed solution will be validated on a real-life problem.

[1] Heiko Paulheim, Johannes Fürnkranz. Unsupervised Generation of Data Mining Features from Linked Open Data. Technical Report TUD–KE–2011–2 Version 1.0, Knowledge Engineering Group, Technische Universität Darmstadt, November 4th, 2011. Available at http://www.ke.tu-darmstadt.de/bibtex/attachments/single/297 .

Two-Staged Crowdsourcing Tool for Linking Data and Evaluating Linked Data Quality

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Linked data and crowdsourcing have emerged in recent years as mechanisms to enable effective and economical large-scale data collection, filtering, aggregation, and presentation on the Web. Recent initiatives such as Civil War Data 150 have had some success at combining linked data methods with crowdsourcing. However, this and other initiatives suffer from a major drawback, namely a lack of quality assurance (QA) during the crowd-sourcing process. At the same time QA has been successfully handled in reCAPTCHA, which is used to crowdsource transcription of text from digitized books in the Internet Archive.

The goal of this Master thesis work will be to develop a tool for crowdsourcing that will suppport the creation and quality-checking of links between data objects originating from a variety of sources. The main contribution will be an implementation of a two-staged method, allowing first the creation of links between datasets at the metadata level, and second, the evaluation of the created links by means of simple closed-ended questions. The question-answering process will be facilitated through a specific widget, which can be easily placed in any Web site, similarly to reCAPTCHA. As a starting point the DBpedia will be used for experimentation.

Data Source Selection Strategies for Deep Web Surfacing

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Deep Web search aims at surfacing the data, not available to mainstream Web crawlers, from databases and other data sources to the visible Web for further use. During the past years several components of a Deep Web search engine have been developed at ATI for covering different aspect of deep Web search. These components include: SOAP-JSON proxy for retrieving data from Web services, data visualization engine and SOAP cache solution for faster data retrieval. A simple prototype solution capable to handle a case consisting of several services has been designed and implemented. However, this solution lacks performance and scalability, which origins partly from inefficient data source selection scheme. The aim of this project is to design a strategy for effective and efficient selection of Web services during the search session. The proposed strategy will be evaluated on a large collection of Web services providing hundreds of thousands of operations for data retrieval.

A Crawler for RESTful, SOAP Services and Web Forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Transforming the Web into a Knowledge Base: Linking the Estonian Web

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of the project is to study automated linking opportunities for Web content in Estonian language. Recent advances in Web crawling and indexing have resulted in effective means for finding relevant content from the Web. However, getting answers to queries, which require aggregation of results, is still in its infancy since better understanding of the content is required. At the same time there has been a fundamental shift in the content linking - instead of linking Web pages, more and more Web content is tagged and annotated to facilitate linking of smaller fragments of Web pages by means of RDFa and microformat markups. Unfortunately this technology has not been widely adopted yet and further efforts are required to advance the Web in this direction.

This project aims at providing a platform for automating this task by exploiting existing natural language technologies, such as named entity recognition for Estonian language, in order to link content of the entire Estonian Web. For doing this, two Master students will work closely, first in setting up the conventional crawling and indexing infrastructure for the Estonian Web and then extending the indexing mechanism with a microtagging mechanism, which will enable linking the crawled Web sites. The microtagging mechanism will take advantage of existing language technologies to extract names (such as names of persons, organizations and locations) from the crawled Web pages. In order to validate the approach a portion of the Estonian Web is processed and exposed in RDF form through a SPARQL query interface such as the one provided by the Virtuoso OpenSource Edition.

Automated Estimation of Company Reputation

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Reputation is recognized as a fundamental instrument of social order - a commodity, which is accumulated over time, is hard to gain and easy to loose. In case of organizations reputation is also linked to their identity, performance and the way others respond to their behaviour. There is an intuition that reputation of a company affects perception of its value by investors, helps to attract new customers and to retain the existing ones. Therefore organizations, focusing to long-term operation, care about their reputation.

Several frameworks, such as WMAC (http://money.cnn.com/magazines/fortune/most-admired/, http://www.haygroup.com/Fortune/research-and-findings/fortune-rankings.aspx), used by the Fortune magazine, have been exploited to rank companies by their reputation. However, there are some serious issues associated with reputation evaluation in general. First, the existing evaluation frameworks are usually applicable to evaluation of large companies only. Second, the costs of applying these frameworks are quite high in terms of accumulated time of engaged professionals. I.e. in case of WMAC more than 10,000 senior executives, board directors, and expert analysts were engaged to fill questionnaires to evaluate nine performance aspects of Fortune 1000 companies in 2009. Third, the evaluation is largely based on subjective opinions rather than objective criteria making continuous evaluation cumbersome and increases the length of evaluation cycles.

This thesis project aims at finding a solution to these issues. More specifically, the project is expected an answer the following research question: in which degree the reputation of a company is determined by objective criteria such as its age, financial indicators, sentiment of news articles and comments in the Web etc. The more specific research questions are the following:

  1. Which accuracy in reputation evaluation can be achieved by using solely objective criteria?
  2. Which objective criteria and which combinations discriminate best reputation of organizations?
  3. In which extent does reputation of an organization affect reputation of another organization through people common in their management?
  4. How do temporal aspects (organization's age, related past events etc) bias reputation?

In order to answer to these questions network analysis and machine learning methods will be exploited and a number of experiments will be performed with a given dataset. The dataset to be used is an aggregation of data from the Estonian Business Registry, Registry of Buildings, Land Register, Estonian Tax and Customs Board, Register of Economic Activities, news articles from major Estonian news papers and blogs and some propriatory data sources.

Analyzing the Evolution of Formal Networks of Companies and Their Board Members for Bankruptcy Prediction

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

There are certain symptoms that characterize companies that are likely to face bankruptcy in the near future. One of them, in addition to various financial ratios, is a high turnover in the management. Furthermore, in case of fraudulent or strategic bankruptcies there is often a closed group of people who will be brought to the management to take over the responsibilities of the previous management.

The aim of this thesis is to design a set of heuristics able to estimate the likelihood that a company will go bankrupt in the near future. For doing this, first, the evolution of business networks around affected companies will be analysed. Based on the identified evolution patterns, suitable heuristics will be developed and, finally, validated on real-life datasets. The input data for this project will be provided by Inforegister.

Tools for software project data collection and integration

Supervisor: Siim Karus (siim04 ät ut.ee)

Data generated in software projects is usually distributed across different systems (e.g. CVS, SVN, Git, Trac, Bugzilla, Hudson, Wiki, Twitter). These systems have different purposes and use different data models, formats and semantics. In order to analyze software projects, one needs to collect and integrate data from multiple systems. This is a time-consuming task. In this project, you will design a unified data model for representing data about software development projects extracted for the purpose of analysis. You will also develop a set of adapters for extracting data from some of the above systems and storing it into a database structured according to the unified model.

GPU-accelerated data analytics

Supervisor: Siim Karus (siim04 ät ut.ee)

In this project a set of GPU accelerated data mining or analytics algorithms will be implemented as an extension to an analytical database solution. For this task, you will need to learn parallel processing optimisations specific to GPU programming (balancing between bandwidth and processing power), implement the analytics algorithms, and design a user interface to accompany it. As the aim is to provide extension to analytical databases (preferably MSSQL, Oracle or PostgreSQL), you will also need to learn the extension interfaces of these databases and their native development and BI tools. Finally, you will assess the performance gains of your algorithms compared to comparable algorithms in existing analytical database tools.

Code clone detection using wavelets

Supervisor: Siim Karus (siim04 ät ut.ee)

Code clones have been identified as "bad smells" in software development often leading to increased maintenance costs and increased code complexity. Thus, identification of such clones is a required step of code quality assurance. Wavelet analysis has been found to be extremely useful for clone detection in image processing and financial market analysis. Wavelets have the benefit of allowing comparisons than span different scales and strength. Wavelet analysis also benefits a lot from parallelisation, which has become more affordable thanks to GPU computing and cloud computing advances. Thus, it makes sense to evaluate wavelet analysis for solving problems in software engineering as well.

In this project you will evaluate the usefulness of wavelets for code clone detection. You will accomplish that by first designing/proposing a way to encode source code as multidimensional numeric series and then running wavelets based clone detection algorithm on the series. Finally, you need to assess the performance of your solution to alternative solutions.

End-to-End Automated Validation of HPLC Analytical Procedures

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

High-performance liquid chromatography is a technique used in analytical chemistry to separate compounds out of a given mixture.

Whenever an HPLC analysis is performed on a given substance, the results provided by the HPLC equipment need to be validated in order to ensure that they are reliable with respect to the purpose of the analysis. To this end, the laboratory personnel needs to gather a significant amount of data and analyze it using various procedures.

In a previous Master's thesis, a system was developed that allows a developer to define Web forms and report generators for validation of HPLC procedures. Also, one guideline (out of about 10 possible guidelines) was implemented from start to end, including all the required forms and reports.

The aim of this new Master's project is to extend this system with the ability to automate the transfer of data from HPLC equipment to the Web-based system for HPLC validation (CVG) and to automate validation steps that currently require manual intervention, specifically those related to the identification of spikes in histograms. The goal is to have a workflow for validation that is as automated as possible.

The project will also address the problem of reusing data collected during a validation using a given guideline, in order to perform a validation using a different guideline, so as to minimize the effort required to perform validations using multiple guidelines.

This Master's project is part of a broader project on automation of HPLC Valdidation involving UT's Institute of Computer Science and Institute of Chemistry. Funding is available to provide remuneration to Master's students who contribute to this project.

Data Analysis Toolkit for Solid State Physics

Sergey Omelkov (firstname.lastname ät ut.ee)

Modern experimental setups for solid state physics has approached the limits in data acquisition speeds, so that the amount of data obtained is growing faster than the scientists are able to analyze using "conventional" methods. In the case of well-established experimental methods, the problem is usually somehow solved by suppliers of equipment who develop a highly specialized expensive software to do batch data analysis for a particular problem. However, this is impossible for state-of-the-art unique experimental stations, which are the main workhorses for the high-end research.

The objective of this task will be to start an open-source project and develop a universal yet powerful tool for data analysis in solid state physics. The working proof-of-concept for such tool has been developed and tested by the Institute of Physics, this concept can be used as a starting point. The tool will be based on a math scripting engine to handle the calculations (currently the symbiosis of SAGE and numpy), and a document-oriented database for storing the raw experimental data and calculation results (currently MongoDB).

The tools to be developed are (in the order of importance):

  • A data type suitable to store the data and analysis results, which is serializeable to database.
  • A set of methods for data processing commonly used in spectroscopy for data analysis, using the power of underlying math scripting engine
  • A tool to add the experimental data to DB directly from experimental setup software (in a form of LabView VIs)
  • A graphical tool to browse the DB and quickly import the data to scripting engine
  • An interface to import the calculation results (mainly images) from DB into text processors (LaTex, LyX, MSWord), maybe also into conventional data analysis programs, like Origin.
  • A system should be a multiuser environment for data exchange and protection (by the means of database)

The main requirement to data analysis process is: the result of any calculation stored in DB should either bear the links to initial data and the calculation procedure, or be simply a script that produces the result.

This project requires that the student is willing to understand the way the physicists see the data acquisition and analysis process. Experience in Python would also be much needed.

Advanced Business Process Model Simulator

Marlon Dumas (firstname.lastname ät ut.ee)

In a previous Masters thesis, Madis Abel developed a fast simulator for business process models captured in the BPMN notation. The resulting simulator has been subsequently made publicly available (see the BIMP online simulator) and is used for teaching and other applications by hundreds of users worldwide.

The BIMP simulator however adopts a relatively simple approach for specifying and running simulations. In particular, it does not provide any confidence intervals to evaluate the reliability of the simulation results, and it does not provide any means for validating the simulation parameters against real execution logs. Techniques for dealing with these limitations are outlined in the following paper among others.

In this project you will extend the BIMP simulator with advanced simulation and validation features. The outcome will be an enhanced version of the BIMP simulator. The project will require strong Java programming skills and some knowledge of business process modeling and analysis.

Bachelors projects

Lightning-Fast Multi-Level SOAP-JSON Caching Proxy (Bachelors topic)

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

In a previous Master thesis a solution was developed for proxying SOAP requests/responses to JavaScript widgets exchanging messages with JSON payload. Although this approach was shown to be useful for surfacing Deep Web data, it suffers from some performance bottlenecks, which arise when a SOAP endpoint is frequently used.

This Bachelors thesis aims at developing a cache component, which will make dynamic creation of SOAP-JSON proxies more effective with respect to runtime latency. The resulting cache component will be evaluated from the performance point of view.

Complex Event Processing for OpenAjax Hub 2.0 Widgets

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Web Widgets provide a mechanism for exposing Web application components, which can be used to compose new Web applications. Currently the majority of widgets available over the Web are rather static (typical examples are clock widgets, weather widgets, stock ticker widgets etc) and do not facilitate interaction with other widgets due to the constraints originating mainly from Web browsers. OpenAjax Hub 2.0 is a framework for facilitating inter-widget communication such that widgets running in the same user agent (i.e. Web browser) listen to the events thrown by one another. The main disadvantage of OpenAjax Hub is that it assumes that event message structures, used to implement inter-widget communication, of independent widgets are known already at design-time. The aim of this project is to transfer the main concepts of complex event processing to OpenAjax Hub platform such that rules for run-time processing of events can be defined. These run-time event processing rules will be then the main means to implement Web application logic while keeping the implementations of independent widgets untouched. To enable this vision, in this Bachelors project you will design and implement a rule execution engine on top of OpenAjax Hub, which will support the description of application logic at high level of granularity.

A Crawler for RESTful, SOAP Services and Web Forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Web Forms and Reports for Validation of HPLC Analytical Procedures

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

High-performance liquid chromatography is a technique used in analytical chemistry to separate compounds out of a given mixture.

Whenever an HPLC analysis is performed on a given substance, the results provided by the HPLC equipment need to be validated in order to ensure that they are reliable with respect to the purpose of the analysis. To this end, the laboratory personnel needs to gather a significant amount of data and analyze it using various procedures.

In a previous Master's thesis, a system was developed that allows a developer to define Web forms and report generators for validation of HPLC procedures. In this thesis, one guideline (out of about 10 possible guidelines) was implemented from start to end, including all the required forms and reports.

The aim of this Bachelor's thesis is to implement additional guidelines, using the forms generator and the reports generator developed in the previous Master's project.

This project will be undertaken in collaboration with UT's institute of Chemistry. Supervision and access to documentation and domain experts will be facilitated by the Institute of Chemistry.

Web-Based Single-Player Project Simulation Game

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Software development is a dynamic and complex process as there are many interacting factors throughout the life-cycle that impact cost and schedule of the development project, and quality of the developed software product. In addition, software industry constantly faces increasing demands for quality, productivity, and time-to-market, thus making the management of software development projects one of the most difficult and challenging tasks in any software organization.

The potential of simulation models for the training of managers has long been recognized: flight-simulator-type environments (or microworlds) confront managers with realistic situations that they may encounter in practice, and allow them to develop experience without the risks incurred in the real world.

The objective of this project is to develop a simulation-based software project management game for two to four players, comprising the following elements:

  • Development of a process simulation model (based on existing work), integrated into a web-based application suitable for single-player gaming sessions
  • Didactic gaming scenarios
  • Proof of concept, i.e., at least one successful game played by students with lessons learnt recorded

Development and Application of a Process Simulator Family for Analysing Software Development Processes

Supervisor: Dietmar Pfahl (firstname dot lastname at ut dot ee)

Process simulation is a common practice in many engineering disciplines. The advantage of process simulation is that the capability of existing (and future) process designs can be evaluated without executing the actual process. However, in software development, process simulation s not a common practice to analyse (and improve) the processes according to which software is developed. There are many reasons for this situation. Two important reasons are the difficulty of calibrating process models with real-world data (due to the lack of such data) and the lack of stability of used processes. Nevertheless, process simulation could be a useful research tool to analyse the capability of process paradigms (waterfall, iterative, incremental, various types of agile and lean processes, etc.) under varying assumptions about the application context (e.g., type of product, available resources, quality goals, project size).

The aims of this thesis project are the following:

  • Develop a family of process simulators representing process paradigms commonly used in industrial software development projects.
  • Systematic evaluation of process paradigms in various contexts using the developed process simulators.
  • Based on the analysis of evaluation results, discussion of the advantages/disadvantages of process paradigms.

The project will consist of the following steps:

  • Selection of a process modeling tool (choices are to be determined; one possibility is to use a business process modeling tool which has simulation capability)
  • Selection of process paradigms to be analysed
  • Definition of contexts and application scenarios (to evaluate process paradigms)
  • Development of process simulators
  • Application of process simulators to evaluate process paradigms in various contexts

Note: This thesis topic can be worked on by several students. The task can be split with regards to the choice of the modeling tools and/or the choice of process paradigms. For students interested in a BSc thesis, this topic can be tailored to fit into the reduced time frame.

Data Analysis Toolkit for Solid State Physics

Sergey Omelkov (firstname.lastname ät ut.ee)

Modern experimental setups for solid state physics has approached the limits in data acquisition speeds, so that the amount of data obtained is growing faster than the scientists are able to analyze using "conventional" methods. In the case of well-established experimental methods, the problem is usually somehow solved by suppliers of equipment who develop a highly specialized expensive software to do batch data analysis for a particular problem. However, this is impossible for state-of-the-art unique experimental stations, which are the main workhorses for the high-end research.

The objective of this task will be to start an open-source project and develop a universal yet powerful tool for data analysis in solid state physics. The working proof-of-concept for such tool has been developed and tested by the Institute of Physics, this concept can be used as a starting point. The tool will be based on a math scripting engine to handle the calculations (currently the symbiosis of SAGE and numpy), and a document-oriented database for storing the raw experimental data and calculation results (currently MongoDB).

The tools to be developed are (in the order of importance):

  • A data type suitable to store the data and analysis results, which is serializeable to database.
  • A set of methods for data processing commonly used in spectroscopy for data analysis, using the power of underlying math scripting engine
  • A tool to add the experimental data to DB directly from experimental setup software (in a form of LabView VIs)
  • A graphical tool to browse the DB and quickly import the data to scripting engine
  • An interface to import the calculation results (mainly images) from DB into text processors (LaTex, LyX, MSWord), maybe also into conventional data analysis programs, like Origin.
  • A system should be a multiuser environment for data exchange and protection (by the means of database)

The main requirement to data analysis process is: the result of any calculation stored in DB should either bear the links to initial data and the calculation procedure, or be simply a script that produces the result.

This project requires that the student is willing to understand the way the physicists see the data acquisition and analysis process. Experience in Python would also be much needed.

Web tool for typography

Tiit Paabo (firstname.lastname@aara.ee)

Project Description