Student Projects 2011/2012

Below is a list of project topics for Masters and Bachelors theses offered by the software engineering research group in 2011-2012. The projects are divided into:

If you're interested in any of these projects, please contact the corresponding supervisor.

Note that the number of projects for Bachelors students is limited. For Bachelors students we're open to student-proposed projects. So if you have an idea for your Bachelors projects and your idea falls in the area of software engineering (broadly defined), please contact the group leader: Marlon . Dumas ät ut.ee


Masters projects

Tools for software project data collection and integration

Supervisor: Siim Karus (siim04 ät ut.ee)

Data generated in software projects is usually distributed across different systems (e.g. CVS, SVN, Git, Trac, Bugzilla, Hudson, Wiki, Twitter). These systems have different purposes and use different data models, formats and semantics. In order to analyze software projects, one needs to collect and integrate data from multiple systems. This is a time-consuming task. In this project, you will design a unified data model for representing data about software development projects extracted for the purpose of analysis. You will also develop a set of adapters for extracting data from some of the above systems and storing it into a database structured according to the unified model.

Application of business intelligence tools and techniques in software development

Supervisor: Siim Karus (siim04 ät ut.ee)

Business intelligence (BI) tools are used by companies to analyse business operations, financial and market data, and to generate reports in order to support decision making. BI tools have been successfully applied in many applications ranging from retail to logistics and customer relations.

In this project, you will apply BI tools and techniques to data extracted from software source code repositories, in order to devise new ways to explore the source code at different levels of detail and from different perspectives. Applications of this work could be to identify areas in the source code where bugs might be concentrated, or areas of source code that are likely to cause maintenance problems in future releases.

Two-staged crowdsourcing tool for linking data and evaluating linked data quality

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Linked data and crowdsourcing have emerged in recent years as mechanisms to enable effective and economical large-scale data collection, filtering, aggregation, and presentation on the Web. Recent initiatives such as Civil War Data 150 have had some success at combining linked data methods with crowdsourcing. However, this and other initiatives suffer from a major drawback, namely a lack of quality assurance (QA) during the crowd-sourcing process. At the same time QA has been successfully handled in reCAPTCHA, which is used to crowdsource transcription of text from digitized books in the Internet Archive.

The goal of this Master thesis work will be to develop a tool for crowdsourcing that will suppport the creation and quality-checking of links between data objects originating from a variety of sources. The main contribution will be an implementation of a two-staged method, allowing first the creation of links between datasets at the metadata level, and second, the evaluation of the created links by means of simple closed-ended questions. The question-answering process will be facilitated through a specific widget, which can be easily placed in any Web site, similarly to reCAPTCHA. As a starting point the following two datasets will be used for experimentation: The U.S. SEC data and DBpedia.

A crawler for RESTful, SOAP services and Web forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Automated generation of microsites from collections of widgets

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The aim of this thesis is to design and to implement a tool to automatically generate visually-appealing microsites in HTML5/JavaScript from a given set of visual and non-visual OpenAjax Hub widgets. While non-visual widgets in the context of this thesis are for data provisioning, the visual ones are for data visualization. The OpenAjax Hub widgets, targeted in this project, are assumed to have their messages described with JSON Schema and mapped into an unified domain model which implicitly provides schema mappings. These schema mappings will be used to dynamically bind components together by using a simple matching scheme.

The main contribution of this master thesis will be to first analyze layout schemes of major microsites or mashups and then to propose, implement and validate an algorithm for automating the layout construction. The validation will be carried out in form of usability study. Throughout the usability evaluation the widgets of a deep Web search engine will be used as non-visual widgets. The thesis could build up on a previous Masters thesis in which a tool for producing Web frontends from XML schemas, namely DynaForm, was developed.

Data and service virtualization layer for exposing Web forms as SOAP services

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Data virtualization is the process of abstracting, transforming, federating and delivering data contained within a variety of information sources, so that these data may be accessed by an application or end-users on-demand, without regard to the physical storage or heterogeneous structure of the data.

The aim of this thesis is to set-up, configure and extend an existing data virtualization system to expose Web forms as SOAP-based Web services with appropriate WSDL interface descriptions. The results of this thesis will be used within an existing Deep Web search engine for surfacing resources behind Web forms and RESTful services. The proposed solution will be evaluated mainly in terms of performance.

An alternative version of this thesis is available for exposing RESTful services as SOAP services.

Impact of employee migration on Web service portfolios

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Movement of key employees from one organization to another impacts the way organizations operate and the services that they provide. The starting hypothesis of this thesis is that this impact can be estimated with a suitable model. Since there does not exist such a model at the time being, this thesis aims to shed some light to the topic by constructing an initial model out of publicly available employee migration data and a Web services network.

In this project, a dataset representing a service network containing approximately 30k Web services will be used for experimentation, together with data crawled from social networks (e.g. LinkedIn). The evolution of the services network and the employee network will be analyzed with respect to two aspect of employee migration: 1) how does the composition of the service portfolio of a service provider affects employee migration; and 2) how do individuals moving from organization to another affect the evolution of service portfolios in organizations. The result of this will be a model for estimating the impact of employee migration on a service portfolio.

Improving Business Process and Security Modelling Languages for Security Risk Management

Supervisor: Raimundas Matulevicius (raimundas.matulevicius [at] ut.ee)

A number of business process modelling (e.g., BPMN, EPC) and security modelling languages (e.g, Secure Tropos, Misuse cases, KAOS extensions to security, Mal-activity diagrams, and SecureUML) have been analysed with respect to the domain model of information systems (IS) security risk management. The major observations are that the languages do not contain the full extent to manage security risk. The main purpose of this thesis will be to improve a selected modelling language to manage security risk at the full extend during IS development.

In this project, you will start by gaining an understanding of different (security) modelling approaches and risk management methods. You will then select a specific language and you will engineer improvements to it, both at the level of concrete and abstract syntax and at the level of semantics. The suggested improvements should be tested in a proof of concept prototype or using empirical/industrial cases.

Transformation Between Security Modelling Languages

Supervisor: Raimundas Matulevicius (raimundas.matulevicius [at] ut.ee)

A number of security modelling languages (e.g, Secure Tropos, Misuse cases, KAOS extensions to security, Mal-activity diagrams, and SecureUML) have been analysed with respect to the domain model of information systems security risk management (ISSRM). Hence the ISSRM model can be used as the grounding ontology for developing transformations between security models created in one language to models in another languages. Such transformations hopefully should help developers to understand IS security from different modelling perspectives and to understand system security needs at different levels of abstraction.

In this project, you will investigate approaches for language transformation in the security modeling domain. Then you will scope and select a few specific modelling approaches, and develop a set of transformation guidelines between the selected security modelling approaches for the purpose of security risk management. The transformation guidelines will be validated by means of a proof of concept prototype or using empirical/industrial cases.

Model transformation between Role-based Access Control Perspectives

Supervisor: Raimundas Matulevicius (raimundas.matulevicius [at] ut.ee)

Role-based access control (RBAC) is a security mechanism to ensure that the secured data would be accessed only by the people who have a permission to access it. RBAC models can be developed using different modeling approaches, such as SecureUML, UMLsec, and others. However it was also observed that these approaches address only one particular modeling viewpoint. The purpose of this Masters thesis work will be to investigate if it is possible to combine different modeling approaches for RBAC.

In this project, you will conduct a review of the state of the art for RBAC modeling. Based on the conclusions of this analysis, you will develop and validate an approach to facilitate use of different security languages for RBAC addressed through various perspectives.

Greening the Wikipedia Infrastructure

Supervisor: Michele Mazzucco (michele.mazzucco ät ut.ee)

Do you know how many servers are used to run the Wikipedia infrastructure infrastructure? Do you know how much electric power these servers consume? Do you know how much of this electric power is really needed in order to run the infrastructure versus how much of it goes into warming up idle servers (and then cooling down those same servers)? Can you imagine how much your future employers or customers will love you when you tell them that you can help them to run their IT infrastructure with a fraction of the electric power they currently pay and with the same level of service?

In this project, you will to deploy the Wikipedia infrastructure on a cluster or on the cloud, and you will use real logs in order to test several power-aware resource allocation policies. These experiments will allow you to quantify how much savings can be obtained by using on-demand server allocation as opposed to a traditional static allocation policy, which is naturally prone to over-provisioning.

Sounds interesting? Drop me an e-mail (michele.mazzucco ät ut.ee) and let's have a good chat about it.

Check also the following article to see what lies behind the curtain.

Online Quantitative Analysis Tool for Business Process Models

Supervisor: Luciano García-Bañuelos (luciano.garcia ät ut.ee)

Apromore is an open-source Web-based tool suite for storing, versioning, organizing and analyzing business process models. In its current version, Apromore supports search queries on collections of process models and other operations such as merging two models into a single one or navigating through multiple versions of a model.

In this project you will extend Apromore with features for analyzing process models in terms of quantitative parameters such as mean execution time, cost and reliability. The tool would allow a user to pick a specific process model, annotate it with quantitative data, and analyze the overall properties of the annotated process model.

We have developed various components for quantitative analysis of process models. In this project, you will adapt, extend and integrate these components and seamlessly fitting them into Apromore's architecture.


Bachelors projects

Lightning-fast multi-level SOAP-JSON caching proxy (Bachelors topic)

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

In a previous Master thesis a solution was developed for proxying SOAP requests/responses to JavaScript widgets exchanging messages with JSON payload. Although this approach was shown to be useful for surfacing Deep Web data, it suffers from some performance bottlenecks, which arise when a SOAP endpoint is frequently used.

This Bachelors thesis aims at developing a cache component, which will make dynamic creation of SOAP-JSON proxies more effective with respect to runtime latency. The resulting cache component will be evaluated from the performance point of view.

A crawler for RESTful, SOAP services and Web forms

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

The Deep Web, consisting of online databases hidden behind SOAP-based or REST-ful Web services or Web forms, is estimated to contain about 500 times more data than the (visible) Web. Despite many advances in search technology, the full potential of the Deep Web has been left largely underexploited. This is partially due to the lack of effective solutions for surfacing and visualizing the data. The Deep Web research initiative at University of Tartu's Institute of Computer Science has developed an experimental platform to surface and visualize Deep Web data sources hidden behind SOAP Web service endpoints. However, currently this experimental platform only supports a limited set of SOAP endpoints, updated on ad hoc basis.

The aim of this project is to build a crawler and an indexing engine capable of recognizing endpoints behind Web forms, RESTful services and SOAP-based services, together with their explicit descriptions (e.g. WSDL interface descriptions, when available). Furthermore, the crawler should identify examples of queries that can be forwarded to those endpoints, especially for endpoints with no explicit interface descriptions such as Web forms.

This project is available both for Master and for Bachelor students. The goal of the Masters project would be to build a crawler supporting endpoints with and without explicit interfaces. The goal of the Bachelor thesis will be to crawl WSDL interfaces only.

Tools for software project data collection and integration

Supervisor: Siim Karus (siim04 ät ut.ee)

Data generated in software projects is usually distributed across different systems (e.g. CVS, SVN, Git, Trac, Bugzilla, Hudson, Wiki, Twitter). These systems have different purposes and use different data models, formats and semantics. In order to analyze software projects, one needs to collect and integrate data from multiple systems. This is a time-consuming task. In this project, you will design a unified data model for representing data about software development projects extracted for the purpose of analysis. You will also develop a set of adapters for extracting data from some of the above systems and storing it into a database structured according to the unified model. For a Bachelors project you will be asked to focus only on a small set of systems (e.g. SVN+Trac or Git+Bugzilla)