Masters Projects 2010/2011

Greening the Wikipedia Infrastructure

Supervisor: Michele Mazzucco (michele.mazzucco ät ut.ee)

Do you know how many servers are used to run the Wikipedia infrastructure infrastructure? Do you know how much electric power these servers consume? Do you know how much of this electric power is really needed in order to run the infrastructure versus how much of it goes into warming up idle servers (and then cooling down those same servers)? Can you imagine how much your future employers or customers will love you when you tell them that you can help them to run their IT infrastructure with a fraction of the electric power they currently pay and with the same level of service?

In this project, you will to deploy the Wikipedia infrastructure on a cluster or on the cloud, and you will use real logs in order to test several power-aware resource allocation policies. These experiments will allow you to quantify how much savings can be obtained by using on-demand server allocation as opposed to a traditional static allocation policy, which is naturally prone to over-provisioning.

Sounds interesting? Drop me an e-mail (michele.mazzucco ät ut.ee) and let's have a good chat about it.

Check also the following article to see what lies behind the curtain.

Probabilistic Performance Testing of Web Applications

Supervisor: Michele Mazzucco (michele.mazzucco ät ut.ee)

In this project you will investigate the problem of probabilistic assurance of performance requirements: the idea is to replace continuous performance testing with a magic box capable of answering questions like 'How long will the page take to load if the service time of component A increases from X ms to Y ms?'

In particular, by using a Web application testing framework such as Selenium, we want to build a tool that (i) automatically extracts some information (i.e., how long it takes to generate component A, how long it takes to fetch some data from the database, etc.), and that (ii) by means of queueing networks or other performance oriented methodology (e.g., stochastic Petri nets) computes an expected average or the Xth-percentile of response times. This tool will allow performance engineers to answer practical and highly relevant questions about the expected performance of systems under a given system configuration.

Mining Artifact-Centric Process Models

Supervisor: Viara Popova (viara.popova ät ut.ee)

Process Mining is a new, fast-developing field of research which aims at extracting useful business information from event logs. A variety of Data Mining algorithms and approaches are used to gain insight in the working of the company and its business processes which can help in evaluating and improving the company's performance.

The ACSI project concentrates on artifact-centric modelling approaches, which are focused on artifacts representing business data, products and concepts driving the company's operations. You will be working on extracting artifact-centric models and their interactions from event logs using new and existing process mining techniques. The developed methods will be integrated into the open-source codebase of the ProM process mining framework.

Complex Event Processing for OpenAjax Hub 2.0 Widgets

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Web Widgets are about to become a new paradigm for exposing Web application components, which can be used to compose new Web applications. Currently majority of widgets available over the Web are rather static (typical examples are clock widgets, weather widgets, stock ticker widgets etc) and do not facilitate interaction with them due to the constraints originating mainly from Web browsers. OpenAjax Hub 2.0, however, is a framework for facilitating inter-widget communication such that widgets running in the same user agent (i.e. Web browser) listen to the events thrown by each-other. The main disadvantage of OpenAjax Hub is that it assumes that event message structures, used to implement inter-widget communication, of independent widgets are known already at design-time. The aim of this project is to transfer the main concepts of complex event processing to OpenAjax Hub platform such that rules for run-time processing of events can be defined. These run-time event processing rules will be then the main means to implement Web application logic while keeping the implementations of independent widgets untouched. The main challenge of this project is to define event processing rules, which will enable description of application logic at high level of granularity and to implement the rule execution engine.

This project could be suitable for a Bachelors thesis.

Online Data Aggregation and Visualization Platform for Linked Data.

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Spreadsheet applications allow users to visualize their data by using a set of pre-defined visualization schemes such as charts, graphs and histograms. However, the main assumption is that data there is presented in tabular format, which is not the typical case when data is retrieved from Web services. The Many Eyes demonstrator allows users to upload their data and then explore it through different visualization schemes. This project aims at designing and implementing a similar Web-based environment, which will first allow retrieval of data through arbitrary Web services and then its visualization through a predefined set of visualization components. The main challenge of this project is to design a set of methods for online aggregation of data from multiple sources.

This project could be suitable for a Bachelors thesis.

Google Maps API Extension for public Web Services.

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Recent studies have shown that about 96% of Web mashups use Google Maps API for data visualization. Map components have proved to be easily interpreted by decision makers and mainstream Internet users and thus are one of the most effective ways to visualize data. However, using Google Maps API for visualizing data from custom sources still requires some programming. The latter makes its usage on arbitrary data sources more complex than accepted by end-users. In this project a student is expected to design and implement a Web services proxy, which will automatically retrieve geographically annotated data from virtually any SOAP Web service, normalize it and visualize it with Google Maps. The resulting system will be exposed as a Web application for demonstration purposes. The novelty of the project lies in the methods, which allow automated detection of data attributes potentially including geographic data from WSDL interfaces of Web services.

This project could be suitable for a Bachelors thesis.

Proof-of-Concept Semantic Interoperability Strategy for Estonian Public Sector Institutions

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

Semantic interoperability is the ability of computer systems to interpret exchanged information by the receiving system in the same sense as intended by the transmitting system. Although sounding like science fiction there are some practical steps which will help to achieve this aim. Need for semantic interoperability has been recognized by several individual EU member contries and a common generic framework has been specified at EU level as well. Given the preceding a set of technology frameworks and technical guidelines have been constructed for the Estonian public sector institutions who are required to follow them. However, in practice these frameworks and guidelines have been proven to be difficult to apply because of the organizational issues. More specifically, there is currently no "golden strategy", which an organization could adopt in order to apply the frameworks and guidelines in a sustainable manner. The aim of this project is to develop a proof-of-concept strategy, which public sector organizations can adopt with minor modifications and apply in practical settings. The strategy will be evaluated on a specific public sector organization selected from a pool of potential candidates.

Detection of data redundancy in large-scale federated information systems

Supervisor: Peep Küngas (peep.kungas ät ut.ee)

In this project the problem of detecting data redundancy in large service-oriented information systems is studied by investigating interface descriptions in WSDL. By redundant data, we mean data maintained at different information (sub)systems independently. Such data redundancy may lead to inconsistency and may hinder the reuse of services. We have defined previously two simple but effective heuristics that help to identify and pinpoint potential data redundancy. The quality of the heuristics has evaluated in terms of precision and recall. However, these initial results should be refined further.

The following 3 variations of the project are available:

Collect expert opinions on potentially redundant data in selected information systems from field experts (system architects and analysts) and compare these results to the results of automated classifiers. The main emphasis of this variant is to interview field experts in order to gather their opinion. After collecting expert opinions, existing redundancy detection implementation will be used to compare precision and recall of a previously proposed classifier against expert opinions.
Identify data elements with the same meaning in a set of information system (IS) interfaces. The main emphasis of this variant is on manual analysis of given IS interfaces. After identifying elements with the same meaning, existing redundancy detection implementation is used to recompute precision and recall of a previously proposed classifier in more realistic settings.
Enhance existing heuristics and algorithms of redundancy detection. The main emphasis in this project is elaboration of concepts and algorithms from network theory and other research areas to propose algorithms and heuristics, which would give better precision and recall in redundancy detection. Within the project the proposed algorithms and heuristics are implemented and experiments are performed with given data.

In all three variants, the project will be based on a collection of Web service interfaces covering both Estonian and global Web services.

This project could be suitable for a Bachelors thesis.

Lightning-Fast Business Process Simulation

Supervisor: Luciano García-Bañuelos (luciano.garcia ät ut.ee)

Business process simulation is a widely used technique for analyzing business process models with respect to performance metrics such as cycle time, cost and resource utilization. Many commercial business process modeling tools incorporate a simulation component, e.g. TIBCO Business Studio, IBM Websphere Business Modeler (WBM), ARIS, FileNet and Protos. However, these process simulators are often dead slow and sometimes can't even deal with large simulations.

In this project, you will design and implement a scalable and high-performance business process simulation engine. The project is offered in two variants (possibly to two different students):

In the first variant, the simulator will be designed and implemented from scratch.
In the second variant, you will reuse the simulation module of the CPN Tools framework. The simulation engine will take as input BPMN process models and will produce Coloured Petri nets that will be automatically deployed into CPN Tools for simulation.

Regardless of the variant chosen, you will conduct performance experiments to quantify the performance and scalability benefits of this approach, compared to traditional unoptimized business process simulation engines.

Integrating Security Modelling and Business Process Modelling

Supervisor: Raimundas Matulevicius (raimundas.matulevicius [at] ut.ee)

Nowadays, there are several established Business Process Modelling Languages (BPMLs) commonly used in industry (e.g. BPMN, EPC, YAWL and activity diagrams). Usually to describe a business process comprehensively, many forms of information must be integrated into a business process model. Industry BPMLs differ in the extent to which their constructs represent the information that answers what is going to be done, who is going to do it, when and where it will be done, how and why will it be done, and who is dependent on the information. These differences result from the various source domains (e.g. process or software engineering etc.), and there is a need to secure entities and activities related to the above mentioned questions by implementing secure constructs. Work has not been done to align the business processes with Security Risk Management Model (SRM). SRM can be addressed using different modelling techniques at different enterprise levels: asset level, risk level, and risk treatment level.

It has been proved that the application of these techniques separately, contributes to the better security solutions. However business processes development includes multiple perspectives and viewpoints, thus combined application of these techniques could much improve the understanding of different stakeholders needs with respect to the security risks. It would also contribute to the quality of system security developed through different development stages (e.g., from requirements to design).

The purpose of this Masters thesis is to develop a set of rules and guidelines in order to measure the suitability of existing BPMLs for capturing security concerns. You will start by reviewing the existing state of the art (in BPMLs and SRM domain models). You will then develop and validate the guidelines on how to do alignment between selected business process models using different modelling approaches. The proposal will need to be validated against the risk management model of a specific given project (case study).

Quality Assessment of Security Requirements Languages

Supervisor: Raimundas Matulevicius (raimundas.matulevicius [at] ut.ee)

Security is an important aspect of systems engineering. However the current literature reports, that security concerns typically appear only once a system is already in use, or, in the best case, security is considered only during the late system development stages (e.g., late design and implementation). However, in order not to miss the important concerns, security modelling has to be started already during requirements engineering and continued during system design. When engineering security, it is also very critical to understand sources of system misbehaviors, and how we could mitigate them: this leads to the security risk discovery and mitigation.

At the early development stages there exists a number of security modelling approaches, such as abuse frames, abuse and misuse cases, SecureUML, UMLsec, Mal-activity diagrams, Secure i*, Secure Tropos, KAOS extensions to security, and others. However these languages help little with respect to security risk management.

The purpose of this thesis is to adapt the security modelling languages to the realm of security risk management. The driver of the study will be a security risk management (SRM) domain model. You will first select the targeted security modelling language (state-of-the-art) and investigate its suitability and applicability in the SRM domain (contribution). You will then propose language improvements and validate them: (i) theoretically by comparing the proposal with similar ones, and/or (ii) empirically by conducting case studies or experiments.

Model transformation between Role-based Access Control Perspectives

Supervisor: Raimundas Matulevicius (raimundas.matulevicius [at] ut.ee)

Role-based access control (RBAC) is a security mechanism to ensure that the secured data would be accessed only by the people who have a permission to access it. RBAC models can be developed using different modeling approaches, such as SecureUML, UMLsec, and others. However it was also observed that these approaches address only one particular modeling viewpoint. The purpose of this Masters thesis work will be to investigate if it is possible to combine different modeling approaches for RBAC.

In this project, you will conduct a review of the state of the art for RBAC modeling. Based on the conclusions of this analysis, you will develop and validate an approach to facilitate use of different security languages for RBAC addressed through various perspectives.

Topics on Cloud Computing

Bachelor's and Master's thesis topics on cloud computing