Problems and algorithms for sequence analysis

Event details
Date | 23.04.2009 |
Hour | 16:15 |
Speaker | Dr Evimaria Terzi |
Location | |
Category | Conferences - Seminars |
Sequential data appear in a wide range of diverse applications like telecommunications, stock-market analysis, bioinformatics, text processing, click-stream mining and many more. In this talk, I will show how such data motivate new problems and pose novel algorithmic challenges. I will focus on some of these challenges and address them through the lens of combinatorial optimization.
The central theme of my talk is sequence segmentation, i.e., the discovery of ``homogeneous" segments from input sequences. In the first part of the talk, I will introduce the notion of segmental groupings and use it to provide comprehensive summaries for large event sequences. Using the Minimum Description Length principle, I will define the optimization problem of finding the best segmental grouping and then show that it can be solved optimally in polynomial time using a nested dynamic-programming algorithm. For the same problem, I will also present faster heuristic algorithms that perform very well in practice. In the second part of the talk, I will present a simple and efficient approximation algorithm for the k-segmentation problem on time-series data.
Finally, I will conclude by showing how the results of data-analysis algorithms for sequential data can be used as input to new "meta-analysis" algorithms.
Short bio: Evimaria Terzi has been a research scientist at IBM Almaden Research Center since June 2007. She obtained her PhD from the University of Helsinki (Finland) in January 2007, under the supervision of prof. Heikki Mannila. Before that, she got her MSc from Purdue University (USA) and her BSc from Aristotle University of Thessaloniki (Greece). Her research focuses on algorithmic aspects of data mining with applications in the analysis of sequential and graph data, ranking and clustering.
Evimaria Terzi's home page
Practical information
- General public
- Free