Home

Dottorato in Informatica

PhD course "Computational methods for handling textual data"

Crediti
5.0
Anno
2016
Periodo
29 Feb - 23 March 2016
Docenti
Ferdinando Cicalese , Giuditta Franco , Zsuzsanna Liptak

Cicli in cui è offerta

Descrizione

 

In this course, we introduce basic computational methods for handling textual data. Textual data (strings, sequences) is ubiquitous in today's world: WWW

(webpages) and biological data (genomic and other biological sequences) are only two examples of large amounts of textual data which need to be handled on a daily basis. This type of data is being produced at an ever increasing rate, and one of the major computational challenges now is to develop data structures which allow both storing the data efficiently, and at the same time extracting information from it (e.g. search, pattern matching).

In this course, directed at a general audience of computer science PhD students, we give an introduction to this topic. The course starts with an introduction to basic information theoretic measures, compression, dictionary based compression. In the second part, we introduce three data structures for strings which have been milestones in the area of string storage. These are suffix trees, suffix arrays, and the Burrows-Wheeler transform (BWT). All of these datastructures have been or are being used in mainstream software for biological and other data. In this part, we will get an insight into of some of the majorchallenges in this research area. Finally, in the third part of the course, we give an introduction to an application of these concepts to genomic sequences.

Allegati

Documenti




© 2002 - 2017  Università degli studi di Verona
Via dell'Artigliere 8, 37129 Verona  |  Partita IVA 01541040232  |  Codice Fiscale 93009870234
Credits