Processamento de Streams (2019/2020) - Departamento de Informática
Descrição

In recent years there has been a huge increase in the amount of information that is being continuously generated (e.g. by financial systems, telecom and utilities networks, sensor networks). Making use of this very rich information requires applications of increasing complexity operating on these streams of data, for which the existing store-and-process information management system are not appropriate. Instead one needs systems that, in (quasi) real time, are able to process data streams, relate them, detect patterns, and react to detected patterns. Current developments in building such systems have witnessed a significant success in diverse application areas, such as traffic analysis, intrusion and fraud detection (both in financial systems and utilities networks), algorithmic trading, etc. In this course we will study the fundamentals, languages and systems for building application that process streams of data. The course starts by focusing on general purpose distributed realtime stream processing systems, covering both the key system aspects and proposed programming models. It then tackles structured data models for dealing with streams, and the relation of these models with the model of relational databases (thus combining streaming data with stored data), covering languages and systems that extend those that exist for SQL. Finally, students will learn how to detect complex patterns in data streams, and experiment with existing system.

Objectivos

Learn the fundamentals, languages and systems for building application that process streams of data, ranging from general purpose distributed realtime stream processing systems to structured data models for dealing with streams.

Programa

Distributed Stream Processing Systems.
System models for stream processing: streams as sequences of mini-batches (e.g. Spark streaming); continuous processing (e.g. Storm). General-purpose programming models. The problem of cyclic computations.
System aspects: distribution, scalability and fault-tolerance.

Data Stream Management Systems (DSMS).
Structured Data Models for Streams. Algebraic operators on stream and relations (continuous queries, aggregates and blocking, time windows). Continuous query languages. Languages and systems that extend SQL and database management systems to deal with data streams.

Complex Event Processing.
Streams as sequences of events. Production rules, reactive rules, and event-driven computing. Event processing networks, agents and channels. Complex and derived events. Detection of event patterns. Event-processing languages and systems.

Bibliografia Principal

Opher Etzion and Peter Niblett. Event Processing in Action. Manning Publications, 2010.

Lukasz Golab and Tamer Özsu. Data Stream Management. Morgan and Claypool, 2010.

Several papers will be provide for further reading.

Esforço do Aluno
  Horas por crédito 28
  Horas p/ semana Semanas Horas
Total de Horas 0
ECTS 6.0