Stream Processing (2021/2022) - Departamento de Informática
Description

This course teaches the fundamentals, languages and systems for building application that process streams of data, ranging from general purpose distributed realtime stream processing systems to structured data models for dealing with streams.

Objectives

Learn the fundamental concepts, languages, and systems for building applications that process data streams. This course discusses, presents and discusses generalist systems for real-time stream processing, and will focus on the study of systems for structured data flow-oriented models.

To knowledge

A. Know the main programming models for streaming data processing

B. Know the languages and assimilate the fundamental characteristics to solve problems in the stream processing domain.

C. Understand the advantages and disadvantages of stream processing platforms.

Application

A. Being able to choose the most appropriate models, languages and tools to solve a stream processing problem.

B. Set capable of developing and executing stream processing applications using current tools and
technologies.

Syllabus

Distributed Stream Processing Systems.

System models for stream processing: streams as sequences of mini-batches (e.g. Spark streaming); continuous processing (e.g. Apache Flink, Storm).
Programming models. System aspects: distribution, scalability and fault-tolerance.
Distributed time-series databases. Systems for IoT stream processing.

Data Stream Management Systems (DSMS).

Structured Data Models for Streams. Algebraic operators on stream and relations.
Continuous query languages (extensions to SQL and database management systems to deal with data streams).

Machine Learning for Streams.

Introduction to learning from data Dimensionality reduction for streams.
Learning under concept drift. Incremental learning.
Learning under imbalance and learning from graphs.

Bibliography

Opher Etzion and Peter Niblett. Event Processing in Action. Manning Publications, 2010.

Lukasz Golab and Tamer Özsu. Data Stream Management. Morgan and Claypool, 2010.

Bifet et al., (2018) Machine Learning for Data Streams, MIT Press

Several papers will be provided for further reading.

Prerequisites

Adequate programming skills in the Python programming language.

Student work
  Hours per credit 28
  Hours per week Weeks Hours
Aulas práticas e laboratoriais   24.0
Aulas teóricas   26.0
Avaliação   6.0
Self study   40.0
Project   66.0
Total hours 162
ECTS 6.0