Data Analytics and Mining (11563)

This course addresses principles, methods and practical recommendations for extracting interesting and meaningful patterns from structured and unstructured data (numeric and textual data), from a perspective at the interface between Computer Science and Statistics. The course covers fundamental topics and computational methods for the growing field of Data Analytics and Mining.

The course is organized in two modules:

(i) Module I is about data pre-processing, dimensionality reduction and data-driven clustering, to induce models from data and their interpretation aids.

(ii) Module II is about Relevant Information Extraction, symbolic and statistical analysis of texts, document descriptors, document classification and distribution of words and multi-words in Big Data context.

Knowledge:

Understand the paradigms and challenges of Data Analytics and Text Mining
Learn the fundamental methods and their applications in the extraction of patterns from data. Understand data features, the selection of models and interpretation of model’s results.
Understand the advantages and disadvantages of the different methods.

Skills

Implement and adapt Data Analytics and Text Mining algorithms;
Model real data experimentally.
Assessment and interpretation of experimental results.

Competences

Ability to choose and evaluate the suitability of methods to case studies
Abstraction and generalisation skills
Critical analysis skills
Search of scientific literature
Autonomy and self-reliance in the application and furthering studies in Data Analytics and Text Mining.

Introduction

Data Analytics

What is data? Examples of data analytic tasks and various perspectives of them

Text Mining

Structured or unstructured data? Why mining texts?

What types of problems can be solved?

Module I

Data Understanding

1D Summarization and Visualization of a Single Feature
2D Analysis: Correlation and Visualization of Two Quantitative Features
Verification of structure in data
Why normalization matters

Descriptive Modeling I

Principal Component Analysis(PCA): Model and Method

Summarization versus Correlation
Matrix spectrum and Singular Value Decomposition (SVD)
PCA as SVD. Conventional PCA’s.

PCA: Applications

Descriptive Modeling II

K‐means, Anomalous clusters, IntelligentK‐Means
Spectral clustering
Fuzzy clustering

Interpreting Descriptive Models

Conventional Cluster Model Interpretation
Assessing Cluster Tendency
Least squares principle induced interpretation aids

Data Analytics Case Studies

Module II‐Text Mining

Relevant Information Extraction

Relevant Expressions: Multi‐words and single‐words
Statistical vs symbolic extractors. Algorithms and metrics
Language‐independence

Symbolic and Statistical Analysis of texts

Tokenization, Stemming and Part‐Of‐Speech Tagging
Word and Multi-word distribution in Big Data context. Zipf Law
Metrics for word association and retrieval
Document correlation
Word Sense Disambiguation

Document Descriptors

Language‐independent Mining of Explicit and Implicit Keywords from documents.
Semantic Scope of Documents
Document Summarization

Document Classification

Relevant Expressions as features for document characterization. Feature selection and reduction.
Document Similarity
Supervised vs unsupervised Document Clustering.
Prediction and evaluation

Text Mining Case Studies(some examples)

Extraction of Named Entities
Email filtering
Language detection
Efficient Extraction of Multiwords
Polarity Detection

Zaki, M., and Meira Jr, W., (2020), Data Mining and Machine Learning: Fundamental Concepts and Algorithms, Cambridge University Press (2nd Edition)
Larose, D. T. , Larose C. D. (2015), Data Mining and Predictive Analytics, Wiley (2nd Edition)
Mirkin, B. (2019) Core Data Analysis: Summarization, Correlation, and Visualization, Springer
Nascimento, S. (2005). Fuzzy Clustering via Proportional Membership Model, Frontiers of Artificial Intelligence and Applications, v 119, IOS Press
Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F. (2005), Text Mining: Predictive Methods for Analyzing

	Hours per credit		28
	Hours per week	Weeks	Hours
Aulas prÃ¡ticas e laboratoriais			24.0
Aulas teÃ³ricas			24.0
AvaliaÃ§Ã£o			6.0
Self study			54.0
OrientaÃ§Ã£o tutorial			6.0
Project			54.0
Total hours			168
ECTS			6.0