John M. Noble

Mathematical Statistics

Institute of Applied Mathematics

University of Warsaw

October 2023 - January 2024

## Multivariate Statistics

## Course Information

**Language:** English

**Type of course:** elective

## Place and Time

There will be 14 lectures and 14 tutorials. These take place on Mondays. Lecture: 08.30 - 10.00 (room 5060) and tutorial 10.15 - 11.45 (room 2044: computer lab). The dates are:

**October 2023** 2nd, 9th, 16th, 23rd

**November 2023** 6th, 13th, 20th, 27th

**December 2022** 4th, 11th, 18th

**January 2023** 8th, 15th, 22nd

**Note: NO CLASS ON MONDAY 30th OCTOBER: THIS DAY RUNS ACCORDING TO THE SCHEDULE FOR EVEN FRIDAYS**
## Description

The course ‘Multivariate Statistics’ is a Master's level course, giving some statistical theory, with application in R.

The topics covered are:## Introduction to R

You should learn some R programming throughout the course. A reasonable introduction may be found here.

## Assessment

Assessment is based on

## Lecture and Tutorial Notes

These will be placed here throughout the course.
## Data Files

Click here for the data directory.

*(Last updated: 15th January 2024 by John M. Noble)*

Mathematical Statistics

Institute of Applied Mathematics

University of Warsaw

October 2023 - January 2024

The topics covered are:

- Nonparametric Density Estimation (histograms, kernel methods, projection pursuit)
- Multiple regression: model assessment and selection, shrinkage methods (eg LASSO)
- Linear Dimensionality Reduction: Principal Component Analysis, Canonical Correlation Analysis, Projection Pursuit
- Linear Discriminant Analysis.
- Recursive Partitioning and Tree-based Methods
- Artificial Neural Networks
- Support Vector Machines
- Clustering techniques: hierarchical and non-hierarchical partitioning methods, self organising maps (SOM), clustering variables, clustering based on mixture models (the EM algorithm as a tool for clustering and semi-supervised learning).
- Multidimensional Scaling and Distance Geometry
- Committee Machines, Bagging and boosting, random forests
- Latent Variable Models for Blind Source Separation
- Nonlinear Dimensionality Reduction and Manifold Learning
- Correspondence Analysis
- The multivariate Gaussian distribution, parameter estimation, the Wishart distribution.

- two data analysis assignments
- a take home written exam
- Tutorial participation.

- 2023-10-02: 08.30 - 10.00 Lecture 1: Nonparametric Density Estimation
- 2023-10-02: 10.15 - 11.45 Tutorial 1: Nonparametric Density Estimation
- 2023-10-02: 10.15 - 11.45 Tutorial 1: R script
- 2023-10-09: 08.30 - 10.00 Lecture 2: Principal Component and Partial Least Squares Retression
- 2023-10-09: 10.15 - 11.45 Tutorial 2: PC Regression and PLS Regression
- 2023-10-09: 10.15 - 11.45 Tutorial 2: R script
- 2023-10-16: 08.30 - 10.00 Lecture 3: Penalized Regression Methods
- 2023-10-16: 10.15 - 11.45 Tutorial 3: Penalized Regression Methods
- 2023-10-16: 10.15 - 11.45 Tutorial 3: R script
- 2023-10-23: 08.30 - 10.00 Lecture 4: Principal Component and Factor Analysis
- 2023-10-23: 10.15 - 11.45 Tutorial 4: Principal Component Analysis
- 2023-10-23: 10.15 - 11.45 Tutorial 4: R script
- 2023-11-06: 08.30 - 10.00 Lecture 5: Canonical Correlation Analysis
- 2023-11-06: 10.15 - 11.45 Tutorial 5: Canonical Correlation
- 2023-11-06: 10.15 - 11.45 Tutorial 5: R script
- 2023-11-13: 08.30 - 10.00 Lecture 6: Linear Discriminant Function Analysis
- 2023-11-13: 10.15 - 11.45 Tutorial 6: Linear Discriminant Function Analysis
- 2023-11-13: 10.15 - 11.45 Tutorial 6: R script
- 2023-11-20: 08.30 - 10.00 Lecture 7: Recursive Partitioning and Tree Based Methods
- 2023-11-20: 10.15 - 11.45 Tutorial 7: Recursive Partitioning and Tree Based Methods
- 2023-11-20: 10.15 - 11.45 Tutorial 7: R script
- Assignment 1 (Due date: 11th December 2023 at 13:00)
- 2023-11-27: 08.30 - 10.00 Lecture 8: Support Vector Machines
- 2023-11-27: 10.15 - 11.45 Tutorial 8: Support Vector Machines
- 2023-11-27: 10.15 - 11.45 Tutorial 8: R script
- 2023-12-04: 08.30 - 10.00 Lecture 9: Clustering
- 2023-12-04: 10.15 - 11.45 Tutorial 9: Clustering
- 2023-12-04: 10.15 - 11.45 Tutorial 9: R script
- 2023-12-11: 08.30 - 10.00 Lecture 10: Multidimensional Scaling and Distance Geometry
- 2023-12-11: 10.15 - 11.45 Tutorial 10: Multidimensional Scaling and Distance Geometry
- 2023-12-11: 10.15 - 11.45 Tutorial 10: R script
- 2023-12-18: 08.30 - 10.00 Lecture 11: Bagging, Boosting and Random Forests
- 2023-12-18: 10.15 - 11.45 Tutorial 11: Bagging, Boosting and Random Forests
- 2023-12-18: 10.15 - 11.45 Tutorial 11: R script
- 2024-01-08: 08.30 - 10.00 Lecture 12: Generalised Linear Models I
- 2024-01-08: 10.15 - 11.45 Tutorial 12: Generalised Linear Models I
- 2024-01-08: 10.15 - 11.45 Tutorial 12: R script
- Data Assignment 2 (Due date: 5th February 2024 at 13:30)
- Multivariate Final Exam: Theoretical Exercises (Due date: 5th February 2024 at 13:30)
- 2024-01-15: 08.30 - 10.00 Lecture 13: Generalised Linear Models II (Count Data)
- 2024-01-15: 10.15 - 11.45 Tutorial 13: Generalised Linear Models II
- 2024-01-15: 10.15 - 11.45 Tutorial 13: R script
- 2024-01-22: 08.30 - 10.00 Lecture 14: Model Building Criteria
- 2024-01-22: 10.15 - 11.45 Tutorial 14: Model Building
- 2024-01-22: 10.15 - 11.45 Tutorial 14: R script