Abstract:
These course notes provide an applied introduction to multivariate data analysis methods and statistical
models using the R system for statistical computing. Currently, they are primarily aimed at students of
the “Statistics for Data Science” course of the MSc in Data Science of the University of Girona and it
serves as basis to more specialised courses taught later on. They build on previous materials developed
by the author while delivering training courses for scientists at Biomathematics and Statistics Scotland
(BioSS) and lecturing the Multivariate Data Analysis course at the University of Edinburgh. Basic statistical
knowledge and some experience working and managing data in the R environment is assumed. The course avoids mathematical/statistical theory as much as possible and concentrates on the underlying concepts, emphasising how to put them in practice using R as computing tool.They are divided into two blocks:Chapters 1-6: overview of some multivariate methods aimed at data dimension reduction, classification, identification of similarities, associations, and patters in data sets; with a focus on data exploration and graphical representation. Chapters 7-12: overview of some of the families of linear, non-linear, generalised linear and additive regression models commonly used in statistical modelling, including questions related to model validation, variable selection and dealing with high dimensions