Statistical learning methods and applications to modern problems in science, industry, and society. Topics include: linear model selection, cross-validation, lasso and ridge regression, tree-based methods, bagging and boosting, support vector machines, and unsupervised learning. Prerequisites: STAT 3210 or equivalent. Cross-listed with: STAT 3880.


Prereqs enforced by the system: STAT 3210; Cross listed with STAT 3880 A; Total combined enrollment: 30 Open to Degree and PACE students

This will be a primarily flipped class, where you will be responsible for reading & viewing the text & pre-class videos before working on the material in class. This pre-class work will be assessed each day. The class will consist of a mixture of lecture, discussion, and projects to highlight particular topics. It would be impossible, however, for these to encompass all of the material for the course. There will be material in the text for which you will be responsible that we will not cover explicitly in class. I expect that you will read the material in the text before we discuss it in class. Key topics are: *Multiple linear regression *Classification - logistic regression - linear & quadratic discriminant analysis - naive bayes - k-nearest neighbors *Resampling methods - leave-one-out cross-validation - k-fold cross-validation *Decision trees - classification & regression trees - bagging, boosting, random forests *Principal components *Model selection & regularization - subset selection - shrinkage methods (ridge regression & lasso)

Students should expect three or more hours of work outside of class for every hour of class time. Reading the text and viewing pre-class videos for each class will be an ongoing homework assignment throughout the course. Homework assignments will be listed on the class webpage. Late assignments will not be accepted.


Final grades will be determined by exams/quizzes, homework, and participation in class work and discussions

