Prereqs: STAT 5870, CS 5870, or CSYS 5870; Open to Degree and PACE students; Cross listed with STAT 6870 A and CS 6870 A; Total combined enrollment: 40 Special Topics courses cannot carry CC designations.

Extracting meaning from data remains one of the biggest tasks of science. The Internet and modern computers have given us vast amounts of data, so it is more important than ever to understand how to collect, process, and analyze these data while maintaining reproducibility with data provenance or "chain of custody" of the data. In this course students will learn: 1. scientific computing pipelines, software testing, “defensive” data analysis, and revision control, 2. practical implementations of advanced statistical analyses, 3. how to deal with large-scale datasets, remote computing, and "big data"-ready pipelines, 4. ethical and privacy implications of collecting and analyzing big data 5. to explore the literature of cutting-edge data analytics 6. to communicate data-driven results. As with Data Science I, particular emphasis will be placed on nontraditional (non-numeric) data such as networks, text corpora, etc. and on developing good habits for rigorous and reproducible computational science.

The best way to learn is by doing. Lectures will be used for guidance, but students will directly develop their own computer programs and workflows. Students should expect an average of 6-8 hours of work outside of class per week, depending on skill level and experience entering the course. No textbook is required. Course Prerequisites: STAT/CS 287 Data Science I.


Grades will be based on homework assignments, readings and in-class discussions, and a final research project and presentation.

