Basic Data Analysis-E2
Numbering Code | U-LAS11 10009 LE55 | Year/Term | 2022 ・ First semester | |
---|---|---|---|---|
Number of Credits | 2 | Course Type | Lecture | |
Target Year | All students | Target Student | For all majors | |
Language | English | Day/Period | Tue.2 | |
Instructor name | VANDENBON, Alexis (Institute for Frontier Life and Medical Sciences Senior Lecturer) | |||
Outline and Purpose of the Course | Nowadays, research in many fields of science is increasingly dependent on large amounts of data. The key problem is how to turn this data into new knowledge. This course covers a wide variety of data analysis and machine learning approaches. The course starts with an introduction of the basic concepts in machine learning. After that, we will introduce regression and classification methods, including linear models, tree-based methods, support vector machines, and principal component analysis. Practical applications will be demonstrated using the statistical programming language R. | |||
Course Goals | Students will learn about basic concepts in data analysis and statistical learning, such as regression and classification problems, and supervised and unsupervised machine learning. Students will become familiar with strengths and weaknesses of several approaches, and learn how to apply them on real datasets. | |||
Schedule and Contents |
The course will be offered according to the plan below. If face-to-face lectures are not possible because of the pandemic, the course will be online (“on demand”) and I will add new course material (videos and slides) on PandA. I will also hold a weekly Zoom meeting to take questions. Lectures 1 and 2. Introduction to data analysis and machine learning: We will discuss data analysis in the context of scientific investigation. Using several examples, the concepts of supervised and unsupervised learning, regression and classification problems, and assessment of model accuracy will be introduced. Lectures 3 and 4. Linear regression: Introduction to linear regression as a simple supervised learning approach. We will cover simple and multiple linear regression, discuss how to interpret models, and compare linear regression with K-nearest neighbors. Lectures 5 and 6. Classification methods. We will introduce classification methods, including logistic regression, linear discriminant analysis, and quadratic discriminant analysis. We will discuss the differences between them, and their strong and weak points. Lecture 7 and 8. Model assessment: We will introduce several approaches for evaluating the accuracy of models, including cross-validation and bootstrapping. Lectures 9 and 10. Tree-based methods: Focussing on decision trees, we will introduce tree-based methods for regression and classification. After that, we will cover more advanced methods, such as Bagging, Random Forests, and Boosting. Lecture 11. Support Vector Machines (SVMs): We will introduce maximal margin classifiers, and use this as a base to exploring SVMs. Lectures 12 and 13: Unsupervised learning: Introduction to unsupervised learning problems. We will introduce Principal Component Analysis, K-means clustering, and hierarchical clustering. Lecture 14. Review of course material. Lecture 15. Final examination, if the COVID-19 situation allows it. If a face-to-face examination is impossible, the final examination will be replaced by a number of smaller assignments. Lecture 16. Feedback |
|||
Evaluation Methods and Policy | Grading will be based on a final examination (50%) and small assignments (50%). If the COVID-19 situation does not allow a face-to-face examination, the grading will be based completely on assignments (100%). | |||
Course Requirements | The course is intended for students who have a basic understanding of statistics. Programming experience is useful but not required. | |||
Study outside of Class (preparation and review) | The course will follow a textbook. At the end of each lecture I will specify the sections to read before the next lecture. | |||
Textbooks | Textbooks/References | An Introduction to Statistical Learning: with Applications in R, James, Witten, Hastie and Tibshirani, (Springer), ISBN:978-1461471370, The course lectures will follow the content of this textbook (Edition 1). Sections of the book to read in preparation of each class will be announced. This textbook contains theoretical parts as well as practical exercises. Please note that this textbook is also freely (legally) available for download at https://www.statlearning.com. |