Data Analysis Practice II-E2

Numbering Code U-LAS11 20006 SE55 Year/Term 2022 ・ Second semester
Number of Credits 2 Course Type Seminar
Target Year All students Target Student For all majors
Language English Day/Period Fri.3
Instructor name PATAKY,Todd (Graduate School of Medicine Associate Professor)
Outline and Purpose of the Course This course aims to explore a wide variety of data analysis methods in a manner that emphasizes data interpretation. Probability and distributions will be explored using graphical and numerical approaches. Concepts from classical hypothesis testing and machine learning will be emphasized through example. No prior knowledge of statistics or data science is required. Computer programming experience is useful but not required.
Course Goals Students will learn the basics of data science, statistics and computer programming. Students will understand when certain data science tools are useful and when they are less useful or even inappropriate. This lecture will extensively use the Python programming language (python.org) and Jupyter Notebooks (jupyter.org). The final goal of this course is to produce a Final Project, which involves (1) analysis of a real-world dataset using a variety of analysis techniques, and (2) creation of a full report of your findings, in a user-friendly format, similar to real-world report that you might one day produce for a data analysis customer.
Schedule and Contents The following weekly topics will be covered:

1) Introduction, Jupyter + Markdown
2) Python I: Basics
3) Python II: Data visualization
4) Descriptive Statistics & Correlation
5) Working With Real Data I: Getting Data
6) Working With Real Data II: Exploring Data
7) Probability I: Random Variables
8) Probability II: Hypothesis Testing
9) Probability III: Simulating Experiments
10) Machine Learning I: Classification
11) Machine Learning II: Clustering
12) Machine Learning III: Regression
13) Machine Learning IV: Preprocessing
14) Machine Learning V: Dimensionality Reduction
15) Feedback
Evaluation Methods and Policy Students are expected to produce all in-class demonstrations independently, and to independently complete regular assignments.

Evaluation will be based on the following criteria:

- Assignments (64%) [8 @ 8% each]
- Final Project Proposals (12%) [2 @ 6% each]
- Final Project (24%)

TOTAL: 100%
Course Requirements There are no specific requirements for this class. However, students must be willing to work with open-source software, which is relatively poorly documented compared to commercial software. The class instructor will help with problems, but students are also encouraged to find solutions to their problems through internet searches.

Additionally, skills in the following would be helpful:
- Computer programming: Python experience (or experience with any other language)
- HTML editing: Markdown (or any other high-level HTML-generation language)
- Statistics: basic hypothesis testing, basic machine learning, etc.
Study outside of Class (preparation and review) This course has a variety of out-of-class assignments (including a Final Project) and no exam. Students who do not pay attention to the lecture content during class will likely have difficulties completing the assignments.

The lecture content will be made available prior to the lecture. It is recommended that students review this content prior to the lecture.
Textbooks Textbooks/References An open, electronic textbook will be electronically distributed to students and will be used in all classes. All other necessary materials will also be distributed electronically and will be discussed in class.
References, etc. Data Science from Scratch: First Principles with Python, Joel Grus, (O'Reilly Media), ISBN:978-1491901427, Lectures will loosely follow this textbook's content. This textbook is OPTIONAL, but will be useful for reviewing concepts and for independent study.
"Data Science from Scratch" is a useful reference book, but is not required for this class. Lecture notes and all other materials will be made available electronically.
Related URL https://github.com/joelgrus/data-science-from-scratch
https://www.jupyter.org
PAGE TOP