ビッグデータの計算科学
Numbering Code |
G-INF07 88035 LJ11 G-INF07 88035 LJ10 |
Year/Term | 2022 ・ Second semester | |
---|---|---|---|---|
Number of Credits | 2 | Course Type | Lecture | |
Target Year | Target Student | |||
Language | Japanese | Day/Period | Tue.3 | |
Instructor name |
YAMASHITA NOBUO (Graduate School of Informatics Professor) SATO HIROYUKI (Graduate School of Informatics Program-Specific Associate Professor) SEKIDO HIROTO (Institute for Liberal Arts and Sciences Program-Specific Senior Lecturer) KOYAMADA KOUJI (Academic Center for Computing and Media Studies Professor) NATSUKAWA HIROAKI (Academic Center for Computing and Media Studies Program-Specific Senior Lecturer) |
|||
Outline and Purpose of the Course |
With the progress of computers and the development of information infrastructure technology in recent years, the amount of data generated from social activities performed via the Internet such as cloud computing, or obtained through computer simulation, which is an important technique in computational science is increasing day by day. The purpose of this course is to learn techniques for analyzing and visualizing such big data. A large-scale sparse matrix can be interpreted as an adjacency matrix to represent a large-scale directed graph, and these can be used to express various analysis targets. The most common and universal method for extracting the features of a matrix, that is, the features to be analyzed, is to perform eigenvalue decomposition or singular value decomposition. Therefore, regarding the data analysis method, starting from the least squares method and principal component analysis, which are the basis of multivariate analysis, we will teach you about various data analysis methods that use eigenvalue decomposition and singular value decomposition, including spectral clustering of graphs and EM algorithms for estimating missing values of matrices. In addition, optimization problems frequently occur when data analysis methods are actually applied. For example, the least squares method, principal component analysis, spectral clustering, and matrix missing value estimation are all formulated as optimization problems. Such optimization problems may be solved using calculations based on linear algebra, but generally an algorithm for solving the optimization problem is required. For example, estimation of missing values in a matrix can be achieved by singular value decomposition in the case of a small-scale dense matrix, but it is not practical in the case of a large-scale sparse matrix because the singular value decomposition takes too much time. Therefore, in this lecture, we will explain optimization algorithms for big data, using the optimization algorithm for estimating the missing value of a large-scale sparse matrix as a case-in-point. |
|||
Course Goals | Understand how big data is analyzed when given in the form of weighted directed graphs and large dimensional sparse matrices. In particular, understand techniques for cutting graphs using singular value decomposition. In addition, understand the contents of the least squares method and principal component analysis, which are basic statistical analysis methods. Understand optimization methods for big data. | |||
Schedule and Contents |
The schedule for all 15 classes is as follows. ・ Guidance (1 class) ・ Visualization of big data (3 classes) Explain techniques for visually understanding big data ・ Data analysis method using singular value decomposition, eigenvalue decomposition, and singular value decomposition of data matrix (6 classes) Lecture on the basics of linear algebra, including the definition of singular value decomposition. In addition, the least squares method, which is a basic data analysis method, and the principal comp1nt analysis, which is the basic idea for data analysis using singular value decomposition, will be explained. After that, various data analysis methods using eigenvalue decomposition and singular value decomposition such as spectral clustering of graphs will be outlined. ・ Optimization method for big data (5 classes) To learn the approach to the large-scale optimization problems that appear when analyzing big data, we will start with the basics of optimization algorithms. After that, we explain the optimization problem that appears in the Lasso regression for sparse estimation and the problem of estimating the missing value of a large-scale sparse matrix. |
|||
Evaluation Methods and Policy |
Students will be evaluated based on their scores on the following report-based tests. One report task will be given for each of the following: "visualization of big data", "data analysis method using eigenvalue / singular value decomposition", and "optimization method for big data" (each will be worth 100/3 points) |
|||
Course Requirements | None in particular | |||
Study outside of Class (preparation and review) | Knowledge of linear algebra, which is important for statistics, will be explained in course, but we expect that you will prepare or review it in advance. In addition, we expect that you will prepare or review the basic knowledge of statistics, especially knowledge such as principal component analysis, and take the course. | |||
Textbooks | Textbooks/References | Lecture materials will be distributed as necessary. No particular textbook is required or specified. | ||
References, etc. | Koji Oyamada, Naohisa Sakamoto "粒子ボリュームレンダリング-理論とプログラミング" (Corona) ISBN:ISBN: 978-4-339-02449-4 See http://www.coronasha.co.jp/np/detail.do?goods_id=2726 |