I recently read the post by Wale Akinfaderin on the Mathematics of Machine Learning and was compelled to write on the subject as a way to keep track of some useful resources and pointers, having gone – even though having a decent mathematical background – through the process of refreshing up certain mathematical concepts in order to be able to understand machine learning algorithms but also making sense of the strengths and weaknesses of their implementations.
Connecting mathematical concepts sometimes centuries-old to modern machine learning algorithms can be an exhilarating and very humbling experience. Let’s start with the most prominent area of mathematics in machine learning: linear algebra theory and a small list of useful references at the end of the post.
Here are some of the concepts that are studied in linear algebra (source mathworld.wolfram.com):
Eigenvalue | One of a set of special scalars associated with a linear system of equations that describes that system’s fundamental modes. An eigenvector is associated with each eigenvalue. |
Eigenvector | One of a special set of vectors associated with a linear system of equations. An eigenvalue is associated with each eigenvector. |
Euclidean Space | The space of all n-tuples of real numbers. It is the generalization of the two dimensional plane and three dimensional space. |
Inner Product | (1) In a vector space, a way to multiply vectors together, with the result of this multiplication being a scalar. (2) A synonym for dot product. |
Linear Algebra | The study of linear systems of equations and their transformation properties. |
Linear Transformation | A function from one vector space to another. If bases are chosen for the vector spaces, a linear transformation can be given by a matrix. |
Matrix | A concise and useful way of uniquely representing and working with linear transformations. In particular, for every linear transformation, there exists exactly one corresponding matrix, and every matrix corresponds to a unique linear transformation. The matrix is an extremely important concept in linear algebra. |
Matrix Inverse | Given a matrix M, the inverse is a new matrix M-1 that when multiplied by M, gives the identity matrix. |
Matrix Multiplication | The process of multiplying two matrices (each of which represents a linear transformation), which forms a new matrix corresponding to the matrix representation of the two transformations’ composition. |
Norm | A quantity that describes the length, size, or extent of a mathematical object. |
Vector Space | A set that is closed under finite vector addition and scalar multiplication. The basic example is n-dimensional Euclidean space. |
How do these concepts relate to machine learning ?
Principal Component Analysis is mainly used for 2 things :
- Dimensionality reduction, when the number of variables to analyse is too large and we need to reduce this set without loosing too much information.
- Data interpretation: we want to discover hidden relationships between variables and determine which ones are the most informational.
To understand PCA and make sense of it, a few concepts are necessary: standard deviation, covariance (statistics), eigenvactors, eigenvalues and eigendecomposition (linear algebra).
Singular Value Decomposition is very widely used in machine learning. A notorious example is netflix’s contest in 2006 which displayed a complex dimensionality reduction problem: decomposing a (big) matrix into 3 smaller matrices (U, V and ) such that
, where U and are unitary matrices and is the diagonal matrix of singular values. This becomes crucial in big data applications where the matrices are extremely large (see some library references at the end of the post).
Several matrix factorization techniques other than SVD including LU decomposition, QR decomposition, rank revealing QR decomposition, interpolative decomposition are being constantly improved and challenged (see extensive references below).
Linear Regression and classification algorithms make extensive use of linear algebra: ordinary least squared linear regression, regularization (Tikhonov, lasso). Vector norms are broadly used in areas such as sound processing (sound separation for example), clustering, recommender systems, to name a few.
Time series analysis makes extensive usage of linear algebra as well, Autoregression(VA), Moving Average (MA), Vector Autoregression (VAR), Dynamic Time Warp are all about vectors and linear algebra.
Finally, linear algebra are essential to the development and implementation of neural networks. Vector projections, eigendecompositions and SVD, Taylor expansions, gradient vectors and Hessian matrices of vector functions are some of the linear algebra concepts that make neural networks a reality and foster deep learning’s fertile and very active research.
Resources
M.I.T. course on Linear Algebra by prof. Gilbert Strang.
Khan Academy course on Linear Algebra.
A tutorial on Principal Component Analysis by Lynda I Smith. Very straightforward tutorial on PCA.
Coding the matrix: Linear Algebra Through Computer Science Applications
Singular Value Decomposition: a geometric explanation.
A neat explanation of the relationship between PCA and SVD.
Netflix prize and SVD: an interesting paper on the netflix’s contest.
Dimensionality reduction in Apache Spark’s mllib library.
Literature survey on low-rank approximation of matrices: a recent arxiv survey on matrix factorization techniques, which is a very fertile research area.
Randomized LU decomposition: an algorithm for dictionaries construction: Arxiv paper showing a novel algorithm based in LU decomposition that outperforms SVD.
Linear Algebra for Neural Networks: a comprehensive article by Hervé Abdi
A novel Time Series library for Apache Spark by Sandy Riza (Cloudera)