## Intro

A common problem in machine learning is the classification problem. That is, we want to program a computer that can classify data points by itself. One approach is to use the *support vector machines*, which separates data points with a straight-line, plane or hyperplane.

Everything to the left of the straight line is in class and everything to the right is in class .

However, what happens if the data points are positioned like below?

What we can do is to introduce a new *dimension* through the mapping , given by:

We see here that all the points that are far from the origin will have a high value for their component. Doing this, we can separate classes with the plane based on the value of their -component. This is called the *kernel trick*.

## Concept

Consider the matrix:

If we think of each column as a vector, we will have three vectors, which can lie in at most three dimensions.

However, since , the three vectors are *linearly dependent* and the vectors actually lie in a two-dimensional plane. From this we conclude that the *rank* of is 2.

The dimension theorem says that the dimension of the *null space* of a matrix, i.e. the set of vectors becoming when multiplied by it, is equal to the number of column vectors minus the rank of a matrix.

By applying the dimension theorem to our matrix , we find that the null space has 1 dimension, forming a line.

## Math

To find the null space of a matrix , we generally have to solve the equation:

The dimension theorem can provide us with a shortcut though. Let's return to the matrix:

Since the dimension theorem tells us that the null space of is 1, and that any vector in the null space is perpendicular to those in the row space, all we have to do is to find a line perpendicular to the two vectors

Recall that the cross product of two vectors is a vector perpendicular to both. Therefore,

## Dimension theorem

The dimension theorem comes in several variants, but even if the formulations are different, they are still equivalent. It is about the dimensions of the matrix and the dimensions of its column space and null space.

**(The dimension theorem for matrices)** If is an -matrix, then:

where is the dimension of the column space and is the dimension of the null space of A. In other words, it would have been possible to reformulate the same sentence as:

## Rank theorem

The rank theorem involves the row space and the column space for each matrix . It reads as follows:

**(Rank theorem)** The row space and the column space of a matrix have the same dimensions.

It is a very simple and beautiful theorem. It leads to more insights about matrices, as follows:

Let be an -matrix, then:

A definition the beginner should be aware of is that an -matrix is considered to have *full rank* if its column vectors are linearly independent. There is also talk of the full rank of the row space, and the definition is analogous.

Without deeper reasoning or evidence, we end here with the following characteristics:

Let be an -matrix. Then:

and have the same null space and row space

and have the same null space and row space

and have the same column space

and have the same column space

, and have the same rank

## Pivot theorem

The pivot theorem is what motivates the algorithm to produce a basis for the column space. Before we go into the algorithm or theorem, we need to define *pivot columns*.

Pivot columns are the column vectors that correspond to those with leading ones in a row-reduced matrix.

So, an example is that column vectors 1, 3 and 5 in the following -matrix are pivot columns:

Now we are ready for the pivot theorem.

**(The pivot theorem)** The pivot columns in a matrix form a basis for the column space .

So with the theorem, we can read that the following vectors form a basis for the column space of the matrix:

We end this section with the column-row factorization theorem that provides a practical application of the basis for the column space and the row space, respectively.

**(Column-row factorization)** If is a non-zero -matrix with rank , then can be factorized as follows:

where is an -matrix whose column vectors are pivot columns to the matrix (=the basis vectors of the column space) and is a -matrix whose row vectors are the base vectors of the row space to .

Examples of the above theorem: