## Intro

Machine learning is becoming an ever more important field. It's used pretty much anywhere, by small scale businesses, theoretical physicists and in health care. The possibilities are endless!

But hang on there, how does machine learning actually work? Well, machine learning is, just as many other cool technologies, powered by math! And in particular, you'll need this concept of gradient descent. Basically, it's a method allowing the machine to become less and less wrong. So without math, machine learning would have been impossible!

## Concept

The gradient is basically a vector with all the partial derivatives.

But what's cool about the gradient isn't that it's more information-rich than an ordinary partial derivative. I mean, who cares whether you're able to compute another partial derivative? So what?

Well, the gradient always points in the direction of steepest ascent, and it's perpendicular to contour lines. These properties make the gradient pretty awesome.

## Math

The gradient of a scalar function of several variables is the vector

The two most important properties of the gradient are

points towards the direction where increases the most.

is perpendicular to the level curves of .

For example, the gradient to the function is

At the point , the function increases the most if you walk in the direction of

## Del operator

This symbol is called *nabla*, and it is used to denote one of the most important constructs in the field of calculus, the *del operator*.

We have previously been exposed to the simple *differential operator*, which acts on a single variable function to produce its derivative :

Then, we extended the concept of the derivative to functions of several variables, through *partial derivatives*.

As it turns out, these are both cases of applying the del operator.

The del operator is a vector of partial derivative operators, and its number of components is decided by how many variables the function it operates on has

Let's dig a bit deeper to see how all this tie together.

*The del operator*

The del operator in the -dimensional coordinate system , with variables , is defined as:

Just like a regular vector times a scalar multiplies each component separately, each component of this vector of partial derivative operators act on a scalar function one-by-one.

Applying the del operator to a function in :

results in only one component, and becomes nothing but the simple differential operator:

Similarly, applying the del operator to a function in :

gives us the partial derivatives, with respect to and respectively, as the two components of a vector

Further, we can operate on a function in dimensions with del, to obtain a vector of partial derivatives:

These examples of acting on scalar functions with the del operator yield special types of vectors called *gradients*.

The operator can be applied to vector valued functions as well though, in two different ways, to produce either *divergence* or *curl*. Here, it is sometimes beneficial to represent the del operator in polar coordinates rather than in Cartesian ones as above.

Gradients, divergence, and curl will be studied in detail in future lecture notes, so stay tuned.

## Gradients

There's a snow storm, and you're out climbing the Matterhorn.

As you're climbing the Matterhorn, the famous 'Toblerone mountain', there's suddenly a blizzard. You can't see anything. If you walk in the direction of the so-called gradient, you'll gain in altitude the fastest.

If you package all partial derivatives into one single vector, you get the gradient. The gradient is written , and so

The gradient lies in the xy-plane, and it points in the direction of steepest ascent. It's perpendicular to contour lines, as shown in the pic to the right.

If you fancy, you could create a vector field with the gradients in each point. In such a gradient field, the magnitude corresponds to the length of the gradient. Quite commonsensical, right?

### Example

Find the gradient of the following function

in the point

Then you'd get something along the lines (pun intended!) of:

## Directional derivatives

### Rates of change in any direction

The partial derivatives of a function are the derivatives of in the and directions. We have often denoted these directions by the unit vectors and .

Now, we can actually take the derivative in any direction . We call the rate of change in the direction a *directional derivative*.

To be able to compute this derivative, we need to have . We also need to be differentiable at the point of interest.

The directional derivative tells us how quickly the function changes in the given direction

We have already seen one example of directional derivatives: partial derivatives. They make up the gradient which can have any direction. However, it will always be parallel to the direction of most rapid growth of the function.

If we have a gradient, throwing out the component by setting it to gives us the directional derivative in the direction. This is equivalent to projecting it onto the -axis, as is perpendicular to .

Likewise, the directional derivative in any other direction is found by projecting the gradient onto the unit vector pointing in that direction. Concretely, we compute it with the scalar product:

The directional derivative in the direction of at is denoted .

### The smallest directional derivative

The gradient is the directional derivative with the greatest absolute value. The other extreme is the directional derivative in the direction of the level curves at the point. If is a vector parallel to the level curve at , then

Why? Well, the level curves are the curves along which the function stays constant. This is equivalent to saying its derivative is zero in that direction.

### Rate of change in scalar fields

We can regard the function as a scalar field, with the value of the field at each point. Then,

gives the magnitude of the rate of change in the scalar field, in the direction of .

### Example: walking down a mountain

Imagine you are walking down a mountain towards your car. Your speed is in the -plane. The shape of the mountain follows the function , where:

Your position is and the parking place position is at , and you walk a straight path between them. Calculate the rate of change in the height you experience per second, as you start your walk.

The key to solving this problem, is to note that we are searching for the rate of change *per second*. This implies that the speed that we are travelling matter for the end result.

The second thing to note is that the directional derivative gives us a measure of how the height of the mountain vary as we move in the -plane. Thus, we find our change in height per second by multiplying our directional derivative with our speed:

We start by calculating the unit directional vector of our path:

Next, we calculate the gradient of the scalar field:

Then, we calculate the directional derivative at the point (1, 1):

Finally, we calculate the rate of change in height per second. It turns out to be: