Calculus review

Scalar derivatives

Some notation for scalar derivatives:

Scalar identities

Some of the most common identities for working with scalar derivatives:

Common functions

Here are some useful derivatives of scalar functions. In each expression, xx is the variable of interest; all other symbols represent constants.

Vector derivatives

It's also useful to think about functions that return vectors or take vectors as arguments. If ff is a vector-valued function of a real argument, fRRnf\in\mathbb R\to\mathbb R^n, we can write it as a vector whose components are real-valued functions,

f(x)=(f1(x)f2(x)fn(x))f(x) = \left(\begin{array}{c}f_1(x)\\ f_2(x)\\ \vdots \\ f_n(x) \end{array}\right)

Its derivative is then also a vector-valued function, of the same shape as ff. Its components are the derivatives of the component functions:

ddxf=(df1dxdf2dxdfndx)\frac{d}{dx} f = \left(\begin{array}{c}\frac{df_1}{dx}\\[1ex] \frac{df_2}{dx}\\ \vdots \\[.5ex] \frac{df_n}{dx} \end{array}\right)

We can think of ff as representing a curve in Rn\mathbb R^n. The derivative dfdx\frac{df}{dx} represents a tangent vector to this curve: the instantaneous velocity of a point moving along the curve as the argument xx changes at a unit rate. The length of the tangent vector tells us the speed of the point, and the components tell us its direction.

Here's an example of a function in RR3\mathbb R\to\mathbb R^3 and its derivative at a particular point:

Note that this plot doesn't show the argument xx explicitly: instead it is implicit in the position of the point along the curve. If we wanted to show xx explicitly, we could color the curve or add grid marks to show what values of xx correspond to what values of f(x)f(x).

More vector derivatives

If the function ff has multiple inputs instead of multiple outputs, fRnRf\in\mathbb R^n\to\mathbb R, we can collect all of the arguments into a column vector:

x=(x1x2xn)x = \left(\begin{array}{c}x_1\\ x_2 \\ \vdots \\ x_n \end{array}\right)

Then dfdx\frac{df}{dx} means the row vector of partial derivatives of ff:

dfdx=(fx1fx2fxn)\frac{df}{dx} = \left(\begin{array}{c}\frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \ldots & \frac{\partial f}{\partial x_n}\end{array}\right)

We can think of ff as representing a surface in Rn+1\mathbb R^{n+1}: the argument xx varies across Rn\mathbb R^n while f(x)f(x) determines the height. In this case the tangent vector tells us the direction of steepest increase of the function.

Here's an example of a function in R2R\mathbb R^2\to\mathbb R together with its derivative at a point:

The derivative is the vector in R2\mathbb R^2 (shown in green at the bottom of the plot) that points in the direction of steepest increase. Note that it is orthogonal to a contour line.

Chain rule for vectors

With the above notation, the chain rule for vector functions looks just like it did for scalar functions. Suppose fRnRf\in\mathbb R^n\to\mathbb R takes multiple arguments and gRRng\in\mathbb R\to\mathbb R^n returns multiple values, so that f(g(x))f(g(x)) makes sense. Then we have

dfdx=dfdgdgdx\frac{df}{dx} = \frac{df}{dg} \frac{dg}{dx}

This looks just like the scalar chain rule (we "cancel the dgdg"). But now dfdg\frac{df}{dg} is a row vector in R1×n\mathbb R^{1\times n} and dgdx\frac{dg}{dx} is a column vector in Rn×1\mathbb R^{n\times 1}, so that when we multiply them we get their dot product. For clarity we can indicate the values of the arguments to each function:

dfdxx=dfdgg(x)dgdxx\frac{df}{dx}\bigg |_x = \frac{df}{dg}\bigg|_{g(x)}\, \frac{dg}{dx}\bigg|_x

If we write out the dot product, we get

dfdx=i=1nfgidgidx\frac{df}{dx} = \sum_{i=1}^n \frac{\partial f}{\partial g_i} \frac{dg_i}{dx}

which may be familiar as the rule for calculating the total derivative of ff with respect to xx. In words, to calculate the change in ff, we sum up the effects of all of the changes in all of the inputs to ff.