For a function f∈R→R, we write f′∈R→R for its derivative with respect to its argument. If the argument is called x, we can also write dxdf. If the argument represents time, we sometimes write f˙.
If a function depends on more than one variable, we write ∂x∂f or ∂y∂f to indicate a partial derivative: the derivative with respect to one variable while holding the others constant.
Second and higher derivatives are f′′, f¨, dx2d2f, or ∂x∂y∂2f.
For a function f, we write f∣∣x^ or f(x)∣∣x=x^ to represent evaluation at x^. This means the same thing as f(x^) but is sometimes clearer: it lets us keep one name (x) for the variable we are differentiating, and another name (x^) for the value we are substituting at the end.
Scalar identities
Some of the most common identities for working with scalar derivatives:
Differentiation and partial differentiation are linear operators: for example, (af+bg)′=af′+bg′.
Chain rule: if we want dxdf(g(x)), then we use dxdf=dgdfdxdg (As a mnemonic, we can "cancel the dg" — but since dgdf isn't really division, this is just a mnemonic.) Another way to write the same thing: dxdf(g(x))=f′(g(x))g′(x)
Product rule: (fg)′=f′g+fg′
Common functions
Here are some useful derivatives of scalar functions. In each expression, x is the variable of interest; all other symbols represent constants.
The derivative of a constant is zero: dxda=0.
The derivative of a monomial xk is kxk−1. This works even for negative and fractional values of k. One special case is x0, where by convention we treat 0x−1 as equal to zero everywhere.
The derivative of sinx is cosx; the derivative of cosx is −sinx.
The derivative of eax is aeax. If we're using some other base b, we rewrite bx=exlnb and then use the identity above.
The derivative of lnx is x−1. Again we can easily switch to another base: logbx=lnx/lnb.
Vector derivatives
It's also useful to think about functions that return vectors or take vectors as arguments. If f is a vector-valued function of a real argument, f∈R→Rn, we can write it as a vector whose components are real-valued functions,
f(x)=⎝⎛f1(x)f2(x)⋮fn(x)⎠⎞
Its derivative is then also a vector-valued function, of the same shape as f. Its components are the derivatives of the component functions:
dxdf=⎝⎛dxdf1dxdf2⋮dxdfn⎠⎞
We can think of f as representing a curve in Rn. The derivative dxdf represents a tangent vector to this curve: the instantaneous velocity of a point moving along the curve as the argument x changes at a unit rate. The length of the tangent vector tells us the speed of the point, and the components tell us its direction.
Here's an example of a function in R→R3 and its derivative at a particular point:
Note that this plot doesn't show the argument x explicitly: instead it is implicit in the position of the point along the curve. If we wanted to show x explicitly, we could color the curve or add grid marks to show what values of x correspond to what values of f(x).
More vector derivatives
If the function f has multiple inputs instead of multiple outputs, f∈Rn→R, we can collect all of the arguments into a column vector:
x=⎝⎛x1x2⋮xn⎠⎞
Then dxdf means the row vector of partial derivatives of f:
dxdf=(∂x1∂f∂x2∂f…∂xn∂f)
We can think of f as representing a surface in Rn+1: the argument x varies across Rn while f(x) determines the height. In this case the tangent vector tells us the direction of steepest increase of the function.
Here's an example of a function in R2→R together with its derivative at a point:
The derivative is the vector in R2 (shown in green at the bottom of the plot) that points in the direction of steepest increase. Note that it is orthogonal to a contour line.
Chain rule for vectors
With the above notation, the chain rule for vector functions looks just like it did for scalar functions. Suppose f∈Rn→R takes multiple arguments and g∈R→Rn returns multiple values, so that f(g(x)) makes sense. Then we have
dxdf=dgdfdxdg
This looks just like the scalar chain rule (we "cancel the dg"). But now dgdf is a row vector in R1×n and dxdg is a column vector in Rn×1, so that when we multiply them we get their dot product. For clarity we can indicate the values of the arguments to each function:
dxdf∣∣x=dgdf∣∣g(x)dxdg∣∣x
If we write out the dot product, we get
dxdf=i=1∑n∂gi∂fdxdgi
which may be familiar as the rule for calculating the total derivative of f with respect to x. In words, to calculate the change in f, we sum up the effects of all of the changes in all of the inputs to f.