Author Avatar

Yordan Arango

This post is intended to give you all the math and statistical concepts needed to develop the theory of Deep Learning. It is not mandatory to follow the content in an ordered way; you can choose to read any section as you preffer. Besides, it will be growing as more content and tools are required.

Calculus

Additive property of derivatives

The derivative of a sum is the sum of the derivates: $$ \frac{d}{dx} [f(x) + g(x) + h(x) + \cdots] = \frac{df(x)}{dx} + \frac{dg(x)}{dx} + \frac{dh(x)}{dx} + \cdots $$

Chain rule for one independent variable

$$ \frac{d f(g(x))}{dx} = \frac{df}{dg} \frac{dg}{dx} $$

General chain rule

The general cahin rule allows to compute the derivative of a variable which is function of different variables that are, at the same time, functions of different variables, with respect to one of these variables.

Be \(z = f(x_1, x_2, \cdots, x_m)\) a differentiable function of m variables, where each of the \(x_i\) is a differentiable function of the variables \(t_1\), \(t_2\), \(\cdots\), \(t_n\). For any \(j\) between 1 and \(n\) we can write:

$$ \frac{\partial z}{\partial t_j} = \frac{\partial z}{\partial x_1} \frac{\partial x_1}{\partial t_j} + \frac{\partial z}{\partial x_2} \frac{\partial x_2}{\partial t_j} + \cdots \frac{\partial z}{\partial x_m} \frac{\partial x_m}{\partial t_j} $$ A more particular and known case is when \(m=n=1\). So, $$ \frac{d z}{dt} = \frac{dz}{dx} \frac{dx}{dt} $$ This is the case of the chain rule for one independent variable in the previous item.

Linear Algebra

Dot product (Scalar product)

$$ x \cdot y = \underbrace{\left[\begin{array}{c} x_1\\ x_2\\ \vdots \\ x_j\\ \vdots \\ x_J \end{array}\right]}_{x} \cdot \underbrace{\left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_j \\ \vdots \\ y_J \\ \end{array}\right]}_{y} = x_1 y_1 + x_2 y_2 + \cdots + x_j y_j + \cdots + x_J y_J = \sum_j^J x_j y_j $$