# deep learning notes(neural network foundation)

this note will include:

• neural network foundation
• coding notes
• avoid bugs precautions
• questions like ‘why use … ?’

neural network foundation
(take classify question as an example)

1. logistic regression
2. lost function
3. cost function
4. back propagation: calculate derivative of each params and update them
5. neural network: every layer has input z and output a neuron from this link lecture 3.3 what happened in every layer from this link lecture 3.3

6. activative function
1. sigmoid (0,1)
2. tanh (-1, -1)
in hidden layer, tanh seems better than sigmoid because the range is between -1 and 1, so the average value is closer to 0. But for out put layer, sigmoid will be better because it makes sense for y hat(output) to be a number between 0 and 1
3. ReLU(rectified linear unit): a = max(z, 0). if z is very large or very small, the slope of the function will end up being close to zero, so it will slow down gradient descent. So using ReLU will accerlate the process of learning.
4. leaky ReLU: might be max(0.01z, z), difference from ReLU: when z < 0, the slope will not be 0
• tips
• if output is 0 or 1, sigmoid function maybe suitable for output layer. And for all other units on ReLU

coding notes

1. np.dot:
calculate the inner product of two array;
equal to matrix multiplication when two params are 2-D arrays (np.matmul)

2. np.exp(v)

3. np.log(v)

4. np.abs(v)

5. np.maximum(v, 0): to take the max of every element of v with 0

6. v**2: takes element-wise square of each element of v

7. np.dow(w.T, X) + b: python will transfer b to a vector

8. b = x.sum(axis=0): sum columns

9. a/b.reshape(1,4): every line of a will divide by b

10. (m,n) + (1,n): python will transfer (1,n) to (m,n) and then add them

11. avoid using for loops, since using for loop is much slower than vectorization.

avoid bugs precautions

1. a = np.random.randn(5)
shape of a will be (5,), and a.T will look the same as a. Plus, when calculate np.dot(a,a.T) we will get a number instead of an outer product or matrix
so when coding not use data structures where the shape is n or rank 1 array, generate a use np.random.randn(5,1)(5 by 1).

questions

1. why need a nonlinear activation function?

• if not use, the result can be transferred to np.matmul((w.T),x) + b, so ‘unless you throw in a non-linearity in there, then you’re not computing more interesting functions’ from this link lecture 3.7
• the only place that can use linear activation function is regression problem(y is a real number, eg. predict house prices)
2. why need deep neural networks?

• use deep neural networks can combine simple features to detect more complex things
• circuit theory(permutation, use more layer to decrease the neuron in each layer)