Lesson 2: Tensors and Automatic Differentiation

Please ask any questions relevant to lesson 2 here.

Hi Marc,

First I would like to thank you very much for your class, it really helped me for approaching deep learning. I finally feel like I’m understanding the basic concepts, which can be tricky when you’re not a mathematician expert ^^

Then, I have a question concerning the exercise you gave us in the end of lesson 2 part 2.
I’m following the instructions there but I’m kind of stuck now.

I don’t understand how we’re supposed to do the back propagation of the gradient. In the example we see how it is done for the bias. In the ‘backward’ function the gradient is not changed, but where are we supposed to compute it if not in the ‘add_bias’ class ?

Here is my code:

import numpy as np
from numpy.random import random

class add_bias(object):
    def __init__(self, b):
        # initialize with a bias b
        self.b = b

    def forward(self, x):
        # return the result of adding the bias
        return x + self.b

    def backward(self, grad):
        # save the gradient (to update the bias in the step method) and return the gradient backward
        self.grad = grad
        return grad

    def step(self, learning_rate):
        # update the bias
        self.b -= learning_rate * self.grad


class dot_weight(object):
    def __init__(self, w):
        self.w = w

    def forward(self, x):
        self.x = x
        return x * self.w

    def backward(self, grad):
        self.grad = grad
        return self.grad * self.x

    def step(self, learning_rate):
        self.w -= learning_rate * self.grad


class exponential(object):
    def forward(self, x):
        self.y_est = np.exp(x)
        return self.y_est

    def backward(self, grad):
        self.grad = grad
        return grad * self.y_est

    def step(self, learning_rate):
        pass

class composition(object):
    def __init__(self, layers):
        self.layers = layers

    def forward(self, x):
        y_est = x.copy()
        for layer in self.layers:
            y_est = layer.forward(y_est)
        self.y_est = y_est
        return y_est

    def compute_loss(self, y, y_est):
        substract = y_est - y
        self.loss = np.sum(substract ** 2)
        self.grad = 2 * substract
        return self.loss

    def backward(self):
        for layer in reversed(self.layers):
            self.grad = layer.backward(self.grad)

    def step(self, learning_rate):
        for layer in self.layers:
            layer.step(learning_rate)


def launch_exo():
    print('LESSON 2 : SGD EXPONENTIAL LINEAR REGRESSION\n')
    w, b = 0.5, 2
    xx = np.arange(0, 1, .01)
    print('xx = {}'.format(xx))
    yy = np.exp(w * xx + b)
    # print('yy = {}'.format(yy))

    estimated_b = [1]
    estimated_w = [1]
    my_composition = composition([dot_weight(estimated_w[0]), add_bias(estimated_b[0]), exponential()])

    learning_rate = 1e-4

    losses = []
    ws = []
    bs = []

    for i in range(10):
        j = np.random.randint(1, len(xx))

        # compute the estimated value of y from xx[j] with the current values of the parameters
        y_est = my_composition.forward(xx[j])

        # compute the loss and save it
        losses.append(my_composition.compute_loss(yy[j], y_est))

        my_composition.backward()
        my_composition.step(learning_rate)
        ws.append(my_composition.layers[0].w)
        bs.append(my_composition.layers[1].b)

    print('Estimated : losses={} \nb={} \nw={}'.format(losses, bs, ws))

Matt

Found my error, in the function backward from the dot_weight class:
I needed to change the value with the multiplication with x and return the result :

class dot_weight(object):
    def __init__(self, w):
        self.w = w

    def forward(self, x):
        self.x = x
        return x * self.w

    def backward(self, grad):
        self.grad = grad * self.x
        return self.grad

    def step(self, learning_rate):
        self.w -= learning_rate * self.grad

Hi Matt,
thanks for starting the discussion!
Yes, your multiplication is now correct. This should work, congrats!
Marc

1 Like

Hello,

Thanks for the course !
I have been stuck in something for some time now. I don’t see the error in my code, and it is giving me really bad results, so I suppose I am doing something wrong but I don’t see it. Can you help me?

Here is my code:



 class multiplication_weight(object):

    def __init__(self, w):

        # initialize with a weight w

        self.w = w

        

    def forward(self, x):

        # return the result of multiplying by weight

        self.x = x

        return self.w*x

               

    def backward(self,grad):

        # save the gradient and return the gradient backward

        self.grad = grad*self.x

        return self.grad

    def step(self, learning_rate):

        # update the weight

        self.w -= learning_rate*self.grad

        

class my_exp(object):

    # no parameter

    def forward(self, x):

        # return exp(x)

        self.x = x

        return np.exp(x)

            

    def backward(self,grad):

        # return the gradient backward

        self.grad = self.forward(self.x)*grad

        return self.grad

            

    def step(self, learning_rate):

        # any parameter to update?

        # Hint https://docs.python.org/3/reference/simple_stmts.html#the-pass-statement

        pass

class my_composition(object):

    def __init__(self, layers):

        # initialize with all the operations (called layers here!) in the right order...

        self.weight = layers[0]

        self.bias = layers[1]

        self.expo = layers[2]

                

    def forward(self, x):

        # apply the forward method of each layer

        self.x = x

        return self.expo.forward(self.bias.forward(self.weight.forward(x)))

            

    def compute_loss(self,y, y_est):

        # use the L2 loss

        # return the loss and save the gradient of the loss

        self.loss_grad = 2*(y-y_est) 

        return (y-y_est)**2

    def backward(self):

        # apply backprop sequentially, starting from the gradient of the loss

        # Hint: https://docs.python.org/3/library/functions.html#reversed

        self.grad = self.weight.backward(self.bias.backward(self.expo.backward(self.loss_grad)))

                     

    def step(self, learning_rate):

        # apply the step method of each layer

        self.expo.step(learning_rate)

        self.weight.step(learning_rate)

        self.bias.step(learning_rate)



Hi,
What do you mean by really bad results?
I would not have coded like that but your code is running fine…

Marc

Hello,

I launched my code like this:

my_fit = my_composition([multiplication_weight(1),add_bias(1), my_exp()])
learning_rate = 1e-4
Loss = []
estimated_w = [1]
estimated_b = [1]

for i in range(5000):
  j = j = np.random.randint(1, len(xx))
  y_est = my_fit.forward(xx[j])
  loss = my_fit.compute_loss(yy[j],y_est)
  Loss.append(loss)
  my_fit.backward()
  my_fit.step(learning_rate)
  estimated_w.append(my_fit.weight.w)
  estimated_b.append(my_fit.bias.b)


print('')
print(f'Optimal weight:{my_fit.weight.w}')
print(f'Optimal bias:{my_fit.bias.b}')

And I am getting w = -0.69 and b = -2.11 which is far from optimal.

OK, now I understand why I did not see your problem. I did run the same code as you, except in the computation of the loss where I did interchange the arguments. You can try and see that this works!
Your gradient for the loss is not correct, it should be 2*(y_est-y). With this modification, your code should run fine.

Oh, thank you very much!!

Hi Marc,

First of all, thank you for all your lessons and the excellent course.

I have tow questions, please.

  1. I have read other posts, and I have did the same thing for saving the input x value in the forward method, since we need it for computing the gradient in backward method later, but beacause you make a comment “# return the result of multiplying by weight” in the forward method, but not “# save the input x and return the result of multiplying by weight”, I am wondering that perhaps that you have another way to save the x value ? If so, could you tell me please ?

  2. One more question, please, for the backward of add_bias class, you do in the last line of code, return grad, but not return self.grad (which is the same thing with variable grad in this case). For backward method in my code, is it fine to return self.grad instead of return self.x*grad in my exemple ? Both work fine, but I just want to know if there is a rule that it is not advised to return a self property in a method ?

My code is like this :
class multiplication_weight(object):

def __init__(self, w):

    # initialize with a weight w

    self.w = w        
def forward(self, x):
    # return the result of multiplying by weight
    self.x = x
    return self.w*x
           
def backward(self,grad):
    # save the gradient and return the gradient backward
    self.grad = self.x*grad
    return self.grad
        
def step(self, learning_rate):
    # update the weight
    self.w -= learning_rate*self.grad

Thanks a lot and have a nice day,

Zheyu XIE

Hi

  1. no I am doing like you!
  2. return self.grad is simpler ans more readable. You can return a self.something, I do not see any pb with that…
1 Like

Hi Marc;

Thank you very much for this course serie! I’m catching up a little late, hope you would still answer my questions.

I don’t understand why the backward() method of multiplication_weight returns the gradient with respect to w. Shouldn’t it return a gradient with respect to x ?
In the current example, it does not change anything, since the multiplication module appears only at the beginning of the composition, but I think it may matter if several multiplication blocks are composed together.

My implementation :

   def backward(self,grad):
    # save the gradient and return the gradient backward
    # grad w.r.t. w 
    self.grad = self.x * grad
    # grad w.r.t. x
    return self.w * grad 

Many thanks;

Hi Severine,
here x is the input and w is a parameter. You want to update the parameter with a gradient descent algorithm hence, you need to compute the derivative of the loss with respect to the parameters not the input.
But you are right if you had another module before the multiplication, then you should also compute the derivative with respect to x which would not be anymore the input. In this case, you need to compute the full gradient i.e. a vector with both derivatives with respect to x and with respect to w. You can do this by creating a self.grad_w and a self.grad_x. In our case we only need to compute self.grad_w.

Does it answer your question?

Best,
Marc

Hi Marc;
Thanks for your answer. Just to be sure, backward() will thus :

  • compute store self.grad_w = grad * self.x
  • compute and return grad_x = grad * self.w
    Only self.grad_w will be used by the parameters update step() , but the returned gradient would be possibly used to compute the gradient of previous modules.

Is that correct ?
Thanks !

Yes, that’s it. You can try by adding another module before the multiplication!

ok thanks, I’ll try it !