Lesson 1: Machine Learning Pipeline

Please ask any questions relevant to lesson 1 here.

Hi,

In practical 1, could you tell what is the expected accuracy of the classifier(vgg16) on the test data after modifying it to classify 37 classes?

Thanks,
Madhu

Hi Madhu,
to be clear, I do not really care about performances for this first practical but since you asked I just run my code and obtained: after 2 epochs, an accuracy of 0.7 on the test (0.44 on the train) and after the 82 epochs 0.88 test/ 0.9 train.

What others obtain?

Cheers,
Marc

Hi Marc,

After 80 epochs, I got the following performance:

Train: 0.0032 Loss and 0.9 Accuracy
Test: 0.0035 Loss and 0.89 Accuracy

Best,
Hari

1 Like

Hi Marc,

I am wondering about the meaning of model.train() or model.train(True) and model.eval() or model.train(False), this is just a information that we begin to do trainning process or testing process, or they do something else actually ?

The reference of your code is here:

def train_model(model,dataloader,size,epochs=1,optimizer=None):

    model.train()
    ...

def test_model(model,dataloader,size):

    model.eval()
    ...

Thanks a lot and have a nice day,

Zheyu XIE

When you are training your network i.e. updating the parameters, you need to be in train mode. For the test, you just do inference (i.e. you are not modifying your network), you need to be in eval mode because some layer behaves differently in train and eval mode in particular batchnorm and dropout.

1 Like

Hi Marc,

Thank you for your response, I have another question, please. The goal of the normalisation is not quite clear for me: normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]). You explain that in the notebook:
As explained in the PyTorch doc, you will use a pretrained model. All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].
My questions are:

  1. This normalisation is required in the paper of VGG or for all models pre-trained by ImageNet dataset in PyTorch?
  2. And why we need to do the normalisation and load the data into a range of [0, 1], I mean, what problem can be avoided or what profit can we get by doing the normalisation?

Thanks a lot and have a nice day,

Zheyu XIE

  1. yes required for all pre-tained models on imagenet
  2. you need first to rescale your RGB channels to [0,1] and then normalize. The normalize function of pytorch is not invariant to scaling so this scaling is required.
    More details here

Hi Marc,

For vgg16, I obtained: after 2 epochs, an accuracy of 0.6855 on the test (0.4370 on the train) and after the 82 epochs 0.8809 on the test (0.9005 on the train).

But I have a question, please: after all, the model know the train set but not the set set, I am wondering about how to explain that the model can acheive a better performance on the test set than on the train set, like what we obtained after 2 epochs or in your video? Is it usual ?

Thank you very much and have a nice day,

Zheyu XIE

Hi Marc,

For the “change of neural network model” part, with resnet34, after 30 epochs, I got an accuracy of 0.7923 on the test (0.8274 on the train). But for the “speed up the process” part, with resnet34, after 30 epochs, I got an accuracy of 0.0591 on the test (0.7052 on the train), and after 80 epochs 0.0730 on the test (0.8565 on the train).

As you see, the accuracy on the test is quite small for the “speed up the process” part with respect to train directly resnet34 intial model, but I can’t find the problem of my code, could you help me, please?

My code:

model_resnet = models.resnet34(pretrained=True)
model_resnet = model_resnet.to(device)

def preconvfeat(dataloader):
    conv_features = []
    labels_list = []
    for data in dataloader:
        inputs,labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)
        
        model_resnet.fc = nn.Identity()
        x = model_resnet(inputs)
        conv_features.extend(x.data.cpu().numpy())
        labels_list.extend(labels.data.cpu().numpy())
    conv_features = np.concatenate([[feat] for feat in conv_features])
    return (conv_features,labels_list)

conv_feat_train,labels_train = preconvfeat(loader_train)
conv_feat_valid,labels_valid = preconvfeat(loader_valid)

dtype=torch.float
datasetfeat_train = [[torch.from_numpy(f).type(dtype),torch.tensor(l).type(torch.long)] for (f,l) in zip(conv_feat_train,labels_train)]
datasetfeat_train = [(inputs.reshape(-1), classes) for [inputs,classes] in datasetfeat_train]
loaderfeat_train = torch.utils.data.DataLoader(datasetfeat_train, batch_size=128, shuffle=True)

model_resnet_lsm = nn.Sequential(nn.Linear(512, 37), nn.LogSoftmax(dim=1))
model_resnet_lsm = model_resnet_lsm.to(device)
optimizer_resnet_lsm = torch.optim.SGD(model_resnet_lsm[0].parameters(),lr = lr)

train_model(model_resnet_lsm,dataloader=loaderfeat_train,size=dset_sizes['trainval'],epochs=80,optimizer=optimizer_resnet_lsm)

datasetfeat_valid = [[torch.from_numpy(f).type(dtype),torch.tensor(l).type(torch.long)] for (f,l) in zip(conv_feat_valid,labels_valid)]
datasetfeat_valid = [(inputs.reshape(-1), classes) for [inputs,classes] in datasetfeat_valid]
loaderfeat_valid = torch.utils.data.DataLoader(datasetfeat_valid, batch_size=128, shuffle=False)

predictions, all_proba, all_classes = test_model(model_resnet_lsm,dataloader=loaderfeat_valid,size=dset_sizes['test'])

Thanks a lot and have a nice day,

Zheyu XIE

Hi,
remember that your network is minimizing the loss and not maximizing accuracy. You should look at the loss more than the accuracy to understand what’s going on.
At the beginning of the training, your loss will decrease very rapidly and since you are averaging loss and accuracy over one epoch, you can get a high loss low accuracy because the loss is still high at the beginning of the epoch and decreasing. When you compute the loss on the validation loss, you can get better results because of this effect.

In the case, you mention after 82 epochs for the practicals 1, you are probably overfitting already a lot, you should look at your loss. How does it compare on the train and valid? If the loss is much lower on the train, then your network is overfitting and you should be careful about interpretation of your results…

1 Like

Hi,
in your preconvfeat function, before computing x, add model_resnet.eval()

It will probably fix your issue. The reason is resnet is using batchnorm layers and these layers are behaving differently in train and eval mode. To be sure your batchnorm layers are always doing the same computation, you need to be in eval mode.

1 Like

Hi Marc,

Thank you very much for your response. I have just added model_resnet.eval() like what you told me, and it works now!

For resnet34, part “speed up”, after 80 epochs, I got an accuracy of 0.8724 on the test (0.9035 on the train).

By the way, could you tell me what you mean “always doing the same computation” for batchnorm layers, please? Would you like to say that eval mode makes these layers do the initial behaving? If I not specify any mode, the behavior of thes layers will be random, so, different in every iteration in a for cycle?

Thanks a lot and have a nice day,

Zheyu XIE

Hi Marc,

Thank you very much for your response.

Could you tell me what you mean “always doing the same computation” for batchnorm layers, please? Would you like to say that eval mode makes these layers do the initial behaving? If I not specify any mode, the behavior of thes layers will be random, so, it will be different in every iteration in a for cycle?

Thanks a lot and have a nice day,

Zheyu XIE

The parameters of the batchnorm layer are fixed only in eval mode, in train mode, they vary among different batches. In the practicals, the resnet is used to precompute features, we do not want parameters to vary among batches.

1 Like

Hi Marc,

Thank you very much for your response. This is clear for me now, thanks a lot!

Thanks a lot and have a nice day,

Zheyu XIE

Hi Marc,

I have read your response again, but I am sorry that I am still confused about the underfitting and overfitting… I hope that you could help me, please.

  1. If we have a high loss, we will certainly have a low accuracy?
  2. If I understand well, you mean that at the beginning of the epochs, like the several first batches, the loss is high, but later, as we continuously update the parameters of our models, the batches can achieve better and better accuracy, and lower and lower loss, so when we make an average for one epoch, we can get a high loss low accuracy since the first batches give a big influence to the result? Is it underfitting?
  3. How much difference could we say that the loss is much lower on the train, I mean how can we measure it?
  4. If the model is overfitting or underfitting, there is no sense to interpret the result?

Thanks a lot and have a nice day,

Zheyu XIE

  1. If we have a high loss, we will certainly have a low accuracy?
    yes
  2. If I understand well, you mean that at the beginning of the epochs, like the several first batches, the loss is high, but later, as we continuously update the parameters of our models, the batches can achieve better and better accuracy, and lower and lower loss, so when we make an average for one epoch, we can get a high loss low accuracy since the first batches give a big influence to the result?
    yes
    Is it underfitting?
    not really, underfitting is when you have a too simplisitic model to fit your data. In the lesson, we looked at the case where you fit a polynomial of degree 3 with a line. In a deep learning setting, when you are stopping the learning very early, you also have a ‘simple’ model and you will underfit.
  3. How much difference could we say that the loss is much lower on the train, I mean how can we measure it?
    not sure to understand your question…
  4. If the model is overfitting or underfitting, there is no sense to interpret the result?
    well, you want to avoid these regimes!

Hi Marc,

Thank you very much for your response!

My question is concerned to your early response below, I mean that when can we say that the loss is much lower on the train (when the difference of the losses on trainset and testset is 0.01, 0.1, or 1)? For example, if the loss on the trainset is 0.8 and the loss on the testset is 0.9 (so the difference is 0.1), could we say that the loss is much lower on the trian, so it’s overfitting?

Another question, please:

Sorry Marc, I am not sure that I understand well your response about “regimes”, you would like to say that I had better not do underfitting neither overfitting?

Hope that my explanation is clear this time.

Thanks a lot Marc, and have a good night,

Zheyu XIE

Hi Marc,

I am still wondering why we do the normalization, but not use directly the raw data in the range of [0, 255]? Could you clarify it for me, please?

Thanks a lot and have a nice day,

Zheyu XIE