I am using a ResNet152 model from PyTorch. I'd like to strip off the last FC layer from the model. Here's my code:
from torchvision import datasets, transforms, models
model = models.resnet152(pretrained=True)
print(model)
When I print the model, the last few lines look like this:
(2): Bottleneck((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace)))(avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
I want to remove that last fc layer from the model.
I found an answer here on SO (How to convert pretrained FC layers to CONV layers in Pytorch), where mexmex seems to provide the answer I'm looking for:
list(model.modules()) # to inspect the modules of your model
my_model = nn.Sequential(*list(model.modules())[:-1]) # strips off last linear layer
So I added those lines to my code like this:
model = models.resnet152(pretrained=True)
list(model.modules()) # to inspect the modules of your model
my_model = nn.Sequential(*list(model.modules())[:-1]) # strips off last linear layer
print(my_model)
But this code doesn't work as advertised -- as least not for me. The rest of this post is a detailed explanation of why that answer doesn't work so this question doesn't get closed as a duplicate.
First, the printed model is nearly 5x larger than before. I see the same model as before, but followed by what appears to be a repeat of the model, but perhaps flattened.
(2): Bottleneck((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(relu): ReLU(inplace)))(avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
(1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
(4): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(5): Sequential(. . . this goes on for ~1600 more lines . . .(415): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(416): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(417): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(418): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)(419): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(420): ReLU(inplace)(421): AvgPool2d(kernel_size=7, stride=1, padding=0)
)
Second, the fc layer is still there -- and the Conv2D layer after it looks just like the first layer of ResNet152.
Third, if I try to invoke my_model.forward()
, pytorch complains about a size mismatch. It expects size [1, 3, 224, 224], but the input was [1, 1000]. So it looks like a copy of the entire model (minus the fc layer) is getting appended to the original model.
Bottom line, the only answer I found on SO doesn't actually work.