inception_v3
requires an input of (299, 299)
while other models requires an input of (224, 224).
Due to adaptive pooling used in some models,
they can run on varying sized intput without throwing errors
(but the results are usually not correct).
You have to resize/crop an image to be the right input size
(and then other necessary transformations, e.g., to_tensor
and Normalize
)
before feeding it to a pretrained model.
FINETUNING TORCHVISION MODELS has a very detailed tutorial on how to finetune pretrained models in torchvision.
TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL
Building your own object detector — PyTorch vs TensorFlow and how to even get started?
In [ ]: