inception_v3 requires an input of (299, 299) while other models requires an input of (224, 224). Due to adaptive pooling used in some models, they can run on varying sized intput without throwing errors (but the results are usually not correct). You have to resize/crop an image to be the right input size (and then other necessary transformations, e.g., to_tensor and Normalize) before feeding it to a pretrained model.

FINETUNING TORCHVISION MODELS has a very detailed tutorial on how to finetune pretrained models in torchvision.