Torchvision.transforms.Normalize(mean, std)
Pixel values normally range between 0 and 255. CNNs are trained using Gradient Descent techniques and these large values require large gradient updates that may take a while to converge. Potentially these can even lead to issues like vanishing/exploding gradients.