Why we use Unsqueeze() function while image processing?

2021-08-17

An answer to this question on Stack Overflow.

Question

I was trying to work on a guided project and it was related to image processing. While working on the image processing the instructor used Unsqueeze(0) function for setting up the bed size. I would like to know what happens after changing the bed size. The code is given below for your reference.

I will be very thankfull for a quick response.

from torchvision import transforms as T
def preprocess(img_path,max_size = 500):
  image = Image.open(img_path).convert('RGB')
  if max(image.size) > max_size:
    size = max_size
  else:
    size = max(image.size)
  img_transform = T.Compose([
                             T.Resize(size),
                             T.ToTensor(),
                             T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
  ])
  image = img_transform(image)
  image = image.unsqueeze(0)
  return image

Answer

After this line:

image = Image.open(img_path).convert('RGB')

image is, possibly, a 3D matrix of some sort. One way that information might be laid out is with dimensions [Channel, Row, Intensity], so you have:

an R matrix containing many rows each of which contains the intensity values of the Red channel
a G matrix containing many rows each of which contains the intensity values of the Green channel
a B matrix containing many rows each of which contains the intensity values of the Blue channel

Now, in machine learning, when we are training a model we are very rarely interested in having only one example. We training on batches of examples. A batch is simply a set of images stacked on top of each other, so we need to go from: [Channel, Row, Intensity] to [Batch, Channel, Row, Intensity].

This is what the unsqueeze(0) does, it adds a new, zeroth dimension that is used to make the images stackable.