Why we use Unsqueeze() function while image processing?
An answer to this question on Stack Overflow.
Question
I was trying to work on a guided project and it was related to image processing. While working on the image processing the instructor used Unsqueeze(0) function for setting up the bed size. I would like to know what happens after changing the bed size. The code is given below for your reference.
I will be very thankfull for a quick response.
from torchvision import transforms as T
def preprocess(img_path,max_size = 500):
image = Image.open(img_path).convert('RGB')
if max(image.size) > max_size:
size = max_size
else:
size = max(image.size)
img_transform = T.Compose([
T.Resize(size),
T.ToTensor(),
T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])
image = img_transform(image)
image = image.unsqueeze(0)
return image
Answer
After this line:
image = Image.open(img_path).convert('RGB')
image is, possibly, a 3D matrix of some sort. One way that information might be laid out is with dimensions [Channel, Row, Intensity], so you have:
- an R matrix containing many rows each of which contains the intensity values of the Red channel
- a G matrix containing many rows each of which contains the intensity values of the Green channel
- a B matrix containing many rows each of which contains the intensity values of the Blue channel
Now, in machine learning, when we are training a model we are very rarely interested in having only one example. We training on batches of examples. A batch is simply a set of images stacked on top of each other, so we need to go from: [Channel, Row, Intensity] to [Batch, Channel, Row, Intensity].
This is what the unsqueeze(0) does, it adds a new, zeroth dimension that is used to make the images stackable.