14.10. Transposed Convolution — Dive into Deep Learning 1.0.3 documentation
The CNN layers we have seen so far, such as convolutional layers
(Section 7.2) and pooling layers
(Section 7.5), typically reduce (downsample) the spatial
dimensions (height and width) of the input, or keep them unchanged. In
semantic segmentation that classifies at pixel-level, it will be
convenient if the spatial dimensions of the input and output are the
same. For example, the channel dimension at one output pixel can hold
the classification results for the input pixel at the same spatial
position.
https://d2l.ai/chapter_computer-vision/transposed-conv.html