Convolutional Layer

Parameter Sharing Across Spatial Locations

Convolutional layers share parameters across different spatial locations, assuming input stationarity. This approach significantly reduces the number of parameters compared to fully connected or locally connected layers.

Key Components

Stride: Determines the step size between filter applications

Padding: Usually zero-padding at borders to maintain spatial dimensions

Dilation: Controls the spacing between kernel elements

Output Channels: Number of feature maps produced

Kernel Size: Spatial dimensions of the convolutional filter

Filter Design Principles

Weight variables correspond to filter dimensions across all channels. Smaller filters are generally preferred for several reasons:

1×1 convolutions effectively adjust feature map channels through scaling

Filter combinations across channels produce the final output

Number of filters (K) typically follows powers of 2

Increasing K results in linear growth of receptive field rather than exponential

Small vs Large Filters

Multiple layers of small filters (e.g., 3×3) are preferred over single layers with large filters because they:

Require fewer parameters

Introduce more non-linearity

Reduce computational cost

Following VGG's success, 3×3 filters became standard practice, replacing larger filters like AlexNet's 11×11 or 7×7. The 1×1 convolutions are used in Bottleneck layers for dimensionality reduction.

Parameter Calculation

filter \; size ^2 +1\;for\;bias

Output Size Calculation

Stride determines how many pixels to skip between filter applications.

Count = (N + 2P - F) / stride + 1

Where P is padding size, N is input size, F is filter size, and count is output size