Parameter Sharing Across Spatial Locations
Convolutional layers share parameters across different spatial locations, assuming input stationarity. This approach significantly reduces the number of parameters compared to fully connected or locally connected layers.
Key Components
- Stride: Determines the step size between filter applications
- Padding: Usually zero-padding at borders to maintain spatial dimensions
- Dilation: Controls the spacing between kernel elements
- Output Channels: Number of feature maps produced
- Kernel Size: Spatial dimensions of the convolutional filter
Filter Design Principles
Weight variables correspond to filter dimensions across all channels. Smaller filters are generally preferred for several reasons:
- 1×1 convolutions effectively adjust feature map channels through scaling
- Filter combinations across channels produce the final output
- Number of filters (K) typically follows powers of 2
- Increasing K results in linear growth of receptive field rather than exponential
Small vs Large Filters
Multiple layers of small filters (e.g., 3×3) are preferred over single layers with large filters because they:
- Require fewer parameters
- Introduce more non-linearity
- Reduce computational cost
Following VGG's success, 3×3 filters became standard practice, replacing larger filters like AlexNet's 11×11 or 7×7. The 1×1 convolutions are used in Bottleneck layers for dimensionality reduction.
Parameter Calculation
Output Size Calculation
Stride determines how many pixels to skip between filter applications.
Where P is padding size, N is input size, F is filter size, and count is output size