Share the same parameters across different locations (assume input stationary)
- stride
- padding
- dilation
- out_channels
- kernel_size
In practice usually zero pad the border
parameter 가 fully connected 나 locally connected 보다 훨씬 작아서
There are weight variables which are same as the size of filter including all that channels
Using small filter is better
1x1 is also valid because we can reduce or increase the Output Channel of feature map with different scaling
채널별 필터 총합이 최종필터
Commonly number of filters K is powers of 2
K가 늘면 지수적으로 연관 픽셀이 늘어나는게 아니라 선형적으로 가까운 거리로 늘어남
Multiple small sized filter layer’s parameters have much smaller than one large sized filter layer (also more nonlinearity, less compute)
1x1 으로 더 줄일 수 있는데 그게 Bottleneck layer
total parameter
CNN Stride
How many pixels to skip
P is padding size, N is input size, F is filter size and count is output size+