Interactive prototype

Pooling and Downsampling

Compare max pooling and average pooling as a window slides across a feature map and reduces spatial size.

Structured teaching notes

Connect the interaction to the core idea.

These notes are written to sit below the interactive prototype, preserve the same teaching flow, and help the learner name what the visualization is showing.

Background

Pooling is a local summarization step, not a local pattern-learning step. Your prototype already teaches this well: the learner sees a small window move across an existing feature map, then sees a single summarized value written into a smaller output map. That visual contrast with convolution is the key lesson. Convolution learns a pattern by using trainable weights; pooling compresses an existing map by applying a fixed rule such as maximum or average.

The output becomes spatially smaller, so later layers work with fewer positions and a broader effective view. In this prototype, the beginner should notice the current window, the mapped output cell, the compression ratio, and the fact that the strongest local signal is preserved in max pooling. The supporting text should therefore emphasize spatial compression, local summary, and the absence of learned parameters.

Important formulas

Max pooling(x) = max(window)

Max pooling keeps the strongest activation inside the current local window.

Average pooling(x) = mean(window)

Average pooling keeps the mean of the values inside the current local window.

Shrink factor depends on window size F and stride S

The pooling window size F and stride S determine how much the feature map shrinks along each side.

Pros

Reduces spatial size, which lowers memory use and later computation.
Preserves strong local evidence in max pooling, which often helps later layers focus on salient signals.
Introduces some tolerance to small spatial shifts because nearby values are summarized together.
Has no learned weights, so its behavior is simple, fixed, and easy to explain visually.

Cons

Throws away spatial detail, which can hurt tasks that need precise localization.
Can be too aggressive if the window or stride is large, making the output lose useful structure.
Average pooling may weaken sharp local signals, while max pooling may ignore moderate but still meaningful responses.
Because pooling is fixed, it cannot adapt its summarization rule to the data the way convolution can.

Quick example

A 6x6 map pooled with a 2x2 window and stride 2 becomes a 3x3 map. In max pooling, each output cell is simply the largest value from its corresponding 2x2 region, so the map becomes smaller while keeping the strongest local activation in each patch.

Common mistake

A common mistake is to describe pooling as if it were learning a new detector. It is not learning a new pattern; it is compressing an existing map. Another mistake is to forget that pooling works channel by channel, so it shrinks height and width but usually keeps the number of channels the same.