Visual notes

The deep learning
blog.

Conceptual visual essays, tutorials, and deep dives into the mechanisms that make modern AI easier to understand.

Batch Normalization

How normalization, running statistics, gamma, and beta keep deep network activations easier to train.

Why skip connections help information and gradients move through deeper networks more reliably.

Why gradients can shrink across many layers or timesteps, and why long-range learning becomes difficult.

How words become vectors that can capture semantic similarity, direction, and reusable language structure.

How Query, Key, and Value vectors let a model decide which tokens matter most for the current context.

How residual addition and normalization stabilize transformer layers after attention or feed-forward steps.

Compare how sequence order, receptive field, parallelism, and context flow differ across the three families.

How pretrained features can be reused, adapted, and fine-tuned for a smaller or more specific target task.

Learn how Momentum gives gradient descent memory, reducing zig-zag movement and building speed in consistent directions.

Learn how RMSProp gives each parameter an adaptive step size by tracking recent squared gradients.