Visual notes

The deep learning
blog.

Conceptual visual essays, tutorials, and deep dives into the mechanisms that make modern AI easier to understand.

Deep DiveTraining stability

Batch Normalization

How normalization, running statistics, gamma, and beta keep deep network activations easier to train.

Read note
Deep DiveCNN architecture

Residual Block / ResNet Intuition

Why skip connections help information and gradients move through deeper networks more reliably.

Read note
Deep DiveSequence limits

Vanishing Gradient Problem

Why gradients can shrink across many layers or timesteps, and why long-range learning becomes difficult.

Read note
Deep DiveRepresentation

Word Embeddings

How words become vectors that can capture semantic similarity, direction, and reusable language structure.

Read note
Deep DiveAttention

Attention Mechanism

How Query, Key, and Value vectors let a model decide which tokens matter most for the current context.

Read note
Deep DiveTransformer block

Add & Norm

How residual addition and normalization stabilize transformer layers after attention or feed-forward steps.

Read note
Deep DiveArchitecture comparison

Self-Attention vs RNN vs CNN

Compare how sequence order, receptive field, parallelism, and context flow differ across the three families.

Read note
Deep DiveModel reuse

Transfer Learning

How pretrained features can be reused, adapted, and fine-tuned for a smaller or more specific target task.

Read note
Deep DiveOptimization

Momentum

Learn how Momentum gives gradient descent memory, reducing zig-zag movement and building speed in consistent directions.

Read note
Deep DiveOptimization

RMSProp

Learn how RMSProp gives each parameter an adaptive step size by tracking recent squared gradients.

Read note