All blog topics
Model reuse

Transfer Learning

How pretrained features can be reused, adapted, and fine-tuned for a smaller or more specific target task.

Large pretrained model reused for a smaller new task
Image 1: We do not start from zero.

Background

Training a deep model from scratch usually needs a large labeled dataset, strong compute, and many rounds of tuning.

Many real projects only have a small amount of labeled data, so starting from random weights can be slow and unreliable.

Transfer learning exists because knowledge learned from one large task can often help another related task. Instead of teaching a model from zero, we start with a pretrained model and adapt it to the new problem.

Idea

A pretrained model has already learned useful patterns from a large dataset. We reuse most of that model as a feature extractor, then replace the old prediction layer with a new head for our task.

At first, we often freeze the earlier layers so their weights do not change. Then we train only the new head, because it needs to learn the new labels.

Later, we may fine-tune some deeper layers carefully if the new task needs more adaptation. The mental model is simple: reuse general knowledge, learn only what is new.

Frozen pretrained backbone with a new trainable head
Image 2: Reuse general features, learn task-specific output.

Formula

\[h_{\text{pretrained}}(x)=\text{features}\]

The pretrained model turns the input into reusable features.

\[\hat{y}=g_{\text{new}}\left(h_{\text{pretrained}}(x)\right)\]

The new head maps pretrained features to the new task prediction.

\[\theta=[\theta_{\text{frozen}},\ \theta_{\text{trainable}}]\]

Frozen parameters stay fixed, while trainable parameters are updated for the new task.

Symbols

  1. x: input example, such as an image.
  2. h_pretrained: pretrained feature extractor.
  3. g_new: new task-specific head.
  4. y-hat: prediction for the new task.
  5. theta_frozen: parameters kept fixed during training.
  6. theta_trainable: parameters updated for the new task.

Example

Suppose a beginner wants to build a cat vs dog classifier, but only has 500 labeled images. Training a CNN from scratch with only 500 images may cause the model to memorize the training set instead of learning useful visual patterns.

A better starting point is to use a model pretrained on ImageNet. The model has already seen millions of images and learned general visual features such as edges, textures, fur-like shapes, and object parts.

The old ImageNet head predicts 1,000 classes, so we remove it. We keep the convolutional backbone and add a new binary classification head that predicts only cat or dog.

At first, the backbone is frozen and only the new head is trained. Later, we may unfreeze the last few blocks and fine-tune them with a small learning rate.

Transfer learning workflow
Image 3: Transfer learning is a practical training pipeline.

Workflow

  1. Choose a pretrained model trained on a large dataset.
  2. Remove or ignore the old task head.
  3. Keep the pretrained backbone as a feature extractor.
  4. Add a new head for the new task.
  5. Freeze the backbone first to protect useful pretrained features.
  6. Train the new head.
  7. Fine-tune selected layers if the task needs more adaptation.
  8. Evaluate on validation or test data.

Pros

  1. Needs less labeled data.
  2. Trains faster than starting from scratch.
  3. Often works better on small datasets.
  4. Reuses general visual features.
  5. Saves compute.

Cons

  1. Works best when source and target tasks are related.
  2. Can fail when the new domain is very different.
  3. Fine-tuning too much can overfit.
  4. Freezing too much can underfit.
  5. Pretrained models may carry bias.

Takeaway

Transfer learning is about reusing useful knowledge instead of starting from zero.

The model keeps general features from a pretrained backbone and learns a new task-specific head. It is especially useful when the new dataset is small but related to what the pretrained model has already learned.