Module 01 / Generalization
Overfitting vs.
Underfitting
Learn how model complexity changes the fitted curve, training error, validation error, and generalization gap.
Explore
Interactive lesson
Model Diagnosis
Takeaway
Background
Background
A model is trained on data, but the real goal is not to memorize the training set. The real goal is to perform well on new, unseen data.
Underfitting happens when the model is too simple to capture the real pattern in the data. In this case, both training error and validation error stay high because the model cannot even learn the training set well.
Overfitting happens when the model is too flexible and starts fitting noise instead of the true signal. In this case, training error becomes very low, but validation error becomes high because the model does not generalize well.
The best model usually sits between these two extremes. It is flexible enough to learn the main pattern, but not so flexible that it memorizes random fluctuations.
The visualization shows this balance directly: model complexity changes ? fitted curve changes ? training and validation errors change ? generalization quality changes.
Important formulas
Important formulas
Generalization Gap
A small gap usually means the model generalizes better. A large gap often means the model is overfitting the training data.
Bias-Variance Decomposition
High bias usually means the model is too simple and underfits. High variance usually means the model is too sensitive to the training data and overfits. Noise is the part of the data that cannot be perfectly predicted.
Best Model Complexity
The best model complexity is usually where validation error is lowest, not where training error is lowest.
Regularized Training Objective
Regularization adds a penalty term to discourage overly complex or unstable solutions. This can reduce overfitting even if the training error becomes slightly higher.
Pros / Cons
Pros / Cons
Pros
- Clear diagnosis. Comparing training error and validation error helps identify whether the model is underfitting, overfitting, or generalizing well.
- Practical model selection. The validation curve helps choose a better model complexity instead of simply choosing the most powerful model.
- Connects many training ideas. Dataset size, regularization, model complexity, and validation performance can all be understood in one framework.
- Works across many models. The same idea applies to linear regression, neural networks, decision trees, and deep learning models.
Cons
- Validation results can be noisy. If the validation set is too small or poorly chosen, the validation error may give a misleading signal.
- The sweet spot is problem-dependent. There is no fixed model complexity that works best for every dataset.
- Training and validation error are not the whole story. A model can have good validation performance but still fail on real-world data if the data distribution changes.
- Regularization needs tuning. Too little regularization may not fix overfitting. Too much regularization may cause underfitting.
Example / Mistake
Quick example and common mistakes
Quick Example
Imagine fitting a polynomial curve to noisy one-dimensional data.
| Model | Training Error | Validation Error | Meaning |
|---|---|---|---|
| Degree 1 | High | High | Underfitting |
| Degree 5 | Medium-low | Low | Good fit |
| Degree 20 | Very low | High | Overfitting |
A degree-1 model may be too simple, so it misses the curved pattern. A degree-20 model may bend too much and pass close to almost every training point, including noise.
A moderate-degree model often gives the best validation error because it captures the main trend without memorizing every random fluctuation.
The goal is not to make the training error as small as possible. The goal is to make the validation error low while keeping the train-validation gap small.
Common Mistakes
- Only looking at training error. Low training error does not always mean the model is good. A model can perform extremely well on training data but poorly on new data.
- Thinking bigger models are always better. A more flexible model can learn more complex patterns, but it can also memorize noise more easily.
- Thinking regularization simply makes the model worse. Regularization may increase training error, but it can lower validation error by making the model less sensitive to noise.
- Confusing noise with signal. Overfitting happens when the model treats random fluctuations in the training data as if they were meaningful patterns.
- Choosing the model with the lowest training error. The best model is usually selected by validation performance, not training performance. \(c^* = \arg\min_c E_{\mathrm{val}}(c)\)
Takeaway
Takeaway
Underfitting means the model is too simple to learn the real pattern.
Overfitting means the model is too sensitive to the training data and starts memorizing noise.
A good fit balances both sides: the validation error is low, and the gap between training error and validation error stays small.