Overfitting vs. Underfitting | Topic Design Preview

Explore

Interactive lesson

Model Complexity (Degree)

Degree 4

Regularization (Lambda)

0.08

Dataset Noise

1.2

Training Data Points

22 pts

Quick presets

Show true function

Show validation gap

Polynomial Fit on Training Data

Training vs. Validation Error Landscape

Model Diagnosis

Good fit

Train Error

0.00

Val Error

0.00

Bias 0.0 / 5

Variance 0.0 / 5

Takeaway

You are currently viewing a Good fit scenario. Look at the Train and Validation errors—the gap between them tells you how well the model generalizes to new data.

Background

A model is trained on data, but the real goal is not to memorize the training set. The real goal is to perform well on new, unseen data.

Underfitting happens when the model is too simple to capture the real pattern in the data. In this case, both training error and validation error stay high because the model cannot even learn the training set well.

Overfitting happens when the model is too flexible and starts fitting noise instead of the true signal. In this case, training error becomes very low, but validation error becomes high because the model does not generalize well.

The best model usually sits between these two extremes. It is flexible enough to learn the main pattern, but not so flexible that it memorizes random fluctuations.

The visualization shows this balance directly: model complexity changes ? fitted curve changes ? training and validation errors change ? generalization quality changes.

Important formulas

Generalization Gap

\[ \mathrm{Gap} = E_{\mathrm{val}} - E_{\mathrm{train}} \]

A small gap usually means the model generalizes better. A large gap often means the model is overfitting the training data.

Bias-Variance Decomposition

\[ E_{\mathrm{expected}} = \mathrm{Bias}^2 + \mathrm{Variance} + \mathrm{Noise} \]

High bias usually means the model is too simple and underfits. High variance usually means the model is too sensitive to the training data and overfits. Noise is the part of the data that cannot be perfectly predicted.

Best Model Complexity

\[ c^* = \arg\min_c E_{\mathrm{val}}(c) \]

The best model complexity is usually where validation error is lowest, not where training error is lowest.

Regularized Training Objective

\[ L_{\mathrm{total}} = L_{\mathrm{train}} + \lambda \Omega(\theta) \]

Regularization adds a penalty term to discourage overly complex or unstable solutions. This can reduce overfitting even if the training error becomes slightly higher.

Pros / Cons

Pros

Clear diagnosis. Comparing training error and validation error helps identify whether the model is underfitting, overfitting, or generalizing well.
Practical model selection. The validation curve helps choose a better model complexity instead of simply choosing the most powerful model.
Connects many training ideas. Dataset size, regularization, model complexity, and validation performance can all be understood in one framework.
Works across many models. The same idea applies to linear regression, neural networks, decision trees, and deep learning models.

Cons

Validation results can be noisy. If the validation set is too small or poorly chosen, the validation error may give a misleading signal.
The sweet spot is problem-dependent. There is no fixed model complexity that works best for every dataset.
Training and validation error are not the whole story. A model can have good validation performance but still fail on real-world data if the data distribution changes.
Regularization needs tuning. Too little regularization may not fix overfitting. Too much regularization may cause underfitting.

Example / Mistake

Quick example and common mistakes

Quick Example

Imagine fitting a polynomial curve to noisy one-dimensional data.

Model	Training Error	Validation Error	Meaning
Degree 1	High	High	Underfitting
Degree 5	Medium-low	Low	Good fit
Degree 20	Very low	High	Overfitting

A degree-1 model may be too simple, so it misses the curved pattern. A degree-20 model may bend too much and pass close to almost every training point, including noise.

A moderate-degree model often gives the best validation error because it captures the main trend without memorizing every random fluctuation.

The goal is not to make the training error as small as possible. The goal is to make the validation error low while keeping the train-validation gap small.

Common Mistakes

Only looking at training error. Low training error does not always mean the model is good. A model can perform extremely well on training data but poorly on new data.
Thinking bigger models are always better. A more flexible model can learn more complex patterns, but it can also memorize noise more easily.
Thinking regularization simply makes the model worse. Regularization may increase training error, but it can lower validation error by making the model less sensitive to noise.
Confusing noise with signal. Overfitting happens when the model treats random fluctuations in the training data as if they were meaningful patterns.
Choosing the model with the lowest training error. The best model is usually selected by validation performance, not training performance. \(c^* = \arg\min_c E_{\mathrm{val}}(c)\)

Takeaway

Underfitting means the model is too simple to learn the real pattern.

Overfitting means the model is too sensitive to the training data and starts memorizing noise.

A good fit balances both sides: the validation error is low, and the gap between training error and validation error stays small.

Overfitting vs.Underfitting

Interactive lesson

Model Diagnosis

Takeaway

Background

Important formulas

Generalization Gap

Bias-Variance Decomposition

Best Model Complexity

Regularized Training Objective

Pros / Cons

Pros

Cons

Quick example and common mistakes

Quick Example

Common Mistakes

Takeaway

Overfitting vs.
Underfitting