Module 3 · ML Strategy

Train / Val / Test Split
Data Partitioning

This topic is not only about percentages. It is about assigning different jobs to each split and making sure your dev and test sets represent the future data you actually care about. Use the controls to see how a split can look good on paper but still fail in the real world.

Train learn parameters
Val / Dev choose ideas
Test final unbiased check
Core risk wrong target or test leakage
Key concepts
What to watch for
Different jobs Train learns, dev compares experiments, and test should stay locked until the very end.
Future-user match matters A dev/test set that looks clean but does not reflect future users can push you toward the wrong model.
Do not tune on test If you repeatedly look at test while iterating, your final number becomes less trustworthy.

Controls

Pick a preset, adjust each split's source mix, change the data scale, and simulate repeated development cycles.

Quick presets
Jump between the main teaching cases without adjusting everything manually.
Data scale
Big data
This changes the overall split sizes, not the source mix inside each split.
Train app data
20%
How much of the training set comes from real mobile-app style data instead of clean web images.
Dev / Val app data
0%
This split should represent the data you care about when choosing models.
Test app data
0%
Test should usually match the same future distribution as dev, but stay unopened during iteration.
Development cycles
6
Higher means you tried more ideas, tuned more settings, and pushed harder on dev performance.
Test usage
Leave this off for a proper final check. Turn it on to see confidence collapse.
Split builder
Future users vs. what each split actually contains
Future users
90% app
Web images — cleaner, easier, more professional
App images — blurrier, noisier, more realistic
Train
98%
Dev
1%
Test
1%
Role of each split
This is why train / dev / test are not interchangeable
Train
Learn parameters
Use this split for gradient updates and fitting weights. More training data usually helps, but its distribution does not define the target by itself.
Dev / Val
Choose ideas
Use this split to compare architectures, hyperparameters, and data strategies. This split should reflect future data you care about.
Test
Final honest check
Open this only at the end. If you keep using it during iteration, the final score stops being an unbiased estimate.
Development timeline
More cycles mean more tuning pressure on the split you are watching
Healthy workflow: iterate on dev, then check test once at the end.
Unhealthy workflow: keep peeking at test and slowly overfit your project decisions to it.