Finding the Optimal Number of Scores for an Average-Based Handicap
When golfers are first asked their handicap, the common response is to say they average a certain score — for example:
“I usually shoot 90 for 18 holes, so subtract par 72 and that gives me a handicap of 18.”

In their mind, that makes perfect sense. Golfers instinctively understand that handicaps should reflect what they typically score, not just what they shoot on their best day.
But that leads to an important question:
If handicaps are based on averages…
how many scores should be included in the average?
Too few scores and the handicap becomes overly reactive, swinging wildly from round to round.
Too many scores and the handicap becomes diluted, slow to recognize improvement or decline.
Somewhere in the middle lies an optimal balance between:
- accuracy,
- precision,
- responsiveness,
- and fairness.
So we decided to test it.
The Test
Using a large volume of league scores, we generated handicap calculations based on averaging the golfer’s last:
- 1 score
- 2 scores
- 3 scores
- …
- all the way to 20 scores
We then measured:
- Mean → bias relative to par
- Standard Deviation → consistency and precision
- Absolute Difference → closeness to actual outcomes (accuracy)
The lower the Standard Deviation and Absolute Difference, the better the formula performs.
The testing was separated into:
- Men’s 18-hole
- Men’s 9-hole
- Women’s 18-hole
- Women’s 9-hole
The results were remarkably consistent across all four datasets.


The Results
Across all four groups, the same overall pattern emerged:
- Very small score samples (1–3 rounds) produced highly volatile handicaps.
- As additional scores were included, variability dropped and predictions tightened.
- Around the middle range, performance stabilized.
- Beyond that point, additional scores produced diminishing returns while responsiveness began to decline.
The strongest overall performance consistently clustered around:
13 scores
That consistency across:
- genders,
- hole formats,
- and golfer populations
is important.
It strongly suggests this is not coincidence, but a natural balance point between randomness and stability.
By the time the calculation reached roughly 13 rounds:
- Mean remained close to neutral,
- Standard deviation flattened,
- Absolute difference stabilized,
- and additional scores added very little predictive benefit.
In short:
13 scores appeared to provide the best balance between stability and responsiveness.
Why 13 Makes Sense
What makes this especially interesting is that the number 13 appears repeatedly in other predictive industries.
Not because 13 is magical — but because many systems naturally converge around an optimal balance between:
- signal,
- noise,
- recency,
- and stability
when using roughly that amount of historical information.
Examples include:
Finance
Financial trend models frequently rely on:
- 12–14 period moving averages,
- 13-week trend windows,
- and quarterly rolling models
to smooth volatility while remaining responsive to change.
Insurance & Actuarial Modeling
Risk models seek enough historical observations to stabilize prediction without overweighting outdated behavior.
Sports Analytics
Player projection systems often use rolling windows in this range because:
- too little data overreacts,
- too much data ignores trend.
Machine Learning & Time-Series Forecasting
Predictive systems repeatedly discover that:
- recent behavior matters most,
- but enough history is needed to separate randomness from true pattern.
Golf appears to behave similarly.
Why Not Use More Scores?
This is where many handicap systems begin to struggle.
Using too many scores creates what is better described as:
score dilution.
As more and more rounds are added into the calculation, each individual score carries less weight. The result is a handicap that becomes increasingly anchored to long-term history instead of current reality.
Older rounds continue influencing the handicap long after they stop representing the golfer:
- old injuries,
- old swing mechanics,
- previous skill levels,
- different playing frequency,
- or even entirely different competitive environments.
The problem isn’t merely slow reaction.
The problem is dilution of meaningful change.
A golfer improves…
but the handicap remains inflated by older poor rounds.
Or the golfer declines…
and the handicap stays artificially low because stronger historical scores continue pulling it down.
Either way, the handicap stops reflecting who the golfer is now.
That’s why the optimal number of scores matters:
- too few creates volatility,
- too many creates dilution,
- and somewhere in the middle lies the balance between responsiveness and stability.
In our testing, that balance consistently landed around:
13 scores.
Why This Matters
Most golfers instinctively think about handicaps as averages — and they’re not wrong.
But averages only work well when:
- the playing environment remains reasonably consistent,
- enough rounds smooth randomness,
- while not including so many that current ability becomes diluted.
That’s especially true in contained environments like golf leagues, where:
- golfers often play the same course,
- rotate between familiar tees,
- compete under similar conditions,
- and maintain consistent frequency of play.
In those settings, average-based handicapping can work surprisingly well — provided the number of scores used is optimized correctly.
Our testing suggests that balance point is:
13 scores.
Not 3.
Not 20.
Not “best 8 of last 20.”
About 13.
Final Thought
Golf handicapping has spent decades trying to estimate future performance from past scores.
The interesting part is this:
Even before AI enters the discussion, the data itself quietly points toward balance, trend, and prediction — not simply raw averaging.
And in our testing, that balance consistently landed on:
13 scores.
Not because 13 is special, but because it appears to represent the point where:
- randomness becomes smoothed,
- trends still remain visible,
- and the handicap best reflects who the golfer actually is right now.
September 5, 2025
Stu Healey, President