[Validation] Your Backtest Drawdown Might Just Be Luck — Block Bootstrap

A realized equity curve is just one of countless possible histories. Strip away the luck of ordering and see the real risk — block bootstrap, walked through with a 12-day toy example.

One History Happened — Is That the Truth?

In my last post, I said my portfolio’s 5-year max drawdown (MDD) was −8.8%. But should I take that number at face value?

Here’s the problem. −8.8% is just one path that actually happened, once. What order the good days and bad days arrived in is largely a matter of luck. If a handful of bad days had happened to scatter, the drawdown would have been shallow; if they had happened to cluster, it would have been far deeper.

In other words, −8.8% might be a path that got lucky. I didn’t want to lean on a single lucky outcome for important decisions — like whether to crank up leverage.

 

How to Peek at Histories That Didn’t Happen

The tool we use here is the bootstrap. The idea is surprisingly simple.

Using the actual return data as raw material, generate thousands of imaginary histories that “could have happened but didn’t,” and look at the distribution of the outcomes.

The most naive approach is to drop each day into a hat and redraw at random. But there’s a trap hiding right here.

 

The Trap — Shuffle Day by Day and Risk Evaporates

Market crashes don’t wrap up after one bad day. Bad days come in clusters (so-called volatility clustering). But shuffle day by day and that “worst week” gets shredded and scattered among up days. The crash gets diluted.

Let’s see this with a toy 12-day return series. Sitting in the middle is a textbook crash window where −5%, −6%, −4% hit in a row.

Day123456789101112
Return+2+1+3−1+2−5−6−4+3+2+1+2

The original max drawdown is −14.3% (because −5/−6/−4 land in quick succession right after the peak).

Now keep the same 12 numbers and change only how we shuffle them.

MethodMDD of one sample pathWhat happened
Original (the history that actually happened)−14.3%−5/−6/−4 stacked into a deep trough
Shuffle one day at a timearound −7%The crash gets scattered among up days, shrunk by half ❌
Shuffle in 3-day 'blocks'around −10 to −14%The crash chunk stays intact ✅

Shuffle day by day and a +3% slips in after a −6%, absorbing the shock. The result: risk gets underestimated by half. But shuffle in chunks (blocks) and −5/−6/−4 travel together, so deep drawdowns reappear at realistic frequencies.

 

Block Bootstrap — One-Line Definition

Cut the return time series into fixed-length blocks (preserving the clustering of crashes), glue those blocks back together at random to build thousands of imaginary histories, then look at the distribution of the metric you care about (here, drawdown).

The block length is set to match “how long a bad stretch typically lasts.” I use roughly 20 days. Too short and the clusters break (back to day-by-day shuffling); too long and the paths stop diversifying.

 

Running It on My Actual Portfolio

I resurrected my equity curve as 5,000 imaginary histories, then lined up each one’s max drawdown.

BenchmarkValueMeaning
MDD that actually happened−8.8%A single path with luck baked in
Block bootstrap 5th percentile (p5)−11.7%95% of imaginary histories were shallower than this; only the unlucky 5% went deeper

The reading goes like this: “−8.8% was a slightly lucky number; realistically, the drawdown to brace for is closer to −11.7%.” So when I size up risk, I treat p5 (−11.7%) as my baseline — not the realized value (−8.8%). Crank leverage to 1.5× and that brace line grows roughly 1.5× too (somewhere around −18%) — and whether you can stomach that becomes the real question.

 

Takeaways

  1. A single backtest curve is just one snapshot from “the histories that could have been.” Trust that one drawdown as your hard limit and you’re betting on a path that got lucky.
  2. Shuffle in chunks, not in single days. Break the clustering of crashes and risk vanishes as if by magic — exactly when it matters most.
  3. Brace yourself against the tail of the distribution (p5), not the realized value. It’s a surprisingly cheap one-liner of a check that separates “luck” from “skill.”

※ This post is meant to share a validation methodology; specific signals, tickers, and sizing are not disclosed. It is not investment advice.

댓글