- Deep Reinforcement Learning for Financial Trading Enhanced by Cluster Embedding and Zero-Shot Prediction
- Markov and Hidden Markov Models for Regime Detection in Cryptocurrency Markets: Evidence from Bitcoin (2024–2026)
- Regime-Aware LightGBM for Stock Market Forecasting: A Validated Walk-Forward Framework with Statistical Rigor and Explainable AI Analysis
I read three interesting papers and tested whether bolting their ideas onto the crypto bot I’m running right now would actually make it better. The punchline first: I ran 26 variants and not a single one passed.
1. What the papers said
All three shared the same theme: “Use a model to figure out what state (regime) the market is in, then adjust your trade sizing accordingly.”
- Pagliaro 2026 — Don’t throttle everything at once; pick out only the strategies that genuinely struggle in that regime and throttle just those
- Markov HMM BTC — Use external information like volume to catch state transitions faster (NHHMM)
- DRL + Cluster Embedding — Combine reinforcement learning with future prediction for a smarter state representation
2. How I tested it
I simulated layering a regime throttle on top of the portfolio (10 strategies) I’m currently running live.
- 6 timeframes: 5m / 15m / 1h / 4h / 12h / 1d
- 2 HMM training approaches: offline (one-shot training) / rolling (periodic retraining using only past data)
- 3 throttle modes: none / blanket half-size / selective
- 3 periods: old OOS2 (2021–22) / IS (2023) / the real holdout OOS (2024–26)
The non-negotiable rule: parameter selection happens only on OOS2 + IS, and OOS stays untouched as “a future I’ve never seen.” Break this and you’re just memorizing past patterns.
3. The result — 0/26
DirectionOOS Calmar (selective)OOS Calmar (do nothing)DiffA. Pagliaro selective4.577.37**-2.80B. NHHMM4.577.37-2.80C. enriched 7-feature4.067.37-3.31**
- Calmar = CAGR / |max drawdown|. Higher is better.
The most promising-looking candidate was 1h rolling, which posted a Calmar of 22.5 on the older data (2021–23). But when I measured it on the unseen future (2024–26), it dropped to 4.57. It had learned the noise in the past, not a real signal.
4. Why it didn’t work
When I sat with it, the reasons were clear.
- The bot is already too well-diversified. Six strategies, each with a different mechanism — pairs / funding / trend / intraday RSI / breakout. There’s no shared vulnerability that a single volatility regime can throttle in one stroke.
- The cost of throttling > the protection it buys. “Cut size in half when volatility is high” — sure, the bad stretches are softened, but the good stretches get cut in half too, and cumulative returns end up worse. Even a flat 0.5× throttle cost -1.66 in Calmar.
- The post-2024 market is a different animal (overfitting). ETF approval, the halving, AI-coin rotation — the “this strategy is weak in this regime” patterns learned from 2021–23 have shifted into different patterns since 2024.
5. Takeaways
Two things got reconfirmed.
- The OOS holdout is genuinely sacred. “It looked good on the old data” means almost nothing. It has to survive on a future you’ve never peeked at — that’s the only test that counts.
- The hardest thing is adding something to a baseline that’s already strong. Slap a filter on a weak strategy and it’s easy to improve; add a throttle to a system that’s already running well and you almost always just accumulate cost.
6. So What?
I’m closing the regime-throttle direction. If I were to try again, it would have to be through a different mechanism — for example, using signals from another asset class (outside crypto) to modulate sizing, or just adding an entirely new sleeve. That would actually be worth doing.
The infrastructure I built (the 4-hour capital base engine, the 6-timeframe HMM code) can be reused as-is for the next hypothesis, which is a small consolation — the time wasn’t a total write-off.
댓글