Started Derby Day 2026. Picked the winner (Golden Tempo, 25-1). Net +$435 on an $85 ticket. Now generalized for any race, drop in a config + parsed PPs and the same pipeline runs.
๐ฏ What this is¶
A weighted softmax handicap and a Kelly-style portfolio builder. Parse the PPs, score the field, strip takeout from live odds, find horses where our number says the public is wrong, bet the overlays. Pure Python, one TOML config per race.
flowchart LR
A[๐ PP files] --> B[๐งฎ Handicap engine]
C[๐ฐ Live odds] --> B
B --> D[๐ Overlays]
D --> E[๐ซ Three-layer ticket]
E --> F[๐ Bet]
The premise: strip the takeout, find the overlays, size with discipline, document everything.
โก Quick start¶
git clone https://github.com/nigelglenday/horse-math
cd horse-math
pip install -e . # installs pydantic + matplotlib
# Run the full pipeline against frozen Derby data
python3 src/handicap.py --race 2026-kentucky-derby
python3 src/sensitivity.py --race 2026-kentucky-derby
python3 src/exacta.py --race 2026-kentucky-derby
python3 src/trifecta.py --race 2026-kentucky-derby
python3 src/portfolio.py --race 2026-kentucky-derby --bankroll 85 \
--target-spend 70 --include-tri \
--top-pick-wheel 8 --longshot-scan 7
python3 src/charts.py --race 2026-kentucky-derby
# Verify the Derby behavior is preserved
python3 tests/test_derby_regression.py
You'll get the same overlays and ticket structure that picked the 2026 Derby winner. Optional: pip install -e ".[fit]" to get scikit-learn for the historical-weight-fitting module.
๐ ๏ธ Run on a new race¶
Status: the engine is ready. Field and PP CSVs are LLM-parsed from the source data via Claude Code reading the rendered PDF and extracting structured rows. Workflow:
| 1. | Create data/races/<your-race-slug>/ |
| 2. | Copy data/races/2026-kentucky-derby/config.toml as a template; tune for your race (post bias, weights, preferred-prep race class) |
| 3. | Parse the Equibase PP into field.csv + past_performances.csv (see schema in either file) |
| 4. | python3 src/fetch_odds.py --race <slug>, prints instructions for pulling live odds |
| 5. | Pull live odds (most tote sites are JS-rendered โ use Claude Code's WebFetch or paste manually) into live_odds.csv |
| 6. | Pull exacta probables from a tote source into exacta_probables.txt (24ร24 grid) |
| 7. | Run the pipeline (handicap โ sensitivity โ exacta โ trifecta โ portfolio โ charts) |
| 8. | Review against learnings/index.md, accumulated cross-race wisdom |
| 9. | Place bets, then write your own learnings/<slug>.md post-race |
A scaffolded Preakness 2026 config is ready at data/races/2026-preakness/config.toml.
๐ Architecture¶
Three layers, none collapsing into another:
| Layer | What | Why |
|---|---|---|
| 1๏ธโฃ | Kelly core, fractional Kelly per positive-EV bet | Variance-managed, under-deploys with our edge sizes |
| 2๏ธโฃ | Satellite spread, minimum-stake bets on high-EV combos Kelly says skip | Catches combos too small for Kelly to size |
| 3๏ธโฃ | Heuristics, top-pick wheel + longshot scan | Top overlay keys top of trifecta; under-bet placers go in the exacta wheel |
| โ | Human judgment, story features, live-day context, risk tolerance | Always overrides. |
Derby check: a 3-layer ticket caught 96% of the hand-tuned upside ($418 of $435). Quarter-Kelly alone caught 3%.
For full architecture details, system diagrams, and module reference:
docs/ARCHITECTURE.md
๐ Repository layout¶
horse-math/
โโโ ๐ CLAUDE.md # AI assistant orientation
โโโ ๐ README.md # this file
โโโ ๐ src/ # race-agnostic engine (pure Python)
โโโ ๐ data/races/<slug>/ # per-race config + data + outputs
โโโ ๐ analysis/
โ โโโ case-studies/ # frozen race narratives
โ โโโ figures/ # per-race visualizations
โโโ ๐ learnings/ # cross-race priors, compounding wisdom
โโโ ๐ docs/ # architecture + diagrams
โโโ ๐ prompts/ # the Derby-Day seed prompt (frozen)
๐ฃ๏ธ Status¶
| Phase | Description | Status |
|---|---|---|
| v1 | Single-race hardcoded build (Derby Day) | โ |
| v2 | Generalized config-driven engine | โ |
| v2.1 | Kelly portfolio + trifectas + 3-layer wagering | โ |
| v2.1 | AE-penalty bug fix, fetch-odds helper | โ |
| v2.1 | Preakness 2026 scaffold | โ |
| v2.2 | Pydantic config schema validation | โ |
| v2.2 | Regression test suite (8 tests, locks Derby behavior) | โ |
| v3 | Historical-Derbies weight fitting (sklearn scaffolded) | ๐ง |
| v3 | Story features (owner / trainer firsts) | ๐ |
| v3 | Live-odds reactive bet sizing | ๐ |
๐ง Earned priors (read these)¶
These are wisdom carried forward from races we've actually bet. Don't relearn them.
- Sensitivity "ROCK SOLID" โ model is right. Necessary, not sufficient.
- Live-tote drift on the favorite is signal, not noise. Public sees what linemakers miss.
- Star-jockey/trainer overbet is bigger than the model alone captures. Live odds reveal it directly.
- AE-activated horses are real starters. Drop the AE penalty when live odds confirm.
- Top-overlay horses deserve to be top-of-trifecta. Asymmetry rule.
- Bankroll is a scalar, not a structural constraint. Optimize first, scale second.
- Story matters; surface it. Biographical features predict public-money flow.
- The model is not the race. The Beyer is an estimate of speed. Plackett-Luce is an approximation of ordering. Always ask what the model isn't seeing.
Full version: learnings/index.md
โ ๏ธ Limitations¶
For honest disclaimers about what this project does and does not establish, see LIMITATIONS.md. Short version: the headline result is N=1, weights are hand-set priors not MLE-fit, the sensitivity scan is self-consistency rather than statistical validation, and the philosophical framing is design pragmatism dressed up. The methodology is shaped correctly; the empirical grounding is the v3 project.
๐ค For new AI sessions¶
CLAUDE.md is the orientation file, auto-loaded by Claude Code, intended as the entry point for any new AI assistant session. Carries project context, common workflows, accumulated wisdom, and a replaceable user-context section.
The Derby-Day seed prompt stays frozen at prompts/derby-day.md as the historical artifact.
๐ Data attribution¶
Past performance source data is from Equibase Company LLC, copyright 2026, all rights reserved. Raw PP files (data/races/*/raw/*.pdf and intermediate text dumps) are gitignored, get your own. The structured CSVs in data/races/<slug>/ are derivative analytical extracts: factual fields (dates, distances, Beyer figures, finish positions) reorganized into our schema for non-commercial analytical and educational purposes. Beyer Speed Figures are a registered analytical product of Daily Racing Form / Equibase. This repository is fair-use academic-style analysis; not a substitute for a paid PP subscription, not a republication of Equibase's compiled data, not commercial.
MIT licensed. Made on Derby Day 2026, generalized for the Preakness and beyond.
๐๐๐