Skip to content

๐Ÿ‡ horse-math

A generalized handicapping engine for thoroughbred racing.

Python Pydantic Matplotlib License

Derby 2026 Result Net Return MOIC Built With

Started Derby Day 2026. Picked the winner (Golden Tempo, 25-1). Net +$435 on an $85 ticket. Now generalized for any race, drop in a config + parsed PPs and the same pipeline runs.


๐ŸŽฏ What this is

A weighted softmax handicap and a Kelly-style portfolio builder. Parse the PPs, score the field, strip takeout from live odds, find horses where our number says the public is wrong, bet the overlays. Pure Python, one TOML config per race.

flowchart LR
    A[๐Ÿ“„ PP files] --> B[๐Ÿงฎ Handicap engine]
    C[๐Ÿ’ฐ Live odds] --> B
    B --> D[๐Ÿ“Š Overlays]
    D --> E[๐ŸŽซ Three-layer ticket]
    E --> F[๐ŸŽ Bet]

The premise: strip the takeout, find the overlays, size with discipline, document everything.


โšก Quick start

git clone https://github.com/nigelglenday/horse-math
cd horse-math
pip install -e .                       # installs pydantic + matplotlib

# Run the full pipeline against frozen Derby data
python3 src/handicap.py    --race 2026-kentucky-derby
python3 src/sensitivity.py --race 2026-kentucky-derby
python3 src/exacta.py      --race 2026-kentucky-derby
python3 src/trifecta.py    --race 2026-kentucky-derby
python3 src/portfolio.py   --race 2026-kentucky-derby --bankroll 85 \
                           --target-spend 70 --include-tri \
                           --top-pick-wheel 8 --longshot-scan 7
python3 src/charts.py      --race 2026-kentucky-derby

# Verify the Derby behavior is preserved
python3 tests/test_derby_regression.py

You'll get the same overlays and ticket structure that picked the 2026 Derby winner. Optional: pip install -e ".[fit]" to get scikit-learn for the historical-weight-fitting module.


๐Ÿ› ๏ธ Run on a new race

Status: the engine is ready. Field and PP CSVs are LLM-parsed from the source data via Claude Code reading the rendered PDF and extracting structured rows. Workflow:

1. Create data/races/<your-race-slug>/
2. Copy data/races/2026-kentucky-derby/config.toml as a template; tune for your race (post bias, weights, preferred-prep race class)
3. Parse the Equibase PP into field.csv + past_performances.csv (see schema in either file)
4. python3 src/fetch_odds.py --race <slug>, prints instructions for pulling live odds
5. Pull live odds (most tote sites are JS-rendered โ†’ use Claude Code's WebFetch or paste manually) into live_odds.csv
6. Pull exacta probables from a tote source into exacta_probables.txt (24ร—24 grid)
7. Run the pipeline (handicap โ†’ sensitivity โ†’ exacta โ†’ trifecta โ†’ portfolio โ†’ charts)
8. Review against learnings/index.md, accumulated cross-race wisdom
9. Place bets, then write your own learnings/<slug>.md post-race

A scaffolded Preakness 2026 config is ready at data/races/2026-preakness/config.toml.


๐Ÿ“ Architecture

Three layers, none collapsing into another:

Layer What Why
1๏ธโƒฃ Kelly core, fractional Kelly per positive-EV bet Variance-managed, under-deploys with our edge sizes
2๏ธโƒฃ Satellite spread, minimum-stake bets on high-EV combos Kelly says skip Catches combos too small for Kelly to size
3๏ธโƒฃ Heuristics, top-pick wheel + longshot scan Top overlay keys top of trifecta; under-bet placers go in the exacta wheel
โž• Human judgment, story features, live-day context, risk tolerance Always overrides.

Derby check: a 3-layer ticket caught 96% of the hand-tuned upside ($418 of $435). Quarter-Kelly alone caught 3%.

For full architecture details, system diagrams, and module reference: docs/ARCHITECTURE.md


๐Ÿ“š Repository layout

horse-math/
โ”œโ”€โ”€ ๐Ÿ“„ CLAUDE.md          # AI assistant orientation
โ”œโ”€โ”€ ๐Ÿ“„ README.md          # this file
โ”œโ”€โ”€ ๐Ÿ“ src/               # race-agnostic engine (pure Python)
โ”œโ”€โ”€ ๐Ÿ“ data/races/<slug>/ # per-race config + data + outputs
โ”œโ”€โ”€ ๐Ÿ“ analysis/
โ”‚   โ”œโ”€โ”€ case-studies/     # frozen race narratives
โ”‚   โ””โ”€โ”€ figures/          # per-race visualizations
โ”œโ”€โ”€ ๐Ÿ“ learnings/         # cross-race priors, compounding wisdom
โ”œโ”€โ”€ ๐Ÿ“ docs/              # architecture + diagrams
โ””โ”€โ”€ ๐Ÿ“ prompts/           # the Derby-Day seed prompt (frozen)

๐Ÿ›ฃ๏ธ Status

Phase Description Status
v1 Single-race hardcoded build (Derby Day) โœ…
v2 Generalized config-driven engine โœ…
v2.1 Kelly portfolio + trifectas + 3-layer wagering โœ…
v2.1 AE-penalty bug fix, fetch-odds helper โœ…
v2.1 Preakness 2026 scaffold โœ…
v2.2 Pydantic config schema validation โœ…
v2.2 Regression test suite (8 tests, locks Derby behavior) โœ…
v3 Historical-Derbies weight fitting (sklearn scaffolded) ๐Ÿšง
v3 Story features (owner / trainer firsts) ๐Ÿ”œ
v3 Live-odds reactive bet sizing ๐Ÿ”œ

๐Ÿง  Earned priors (read these)

These are wisdom carried forward from races we've actually bet. Don't relearn them.

  • Sensitivity "ROCK SOLID" โ‰  model is right. Necessary, not sufficient.
  • Live-tote drift on the favorite is signal, not noise. Public sees what linemakers miss.
  • Star-jockey/trainer overbet is bigger than the model alone captures. Live odds reveal it directly.
  • AE-activated horses are real starters. Drop the AE penalty when live odds confirm.
  • Top-overlay horses deserve to be top-of-trifecta. Asymmetry rule.
  • Bankroll is a scalar, not a structural constraint. Optimize first, scale second.
  • Story matters; surface it. Biographical features predict public-money flow.
  • The model is not the race. The Beyer is an estimate of speed. Plackett-Luce is an approximation of ordering. Always ask what the model isn't seeing.

Full version: learnings/index.md


โš ๏ธ Limitations

For honest disclaimers about what this project does and does not establish, see LIMITATIONS.md. Short version: the headline result is N=1, weights are hand-set priors not MLE-fit, the sensitivity scan is self-consistency rather than statistical validation, and the philosophical framing is design pragmatism dressed up. The methodology is shaped correctly; the empirical grounding is the v3 project.


๐Ÿค– For new AI sessions

CLAUDE.md is the orientation file, auto-loaded by Claude Code, intended as the entry point for any new AI assistant session. Carries project context, common workflows, accumulated wisdom, and a replaceable user-context section.

The Derby-Day seed prompt stays frozen at prompts/derby-day.md as the historical artifact.


๐Ÿ“‹ Data attribution

Past performance source data is from Equibase Company LLC, copyright 2026, all rights reserved. Raw PP files (data/races/*/raw/*.pdf and intermediate text dumps) are gitignored, get your own. The structured CSVs in data/races/<slug>/ are derivative analytical extracts: factual fields (dates, distances, Beyer figures, finish positions) reorganized into our schema for non-commercial analytical and educational purposes. Beyer Speed Figures are a registered analytical product of Daily Racing Form / Equibase. This repository is fair-use academic-style analysis; not a substitute for a paid PP subscription, not a republication of Equibase's compiled data, not commercial.


MIT licensed. Made on Derby Day 2026, generalized for the Preakness and beyond.

๐ŸŽ๐ŸŽ๐ŸŽ