🏇 horse-math¶

A generalized handicapping engine for thoroughbred racing.¶

Started Derby Day 2026. Picked the winner (Golden Tempo, 25-1). Net +$435 on an $85 ticket. Now generalized for any race, drop in a config + parsed PPs and the same pipeline runs.

📦 View the code on GitHub →

🎯 What this is¶

A weighted softmax handicap and a Kelly-style portfolio builder. Parse the PPs, score the field, strip takeout from live odds, find horses where our number says the public is wrong, bet the overlays. Pure Python, one TOML config per race.

flowchart LR
    A[📄 PP files] --> B[🧮 Handicap engine]
    C[💰 Live odds] --> B
    B --> D[📊 Overlays]
    D --> E[🎫 Three-layer ticket]
    E --> F[🐎 Bet]

The premise: strip the takeout, find the overlays, size with discipline, document everything.

⚡ Quick start¶

git clone https://github.com/nigelglenday/horse-math
cd horse-math
pip install -e .                       # installs pydantic + matplotlib

# Run the full pipeline against frozen Derby data
python3 src/handicap.py    --race 2026-kentucky-derby
python3 src/sensitivity.py --race 2026-kentucky-derby
python3 src/exacta.py      --race 2026-kentucky-derby
python3 src/trifecta.py    --race 2026-kentucky-derby
python3 src/portfolio.py   --race 2026-kentucky-derby --bankroll 85 \
                           --target-spend 70 --include-tri \
                           --top-pick-wheel 8 --longshot-scan 7
python3 src/charts.py      --race 2026-kentucky-derby

# Verify the Derby behavior is preserved
python3 tests/test_derby_regression.py

You'll get the same overlays and ticket structure that picked the 2026 Derby winner. Optional: pip install -e ".[fit]" to get scikit-learn for the historical-weight-fitting module.

🛠️ Run on a new race¶

Status: the engine is ready. Field and PP CSVs are LLM-parsed from the source data via Claude Code reading the rendered PDF and extracting structured rows. Workflow:


1.	Create `data/races/<your-race-slug>/`
2.	Copy `data/races/2026-kentucky-derby/config.toml` as a template; tune for your race (post bias, weights, preferred-prep race class)
3.	Parse the Equibase PP into `field.csv` + `past_performances.csv` (see schema in either file)
4.	`python3 src/fetch_odds.py --race <slug>`, prints instructions for pulling live odds
5.	Pull live odds (most tote sites are JS-rendered → use Claude Code's WebFetch or paste manually) into `live_odds.csv`
6.	Pull exacta probables from a tote source into `exacta_probables.txt` (24×24 grid)
7.	Run the pipeline (handicap → sensitivity → exacta → trifecta → portfolio → charts)
8.	Review against `learnings/index.md`, accumulated cross-race wisdom
9.	Place bets, then write your own `learnings/<slug>.md` post-race

A scaffolded Preakness 2026 config is ready at data/races/2026-preakness/config.toml.

📐 Architecture¶

Three layers, none collapsing into another:

Layer	What	Why
1️⃣	Kelly core, fractional Kelly per positive-EV bet	Variance-managed, under-deploys with our edge sizes
2️⃣	Satellite spread, minimum-stake bets on high-EV combos Kelly says skip	Catches combos too small for Kelly to size
3️⃣	Heuristics, top-pick wheel + longshot scan	Top overlay keys top of trifecta; under-bet placers go in the exacta wheel
➕	Human judgment, story features, live-day context, risk tolerance	Always overrides.

Derby check: a 3-layer ticket caught 96% of the hand-tuned upside ($418 of $435). Quarter-Kelly alone caught 3%.

For full architecture details, system diagrams, and module reference: docs/ARCHITECTURE.md

📚 Repository layout¶

horse-math/
├── 📄 CLAUDE.md          # AI assistant orientation
├── 📄 README.md          # this file
├── 📁 src/               # race-agnostic engine (pure Python)
├── 📁 data/races/<slug>/ # per-race config + data + outputs
├── 📁 analysis/
│   ├── case-studies/     # frozen race narratives
│   └── figures/          # per-race visualizations
├── 📁 learnings/         # cross-race priors, compounding wisdom
├── 📁 docs/              # architecture + diagrams
└── 📁 prompts/           # the Derby-Day seed prompt (frozen)

🛣️ Status¶

Phase	Description	Status
v1	Single-race hardcoded build (Derby Day)	✅
v2	Generalized config-driven engine	✅
v2.1	Kelly portfolio + trifectas + 3-layer wagering	✅
v2.1	AE-penalty bug fix, fetch-odds helper	✅
v2.1	Preakness 2026 scaffold	✅
v2.2	Pydantic config schema validation	✅
v2.2	Regression test suite (8 tests, locks Derby behavior)	✅
v3	Historical-Derbies weight fitting (sklearn scaffolded)	🚧
v3	Story features (owner / trainer firsts)	🔜
v3	Live-odds reactive bet sizing	🔜

🧠 Earned priors (read these)¶

These are wisdom carried forward from races we've actually bet. Don't relearn them.

Sensitivity "ROCK SOLID" ≠ model is right. Necessary, not sufficient.
Live-tote drift on the favorite is signal, not noise. Public sees what linemakers miss.
Star-jockey/trainer overbet is bigger than the model alone captures. Live odds reveal it directly.
AE-activated horses are real starters. Drop the AE penalty when live odds confirm.
Top-overlay horses deserve to be top-of-trifecta. Asymmetry rule.
Bankroll is a scalar, not a structural constraint. Optimize first, scale second.
Story matters; surface it. Biographical features predict public-money flow.
The model is not the race. The Beyer is an estimate of speed. Plackett-Luce is an approximation of ordering. Always ask what the model isn't seeing.

Full version: learnings/index.md

⚠️ Limitations¶

For honest disclaimers about what this project does and does not establish, see LIMITATIONS.md. Short version: the headline result is N=1, weights are hand-set priors not MLE-fit, the sensitivity scan is self-consistency rather than statistical validation, and the philosophical framing is design pragmatism dressed up. The methodology is shaped correctly; the empirical grounding is the v3 project.

🤖 For new AI sessions¶

CLAUDE.md is the orientation file, auto-loaded by Claude Code, intended as the entry point for any new AI assistant session. Carries project context, common workflows, accumulated wisdom, and a replaceable user-context section.

The Derby-Day seed prompt stays frozen at prompts/derby-day.md as the historical artifact.

📋 Data attribution¶

Past performance source data is from Equibase Company LLC, copyright 2026, all rights reserved. Raw PP files (data/races/*/raw/*.pdf and intermediate text dumps) are gitignored, get your own. The structured CSVs in data/races/<slug>/ are derivative analytical extracts: factual fields (dates, distances, Beyer figures, finish positions) reorganized into our schema for non-commercial analytical and educational purposes. Beyer Speed Figures are a registered analytical product of Daily Racing Form / Equibase. This repository is fair-use academic-style analysis; not a substitute for a paid PP subscription, not a republication of Equibase's compiled data, not commercial.

MIT licensed. Made on Derby Day 2026, generalized for the Preakness and beyond.

🐎🐎🐎