Clean year-over-year prediction: FRED scores computed entirely from 2024 data, tested against 2025 crash outcomes. No data leakage. No circular validation. Honest temporal prediction.
| Total eligible carriers | 441,199 |
| Y2 crash rate | 10.2% |
| Small (1-5 trucks) | 366,822 (83.1%) |
| Medium (6-20) | 54,515 (12.4%) |
| Large (21-100) | 16,813 (3.8%) |
| XLarge (101+) | 3,049 (0.7%) |
AUC measures the probability that a randomly chosen crash carrier ranks higher than a randomly chosen non-crash carrier. All models are tested against the same Y2 (2025) binary outcome: did the carrier have any crash?
| Band | n | AUC | 95% CI |
|---|---|---|---|
| Small (1-5) | 366,822 | 0.582 | 0.577–0.586 |
| Medium (6-20) | 54,515 | 0.553 | 0.547–0.558 |
| Large (21-100) | 16,813 | 0.582 | 0.574–0.591 |
| XLarge (101+) | 3,049 | 0.649 | 0.620–0.683 |
AUC improves with fleet size because larger carriers have more stable EB estimates (more data per carrier) and the rate-to-binary mismatch is reduced.
Grades assigned from Y1-only peer_index using production thresholds. Two metrics shown: binary crash rate (% with any crash) and EB crash rate (crashes per 100k miles).
| Grade | n | Crashed | % Crashed | 95% CI | EB Rate |
|---|---|---|---|---|---|
| Excellent | 21,950 | 2,386 | 10.87% | 10.46–11.27 | 0.01374 |
| Strong | 34,099 | 2,700 | 7.92% | 7.63–8.21 | 0.02113 |
| Satisfactory | 188,361 | 13,096 | 6.95% | 6.84–7.07 | 0.03272 |
| Marginal | 81,744 | 10,730 | 13.13% | 12.89–13.35 | 0.04552 |
| Poor | 54,758 | 8,905 | 16.26% | 15.97–16.56 | 0.05888 |
| Critical | 60,287 | 6,990 | 11.59% | 11.34–11.85 | 0.27295 |
2024 crashes counted directly from FMCSA crash CSV, tested against 2025 crash outcomes. This is the strongest single predictor.
| Y1 Crashes | n | Y2 Crash Rate | 95% CI | RR vs 0 |
|---|---|---|---|---|
| 0 | 397,206 | 7.3% | 7.3–7.4 | — |
| 1 | 32,019 | 24.7% | 24.2–25.2 | 3.38× |
| 2 | 6,346 | 49.0% | 47.8–50.3 | 6.71× |
| 3+ | 5,628 | 82.3% | 81.3–83.3 | 11.27× |
2024 violation types from raw CSV, tested against 2025 crash outcomes. Clean temporal separation — no score contamination. Behavioral violations show a strong, monotonic dose-response relationship with future crash risk.
| Violation Factor | n with | Crash % (with) | Crash % (without) | RR |
|---|---|---|---|---|
| Reckless Driving | 219 | 53.9% | 10.1% | 5.32× |
| Drugs / Alcohol | 2,381 | 36.8% | 10.0% | 3.68× |
| Any Speeding | 41,520 | 34.3% | 7.6% | 4.49× |
| 1+ behavioral types | 88,138 | 25.1% | 6.4% | 3.92× |
| 2+ behavioral types | 31,918 | 41.3% | 7.7% | 5.35× |
| 3+ behavioral types | 13,861 | 57.8% | 8.6% | 6.72× |
| 4+ behavioral types | 6,896 | 71.9% | 9.2% | 7.84× |
| 1+ equipment types | 132,567 | 20.0% | 5.9% | 3.36× |
| 2+ equipment types | 75,511 | 25.5% | 7.0% | 3.65× |
| Factor | n | RR |
|---|---|---|
| Reckless Driving | 219 | 5.32× |
| Drugs / Alcohol | 2,381 | 3.68× |
Standardized coefficients from Y1 component RRs predicting Y2 binary crash.
| Component | Importance | OR |
|---|---|---|
| Equipment RR | 42.0% | 0.997 |
| Severe RR | 29.2% | 1.011 |
| Crash RR | 28.1% | 1.036 |
| Behavioral RR | 0.7% | 1.000 |
Single-component AUC and impact of removing each component from the full model.
| Component | Alone AUC | AUC Drop |
|---|---|---|
| Behavioral RR | 0.626 | +0.000 |
| Crash RR | 0.542 | +0.025 |
| Equipment RR | 0.529 | +0.011 |
| Severe RR | 0.528 | +0.017 |
Among small and medium carriers, younger companies have higher crash rates. The apparent reversal for large/xlarge fleets is a sample artifact: "young large fleets" barely exist (n=195 and n=21), since fleets grow over time and rarely appear as large carriers from day one. The few that do are typically corporate restructurings or rebrands, not genuinely new operations.
| Fleet Band | Young (0-2yr) | Old (21+yr) | n (young) | Direction |
|---|---|---|---|---|
| Small (1-5) | 5.9% | 4.6% | 30,390 | Young riskier (1.28×) |
| Medium (6-20) | 21.6% | 19.9% | 1,341 | Young riskier (1.09×) |
| Large (21-100) | 39.0% | 53.0% | 195 | Unreliable (tiny n) |
| XLarge (101+) | 71.4% | 90.8% | 21 | Unreliable (tiny n) |
The aggregate "old riskier" pattern (0.45×) is a textbook Simpson's paradox: large carriers are both older and have higher binary crash rates (due to mileage exposure), dragging the aggregate in the opposite direction from the within-band relationship.
Carriers ranked by Y1-only peer_index (highest risk first), then split into deciles.
| Decile | n | Y2 Crashes | % of Total | Cumulative | Crash Rate |
|---|---|---|---|---|---|
| D1 (riskiest) | 44,119 | 6,629 | 7.3% | 7.3% | 11.1% |
| D2 | 44,120 | 10,795 | 11.8% | 19.1% | 14.0% |
| D3 | 44,120 | 22,118 | 24.2% | 43.3% | 19.4% |
| D4 | 44,120 | 14,491 | 15.9% | 59.2% | 11.0% |
| D5 | 44,120 | 11,573 | 12.7% | 71.9% | 11.0% |
| D6 | 44,120 | 8,029 | 8.8% | 80.7% | 8.2% |
| D7 | 44,120 | 3,699 | 4.0% | 84.7% | 5.5% |
| D8 | 44,120 | 4,778 | 5.2% | 89.9% | 5.4% |
| D9 | 44,120 | 3,527 | 3.9% | 93.8% | 6.3% |
| D10 (safest) | 44,120 | 5,760 | 6.3% | 100.0% | 9.7% |
How this clean temporal validation compares with the original empirical validation.
| Metric | Original Study | This Study (Y1-only) | Notes |
|---|---|---|---|
| FRED AUC | 0.852 | 0.585 | Original used 24-mo window (includes Y2 data) |
| Grade RR (% crashed) | 18.2× | 1.1× | Binary rate confounded by exposure |
| Grade RR (EB rate) | — | 19.9× | New metric: EB crash rate per 100k mi |
| Crash history RR | 1.41× | 4.85× | Clean CSV-based vs DB (leaked) |
| Behavioral 4+ types RR | 8.17× | 7.84× | Both clean temporal — consistent |
| Reckless driving RR | 3.94× | 5.32× | Both clean temporal — consistent |
| Population | 180,402 | 441,199 | Different eligibility filters |
| Score source | DB FRED scores | Y1-only EB recomputation | Key methodological difference |
| Predictor | Spearman rho | p-value |
|---|---|---|
| ML20 crash prob | 0.264 | < 10-300 |
| ISS score | 0.184 | < 10-300 |
| Behavioral RR (Y1) | 0.132 | < 10-300 |
| FRED Peer Index (Y1) | 0.089 | < 10-300 |
| Crash RR (Y1) | 0.044 | 1.2 × 10-185 |
| Equipment RR (Y1) | 0.030 | 6.2 × 10-89 |
| Severe RR (Y1) | 0.029 | 4.3 × 10-82 |
All correlations are highly significant. Behavioral violations (rho=0.132) stand out as the strongest FRED component — 3× stronger than crash history (0.044) and 4× stronger than equipment (0.030) or severe (0.029). The composite peer_index (0.089) is diluted by the heavy crash weight (56%) given that crash history has the weakest temporal signal among the four components.