How We Validated the FRED Score
A Forward-Looking Crash-Burden Model, Tested Out-of-Time
Published: February 2026 • Study Period: 2024-2025
The Question
How much harm will a motor carrier do on the road over the next year? The FRED Score answers that directly: it predicts each carrier's severity-weighted crash burden over the forward 12 months — not just how many crashes, but how bad — then grades it against similarly-sized peers.
We fit the model on one year of carrier history (2024) and test it against the crash burden that actually followed (2025). Because the model is never judged on the same data it learned from, the validation is genuinely out-of-time. Across roughly 1.1 million graded carriers, the fitted model ranks forward risk far better than fleet size alone and stays unbiased within every size band.
What the Model Predicts:
Severity weight = 1 + 12×fatal + 4×injury + 3×hazmat
A frequency × severity GLM (Poisson / Tweedie) with a log-exposure offset predicts each carrier's expected crash burden per mile driven, cross-checked against a gradient-boosted XGBoost-Tweedie model. A fatal crash counts far more than a minor tow-away, so the score reflects expected harm, not just event counts. Thin-data carriers are shrunk toward their size-band prior via Bühlmann-Straub credibility and tagged with a confidence tier.
How Well Does It Rank Forward Risk?
On a carrier-disjoint holdout of the following year, the fitted model reaches a normalized Gini of about 0.59–0.61 — nearly double the ~0.33 achievable from fleet size alone — while staying unbiased within every fleet-size band (observed-over-expected burden lands at ≈ 1.0). It ranks risk sharply and calibrates honestly, across roughly 1.1 million graded carriers.
Do Grades Predict Future Crashes?
Year 2 (2025) Crash Rates by Safety Grade
242,969 graded carriers • EB-adjusted rates per 100k miles
Carriers graded Critical in Year 1 were 3.35× more likely to crash in Year 2 than Excellent carriers, after controlling for fleet size via EB adjustment. 9.7% of Excellent carriers crashed, compared to 18.7% of Critical carriers.
Left number: EB-adjusted crash rate per 100k miles. Right number: % of carriers with any Year 2 crash.
How the Model Is Built and Tested
Our Methodology
Rather than hand-assigning weights, we let the data set them. The pipeline:
- 1. Define the target: Each Year-2 crash is severity-weighted (1 + 12×fatal + 4×injury + 3×hazmat) and summed into a carrier's crash burden.
- 2. Build Year-1 features: Prior-crash EB relativity, fleet size, inspection exposure, and behavioral / equipment / severe violation rates.
- 3. Fit a frequency × severity GLM: A Poisson / Tweedie model with a log-exposure offset learns each feature's weight from how strongly it predicts next-year burden; an XGBoost-Tweedie model runs alongside as a cross-check.
- 4. Apply credibility: Bühlmann-Straub shrinkage blends each carrier toward its size-band prior, yielding a confidence tier.
- 5. Validate out-of-time: Score the forward year and measure discrimination (normalized Gini) and calibration (per-band observed-over-expected).
Critical: Exposure enters as a log-offset so the model predicts a rate of crash burden, not a raw count, and calibration is enforced band-by-band. That is what lets a 3-truck owner-operator and a 3,000-truck fleet be compared fairly.
1. Prior-Crash EB Relativity
The single strongest feature in the fit
Past crashes are the single strongest predictor of future crashes. Carriers with elevated crash rates in Year 1 consistently had elevated rates in Year 2. This signal is robust across all fleet sizes and carrier ages.
Why it leads: Prior crashes are the most direct evidence of forward burden, entered as a credibility-shrunk relativity rather than a raw count. A single crash doesn't doom a small carrier, because Bühlmann-Straub shrinkage blends each carrier's record toward its size-band prior — so the signal is reliable for both 3-truck operations and 3,000-truck fleets.
2. Behavioral Violation Rate
The best leading indicator the fit recovers
Driver decision violations — speeding, reckless driving, HOS violations, drug and alcohol offenses — are the strongest leading indicator among violation types. These capture the culture and discipline of a carrier's operation before crashes actually happen.
Empirical Relative Risk by Violation Type:
Cumulative Behavioral Dose-Response (EB-Adjusted):
% of carriers with any Year 2 crash, by number of distinct behavioral violation types in Year 1. EB RR is size-band normalized.
Why it ranks high: Behavioral violations are the best leading indicator — they reveal risk before crashes happen. A carrier whose drivers regularly speed or drive fatigued carries a higher forward burden. The behavioral rate enters the GLM as a feature whose weight is learned from how strongly it predicts the following year's crash burden; the relativities above are illustrative of the ordering the fit recovers and align with the Violation Types study.
3. Equipment Violation Rate
An independent maintenance signal
Equipment violations — brakes, tires, lighting, cargo securement — reflect a carrier's maintenance standards and operational discipline. Unlike behavioral violations (driver choices), equipment condition reveals systemic quality.
Why it matters: Equipment violations are a moderate but consistent predictor. A carrier that regularly fails brake inspections has systemic maintenance issues. The equipment rate carries less weight in the fit than the behavioral rate (about 23% lower forward relative risk on average) but remains an important independent signal of operational quality.
4. Severe-Violation Rate
Captures tail risk other features miss
The severe-violation rate summarizes the most dangerous violations — those with FMCSA severity weight ≥ 7. It enters the model as its own exposure-normalized feature, so carriers with these violations are predicted to carry a higher forward crash burden than their other signals alone would suggest.
Critical Behavioral Flags:
These violations are flagged on carrier records and feed the severe-violation rate the GLM consumes — no hard score caps, just a learned feature weight:
Lifts the severe-violation rate — carriers with these violations show a markedly higher predicted burden
Substance violations raise both the behavioral and the severe rate — a double signal that amplifies the predicted burden
Carriers with 10+ speeding violations crashed at 98.4% in the following year — a strong upward push on predicted burden
Why it matters: Severity captures tail risk the other features might miss. A carrier with critical violations (severity weight ≥ 7) represents a qualitatively different risk profile. Because the rate is exposure-normalized and the carrier is credibility-shrunk toward its size-band prior, this signal is reliable across fleet sizes without disproportionately penalizing small carriers for isolated incidents.
Supporting Evidence: The Carrier Age Effect
Why Empirical Bayes Matters
Carrier age shows a moderate risk gradient (about 1.37× peak-to-low) captured through EB shrinkage, not a separate weight
Crash Rate by Years in Operation
Click fleet size buttons to compare different carrier segments
Carrier age remains relevant, but the effect is more modest in this cohort: the highest rates are among the newest carriers, and the lowest rates are in the 10-19 year range after normalizing for fleet size and exposure.
Rather than giving age its own weight in the formula, our Empirical Bayes shrinkage naturally handles this effect. New carriers with limited data get pulled toward their peer-group average (which includes the age-related risk), while mature carriers with extensive records keep their observed rates. This is more principled than adding experience as an arbitrary weighted factor.
FMCSA's BASIC Score Problem: BASIC treats all carriers equally regardless of age. A 20-year carrier with 2 crashes gets the same treatment as a 1-year carrier with 2 crashes. When we tested BASIC scores as a predictor, they showed inverse correlation (RR=0.17×) — carriers with better BASIC scores had higher crash rates, likely because established carriers accumulate more inspection history. We don't use BASIC in our model.
Supporting Evidence: The Fleet Size Effect
Small Carriers Have Higher Crash Rates Across All Ages
The interactive chart above reveals a consistent pattern: smaller fleets have higher crash rates regardless of experience. This is why we normalize per 100k miles and apply Empirical Bayes adjustment by fleet-size peer group — it controls for this effect rather than penalizing small carriers arbitrarily.
Crash Rates by Fleet Size (1-2 Year Carriers)
Small carriers in their first two years have crash rates 4.3× higher than enterprise carriers of the same age.
How Our Model Handles This
Empirical Bayes by size band: Each fleet-size peer group (small, medium, large, enterprise) has its own prior. A small carrier's rate is shrunk toward the small-carrier average, not the overall fleet average.
Within-band grading: Grades compare each carrier to others of the same size. A risk relativity of 1.00× means "typical for your size band." This prevents small carriers from being automatically graded worse simply for being small.
Band-calibrated expectation: Each fleet-size band is calibrated so total predicted burden matches total observed burden (observed-over-expected ≈ 1.0). A given grade therefore means the same forward risk regardless of fleet size.
What Predicts Risk for Mature Carriers?
The Predictive Hierarchy Shifts After 10 Years
Once carriers reach 10+ years, operational performance metrics dominate. Crash history becomes the primary differentiator, while behavioral violations remain the best leading indicator of emerging risk.
Behavioral vs Equipment Violations (10+ Year Carriers)
When normalized per 100,000 miles to control for fleet size, behavioral violations emerge as the stronger predictor:
Speeding, reckless driving, HOS, drugs/alcohol
Worst quintile: 0.381 crashes/100k mi
Brakes, tires, lights, cargo securement
Worst quintile: 0.312 crashes/100k mi
Key insight: For mature carriers, behavioral violations are 23% more predictive than equipment violations. This is why the model carries behavioral and equipment rates as distinct features rather than lumping all violations together — the fit gives the behavioral rate the heavier weight on its own.
Implications for Insurance Underwriting
For new carriers (<10 years): The EB shrinkage toward peer-group priors provides natural conservatism. Limited data means the score reflects the peer average more than individual history.
For mature carriers (10+ years): Crash history carries the most weight because these carriers have enough data for reliable rate estimation. Behavioral violations provide the best early warning of deteriorating safety culture before crashes materialize.
Grade Distribution
Grades are assigned from each carrier's within-band burden percentile — where its credibility-shrunk predicted crash burden sits among same-size peers. Alongside the grade, a risk relativity of 1.00× means the carrier is typical for its size band; below 1 is safer, above is riskier. Because the comparison is within band, small and large fleets are graded fairly against their own peers.
242,969 graded carriers in the study population. The distribution peaks at Excellent (46.1%), with a right-skewed tail reflecting that most carriers have zero or very few crashes. Each grade card shows the Year 2 crash rate — the percentage of carriers in that grade who experienced at least one crash in 2025. Critical carriers crash at 1.93× the rate of Excellent carriers.
Key Takeaways
1. Prior Crash Burden Is the Dominant Predictor
A carrier's credibility-shrunk prior-crash relativity is the single strongest feature in the fit. Bühlmann-Straub shrinkage makes this robust for all fleet sizes — small carriers aren't penalized by statistical noise, and large carriers keep their reliable observed history.
2. Behavioral Violations Are the Best Leading Indicator
Driver decision violations (speeding, reckless driving, substances, HOS fraud) are the strongest leading indicator of future crash burden. They capture risk before it materializes. For mature carriers, behavioral violations are 23% more predictive than equipment violations.
3. Equipment Condition Reflects Systemic Quality
Brake failures, tire issues, and lighting problems signal maintenance standards and operational discipline. It carries less weight than the behavioral rate, but contributes an independent and consistent signal.
4. Severity Captures Tail Risk
The most dangerous violations — reckless driving, substance offenses, extreme speeding — carry high severity weights and lift the carrier's exposure-normalized severe-violation rate. Carriers with these violations are predicted to carry a higher forward burden, catching qualitatively different risk that crash history alone might miss.
5. Fleet Size Is Controlled, Not Penalized
Small carriers (1-5 trucks) have crash rates 2-4× higher than enterprise carriers, but within-band grading, the log-exposure offset, and per-band calibration ensure carriers are compared to peers of similar size. A small carrier with clean records can still earn an Excellent grade.