How We Score Safety
A fair, transparent model that normalizes for fleet size and mileage so carriers are compared against true peers — not penalized for being small or large.
Scoring Pipeline at a Glance
Data Sources
Seven FMCSA datasets feed the scoring pipeline
Every score starts with public data from the Federal Motor Carrier Safety Administration. We pull seven distinct datasets, join them on DOT number, and filter to the population of active for-hire property carriers.
| Dataset | What It Contains | Updates |
|---|---|---|
| Census | Carrier registration, fleet size, address, authority type, officer names | Daily |
| Inspections | Roadside inspections with driver and vehicle out-of-service (OOS) counts | Daily |
| Crashes | Reportable crash events with fatality, injury, and tow-away details | Monthly |
| Violations | Individual violation records with category, severity weight, and inspection date | Monthly |
| BASIC Scores | SMS safety measure scores — SMS AB (interstate + intrastate hazmat) and SMS C (intrastate non-hazmat) merged by DOT number | Monthly |
| SMS Census | Authority classification fields: for-hire, exempt, private property, government, etc. | Monthly |
| History | Operating authority orders, revocations, and docket status changes | Daily |
Carrier Eligibility
Who gets scored — and who doesn't
Not every FMCSA-registered entity is a for-hire trucking carrier. We apply two layers of filtering: hard exclusions that remove carriers entirely, and eligibility thresholds that determine whether enough data exists to produce a reliable score.
Pre-Scoring Exclusions
Carriers matching any of these criteria are removed before scoring begins:
Scoring Eligibility Thresholds
Carriers that pass exclusion filters must also have enough observable activity to produce a meaningful score:
Data & Exposure Normalization
Why raw counts are misleading
Public FMCSA records — crashes, inspections, and violations — are aggregated for each carrier over a fixed observation window. But comparing a 3-truck fleet to a 500-truck fleet by raw event counts is inherently unfair. A larger fleet drives more miles and naturally encounters more events.
We solve this by dividing every metric by exposure — the carrier's total mileage over the 24-month scoring window, in 100k-mile units:
Empirical Bayes Stabilization
Taming noise without losing signal
Even after normalizing by miles, a tiny fleet with 1 crash in 50k miles looks far worse than a large fleet with 10 crashes in 5 million miles — even though the small fleet's rate is mostly just noise. One lucky or unlucky year can swing their rate wildly.
For each size band, we fit a Gamma-Poisson prior and blend each carrier's observed rate with the group average. Small fleets get pulled ("shrunk") more toward the mean; large fleets keep their observed rates because the signal is reliable.
Point Scale & Weighted Score
Turning rates into a composite number
Each stabilized metric is expressed as a rate ratio against the peer-group average, then converted to a symmetric log scale. A carrier exactly at the peer mean scores 0; twice the mean scores +100; half the mean scores −100.
Four safety dimensions are then blended into a single underwriting score using weights tuned for both predictive power (AUC) and size-fairness:
Violation Classification & Risk Weights
Violations are classified into behavioral (driver conduct) and equipment (vehicle condition) categories. Each type carries an empirical relative risk weight from temporal crash-correlation validation:
| Tier | Violation Type | RR Weight |
|---|---|---|
| Critical — Immediate danger behaviors | ||
| Reckless Driving | 1.49 | |
| Dangerous Driving | 1.37 | |
| Jumping OOS / Driving Fatigued | 1.36 | |
| High — Serious behavioral risks | ||
| Speeding (high & excessive) | 1.30 | |
| Drugs / Alcohol | 1.29 | |
| Alcohol Possession | 1.27 | |
| Moderate Speeding | 1.20 | |
| Phone Call / Texting | 1.16–1.18 | |
| Moderate — Concerning behaviors | ||
| False Log | 1.12 | |
| Seat Belt | 1.11 | |
| Equipment — Vehicle condition | ||
| Lighting | 1.17 | |
| Tires | 1.16 | |
| Brakes (all types) | 1.13 | |
RR weights represent the empirical relative risk (crash rate ratio) for carriers with each violation type vs. those without. In the FRED pipeline, violations are counted by category (behavioral vs. equipment) and smoothed via Empirical Bayes — the per-type weights inform the classification and are used in the legacy behavioral score.
Expected Loss Calibration
From score to dollars
A fair score is useful for ranking, but underwriters also need a concrete expected event count for pricing. The peer index directly scales the size-band baseline crash rate to produce an expected loss rate for each carrier:
Peer-Relative Safety Grades
Apples-to-apples comparison
Instead of grading on the raw score, Fleetidy assigns safety grades from a peer index — a weighted composite of how each safety component compares to same-size peers. A peer index of 1.0 means each component rate matches the band mean. Because most carriers have zero crashes, the distribution is right-skewed and the typical band average is below 1.0 (e.g., ~0.85–0.90).
These cutoffs are the same for all size bands — a Peer Index of 0.30 means "Strong" whether you run 3 trucks or 3,000.
Rolling Window
24-month observation period
All scoring data — crashes, violations, inspections — uses a rolling 24-month window from the most recent FMCSA data snapshot. As the window advances, older events naturally drop out and are replaced by more recent observations.
This approach ensures that every carrier's score reflects their current safety posture. A carrier that improves its practices will see those improvements reflected as older violations age out. Conversely, a carrier that deteriorates will see the impact within months, not years.
All four scoring components — crash rate, behavioral violations, equipment defects, and critical/severe violations — are measured over the same 24-month window, then compared against the carrier's peer group using Empirical Bayes shrinkage. The peer index is computed directly from these rolling-window rates; no additional time adjustments are applied.
Automatic Rules & Flags
Hard overrides, eligibility gates, and informational flags
Beyond the statistical model, a set of deterministic rules handle edge cases where the math alone isn't sufficient. These fall into three categories: score overrides, data-quality adjustments, and informational flags.
Score Overrides
These rules supersede the calculated score entirely:
If a carrier holds a Conditional (C) or Unsatisfactory (U) FMCSA safety rating, the combined score is forced to 0 and the grade is set to Critical, regardless of the calculated peer index.
If a carrier has interstate-only scope, no active operating authority, and is not exempt — all FRED scores are set to NULL (no score produced).
Eligibility Gates
These block FRED scoring entirely when data is insufficient or implausible:
Fewer than 100k miles and zero inspections in the 24-month window. Not enough data to estimate a rate.
Reported mileage is implausible: >300k miles per truck, <1k miles per truck for fleets ≥10, or >500M total miles. Scoring is blocked to prevent extreme rates.
Reports >300k miles per truck and has fewer than 2 inspections — high mileage can't be corroborated by inspection activity.
Data Quality Adjustments
Modifications to component scores when data quality is degraded:
When mileage is missing or implausible but the carrier has observed activity (inspections, crashes, or violations), the crash and violation rate components are excluded from the composite score.
Calculated exposure is floored at 50k miles (0.5 units) to prevent extremely volatile rates from tiny denominators.
Informational Flags
These flags are attached to carrier records for context but do not directly alter the score:
Validation Standards
How we know it works
Before any score update goes live, we run a battery of fairness and calibration checks. If any gate fails, the update is held until the issue is resolved.
Observed-over-Expected ratio must stay within 0.98 – 1.02 for each size band.
Cross-tab of size band and safety group must stay within 0.90 – 1.10.
Observed crash rates must rise from Excellent to Critical within every size band.
All scoring and calibration is performed by the reproducible pipeline
fred_postgres_v4.py.