How We Score Safety

A fair, transparent model that normalizes for fleet size and mileage so carriers are compared against true peers — not penalized for being small or large.

Scoring Pipeline at a Glance

1. Collect FMCSA Data Crashes, inspections, violations, census records
2. Normalize by Exposure Convert counts to per-100k-mile rates
3. Stabilize with Empirical Bayes Shrink noisy small-fleet rates toward the group mean
4. Score & Weight Log-scale peer-relative points, weighted blend
5. Calibrate Expected Loss Scale baseline crash rate by peer performance
6. Assign Peer Safety Grades Excellent through Critical, relative to same-size peers
7. Rolling Window 24-month observation period, older data drops out naturally
FRED Score
0–100 for ranking & display
Expected Loss
For pricing economics
Peer Index
Census safety grouping

Data Sources

Seven FMCSA datasets feed the scoring pipeline

Every score starts with public data from the Federal Motor Carrier Safety Administration. We pull seven distinct datasets, join them on DOT number, and filter to the population of active for-hire property carriers.

Dataset What It Contains Updates
Census Carrier registration, fleet size, address, authority type, officer names Daily
Inspections Roadside inspections with driver and vehicle out-of-service (OOS) counts Daily
Crashes Reportable crash events with fatality, injury, and tow-away details Monthly
Violations Individual violation records with category, severity weight, and inspection date Monthly
BASIC Scores SMS safety measure scores — SMS AB (interstate + intrastate hazmat) and SMS C (intrastate non-hazmat) merged by DOT number Monthly
SMS Census Authority classification fields: for-hire, exempt, private property, government, etc. Monthly
History Operating authority orders, revocations, and docket status changes Daily
The raw census file contains ~9 million rows. After joining with SMS Census authority fields and applying exclusion filters, roughly 500k–1M active for-hire property carriers remain for scoring.

Carrier Eligibility

Who gets scored — and who doesn't

Not every FMCSA-registered entity is a for-hire trucking carrier. We apply two layers of filtering: hard exclusions that remove carriers entirely, and eligibility thresholds that determine whether enough data exists to produce a reliable score.

Pre-Scoring Exclusions

Carriers matching any of these criteria are removed before scoring begins:

Inactive Status — carrier's FMCSA status is not "Active"
Passenger Operations — operates buses, coaches, school buses, vans, or limos (Fleetidy covers freight/property only)
No Truck Power Units — zero trucks or tractors across all ownership types (owned, leased, trip-leased)
No For-Hire Authority — private-property-only carriers without authorized or exempt for-hire classification

Scoring Eligibility Thresholds

Carriers that pass exclusion filters must also have enough observable activity to produce a meaningful score:

100,000+ miles in the 24-month window OR at least 1 inspection
Not a mileage outlier — excludes carriers reporting >300k miles/truck (implausible) or <1k miles/truck for fleets of 10+ (suspiciously low)
Carriers that fail eligibility still appear in search results with their census data, but won't receive a FRED score or safety grade.

Data & Exposure Normalization

Why raw counts are misleading

Public FMCSA records — crashes, inspections, and violations — are aggregated for each carrier over a fixed observation window. But comparing a 3-truck fleet to a 500-truck fleet by raw event counts is inherently unfair. A larger fleet drives more miles and naturally encounters more events.

3 trucks
2 crashes
Looks dangerous?
500 trucks
30 crashes
Actually much safer per mile

We solve this by dividing every metric by exposure — the carrier's total mileage over the 24-month scoring window, in 100k-mile units:

$$E_i = \text{window\_miles}_i / 100{,}000$$
Exposure is floored at 0.5 (50k miles) to prevent extreme, meaningless rates for carriers reporting very low mileage.

Empirical Bayes Stabilization

Taming noise without losing signal

Even after normalizing by miles, a tiny fleet with 1 crash in 50k miles looks far worse than a large fleet with 10 crashes in 5 million miles — even though the small fleet's rate is mostly just noise. One lucky or unlucky year can swing their rate wildly.

How Empirical Bayes Shrinkage Works
Low rate
High rate
Peer Mean
Raw (3 trucks)
Stabilized
Large fleet (stays)

For each size band, we fit a Gamma-Poisson prior and blend each carrier's observed rate with the group average. Small fleets get pulled ("shrunk") more toward the mean; large fleets keep their observed rates because the signal is reliable.

$$y_i \mid \lambda_i \sim \text{Poisson}(E_i\lambda_i), \quad \lambda_i \sim \text{Gamma}(\alpha,\beta)$$
$$\hat{\lambda}_i = \frac{y_i + \alpha}{E_i + \beta}$$
When $E_i$ is large relative to $\beta$, the formula reduces to the raw rate. When $E_i$ is small, $\alpha/\beta$ (the group mean) dominates. Bigger sample → less shrinkage.

Point Scale & Weighted Score

Turning rates into a composite number

Each stabilized metric is expressed as a rate ratio against the peer-group average, then converted to a symmetric log scale. A carrier exactly at the peer mean scores 0; twice the mean scores +100; half the mean scores −100.

$$RR = \frac{\hat{\lambda}_i}{\mu_{\text{size}}}, \quad \text{Points} = 100\cdot\log_2(RR)$$

Four safety dimensions are then blended into a single underwriting score using weights tuned for both predictive power (AUC) and size-fairness:

Crash
56%
Behavioral
18%
Equipment
14%
Severe
12%
$$\text{UnderwritingScore} = 0.56\,P_{\text{crash}}+0.18\,P_{\text{behavioral}}+0.14\,P_{\text{equipment}}+0.12\,P_{\text{severe}}$$

Violation Classification & Risk Weights

Violations are classified into behavioral (driver conduct) and equipment (vehicle condition) categories. Each type carries an empirical relative risk weight from temporal crash-correlation validation:

Tier Violation Type RR Weight
Critical — Immediate danger behaviors
Reckless Driving 1.49
Dangerous Driving 1.37
Jumping OOS / Driving Fatigued 1.36
High — Serious behavioral risks
Speeding (high & excessive) 1.30
Drugs / Alcohol 1.29
Alcohol Possession 1.27
Moderate Speeding 1.20
Phone Call / Texting 1.16–1.18
Moderate — Concerning behaviors
False Log 1.12
Seat Belt 1.11
Equipment — Vehicle condition
Lighting 1.17
Tires 1.16
Brakes (all types) 1.13

RR weights represent the empirical relative risk (crash rate ratio) for carriers with each violation type vs. those without. In the FRED pipeline, violations are counted by category (behavioral vs. equipment) and smoothed via Empirical Bayes — the per-type weights inform the classification and are used in the legacy behavioral score.

Expected Loss Calibration

From score to dollars

A fair score is useful for ranking, but underwriters also need a concrete expected event count for pricing. The peer index directly scales the size-band baseline crash rate to produce an expected loss rate for each carrier:

$$\text{EL\_rate}_i = \mu_{\text{crash,\,size}} \times \text{PeerIndex}_i$$
$$\mathbb{E}[\text{crashes}_i] = \text{EL\_rate}_i \times E_i$$
$\mu_{\text{crash,\,size}}$ is the EB-estimated crash rate for the carrier's size band. A carrier with PeerIndex = 2.0 is expected to have twice the baseline crash rate for its band. The actual band-average peer index (shown on each carrier's detail page) is typically below 1.0 due to the right-skewed crash distribution. Observed-over-Expected (O/E) ratios are used to verify calibration.

Peer-Relative Safety Grades

Apples-to-apples comparison

Instead of grading on the raw score, Fleetidy assigns safety grades from a peer index — a weighted composite of how each safety component compares to same-size peers. A peer index of 1.0 means each component rate matches the band mean. Because most carriers have zero crashes, the distribution is right-skewed and the typical band average is below 1.0 (e.g., ~0.85–0.90).

$$\text{PeerIndex}_i = 0.56\,RR_{\text{crash}}+0.18\,RR_{\text{behavioral}}+0.14\,RR_{\text{equipment}}+0.12\,RR_{\text{severe}}$$
Safety Grade Scale
Excellent ≤ 0.25 Strong 0.25 – 0.35 Satisfactory 0.35 – 0.80 Marginal 0.80 – 1.40 Poor 1.40 – 3.00 Critical > 3.00

These cutoffs are the same for all size bands — a Peer Index of 0.30 means "Strong" whether you run 3 trucks or 3,000.

Rolling Window

24-month observation period

All scoring data — crashes, violations, inspections — uses a rolling 24-month window from the most recent FMCSA data snapshot. As the window advances, older events naturally drop out and are replaced by more recent observations.

This approach ensures that every carrier's score reflects their current safety posture. A carrier that improves its practices will see those improvements reflected as older violations age out. Conversely, a carrier that deteriorates will see the impact within months, not years.

All four scoring components — crash rate, behavioral violations, equipment defects, and critical/severe violations — are measured over the same 24-month window, then compared against the carrier's peer group using Empirical Bayes shrinkage. The peer index is computed directly from these rolling-window rates; no additional time adjustments are applied.

Automatic Rules & Flags

Hard overrides, eligibility gates, and informational flags

Beyond the statistical model, a set of deterministic rules handle edge cases where the math alone isn't sufficient. These fall into three categories: score overrides, data-quality adjustments, and informational flags.

Score Overrides

These rules supersede the calculated score entirely:

1
FMCSA Adverse Safety Rating

If a carrier holds a Conditional (C) or Unsatisfactory (U) FMCSA safety rating, the combined score is forced to 0 and the grade is set to Critical, regardless of the calculated peer index.

2
Defunct Interstate Carrier

If a carrier has interstate-only scope, no active operating authority, and is not exempt — all FRED scores are set to NULL (no score produced).

Eligibility Gates

These block FRED scoring entirely when data is insufficient or implausible:

3
Insufficient Exposure

Fewer than 100k miles and zero inspections in the 24-month window. Not enough data to estimate a rate.

4
Mileage Outlier

Reported mileage is implausible: >300k miles per truck, <1k miles per truck for fleets ≥10, or >500M total miles. Scoring is blocked to prevent extreme rates.

5
Unverifiable High Mileage

Reports >300k miles per truck and has fewer than 2 inspections — high mileage can't be corroborated by inspection activity.

Data Quality Adjustments

Modifications to component scores when data quality is degraded:

6
Unreliable Mileage

When mileage is missing or implausible but the carrier has observed activity (inspections, crashes, or violations), the crash and violation rate components are excluded from the composite score.

7
Exposure Floor

Calculated exposure is floored at 50k miles (0.5 units) to prevent extremely volatile rates from tiny denominators.

Informational Flags

These flags are attached to carrier records for context but do not directly alter the score:

NO_OPERATING_AUTHORITY — docket revoked or inactive
LOW_RELIABILITY — fewer than 5 inspections
INSPECTION_PER_PU_OUTLIER — inspection rate outside 1st–99th percentile
GOVERNMENT_ENTITY — federal, state, or local government carrier
HHG_ONLY — exclusively hauls household goods
MEXICAN_CARRIER — domiciled in Mexico
CANADIAN_CARRIER — domiciled in Canada

Validation Standards

How we know it works

Before any score update goes live, we run a battery of fairness and calibration checks. If any gate fails, the update is held until the issue is resolved.

Size-Band O/E

Observed-over-Expected ratio must stay within 0.98 – 1.02 for each size band.

Size x Group O/E

Cross-tab of size band and safety group must stay within 0.90 – 1.10.

Monotonicity

Observed crash rates must rise from Excellent to Critical within every size band.

All scoring and calibration is performed by the reproducible pipeline fred_postgres_v4.py.