Methodology
How we analyze 227 million Medicaid billing records across 617,503 providers to identify statistical anomalies that may indicate fraud, waste, or abuse.
1Our Approach
We start with the complete Medicaid Provider Spending dataset released by HHS on February 13, 2026 — 227 million aggregated billing records spanning 2018 through 2024, covering $1.09 trillion in total payments across 617,503 unique providers and 10,881 procedure codes.
Rather than relying on a single metric (like total spending), we run 13 independent statistical tests organized into six categories. Each test targets a different dimension of billing behavior — spending levels, claim volume, temporal patterns, growth trajectories, and code-specific pricing.
Our most important innovation is code-specific benchmarking. Instead of comparing a dermatologist's billing to a dialysis center, we compare each provider's cost-per-claim against the national median for that exact procedure code. We compute full decile distributions (p10 through p99) for 9,578 HCPCS codes, placing every provider in a precise percentile range for every service they bill.
Providers are flagged only when they trip one or more of these tests. Multiple overlapping flags from independent tests significantly increase confidence that the anomaly is worth investigating.
In addition to these statistical tests, we run a machine learning model trained on 514 OIG-excluded providers to score all 617K providers for fraud similarity. The total count of 1,860 flagged providers includes both statistically flagged providers (1,360) and 500 additional providers detected only by the ML model.
Statistical Tests
13
across 6 categories
Providers Flagged
1,860
statistical + ML-detected
Critical Risk
18
3+ independent flags
High Risk
156
2 independent flags
Codes Benchmarked
9,578
with full decile data
213 Statistical Tests
Each test is designed to catch a specific type of anomaly. No single test is sufficient to allege fraud — but when multiple independent tests flag the same provider, the probability of a legitimate explanation decreases.
Spending Outliers
Identifying providers whose total spending or per-claim costs deviate far from peers.
Unusually High Spending
Threshold
Total payments exceed 3 standard deviations above the mean for all providers
What it catches
Providers receiving disproportionately large sums of Medicaid funding relative to the entire provider population.
Real example from our data
A single LLC billing $239M when the average provider bills under $2M.
High Cost Per Claim
Threshold
Average cost per claim exceeds 3× the overall median cost per claim
What it catches
Providers charging far more per service than peers — potential upcoding, inflated rates, or billing for services not rendered.
Real example from our data
A provider averaging $2,400/claim when the median across all providers is $78.
Volume Anomalies
Detecting impossible or suspicious claim volumes relative to patient counts.
High Claims Per Patient
Threshold
Claims-per-beneficiary ratio significantly exceeds the peer average for that provider type
What it catches
Providers filing an excessive number of claims per patient — potential phantom billing or service unbundling.
Real example from our data
A behavioral health provider filing 48 claims per patient per month when peers average 4.
Beneficiary Stuffing
Threshold
More than 100 claims filed per beneficiary
What it catches
Extreme cases where the claim volume per patient is physically implausible for most service types.
Real example from our data
A provider filing 312 claims per beneficiary in a single year — roughly one claim per day including weekends.
Pattern Analysis
Identifying billing patterns that deviate from natural clinical variation.
Single-Code
Threshold
Only 1–2 unique procedure codes billed despite high total volume (>$1M)
What it catches
Providers with extremely narrow service offerings at high volume — potential "mills" focused on a single lucrative code.
Real example from our data
A provider billing $47M through a single PCA code (T1019) with zero diversification.
Consistent Billing
Threshold
Coefficient of variation below 0.1 across all active months
What it catches
Unnaturally consistent monthly billing — real clinical demand varies seasonally and month-to-month. Near-zero variation suggests automated or manufactured claims.
Real example from our data
A provider billing exactly $1.23M every month for 36 consecutive months (CV = 0.02).
Billing Swing
290 providers flaggedThreshold
Year-over-year change exceeding 200% AND absolute change exceeding $1M
What it catches
Dramatic billing swings that cannot be explained by gradual growth — potential acquisition of new billing codes, new scheme deployment, or data entry anomalies.
Real example from our data
A provider going from $34.6M to $107M in a single year (209% increase).
Growth Signals
Flagging rapid billing growth that outpaces normal business expansion.
Explosive Growth
Threshold
Year-over-year billing growth exceeding 500%
What it catches
Providers whose billing skyrockets far beyond what organic patient growth could explain.
Real example from our data
A provider going from $800K to $5.2M in one year (550% growth).
Instant Volume
Threshold
New provider (first year in dataset) billing over $1M immediately
What it catches
Brand-new entities that arrive billing at the level of established organizations — potential shell companies or recycled provider identities.
Real example from our data
A newly registered home health agency billing $3.2M in its first 8 months of operation.
New Entrant
200 providers flaggedThreshold
First appeared in 2022 or later and already billing over $5M total
What it catches
Very new entities that have rapidly accumulated large Medicaid payments — especially concerning in fraud-prone categories like home care and behavioral health.
Real example from our data
A health home LLC that appeared in September 2022 and has already billed $239M across 28 months.
Code-Specific Analysis
Comparing each provider’s cost per claim against the national benchmark for that exact procedure code.
Cost Outlier
257 providers flaggedThreshold
Cost per claim exceeds 3× the national MEDIAN for a specific HCPCS code
What it catches
Providers charging far above what other providers charge for the exact same service — the strongest signal of potential upcoding or inflated rates.
Real example from our data
A provider billing $296/claim for G9005 when the national median is $47 (6.3× higher).
Rate Outlier
320 providers flaggedThreshold
Billing above the 90th percentile for 2 or more procedure codes simultaneously
What it catches
Providers who are expensive across multiple services — a pattern rather than a one-code anomaly. Much stronger signal than a single outlier.
Real example from our data
A provider above p90 for both T2022 ($610/claim vs $203 median) and G0506 ($186/claim vs $7 median).
Cross-Reference
Checking flagged providers against external federal exclusion databases.
Oig Exclusion Check
Threshold
NPI appears on the HHS-OIG List of Excluded Individuals and Entities (LEIE)
What it catches
Providers already excluded from federal healthcare programs for prior fraud, abuse, or misconduct who may still be receiving Medicaid payments.
Real example from our data
Cross-referenced all flagged NPIs against 82,715 excluded providers. Result: zero current matches — our flags surface new, uninvestigated activity.
3Advanced Detection Methods
Beyond the 13 core statistical tests, we apply additional analytical techniques drawn from forensic accounting, time-series analysis, and information theory to surface patterns invisible to threshold-based tests.
Billing Velocity (Impossible Volume)
Calculates claims-per-working-day for every provider. Flags those filing 50+ claims daily — a pace that would require each patient visit to last seconds.
Benford's Law Analysis
Tests whether each provider's claim amounts follow the expected leading-digit distribution. Fabricated numbers tend to violate Benford's Law; natural financial data follows it.
CUSUM Change Point Detection
Identifies the exact month each provider's billing behavior structurally shifted using cumulative sum analysis. Flags providers whose monthly billing jumped 3x or more overnight.
Billing Pattern Similarity
Computes cosine similarity between flagged providers' HCPCS billing distributions to identify clusters of providers billing nearly identical code mixes — a potential indicator of coordinated fraud rings.
HCPCS Concentration Index
Measures how concentrated a provider's billing is across procedure codes using the Herfindahl-Hirschman Index. Extreme concentration in a single code — especially a high-value one — is a common fraud pattern.
4Decile Analysis & Risk Levels
National Cost Percentiles
For each of 9,578 procedure codes, we compute the full cost-per-claim distribution across all providers billing that code. Each provider is then placed in a percentile tier.
Normal Range
Below 75th percentile — typical pricing for this code
Top 25%
75th\u201390th percentile — above average but within range
Top 10%
90th\u201395th percentile — notably expensive
Top 5% / Top 1%
Above 95th or 99th percentile — extreme outlier territory
Provider Risk Levels
Risk levels are based on how many independent tests flag a provider. More flags from different test categories mean higher confidence the anomaly is real.
CRITICAL
3+ independent flags. Highest priority — multiple independent anomalies detected across different test dimensions.
18 providers in this tier
HIGH
2 independent flags. Two separate tests independently identified unusual billing behavior.
156 providers in this tier
MODERATE
Single flag. One anomaly detected — may have a legitimate explanation such as specialized services or geographic pricing.
1186 providers in this tier
!What This Is NOT
This is not an accusation of fraud. Every provider on our watchlist is flagged because their billing patterns are statistically unusual — not because we have evidence of wrongdoing. Statistical outliers have explanations that range from data errors to legitimate specialized services to actual fraud.
This is not a replacement for investigation. Our analysis identifies patterns that warrant a closer look. Actual fraud determination requires claims-level review, medical record audits, patient interviews, and legal proceedings — none of which we perform.
This is not a comprehensive fraud detection system. Sophisticated fraud schemes (phantom patients, identity fraud, kickback arrangements) may not produce the statistical signatures our tests detect. The absence of flags does not indicate a clean provider.
Bottom line: Statistical flags ≠ fraud. Treat every flag as a question (“Why is this unusual?”), not an answer (“This is fraud.”).
5Known Limitations
We believe transparency about limitations strengthens credibility. Here is what our analysis can and cannot do.
Aggregate data only
We see provider-level totals, not individual claim lines. We cannot determine whether a specific patient visit was medically necessary or billed correctly.
Government entities appear anomalous
State agencies, county health departments, and fiscal intermediaries aggregate billing for thousands of individual providers. Their high volumes are often legitimate but look extreme in our analysis.
Per diem codes have different economics
Codes like T2016 (residential habilitation) cover an entire day of care. High per-diem rates may reflect bundled services for complex patients, not provider markup.
Specialty drug costs reflect drug pricing
J-codes (injectable drugs) have legitimately high per-claim costs driven by pharmaceutical pricing, not provider behavior. Billing $10,000/claim for Spinraza is the drug's actual cost.
No web validation or OSINT
We have not verified provider addresses, corporate registrations, or online presence. Some flagged NPIs may correspond to dissolved entities or incorrect registrations.
LEIE labels lag actual fraud
The OIG exclusion list reflects outcomes from investigations that began 1–5 years earlier. A provider can be actively committing fraud for years before appearing on the LEIE.
Medicaid only
T-MSIS captures Medicaid spending only. A provider billing $50M in Medicaid might also bill $200M in Medicare and private insurance — we cannot see that broader picture.
Self-directed care programs
Organizations like Public Partnerships LLC are legitimate fiscal management entities for self-directed care. They aggregate billing on behalf of thousands of individual caregivers, so their high totals are by design — though the self-directed care category is fraud-prone.
6Data Source
CMS T-MSIS Medicaid Provider Spending
All data comes from the HHS Open Data Platform — the Medicaid Provider Spending dataset released by the HHS DOGE team on February 13, 2026. This is derived from the Transformed Medicaid Statistical Information System (T-MSIS), the federal data system that collects Medicaid and CHIP data from all 50 states, DC, and territories.
Total Records
227M
Total Payments
$1.09T
Providers
617,503
Procedure Codes
10,881
Benchmarked Codes
9,578
Date Range
2018–2024
Fields available in the dataset
NPI (National Provider Identifier)
Provider name, city, state, specialty
HCPCS procedure code per row
Monthly payment amounts and claim counts
Beneficiary counts per code per month
Date range of billing activity
Fields NOT available (we cannot see)
Individual claim-line detail
Patient diagnosis codes (ICD-10)
Referring provider information
Place of service detail
Claim denial/adjustment history
Medicare or private insurance billing
7How We Compare
Most Medicaid fraud analysis is either locked behind academic paywalls, published as static PDFs with no interactive exploration, or uses opaque ML models that cannot explain their predictions. We built something different.
Code-specific benchmarks
We compare each provider to the median for THAT exact procedure code — not a generic overall average. A provider billing H2017 is compared only to other H2017 billers.
Full decile distributions
For 9,578 codes we compute p10, p25, p50, p75, p90, p95, and p99 — giving a complete picture of where any provider falls in the national distribution.
13 independent tests
Each test catches a different anomaly type. Multiple overlapping flags from different test categories are far more significant than a single flag.
Interactive exploration
12,800+ pages covering every provider, procedure code, and state — with search, filtering, and drill-down into specific billing codes.
Open and free
All analysis is publicly accessible at no cost. No paywall, no subscription, no login required. Built from public data for public accountability.
Explainable flags
Every flag includes a plain-English explanation with specific numbers: which codes, what ratios, how much money. No black-box ML scores.
Key Finding: OIG Exclusion List Cross-Reference
We cross-referenced all 1,860 flagged providers against the HHS Office of Inspector General's List of Excluded Individuals and Entities (LEIE) — 82,715 providers excluded from federal healthcare programs for fraud, abuse, or misconduct.
Result: Zero matches
None of our flagged providers appear on the current OIG exclusion list. This suggests our analysis is surfacing new, uninvestigated activity rather than re-flagging known bad actors.