Download Data
All datasets are derived from HHS Open Data (227 million Medicaid billing records, 2018–2024). Files are in JSON format and can be opened with any text editor, Python, R, or data analysis tool.
Unified Risk Watchlist
Combine both datasets below to build a unified view of all flagged providers. The statistical watchlist contains 880 providers flagged by code-specific billing tests, while the ML scores file contains fraud-similarity scores from a model trained on 514 confirmed fraud cases. Join on the npi field to merge statistical flags with ML scores for a complete risk picture.
Risk Watchlist (Statistical)
880 providers flagged by 4 code-specific fraud detection tests. Includes flag types, flag details with specific codes and ratios, provider demographics, and total spending.
Risk Watchlist (Legacy)
788 providers flagged by 9 legacy fraud detection tests including outlier spending, explosive growth, beneficiary stuffing, and billing consistency anomalies.
ML Fraud Scores
Machine learning fraud similarity scores for top providers. Random Forest model trained on 514 OIG-excluded providers. Includes feature values like cost per claim, code concentration, and self-billing ratio.
Top 1,000 Providers
The 1,000 highest-spending Medicaid providers ranked by total payments. Includes NPI, name, specialty, city, state, total paid, claims, beneficiaries, and flag counts.
State Summaries
Aggregated Medicaid spending data for all 50 states. Includes total payments, claims, beneficiaries, provider counts, and top procedures by state.
Procedure Codes
All 10,881 HCPCS procedure codes billed to Medicaid with total payments, claim counts, provider counts, and average cost per claim.
Code Benchmarks
National cost-per-claim benchmarks for 9,578 procedure codes. Includes average, median, standard deviation, and percentile distributions (p10 through p99).
Yearly Trends
Annual Medicaid spending totals from 2018 to 2024. Includes total payments, claims, beneficiaries, and provider counts per year.
Data Usage & Citation
This data is derived from publicly available U.S. Department of Health & Human Services Medicaid provider spending records. The underlying data is in the public domain. Our analysis, risk scores, and derived datasets may be freely used with attribution.
Suggested citation: OpenMedicaid by TheDataProject.ai. Analysis of HHS Medicaid Provider Spending data (2018–2024). Available at openmedicaid.org.
Important caveats: Statistical flags and ML scores indicate unusual billing patterns worth investigating — they are not proof of fraud or wrongdoing. Government entities, home care programs, hospitals, and large care organizations may legitimately bill at high rates. See our methodology page for details on how flags are calculated.