Per-NPI per-HCPCS service counts + payment data, free + open. The dataset buyers pay $30K-$185K/yr for.
Fonteumingests the CMS Medicare Provider Utilization & Payment Data (Physician & Other Practitioners by Provider and Service) on the daily HEAD-probe pattern, with full ingestion firing on each annual mid-June release. ~9.5M rows per data year. The 14-tuple provenance contract ships inline with every API response so consumers verify what they’re looking at without a second round-trip.
Per-NPI per-HCPCS Medicare service-count + payment aggregates.
CMS publishes the “Medicare Physician & Other Practitioners by Provider and Service” file annually each mid-June. One row per (NPI, HCPCS code, place of service, data year). Each row carries the count of services rendered, the count of distinct Medicare beneficiaries served, and the submitted / allowed / paid / standardized dollar amounts. Coverage extends back to data year 2018 in the current schema (legacy 2013-2017 schema is different and not ingested in Phase 1).
For context: this is the dataset H1, Definitive Healthcare, and Trilliant Health all sell as their flagship product at $30,000-$185,000 per buyer per year. Fonteum publishes it free, open, with full 14-tuple provenance + Dataset JSON-LD discoverability + free .edu/.gov researcher tier.
No PHI. No claims. No patient records.
CMS pre-aggregates this file to provider-level rows before public release. The dataset contains:
- NO patient identifiers — no names, no addresses, no dates of birth.
- NO claim-level rows — only annual rollups per (NPI, HCPCS, POS).
- NO procedure dates — just the data year.
- NO cells with beneficiary counts under 11 — CMS pre-suppresses these per its privacy policy. Our schema preserves the suppression by allowing NULL in the count fields.
We additionally drop the provider-name and provider-address columns CMS ships with the file: those facts already live in NPPES (the canonical name + address per NPI), and dual-storage would create drift. Joins back to NPPES happen at query time via the federated identity bridge from PR #145.
Daily HEAD probe at 06:00 UTC. Full ingest on annual mid-June release.
The Inngest cron runs daily on the schedule 0 6 * * *. The HEAD probe is cheap and short-circuits via the UNIQUE(source_id, snapshot_date) constraint when nothing has changed. Full ingest only fires when CMS publishes a new data year — typically once a year mid-June. The next 364 daily firings are no-ops at the database level.
CMS rotates the bulk-download URL with each annual release. On the first 06:00 UTC fire after publication, if the existing URL returns 404 the operator updates the registry entry at src/lib/sources/cron-sources.ts with the new pattern + manually re-runs the cron via the Inngest dashboard.
NPI is the bridge to NPPES, PECOS, LEIE, HRSA.
Every utilization row carries an NPI. The federated identity layer (/identity) joins NPPES (canonical name + address + taxonomy), PECOS (Medicare enrollment status), LEIE (OIG exclusion status), HRSA HPSA (shortage-area assignments), and now utilization in a single query.
Format guard: every NPI is Luhn-validated by the parser at ingest time. Rows failing the 10-digit check are dropped before they reach provider_utilization_summary. This mirrors the Phase-1 PECOS / LEIE validation policy.
GET /api/v1/utilization/[npi]
Returns the top-10 HCPCS codes by service count for the given NPI, with the full 14-tuple provenance contract attached inline. Auth flows through the standard withApi handler — bearer token, rate limit, tier resolution. The free .edu/.gov researcher tier gets the same envelope as the paid tiers.
{
"data": {
"npi": "1234567893",
"data_year": 2022,
"top_hcpcs": [
{
"hcpcs_code": "99214",
"hcpcs_description": "Office or other outpatient visit, est patient, lvl 4",
"place_of_service": "O",
"service_count": 412,
"beneficiary_count": 287,
"payment_amt": 44480,
"data_year": 2022
}
],
"provenance": {
"_source": "CMS Medicare Provider Utilization & Payment Data ...",
"_dataset_id": "cms-provider-utilization",
"_snapshot": "2022-12-31",
"_methodology": "v2026.05.0",
"_license": "US-Government-Works",
"_coverage_period_start": "2018-01-01",
"_coverage_period_end": "ongoing"
}
},
"meta": { "request_id": "req_...", "api_version": "v1", "...": "..." }
}US-Government-Works. Anyone can redistribute.
CMS publishes this file as a federal-government work, public domain in the U.S. under 17 U.S.C. §105 and Open Government Data Act. The SPDX identifier US-Government-Works is what Fonteum surfaces in the provenance contract’s _license field for every row derived from this dataset. Anyone — researcher, journalist, competing buyer-tool — can re-use the data with no restriction other than attribution courtesy.
APA-ish, with the upstream CMS source named.
Fonteum. (2026). CMS Medicare Provider
Utilization & Payment Data — Physician & Other Practitioners by
Provider and Service [data set]. https://fonteum.com/docs/provider-utilization.
Retrieved [date]. Original source: Centers for Medicare & Medicaid
Services. License: US-Government-Works.Detailed researcher citation guidance lives at /cite; the researcher-api docs describe the citation TOS for the free tier. A future Zenodo DOI will pin specific methodology versions per /chain.
SHA-256 attestation + S3 cache mirror.
Every snapshot lands with a SHA-256 attestation written by writeAttestation (PR #135). When the source-cache mirror (PR #154) is provisioned, every snapshot also mirrors to S3 — verifiers can re-download the original CSV from the cache and recompute the hash to confirm byte-exact provenance. Use /verify to walk the chain for any snapshot.
Phase 1 ships ingest + API. Phases 2-5 add inpatient + benchmarking + per-specialty rollups.
- Phase 1 (this wave): Physician & Other Practitioners by Provider and Service ingest + /api/v1/utilization/[npi] endpoint + per-NPI summary card on /v/[vertical] detail pages + Dataset JSON-LD on /coverage + /data.
- §sprint3-cms-utilization-inpatient (queued): CMS Inpatient/Outpatient DRG-level utilization.
- §sprint3-cms-utilization-hospice (queued): CMS Hospice utilization.
- §sprint3-utilization-specialty-rollups (queued): per-specialty utilization patterns by NUCC taxonomy.
- §sprint3-utilization-benchmarks (queued): comparative benchmarking — “this provider is 2.3x median for 99214” type insights.