Skip to content
1,322,867 nurse-staffing records · CMS PBJ
fonteum
DataAPIRisk SignalsResearchCompareSnapshotsRequest access →
Provider utilization · Reference

Per-NPI per-HCPCS service counts + payment data, free + open. The dataset buyers pay $30K-$185K/yr for.

Fonteumingests the CMS Medicare Provider Utilization & Payment Data (Physician & Other Practitioners by Provider and Service) on the daily HEAD-probe pattern, with full ingestion firing on each annual mid-June release. ~9.5M rows per data year. The 14-tuple provenance contract ships inline with every API response so consumers verify what they’re looking at without a second round-trip.

Try the API → Data catalog → Verify a snapshot →

1. What this dataset is

Per-NPI per-HCPCS Medicare service-count + payment aggregates.

CMS publishes the “Medicare Physician & Other Practitioners by Provider and Service” file annually each mid-June. One row per (NPI, HCPCS code, place of service, data year). Each row carries the count of services rendered, the count of distinct Medicare beneficiaries served, and the submitted / allowed / paid / standardized dollar amounts. Coverage extends back to data year 2018 in the current schema (legacy 2013-2017 schema is different and not ingested in Phase 1).

For context: this is the dataset H1, Definitive Healthcare, and Trilliant Health all sell as their flagship product at $30,000-$185,000 per buyer per year. Fonteum publishes it free, open, with full 14-tuple provenance + Dataset JSON-LD discoverability + free .edu/.gov researcher tier.

2. What this dataset is NOT

No PHI. No claims. No patient records.

CMS pre-aggregates this file to provider-level rows before public release. The dataset contains:

  • NO patient identifiers — no names, no addresses, no dates of birth.
  • NO claim-level rows — only annual rollups per (NPI, HCPCS, POS).
  • NO procedure dates — just the data year.
  • NO cells with beneficiary counts under 11 — CMS pre-suppresses these per its privacy policy. Our schema preserves the suppression by allowing NULL in the count fields.

We additionally drop the provider-name and provider-address columns CMS ships with the file: those facts already live in NPPES (the canonical name + address per NPI), and dual-storage would create drift. Joins back to NPPES happen at query time via the federated identity bridge from PR #145.

3. Refresh schedule

Daily HEAD probe at 06:00 UTC. Full ingest on annual mid-June release.

The Inngest cron runs daily on the schedule 0 6 * * *. The HEAD probe is cheap and short-circuits via the UNIQUE(source_id, snapshot_date) constraint when nothing has changed. Full ingest only fires when CMS publishes a new data year — typically once a year mid-June. The next 364 daily firings are no-ops at the database level.

CMS rotates the bulk-download URL with each annual release. On the first 06:00 UTC fire after publication, if the existing URL returns 404 the operator updates the registry entry at src/lib/sources/cron-sources.ts with the new pattern + manually re-runs the cron via the Inngest dashboard.

4. How it joins to other sources

NPI is the bridge to NPPES, PECOS, LEIE, HRSA.

Every utilization row carries an NPI. The federated identity layer (/identity) joins NPPES (canonical name + address + taxonomy), PECOS (Medicare enrollment status), LEIE (OIG exclusion status), HRSA HPSA (shortage-area assignments), and now utilization in a single query.

Format guard: every NPI is Luhn-validated by the parser at ingest time. Rows failing the 10-digit check are dropped before they reach provider_utilization_summary. This mirrors the Phase-1 PECOS / LEIE validation policy.

5. The API

GET /api/v1/utilization/[npi]

Returns the top-10 HCPCS codes by service count for the given NPI, with the full 14-tuple provenance contract attached inline. Auth flows through the standard withApi handler — bearer token, rate limit, tier resolution. The free .edu/.gov researcher tier gets the same envelope as the paid tiers.

{
  "data": {
    "npi": "1234567893",
    "data_year": 2022,
    "top_hcpcs": [
      {
        "hcpcs_code": "99214",
        "hcpcs_description": "Office or other outpatient visit, est patient, lvl 4",
        "place_of_service": "O",
        "service_count": 412,
        "beneficiary_count": 287,
        "payment_amt": 44480,
        "data_year": 2022
      }
    ],
    "provenance": {
      "_source": "CMS Medicare Provider Utilization & Payment Data ...",
      "_dataset_id": "cms-provider-utilization",
      "_snapshot": "2022-12-31",
      "_methodology": "v2026.05.0",
      "_license": "US-Government-Works",
      "_coverage_period_start": "2018-01-01",
      "_coverage_period_end": "ongoing"
    }
  },
  "meta": { "request_id": "req_...", "api_version": "v1", "...": "..." }
}
6. License + redistribution

US-Government-Works. Anyone can redistribute.

CMS publishes this file as a federal-government work, public domain in the U.S. under 17 U.S.C. §105 and Open Government Data Act. The SPDX identifier US-Government-Works is what Fonteum surfaces in the provenance contract’s _license field for every row derived from this dataset. Anyone — researcher, journalist, competing buyer-tool — can re-use the data with no restriction other than attribution courtesy.

7. How to cite

APA-ish, with the upstream CMS source named.

Fonteum. (2026). CMS Medicare Provider
Utilization & Payment Data — Physician & Other Practitioners by
Provider and Service [data set]. https://fonteum.com/docs/provider-utilization.
Retrieved [date]. Original source: Centers for Medicare & Medicaid
Services. License: US-Government-Works.

Detailed researcher citation guidance lives at /cite; the researcher-api docs describe the citation TOS for the free tier. A future Zenodo DOI will pin specific methodology versions per /chain.

8. Verify the snapshot

SHA-256 attestation + S3 cache mirror.

Every snapshot lands with a SHA-256 attestation written by writeAttestation (PR #135). When the source-cache mirror (PR #154) is provisioned, every snapshot also mirrors to S3 — verifiers can re-download the original CSV from the cache and recompute the hash to confirm byte-exact provenance. Use /verify to walk the chain for any snapshot.

Phase roadmap

Phase 1 ships ingest + API. Phases 2-5 add inpatient + benchmarking + per-specialty rollups.

  • Phase 1 (this wave): Physician & Other Practitioners by Provider and Service ingest + /api/v1/utilization/[npi] endpoint + per-NPI summary card on /v/[vertical] detail pages + Dataset JSON-LD on /coverage + /data.
  • §sprint3-cms-utilization-inpatient (queued): CMS Inpatient/Outpatient DRG-level utilization.
  • §sprint3-cms-utilization-hospice (queued): CMS Hospice utilization.
  • §sprint3-utilization-specialty-rollups (queued): per-specialty utilization patterns by NUCC taxonomy.
  • §sprint3-utilization-benchmarks (queued): comparative benchmarking — “this provider is 2.3x median for 99214” type insights.

Compliance posture

Methodology · Corrections log · Editorial policy

fonteum

Product

  • Data
  • API
  • Methodology
  • Sources
  • Freshness
  • Citations

For buyers

  • AI agents
  • RAG developers
  • Compliance
  • Researchers
  • Developers

Reference

  • Compare
  • llms.txt
  • Agent card
  • Audit pack
  • Quality scorecard
  • Pilot intake
  • Research

Sourced from federal agencies. Fonteum, Inc., Delaware C-corp. © 2026.

Request access→