Provider utilization · Reference

Per-NPI per-HCPCS service counts + payment data, free + open. The dataset buyers pay $30K-$185K/yr for.

The CMS Medicare Provider Utilization & Payment Data file is annual. Check the loaded snapshot shown by the endpoint rather than inferring freshness from the publisher cadence. Supported responses identify available source and methodology fields; not every field has a complete 14-element provenance tuple.

Try the API → Data catalog → Verify a snapshot →

1. What this dataset is

Per-NPI per-HCPCS Medicare service-count + payment aggregates.

CMS publishes the “Medicare Physician & Other Practitioners by Provider and Service” file annually each mid-June. One row per (NPI, HCPCS code, place of service, data year). Each row carries the count of services rendered, the count of distinct Medicare beneficiaries served, and the submitted / allowed / paid / standardized dollar amounts. Coverage extends back to data year 2018 in the current schema (legacy 2013-2017 schema is different and not ingested in Phase 1).

Fonteum publishes this public-use CMS dataset with Dataset JSON-LD and the source, observation, methodology, and license fields available for the endpoint. Optional provenance fields can be null.

2. What this dataset is NOT

No PHI. No claims. No patient records.

CMS pre-aggregates this file to provider-level rows before public release. The dataset contains:

NO patient identifiers — no names, no addresses, no dates of birth.
NO claim-level rows — only annual rollups per (NPI, HCPCS, POS).
NO procedure dates — just the data year.
NO cells with beneficiary counts under 11 — CMS pre-suppresses these per its privacy policy. Our schema preserves the suppression by allowing NULL in the count fields.

We additionally drop the provider-name and provider-address columns CMS ships with the file: those facts already live in NPPES (the canonical name + address per NPI), and dual-storage would create drift. Joins back to NPPES happen at query time via the federated identity bridge from PR #145.

3. Refresh schedule

Configured probe schedule: 06:00 UTC daily. Loaded freshness is reported separately.

The configured Inngest cron schedule is 0 6 * * *. A configured probe does not prove a current load or a successful run. Check the endpoint's dated loaded snapshot; a full ingest is intended only when CMS publishes a new annual data year.

CMS rotates the bulk-download URL with each annual release. If the configured URL changes, an operator must update the registry pattern and run the ingest. That operational path is not a claim that the newest publisher release is already loaded.

4. How it joins to other sources

NPI is the bridge to NPPES, PECOS, LEIE, HRSA.

Every utilization row carries an NPI. The federated identity layer (/identity) joins NPPES (canonical name + address + taxonomy), PECOS (Medicare enrollment status), LEIE (OIG exclusion status), HRSA HPSA (shortage-area assignments), and now utilization in a single query.

Format guard: every NPI is Luhn-validated by the parser at ingest time. Rows failing the 10-digit check are dropped before they reach provider_utilization_summary. This mirrors the Phase-1 PECOS / LEIE validation policy.

5. The API

`GET /api/v1/utilization/[npi]`

Returns the top-10 HCPCS codes by service count for the given NPI. The response can include source-level provenance metadata; field availability is endpoint-specific and a response-level tuple is not an individual-fact signature.

{
  "data": {
    "npi": "1234567893",
    "data_year": 2022,
    "top_hcpcs": [
      {
        "hcpcs_code": "99214",
        "hcpcs_description": "Office or other outpatient visit, est patient, lvl 4",
        "place_of_service": "O",
        "service_count": 412,
        "beneficiary_count": 287,
        "payment_amt": 44480,
        "data_year": 2022
      }
    ],
    "provenance": {
      "_source": "CMS Medicare Provider Utilization & Payment Data ...",
      "_dataset_id": "cms-provider-utilization",
      "_snapshot": "2022-12-31",
      "_methodology": "v2026.05.0",
      "_license": "US-Government-Works",
      "_coverage_period_start": "2018-01-01",
      "_coverage_period_end": "ongoing"
    }
  },
  "meta": { "request_id": "req_...", "api_version": "v1", "...": "..." }
}

6. License + redistribution

US-Government-Works. Anyone can redistribute.

CMS publishes this file as a federal-government work, public domain in the U.S. under 17 U.S.C. §105. When the endpoint supplies _license, it uses US-Government-Works for rows derived from this dataset; callers must still handle a null or absent field.

7. How to cite

APA-ish, with the upstream CMS source named.

Fonteum. (2026). CMS Medicare Provider
Utilization & Payment Data — Physician & Other Practitioners by
Provider and Service [data set]. https://fonteum.com/docs/provider-utilization.
Retrieved [date]. Original source: Centers for Medicare & Medicaid
Services. License: US-Government-Works.

Detailed researcher citation guidance lives at /cite; the researcher-api docs describe the citation TOS for the free tier. Methodology versions are pinned per attestation chain at /chain.

8. Verify the snapshot

SHA-256 attestation + S3 cache mirror.

For a snapshot that has an attestation and retained source bytes, the /verify surface exposes the recorded digest. Coverage and source-byte retention are not universal.

Phase roadmap

Phase 1 ships ingest + API. Phases 2-5 add inpatient + benchmarking + per-specialty rollups.

Phase 1 (this wave): Physician & Other Practitioners by Provider and Service ingest + /api/v1/utilization/[npi] endpoint + per-NPI summary card on /providers/[npi] + Dataset JSON-LD on /coverage + /data.
§sprint3-cms-utilization-inpatient (queued): CMS Inpatient/Outpatient DRG-level utilization.
§sprint3-cms-utilization-hospice (queued): CMS Hospice utilization.
§sprint3-utilization-specialty-rollups (queued): per-specialty utilization patterns by NUCC taxonomy.
§sprint3-utilization-benchmarks (queued): comparative benchmarking — “this provider is 2.3x median for 99214” type insights.