Skip to content
FonteumThe Graph
DataResearchCare CompareThe DifferAttestAPI
See the proof
  • Data
  • Research
  • Care Compare
  • The Differ
  • Attest
  • API
See the proof
Provider utilization · Reference

Per-NPI per-HCPCS service counts + payment data, free + open. The dataset buyers pay $30K-$185K/yr for.

Fonteumingests the CMS Medicare Provider Utilization & Payment Data (Physician & Other Practitioners by Provider and Service) on the daily HEAD-probe pattern, with full ingestion firing on each annual mid-June release. ~9.5M rows per data year. The 14-tuple provenance contract ships inline with every API response so consumers verify what they’re looking at without a second round-trip.

Try the API → Data catalog → Verify a snapshot →

1. What this dataset is

Per-NPI per-HCPCS Medicare service-count + payment aggregates.

CMS publishes the “Medicare Physician & Other Practitioners by Provider and Service” file annually each mid-June. One row per (NPI, HCPCS code, place of service, data year). Each row carries the count of services rendered, the count of distinct Medicare beneficiaries served, and the submitted / allowed / paid / standardized dollar amounts. Coverage extends back to data year 2018 in the current schema (legacy 2013-2017 schema is different and not ingested in Phase 1).

For context: this is the dataset H1, Definitive Healthcare, and Trilliant Health all sell as their flagship product at $30,000-$185,000 per buyer per year. Fonteum publishes it free, open, with full 14-tuple provenance + Dataset JSON-LD discoverability + free .edu/.gov researcher tier.

2. What this dataset is NOT

No PHI. No claims. No patient records.

CMS pre-aggregates this file to provider-level rows before public release. The dataset contains:

  • NO patient identifiers — no names, no addresses, no dates of birth.
  • NO claim-level rows — only annual rollups per (NPI, HCPCS, POS).
  • NO procedure dates — just the data year.
  • NO cells with beneficiary counts under 11 — CMS pre-suppresses these per its privacy policy. Our schema preserves the suppression by allowing NULL in the count fields.

We additionally drop the provider-name and provider-address columns CMS ships with the file: those facts already live in NPPES (the canonical name + address per NPI), and dual-storage would create drift. Joins back to NPPES happen at query time via the federated identity bridge from PR #145.

3. Refresh schedule

Daily HEAD probe at 06:00 UTC. Full ingest on annual mid-June release.

The Inngest cron runs daily on the schedule 0 6 * * *. The HEAD probe is cheap and short-circuits via the UNIQUE(source_id, snapshot_date) constraint when nothing has changed. Full ingest only fires when CMS publishes a new data year — typically once a year mid-June. The next 364 daily firings are no-ops at the database level.

CMS rotates the bulk-download URL with each annual release. On the first 06:00 UTC fire after publication, if the existing URL returns 404 the operator updates the registry entry at src/lib/sources/cron-sources.ts with the new pattern + manually re-runs the cron via the Inngest dashboard.

4. How it joins to other sources

NPI is the bridge to NPPES, PECOS, LEIE, HRSA.

Every utilization row carries an NPI. The federated identity layer (/identity) joins NPPES (canonical name + address + taxonomy), PECOS (Medicare enrollment status), LEIE (OIG exclusion status), HRSA HPSA (shortage-area assignments), and now utilization in a single query.

Format guard: every NPI is Luhn-validated by the parser at ingest time. Rows failing the 10-digit check are dropped before they reach provider_utilization_summary. This mirrors the Phase-1 PECOS / LEIE validation policy.

5. The API

GET /api/v1/utilization/[npi]

Returns the top-10 HCPCS codes by service count for the given NPI, with the full 14-tuple provenance contract attached inline. Auth flows through the standard withApi handler — bearer token, rate limit, tier resolution. The free .edu/.gov researcher tier gets the same envelope as the paid tiers.

{
  "data": {
    "npi": "1234567893",
    "data_year": 2022,
    "top_hcpcs": [
      {
        "hcpcs_code": "99214",
        "hcpcs_description": "Office or other outpatient visit, est patient, lvl 4",
        "place_of_service": "O",
        "service_count": 412,
        "beneficiary_count": 287,
        "payment_amt": 44480,
        "data_year": 2022
      }
    ],
    "provenance": {
      "_source": "CMS Medicare Provider Utilization & Payment Data ...",
      "_dataset_id": "cms-provider-utilization",
      "_snapshot": "2022-12-31",
      "_methodology": "v2026.05.0",
      "_license": "US-Government-Works",
      "_coverage_period_start": "2018-01-01",
      "_coverage_period_end": "ongoing"
    }
  },
  "meta": { "request_id": "req_...", "api_version": "v1", "...": "..." }
}
6. License + redistribution

US-Government-Works. Anyone can redistribute.

CMS publishes this file as a federal-government work, public domain in the U.S. under 17 U.S.C. §105 and Open Government Data Act. The SPDX identifier US-Government-Works is what Fonteum surfaces in the provenance contract’s _license field for every row derived from this dataset. Anyone — researcher, journalist, competing buyer-tool — can re-use the data with no restriction other than attribution courtesy.

7. How to cite

APA-ish, with the upstream CMS source named.

Fonteum. (2026). CMS Medicare Provider
Utilization & Payment Data — Physician & Other Practitioners by
Provider and Service [data set]. https://fonteum.com/docs/provider-utilization.
Retrieved [date]. Original source: Centers for Medicare & Medicaid
Services. License: US-Government-Works.

Detailed researcher citation guidance lives at /cite; the researcher-api docs describe the citation TOS for the free tier. Methodology versions are pinned per attestation chain at /chain.

8. Verify the snapshot

SHA-256 attestation + S3 cache mirror.

Every snapshot lands with a SHA-256 attestation written by writeAttestation (PR #135). When the source-cache mirror (PR #154) is provisioned, every snapshot also mirrors to S3 — verifiers can re-download the original CSV from the cache and recompute the hash to confirm byte-exact provenance. Use /verify to walk the chain for any snapshot.

Phase roadmap

Phase 1 ships ingest + API. Phases 2-5 add inpatient + benchmarking + per-specialty rollups.

  • Phase 1 (this wave): Physician & Other Practitioners by Provider and Service ingest + /api/v1/utilization/[npi] endpoint + per-NPI summary card on /v/[vertical] detail pages + Dataset JSON-LD on /coverage + /data.
  • §sprint3-cms-utilization-inpatient (queued): CMS Inpatient/Outpatient DRG-level utilization.
  • §sprint3-cms-utilization-hospice (queued): CMS Hospice utilization.
  • §sprint3-utilization-specialty-rollups (queued): per-specialty utilization patterns by NUCC taxonomy.
  • §sprint3-utilization-benchmarks (queued): comparative benchmarking — “this provider is 2.3x median for 99214” type insights.

Built on the authoritative federal record

The primary sources, named on every page.

These are the federal agencies whose public datasets Fonteum ingests and attributes — the issuing authorities, not customers or partners. Every figure on the site links back to one of them.

  • CMS
  • HHS-OIG
  • HRSA
  • FDA
  • NLM
  • NUCC
  • Census
  • BLS
  • BEA

See the full source registry, with license and refresh cadence for each →

Reproducible by design

Every figure traces to its federal source.

14-tuple provenance

Every rendered fact ties to a source URL, dataset ID, snapshot date, row key, and SHA-256 — the full chain-of-custody record.

Reproducible SQL

Each study ships the exact query behind its figures, run against the cited federal snapshot. Re-run it yourself.

Daily reconciliation

Published counts are reconciled against the upstream federal datasets on a daily cadence, with drift logged.

Named medical review

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

Read the full provenance and attestation methodology →

Two doors

Use the free API and open data

Query providers, facilities, sanctions, and quality scores — each field carrying its federal source. Self-serve, no call to start.

Explore the API →Browse the data catalog →

Talk to us

Managed pilots, enterprise terms, and audit-ready, signed attestation packages for compliance, risk, and research teams.

Talk to us →
Fonteum
Products
The DifferAttestAPIFHIR API
Data
Care CompareResearchData catalogSources
Company
Why FonteumAboutPressEditorial policyCorrections
Legal
Privacy policyTerms of serviceMedical disclaimer

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

© 2026 Fonteum LLC. All rights reserved.

The U.S. healthcare graph AI can cite — every fact carries its source.

Request access→

The substrate, by the numbers

9.2Mgraph entitiesProviders, organizations, owners, and facilities
13.3Mlinked identifiersNPIs, CCNs, LEIs and more, resolved to entities
4.7Mgraph edgesSource-attested relationships between entities
44federal source familiesDistinct CMS, OIG, HRSA, FDA and peer datasets
35dataset pagesCitable, downloadable /data catalog pages
52reproducible studiesEach shipping the SQL behind its figures