Per-CCN per-MS-DRG inpatient utilization, free + open. The dataset Definitive sells as Hospital Performance.
Fonteumingests the CMS Medicare Inpatient Hospitals by Provider and Service file on the daily HEAD-probe pattern, with full ingestion firing on each annual mid-June release. ~60M rows per data year. The 14-tuple provenance contract ships inline with every API response so consumers verify what they’re looking at without a second round-trip.
Per-CCN per-MS-DRG Medicare inpatient discharge + payment + length-of-stay aggregates.
CMS publishes the “Medicare Inpatient Hospitals by Provider and Service” file annually each mid-June. One row per (CCN, MS-DRG code, data year). Each row carries the count of Medicare fee-for-service discharges, the average covered/total/Medicare payment amounts, and the average length of stay. Coverage extends back to data year 2018 in the current schema (legacy 2013-2017 schema is different and not ingested in Phase 2).
For context: this is the dataset Definitive Healthcare sells as their flagship “Hospital Performance” module at $45,000-$95,000 per buyer per year (depending on facility count). Fonteum publishes it free, open, with full 14-tuple provenance + Dataset JSON-LD discoverability + free .edu/.gov researcher tier.
No PHI. No claims. No patient records.
CMS pre-aggregates this file to facility-level rows before public release. The dataset contains:
- NO patient identifiers — no names, no addresses, no dates of birth.
- NO claim-level rows — only annual rollups per (CCN, MS-DRG).
- NO discharge dates — just the data year.
- NO cells with discharge counts under 11 — CMS pre-suppresses these per its privacy policy. Our schema preserves the suppression by allowing NULL in the count fields.
We additionally drop the provider-name and provider-address columns CMS ships with the file: those facts already live in the CMS Provider of Services (POS) file (canonical name + address per CCN), and dual-storage would create drift. Joins back to POS happen at query time via the federated identity bridge.
Daily HEAD probe at 06:00 UTC. Full ingest on annual mid-June release.
The Inngest cron runs daily on the schedule 0 6 * * *. The HEAD probe is cheap and short-circuits via the UNIQUE(source_id, snapshot_date) constraint when nothing has changed. Full ingest only fires when CMS publishes a new data year — typically once a year mid-June.
CCN is the bridge to POS, Care Compare, ownership chains.
Every utilization row carries a CMS Certification Number (CCN). The federated identity layer (/identity) joins the CMS Provider of Services (POS) file (canonical facility name + address + type), Care Compare quality ratings, NH ownership chains, and now utilization in a single query.
Format guard: every CCN is validated against the ^[A-Z0-9]{6}$ pattern at ingest time. Rows failing the check are dropped before they reach inpatient_utilization_summary.
GET /api/v1/utilization/inpatient/[ccn]
Returns the top-10 MS-DRGs by discharge count for the given CCN, with the full 14-tuple provenance contract attached inline. Auth flows through the standard withApi handler — bearer token, rate limit, tier resolution. The free .edu/.gov researcher tier gets the same envelope as the paid tiers.
{
"data": {
"ccn": "010001",
"data_year": 2022,
"top_ms_drgs": [
{
"ms_drg_code": "470",
"ms_drg_description": "MAJOR JOINT REPLACEMENT OR REATTACHMENT...",
"total_discharges": 342,
"avg_covered_charges": 65000,
"avg_total_payments": 13800,
"avg_medicare_payments": 12100,
"avg_length_of_stay": 2.5,
"data_year": 2022
}
],
"provenance": {
"_source": "CMS Medicare Inpatient Hospitals by Provider and Service",
"_dataset_id": "cms-inpatient-utilization",
"_snapshot": "2022-12-31",
"_methodology": "v2026.05.0",
"_license": "US-Government-Works",
"_coverage_period_start": "2018-01-01",
"_coverage_period_end": "ongoing"
}
},
"meta": { "request_id": "req_...", "api_version": "v1", "...": "..." }
}US-Government-Works. Anyone can redistribute.
CMS publishes this file as a federal-government work, public domain in the U.S. under 17 U.S.C. §105 and the Open Government Data Act. The SPDX identifier US-Government-Works is what Fonteum surfaces in the provenance contract’s _license field for every row derived from this dataset.
APA-ish, with the upstream CMS source named.
Fonteum. (2026). CMS Medicare Inpatient
Hospitals by Provider and Service [data set]. https://fonteum.com/docs/utilization-inpatient.
Retrieved [date]. Original source: Centers for Medicare & Medicaid
Services. License: US-Government-Works.Detailed researcher citation guidance lives at /cite; the researcher-api docs describe the citation TOS for the free tier.
SHA-256 attestation + S3 cache mirror.
Every snapshot lands with a SHA-256 attestation written by writeAttestation (PR #135). When the source-cache mirror (PR #154) is provisioned, every snapshot also mirrors to S3 — verifiers can re-download the original CSV from the cache and recompute the hash to confirm byte-exact provenance. Use /verify to walk the chain for any snapshot.