FOR · ACADEMIC RESEARCHERS
Cite-able federal data, prepared.
Analysis-ready CMS and HHS-OIG datasets cross-joined on CCN and NPI. Methodology documented per version. Public-use, no IRB review required.
Federal upstream data: U.S. Government Works. Fonteum compilation: CC BY 4.0.
Federal datasets
CMS, HHS-OIG, and HHS sub-agencies. All public-domain upstream.
Total rows
Across all 13 datasets. Cross-joinable on CCN and NPI.
IRB requirements
Federal administrative data is public-use. No patient records. No de-identification required.
What's cite-able
13 federal datasets with upstream source URLs.
Every dataset row carries the federal source URL at the time of ingest. The table below lists the upstream origin for each dataset — the URL a peer reviewer can independently retrieve. Fonteum's compilation layer adds cross-source joins, field typing, and methodology versioning; it does not alter or clean the source values.
| Dataset | Federal source | Rows | Grain | Join key | License |
|---|---|---|---|---|---|
| CMS Provider of Services (POS) iQIES | data.cms.gov ↗ | 68,211 | Per certified facility | CCN | U.S. Government Works |
| CMS Care Compare — Home Health | data.cms.gov ↗ | 12,392 | Per CCN-keyed agency | CCN | U.S. Government Works |
| CMS Care Compare — Hospice | data.cms.gov ↗ | 6,943 | Per CCN-keyed facility | CCN | U.S. Government Works |
| CMS Care Compare — Nursing Home Penalties | data.cms.gov ↗ | 16,832 | Per enforcement action | CCN + Survey Event ID | U.S. Government Works |
| CMS NH Health Deficiencies | data.cms.gov ↗ | 418,148 | Per citation | CCN + Survey Event ID | U.S. Government Works |
| OIG LEIE Exclusions | oig.hhs.gov ↗ | 68,055 | Per excluded individual / entity | NPI (joined) | U.S. Government Works |
| CMS PECOS PPEF (Medical Enrichment) | data.cms.gov ↗ | Varies | Per enrolled provider | NPI | U.S. Government Works |
| CMS QPP MIPS Individual Scores | qpp.cms.gov ↗ | 477,137 | Per clinician, per performance year | NPI | U.S. Government Works |
| HCRIS Hospital Cost Reports | www.cms.gov ↗ | 6,102 | Per facility, per cost report period | CCN | U.S. Government Works |
| CMS Open Payments | openpaymentsdata.cms.gov ↗ | Varies | Per payment record | NPI | U.S. Government Works |
| Federally Qualified Health Centers (HHS annual utilization data) | data.hrsa.gov ↗ | ~9,000 | Per FQHC site | FQHC ID (NPI joinable) | U.S. Government Works |
| NSA IDR + MRF Compliance | www.cms.gov ↗ | Derived | Per entity | NPI / Tax ID | U.S. Government Works (upstream); CC BY 4.0 (Fonteum scoring layer) |
| CMS NSA Surprise Billing IDR Filings | www.cms.gov ↗ | Derived | Per filing, per initiating party | NPI / Tax ID | U.S. Government Works |
Methods-section boilerplate
Drop-in paragraph for journal submission.
Replace the bracketed tokens with the specific dataset, federal source name, methodology version string (e.g. snf-owners/v1), and snapshot date. The methodology version is pinned at export time and retrievable from /methodology indefinitely.
Data for this analysis were obtained from Fonteum (fonteum.com), a federally-sourced healthcare data infrastructure layer. The dataset used [DATASET] is derived from [FEDERAL SOURCE], methodology version [VERSION], snapshot date [DATE]. The upstream federal data are public-domain (U.S. Government Works); the Fonteum compilation is available under CC BY 4.0. Cross-source joins are performed on CMS Certification Number (CCN) or National Provider Identifier (NPI) as documented in the methodology version cited above. No patient-level data are included. No IRB review was required for this analysis.
The methodology version in your data export matches the version page at fonteum.com/methodology/[dataset]. That page is durable — the same URL is retrievable after publication so peer reviewers and journal editors can independently verify the methods.
Reproducibility
Stata, R, and Python codebooks on request.
Statistical codebooks are available upon pilot access request. Each codebook includes variable descriptions, dtype contracts, and worked join examples that replicate the cross-source joins documented in the methodology.
Stata
Value labels, variable descriptions, and import scripts for all 13 datasets.
R
Tidyverse-compatible tibble import, column typing, and join vignettes.
Python
pandas / polars import scripts with dtype contracts and CCN ↔ NPI join examples.
Why this works as a reproducibility reference
The methodology page is the audit artifact.
Every Fonteum dataset ships a public methodology page at /methodology/[dataset]. The page renders: source family and Tier classification, ingest cadence, field schema with per-field confidence levels, join logic, known limitations, and version history with change rationale.
A peer reviewer who questions your methods gets a URL, not a vendor statement. The methodology version in the URL matches the version in your data export — and it does not change after you publish.
Browse all methodology pages →Pre-publication data dictionary
Every field documented before you commit to a methodology.
Pilot access includes the full pre-publication data dictionary: field names, types, null rates, known edge cases, and the CMS/HHS source column they map to. The dictionary is delivered as a machine-readable JSON alongside the CSV export so your analysis scripts can validate dtypes at import time without manual inspection.
Field-level null rates
Per field, per dataset snapshot. Null rate changes flagged across versions.
Source column mapping
Every Fonteum field traces to the originating federal column name and file.
Known edge cases
CMS suppression sentinels ("*", "DS"), partial-year cost reports, facility closures mid-year.
Version diff
Field additions, removals, and type changes are documented between methodology versions.
How to cite Fonteum
Four citation formats. One per data export.
Every data export from the Fonteum API includes a four-format citation block in the response envelope: APA, Chicago, plain text, and BibTeX. The citation pins the methodology version and snapshot date so the reference is reproducible regardless of when a reader retrieves it.
// APA
Fonteum, Inc. (2026). [Dataset name], methodology
version [VERSION], snapshot [DATE].
Fonteum. https://fonteum.com/methodology/[dataset]
// BibTeX
@dataset{fonteum_[dataset]_[year],
author = {{Fonteum, Inc.}},
title = {[Dataset name]},
year = {[year]},
version = {[VERSION]},
publisher = {Fonteum},
url = {https://fonteum.com/methodology/[dataset]},
note = {Snapshot date: [DATE]. CC BY 4.0.}
}The citation block in your API response is generated from the pinned methodology version — it does not change when a new methodology version is released. Your published citation remains valid.
Data access
Request data access for your study.
Describe the study scope (datasets, analysis period, research question), and we send a scoped data access agreement within 2 business days. Academic research requests receive a reduced pilot rate. No procurement loop required for single-PI studies.