Skip to content
FonteumThe Graph
DataResearchCare CompareThe DifferAttestAPI
See the proof
  • Data
  • Research
  • Care Compare
  • The Differ
  • Attest
  • API
See the proof

FONTEUM · USE CASE · RESEARCHERS

citations

Reproducible federal datasets, free for research.

A free researcher API for .edu and .gov, modeled on WRDS, ResDAC, and HCUP — with methodology, a limitations block, and a source-to-snapshot citation chain on every dataset.

Register for the researcher API →
Research access stack

Datasets · researcher API · citation chain

  • , no account

    Research dataset corpus

    Free CSV + JSON

    Every published study at /research is freely downloadable as CSV and JSON with no account — for example the Nursing Home Deficiency & Harm Rate study (

    418,148Source: https://data.cms.gov/provider-data/dataset/r9s8-i3pj · Dataset: cms-care-compare-nh-deficiencies/v1 · Snapshot: 2026-05-01
    citation records across facilities). Each ships with methodology, an explicit limitations block, and a Schema.org Dataset JSON-LD record.

    Explore →

  • Citation, not a license fee

    Researcher API

    Free for .edu / .gov

    Programmatic access to research snapshots and the provider graph, free for academic and government institutions at /signup/researcher — the same access model as Wharton WRDS, CMS ResDAC, and AHRQ HCUP, where the price is a standard citation rather than a license fee. Includes semantic search over

    6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
    active NPI embeddings.

    Explore →

  • Source → snapshot → method

    Citation chain

    Reproducible by design

    Each result resolves source → dataset identifier → snapshot date → methodology version, so a finding can be re-derived from the public federal file at any later date. One of 44 federal source families feeds the corpus, each documented at /sources with tier, cadence, and redistribution posture.

    Explore →

Why researchers choose federal primary sources

Free, citable, reproducible — to the snapshot

Free access, the WRDS / ResDAC / HCUP way

Academic and government teams get free programmatic access via the researcher tier at /signup/researcher — the same model as Wharton WRDS, CMS ResDAC, and AHRQ HCUP, where the price is a standard citation rather than a license fee. It exposes research snapshots, the provider graph, and semantic search across

6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
active NPI embeddings. Static study files at /research need no account at all.

Reproducible to a pinned snapshot

Every study resolves a citation chain — source → dataset identifier → snapshot date → methodology version — and pins the method to a snapshot, so a figure stays reproducible against a fixed point in time even after the federal file refreshes. Because the data is public federal record under 17 U.S.C. § 105, any result can be re-derived from the original government file. The

Fonteum-authored layerSource: https://fonteum.com/research · Dataset: fonteum-research/v1 · Snapshot: 2026-05-27
is CC-BY-4.0; the underlying records stay public domain.

Methodology and limitations, in the open

Each dataset ships with its methodology written out, an explicit limitations block describing what the source does and does not support, and a Schema.org Dataset JSON-LD record. Findings span the federal corpus — for instance

418,148Source: https://data.cms.gov/provider-data/dataset/r9s8-i3pj · Dataset: cms-care-compare-nh-deficiencies/v1 · Snapshot: 2026-05-01
nursing-home deficiency citations with 5.59% at G+ actual harm. Fonteum does not mint external academic identifiers it has not registered; the citation chain points to the live study page and its pinned method.

How it works

From federal portal to citable dataset

Step 1 / Ingest

Ingest

Fonteum pulls directly from the federal portals on each source's native cadence — CMS NPPES weekly (

6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
active providers), Care Compare deficiency citations ( records), and the OIG LEIE monthly ( exclusions). No commercial aggregator sits between the study and the federal record, so a published figure traces to the government file.

Step 2 / Provenance

Provenance

Each study is pinned to a methodology version with a complete citation chain — source, dataset identifier, snapshot date, method — plus an explicit limitations block and a Schema.org Dataset JSON-LD record. Because the snapshot is fixed, a result stays reproducible against that point in time even after the underlying federal file refreshes, which is what peer review and replication require.

Step 3 / Deliver

Deliver

Start with free CSV and JSON at /research — no account, no API key. Academic and government teams register for the free researcher API at /signup/researcher for programmatic access to research snapshots, the provider graph, and semantic search. Every dataset is cataloged in the DCAT-US 3.0 catalog at /data. Custom cohort extracts beyond the published studies are available via the pilot tier.

FAQ

Common questions

Who qualifies for the free researcher API?
The researcher tier at /signup/researcher is free for academic and government institutions — .edu and .gov affiliations — modeled directly on the access pattern of Wharton Research Data Services (WRDS), the CMS Research Data Assistance Center (ResDAC), and the AHRQ Healthcare Cost and Utilization Project (HCUP). In each of those, the price of access is a standard citation in resulting work rather than a license fee, and Fonteum follows the same model. The researcher API provides programmatic access to research snapshots and the provider graph, including semantic search over active NPI embeddings for natural-language queries by specialty, geography, or clinical context. For researchers who only need the published study files, the static CSV and JSON downloads at /research require no account or API key at all, so an exploratory analysis can begin immediately and the API credential is needed only for programmatic or graph-scale work.
What makes Fonteum's datasets reproducible?
Reproducibility rests on a complete citation chain attached to every result: source name → dataset identifier → snapshot date → methodology version. Because Fonteum draws exclusively from public federal files — CMS, OIG HHS, HRSA, BLS, Census — that are US Government Works in the public domain under 17 U.S.C. § 105, any figure Fonteum publishes can be re-derived from the original government file rather than taken on faith. Each study ships with its methodology written out, an explicit limitations block describing what the source does and does not support, and a Schema.org Dataset JSON-LD record. The methodology version is pinned per snapshot, so a result stays reproducible against a fixed point in time even after the underlying federal file refreshes. The standard in academic healthcare work is that a published statistic must be traceable to its authoritative origin — the is what satisfies that, with the federal record and the pinned methodology version both recorded.
What research studies are already published?
The /research corpus covers studies built on the federal source families. Representative examples: the Nursing Home Deficiency & Harm Rate study ( citation records across facilities, 5.59% at scope/severity G or above indicating actual harm, with a 14.7× state-level disparity between Illinois and New Hampshire); the MIPS Score Distribution by specialty study (477,137 PY2023 clinician scores); Open Payments recipient concentration (top 1% of doctors taking two-thirds of industry money in PY2024); and Medicare Part D prescriber concentration (top 5% of prescribers driving half of the drug bill). Each study carries its full methodology, limitations block, dataset downloads, and citation footer. The complete, current list — these grow over time — lives at /research, and each dataset is cataloged in the DCAT-US 3.0 catalog at /data.
How should I cite a Fonteum dataset?
Each published study includes a citation footer and a Schema.org Dataset JSON-LD record carrying the dataset name, the pinned methodology version, the snapshot date, and the underlying federal source. A citation should name the Fonteum study, its methodology version, and the access date, and — because the data is public federal record — the underlying government source the study draws from (for example CMS Care Compare or OIG LEIE). Fonteum's datasets are released under CC-BY-4.0 for the Fonteum-authored layer, while the underlying federal records remain US Government Works in the public domain. Citing both the Fonteum study and its federal source is the practice that keeps a result traceable: a reviewer can follow the citation to the pinned snapshot and re-derive the figure from the public file. Fonteum does not mint external academic identifiers it has not actually registered — the citation chain points to the live study page and its methodology version, which is what is asserted to exist.
What scale and breadth of data can a researcher expect?
The provider graph is anchored by CMS NPPES at active providers, drawn from roughly 8M total NPI records and refreshed weekly. The federal study layer carries substantial volume: the Nursing Home Deficiency & Harm Rate dataset holds citation records across facilities; CMS PBJ Daily Nurse Staffing contributes daily records per quarter across 14,537 facilities; and the OIG LEIE exclusions list covers excluded individuals and entities, refreshed monthly. All are documented at /sources with tier, refresh cadence, jurisdiction coverage, and redistribution posture. For a researcher this means provider identity, facility quality, staffing, ownership, and sanction status are joinable by NPI or CCN within one auditable layer — rather than reconstructed from separately licensed commercial vendors with opaque derivation.
Can I use Fonteum data in a published paper or grant deliverable?
Yes. The Fonteum-authored dataset layer is released under CC-BY-4.0, and the underlying federal records are US Government Works in the public domain, so there is no license barrier to using the data in a published paper, a grant deliverable, or a class. The expectation in return — consistent with the WRDS / ResDAC / HCUP model the researcher tier is built on — is a standard citation to the Fonteum study and its federal source. Every dataset ships with the methodology, the limitations block, and the citation footer needed to document the source rigorously, and the pinned methodology version keeps the result reproducible against a fixed snapshot for peer review. For work that needs a custom extract beyond the published studies — a specific cut of the provider graph keyed to a cohort — the pilot tier provides scoped delivery, though most academic work is fully served by the free researcher API and the static downloads.
Register →

Register for the free researcher API.

Free for .edu and .gov institutions — citation, not a license fee. Browse studies free at /research or the catalog at /data.

Request access →or register for the researcher API →

FONTEUM · PILOT

Run a 90-day pilot. Public data only. No PHI.

Request access→ Read the methodology
See also
  • /research → All published studies — free CSV + JSON downloads.
  • /signup/researcher → Free researcher API for .edu/.gov institutions.
  • /data → DCAT-US 3.0 catalog — all Fonteum datasets in one place.
  • /use-cases/healthcare-analytics → Pipeline-ready bulk export for analytics teams.

Built on the authoritative federal record

The primary sources, named on every page.

These are the federal agencies whose public datasets Fonteum ingests and attributes — the issuing authorities, not customers or partners. Every figure on the site links back to one of them.

  • CMS
  • HHS-OIG
  • HRSA
  • FDA
  • NLM
  • NUCC
  • Census
  • BLS
  • BEA

See the full source registry, with license and refresh cadence for each →

Reproducible by design

Every figure traces to its federal source.

14-tuple provenance

Every rendered fact ties to a source URL, dataset ID, snapshot date, row key, and SHA-256 — the full chain-of-custody record.

Reproducible SQL

Each study ships the exact query behind its figures, run against the cited federal snapshot. Re-run it yourself.

Daily reconciliation

Published counts are reconciled against the upstream federal datasets on a daily cadence, with drift logged.

Named medical review

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

Read the full provenance and attestation methodology →

Two doors

Use the free API and open data

Query providers, facilities, sanctions, and quality scores — each field carrying its federal source. Self-serve, no call to start.

Explore the API →Browse the data catalog →

Talk to us

Managed pilots, enterprise terms, and audit-ready, signed attestation packages for compliance, risk, and research teams.

Talk to us →
Fonteum
Products
The DifferAttestAPIFHIR API
Data
Care CompareResearchData catalogSources
Company
Why FonteumAboutPressEditorial policyCorrections
Legal
Privacy policyTerms of serviceMedical disclaimer

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

© 2026 Fonteum LLC. All rights reserved.

The U.S. healthcare graph AI can cite — every fact carries its source.

Request access→

The substrate, by the numbers

9.2Mgraph entitiesProviders, organizations, owners, and facilities
15.7Mlinked identifiersNPIs, CCNs, LEIs and more, resolved to entities
5Mgraph edgesSource-attested relationships between entities
44federal source familiesDistinct CMS, OIG, HRSA, FDA and peer datasets
35dataset pagesCitable, downloadable /data catalog pages
65reproducible studiesEach shipping the SQL behind its figures