Datasets · researcher API · citation chain
- , no account
Research dataset corpus
Free CSV + JSON
Every published study at /research is freely downloadable as CSV and JSON with no account — for example the Nursing Home Deficiency & Harm Rate study ( citation records across facilities). Each ships with methodology, an explicit limitations block, and a Schema.org Dataset JSON-LD record.
- Citation, not a license fee
Researcher API
Free for .edu / .gov
Programmatic access to research snapshots and the provider graph, free for academic and government institutions at /signup/researcher — the same access model as Wharton WRDS, CMS ResDAC, and AHRQ HCUP, where the price is a standard citation rather than a license fee. Includes semantic search over active NPI embeddings.
- Source → snapshot → method
Citation chain
Reproducible by design
Each result resolves source → dataset identifier → snapshot date → methodology version, so a finding can be re-derived from the public federal file at any later date. One of 44 federal source families feeds the corpus, each documented at /sources with tier, cadence, and redistribution posture.
Free, citable, reproducible — to the snapshot
Free access, the WRDS / ResDAC / HCUP way
Academic and government teams get free programmatic access via the researcher tier at /signup/researcher — the same model as Wharton WRDS, CMS ResDAC, and AHRQ HCUP, where the price is a standard citation rather than a license fee. It exposes research snapshots, the provider graph, and semantic search across active NPI embeddings. Static study files at /research need no account at all.
Reproducible to a pinned snapshot
Every study resolves a citation chain — source → dataset identifier → snapshot date → methodology version — and pins the method to a snapshot, so a figure stays reproducible against a fixed point in time even after the federal file refreshes. Because the data is public federal record under 17 U.S.C. § 105, any result can be re-derived from the original government file. The is CC-BY-4.0; the underlying records stay public domain.
Methodology and limitations, in the open
Each dataset ships with its methodology written out, an explicit limitations block describing what the source does and does not support, and a Schema.org Dataset JSON-LD record. Findings span the federal corpus — for instance nursing-home deficiency citations with 5.59% at G+ actual harm. Fonteum does not mint external academic identifiers it has not registered; the citation chain points to the live study page and its pinned method.
From federal portal to citable dataset
Ingest
Fonteum pulls directly from the federal portals on each source's native cadence — CMS NPPES weekly ( active providers), Care Compare deficiency citations ( records), and the OIG LEIE monthly ( exclusions). No commercial aggregator sits between the study and the federal record, so a published figure traces to the government file.
Provenance
Each study is pinned to a methodology version with a complete citation chain — source, dataset identifier, snapshot date, method — plus an explicit limitations block and a Schema.org Dataset JSON-LD record. Because the snapshot is fixed, a result stays reproducible against that point in time even after the underlying federal file refreshes, which is what peer review and replication require.
Deliver
Start with free CSV and JSON at /research — no account, no API key. Academic and government teams register for the free researcher API at /signup/researcher for programmatic access to research snapshots, the provider graph, and semantic search. Every dataset is cataloged in the DCAT-US 3.0 catalog at /data. Custom cohort extracts beyond the published studies are available via the pilot tier.
Common questions
- Who qualifies for the free researcher API?
- The researcher tier at /signup/researcher is free for academic and government institutions — .edu and .gov affiliations — modeled directly on the access pattern of Wharton Research Data Services (WRDS), the CMS Research Data Assistance Center (ResDAC), and the AHRQ Healthcare Cost and Utilization Project (HCUP). In each of those, the price of access is a standard citation in resulting work rather than a license fee, and Fonteum follows the same model. The researcher API provides programmatic access to research snapshots and the provider graph, including semantic search over active NPI embeddings for natural-language queries by specialty, geography, or clinical context. For researchers who only need the published study files, the static CSV and JSON downloads at /research require no account or API key at all, so an exploratory analysis can begin immediately and the API credential is needed only for programmatic or graph-scale work.
- What makes Fonteum's datasets reproducible?
- Reproducibility rests on a complete citation chain attached to every result: source name → dataset identifier → snapshot date → methodology version. Because Fonteum draws exclusively from public federal files — CMS, OIG HHS, HRSA, BLS, Census — that are US Government Works in the public domain under 17 U.S.C. § 105, any figure Fonteum publishes can be re-derived from the original government file rather than taken on faith. Each study ships with its methodology written out, an explicit limitations block describing what the source does and does not support, and a Schema.org Dataset JSON-LD record. The methodology version is pinned per snapshot, so a result stays reproducible against a fixed point in time even after the underlying federal file refreshes. The standard in academic healthcare work is that a published statistic must be traceable to its authoritative origin — the is what satisfies that, with the federal record and the pinned methodology version both recorded.
- What research studies are already published?
- The /research corpus covers studies built on the federal source families. Representative examples: the Nursing Home Deficiency & Harm Rate study ( citation records across facilities, 5.59% at scope/severity G or above indicating actual harm, with a 14.7× state-level disparity between Illinois and New Hampshire); the MIPS Score Distribution by specialty study (477,137 PY2023 clinician scores); Open Payments recipient concentration (top 1% of doctors taking two-thirds of industry money in PY2024); and Medicare Part D prescriber concentration (top 5% of prescribers driving half of the drug bill). Each study carries its full methodology, limitations block, dataset downloads, and citation footer. The complete, current list — these grow over time — lives at /research, and each dataset is cataloged in the DCAT-US 3.0 catalog at /data.
- How should I cite a Fonteum dataset?
- Each published study includes a citation footer and a Schema.org Dataset JSON-LD record carrying the dataset name, the pinned methodology version, the snapshot date, and the underlying federal source. A citation should name the Fonteum study, its methodology version, and the access date, and — because the data is public federal record — the underlying government source the study draws from (for example CMS Care Compare or OIG LEIE). Fonteum's datasets are released under CC-BY-4.0 for the Fonteum-authored layer, while the underlying federal records remain US Government Works in the public domain. Citing both the Fonteum study and its federal source is the practice that keeps a result traceable: a reviewer can follow the citation to the pinned snapshot and re-derive the figure from the public file. Fonteum does not mint external academic identifiers it has not actually registered — the citation chain points to the live study page and its methodology version, which is what is asserted to exist.
- What scale and breadth of data can a researcher expect?
- The provider graph is anchored by CMS NPPES at active providers, drawn from roughly 8M total NPI records and refreshed weekly. The federal study layer carries substantial volume: the Nursing Home Deficiency & Harm Rate dataset holds citation records across facilities; CMS PBJ Daily Nurse Staffing contributes daily records per quarter across 14,537 facilities; and the OIG LEIE exclusions list covers excluded individuals and entities, refreshed monthly. All are documented at /sources with tier, refresh cadence, jurisdiction coverage, and redistribution posture. For a researcher this means provider identity, facility quality, staffing, ownership, and sanction status are joinable by NPI or CCN within one auditable layer — rather than reconstructed from separately licensed commercial vendors with opaque derivation.
- Can I use Fonteum data in a published paper or grant deliverable?
- Yes. The Fonteum-authored dataset layer is released under CC-BY-4.0, and the underlying federal records are US Government Works in the public domain, so there is no license barrier to using the data in a published paper, a grant deliverable, or a class. The expectation in return — consistent with the WRDS / ResDAC / HCUP model the researcher tier is built on — is a standard citation to the Fonteum study and its federal source. Every dataset ships with the methodology, the limitations block, and the citation footer needed to document the source rigorously, and the pinned methodology version keeps the result reproducible against a fixed snapshot for peer review. For work that needs a custom extract beyond the published studies — a specific cut of the provider graph keyed to a cohort — the pilot tier provides scoped delivery, though most academic work is fully served by the free researcher API and the static downloads.
- /research → All published studies — free CSV + JSON downloads.
- /signup/researcher → Free researcher API for .edu/.gov institutions.
- /data → DCAT-US 3.0 catalog — all Fonteum datasets in one place.
- /use-cases/healthcare-analytics → Pipeline-ready bulk export for analytics teams.