Skip to content
1,322,867 nurse-staffing records · CMS PBJ
fonteum
DataAPIRisk SignalsResearchCompareSnapshotsRequest access →

FONTEUM · USE CASE · HEALTHCARE ANALYTICS

PBJ rows

Healthcare provider data, ready for ETL.

Free CSV and JSON datasets, FHIR R4 bulk NDJSON export, and semantic search over the NPI graph.

Request access →
Data access stack

Bulk export · datasets · semantic search

  • Async NDJSON $export

    FHIR R4 Bulk Export

    ETL-ready NDJSON

    HL7 FHIR R4 Bulk Data Access ($export) — async job queue, NDJSON output per resource type, SMART Backend Services auth (JWT/RS384). Directly loadable by Spark, Pandas, DuckDB, and BigQuery. All 5 USCDI v3 Provider resources available as bulk NDJSON with a 14-tuple provenance tag per resource.

    Explore →

  • Free CSV + JSON downloads

    22 Federal Source Families

    Primary-source dataset layer

    CMS NPPES (

    6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
    active providers), PECOS, Care Compare (8 modules including NH deficiencies — citation records), OIG LEIE ( excluded parties), HRSA HPSA, HRSA UDS, CMS QPP MIPS, CMS HCRIS, BLS OEWS, Census. Freely downloadable as CSV and JSON from /research. Methodology + limitations documented per study.

    Explore →

  • NPI embeddings

    Semantic Search

    Natural-language provider queries

    Embeddings powering natural-language provider search across

    6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
    active NPI records. Query by specialty, geography, or clinical context. Public surface at /search; researcher API available for .edu/.gov institutions.

    Explore →

Why analytics teams choose federal primary sources

Free, citable, auditable — no vendor dependency

22Source: https://fonteum.com/methodology · Dataset: fonteum-methodology/v1 · Snapshot: 2026-05-27
federal source families, free to download

Every research dataset Fonteum publishes is freely downloadable as CSV and JSON from /research — no account, no API key, no rate limit on static files. CMS NPPES (

6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
active providers), PECOS, Care Compare deficiency history ( citation records, 5.59% at G+ actual harm), OIG LEIE exclusions ( excluded parties), HRSA shortage-area designations, CMS QPP MIPS quality scores — all free, all citable, all with methodology and an explicit limitations block documented per study, reproducible from the public CMS file.

Pipeline-ready: NDJSON bulk export

For teams building production ETL pipelines, the FHIR R4 Bulk Data Access $export endpoint outputs NDJSON per resource type — directly loadable by Spark, Pandas, DuckDB, and BigQuery without an intermediate transformation. The manifest file includes per-type resource counts, the export timestamp, and a 14-tuple provenance tag on each resource's meta.tag, so a downstream notebook can cite the exact federal file a row came from. All 5 USCDI v3 Provider resources export against US Core 6.1.0, with SMART Backend Services auth (JWT/RS384) for unattended CI/CD integrations.

Researcher API: free for .edu/.gov teams

Academic and government research teams get free API access via the researcher tier at /signup/researcher — the same access model as Wharton WRDS, CMS ResDAC, and AHRQ HCUP, where the price is a standard citation rather than a license fee. The researcher API provides programmatic access to research snapshots and the provider graph, including semantic search across

6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
active NPI embeddings. Fonteum datasets are also on the HuggingFace Hub for load_dataset() access with pinned methodology versions, so a result stays reproducible against a fixed snapshot.

How it works

Ingest · Provenance · Deliver

Step 1 / Ingest

Pull directly from federal data portals

Fonteum re-pulls each of the

22Source: https://fonteum.com/methodology · Dataset: fonteum-methodology/v1 · Snapshot: 2026-05-27
federal source families on its native cadence — CMS NPPES weekly as a full-replacement file ( active providers), OIG LEIE monthly ( excluded parties), PBJ staffing quarterly ( daily records). For an analytics pipeline that means your scheduled extract reflects the same currency as the federal source, with no opaque vendor refresh lag between the public file and the row you load.

Step 2 / Provenance

Attach source, date, and limitation to every field

Each value ties to a provider_field_provenance row recording source name, last-checked date, and limitation, referencing a named federal dataset. In the FHIR bulk export the same chain rides inline as a 14-tuple provenance tag on each resource's meta.tag, so a notebook reading the NDJSON can cite the exact federal file a feature came from — making a model input or a published statistic reproducible from the public CMS file rather than taking it on faith.

Step 3 / Deliver

Pipeline-ready access for analytics teams

Start with free CSV and JSON at /research (no account, no API key). Build production ETL on the FHIR R4 US Core 6.1.0 bulk NDJSON $export — loadable by Spark, Pandas, DuckDB, and BigQuery with SMART Backend Services auth — or query semantic search over

6.8M+Source: https://npiregistry.cms.hhs.gov/ · Dataset: nppes/v1 · Snapshot: 2026-05-01
NPI embeddings. The researcher API is free for .edu/.gov, datasets are on HuggingFace Hub via load_dataset(), and scoped pilot exports start at $2,500/mo.

FAQ

Common questions

Does Fonteum support bulk data export for analytics pipelines?
Yes. Fonteum implements HL7 FHIR R4 Bulk Data Access ($export) — the standard asynchronous bulk export protocol. A request triggers an Inngest-backed job queue that serializes every provider record matching the request scope as NDJSON, one file per resource type, and returns a manifest once the job completes. The export covers all 5 USCDI v3 Provider resources defined in US Core 6.1.0: Practitioner, Organization, Location, PractitionerRole, and HealthcareService. SMART Backend Services auth (a JWT client assertion signed with RS384, exchanged for a short-lived token) is supported for unattended pipeline integrations, so a scheduled extraction needs no interactive login. Because the underlying directory is drawn from CMS NPPES ( active providers, refreshed weekly as a full-replacement file), a bulk pull reflects the same currency as the federal source. The async pattern means a large extract does not block your pipeline on a single synchronous HTTP request — you poll the status endpoint and collect the NDJSON files when the job is ready.
What format does Fonteum's bulk export use?
NDJSON — newline-delimited JSON — the format specified by the HL7 FHIR R4 Bulk Data Access specification. Each line is one valid FHIR R4 resource JSON object, and files are grouped by resource type, which is the layout Spark, Pandas, DuckDB, and BigQuery expect for streaming or partitioned loads. The export also returns a manifest file recording per-type resource counts and the export timestamp, plus a 14-tuple provenance tag embedded in each resource's meta.tag carrying source name, dataset identifier, last-checked date, and methodology version. That means provenance travels with the data into your warehouse rather than living in separate documentation — a downstream notebook can read meta.tag and cite the exact federal file the row came from. NDJSON loads directly with pandas.read_json(lines=True), Spark's spark.read.json, DuckDB's read_json_auto, or a BigQuery NEWLINE_DELIMITED_JSON load job, with no intermediate transformation step.
Can data science teams access Fonteum's provider data via API?
Yes, through two paths. First, the FHIR R4 REST API: individual resource queries and the asynchronous bulk $export at /api/fhir/* with SMART Backend Services auth (JWT/RS384) for unattended access. Second, the Researcher API: a free access tier for .edu and .gov institutions at /signup/researcher, citation-required, modeled on the same access pattern as Wharton WRDS, CMS ResDAC, and AHRQ HCUP — programmatic access in exchange for a standard citation rather than a license fee. The researcher tier exposes research snapshots and the provider graph, including semantic search over active NPI embeddings for natural-language queries by specialty, geography, or clinical context. For production teams outside academia, scoped pilot exports start at $2,500/mo. Static research files at /research need no account or API key at all — they are plain CSV and JSON downloads, so an exploratory analysis can begin before any credential exchange.
Does Fonteum publish research datasets for academic use?
Yes. Every published research snapshot is freely downloadable as CSV and JSON at /research with no account required — for example the Nursing Home Deficiency & Harm Rate study ( citation records across facilities, with 5.59% of citations at scope/severity G or above, indicating actual harm). The researcher API tier at /signup/researcher gives .edu and .gov institutions programmatic access in exchange for a standard citation. Fonteum is also published on the HuggingFace Hub, so a team can pull a dataset directly with load_dataset() against a pinned methodology version; see /docs/huggingface. Every dataset ships with its methodology, an explicit limitations block, and the federal source citation chain, which is what makes the figures reproducible: a reader can re-derive them from the public CMS file rather than taking the snapshot on faith. PBJ Daily Nurse Staffing ( daily records per quarter) and OIG LEIE ( excluded parties) are available the same way.
What scale of provider data can analytics teams expect from Fonteum?
The provider graph is anchored by CMS NPPES at active providers, drawn from roughly 8M total NPI records and refreshed weekly as a full-replacement file, so a team can join against the same identifiers CMS publishes. Beyond the registry, the federal study layer carries substantial volume: the Nursing Home Deficiency & Harm Rate dataset holds citation records across facilities; CMS PBJ Daily Nurse Staffing contributes daily records per quarter across 14,537 facilities (CY2025Q2); CMS SNF All Owners adds ownership rows across 14,425 facilities; and the OIG LEIE exclusions list covers excluded individuals and entities, refreshed monthly. All federal source families are documented at /sources with tier, refresh cadence, and redistribution posture. For data-science teams, this means a single auditable layer spans provider identity, facility quality, staffing, ownership, and sanction status — joinable by NPI or CCN — rather than stitching those signals from separately licensed vendors.
How does Fonteum compare to a commercial provider-data vendor for analytics?
The core difference is provenance and cost structure. Commercial provider-data vendors typically deliver a blended file with limited visibility into which underlying source produced a given field and charge per-seat or per-record. Fonteum draws exclusively from public federal records — CMS NPPES, PECOS, Care Compare, OIG LEIE, HRSA, BLS, Census — which are US Government Works in the public domain under 17 U.S.C. § 105, and attaches a field-level citation chain (source, dataset identifier, last-checked date, methodology version) to every value. For an analytics team that means a model feature or a published finding can be traced back to its authoritative origin and reproduced from the public file. Static research datasets at /research are free with no account; .edu and .gov teams get free programmatic access through the researcher API, the same access model as WRDS, ResDAC, and AHRQ HCUP. Production pipelines use the FHIR R4 bulk NDJSON $export with 14-tuple provenance per resource, and scoped pilot exports start at $2,500/mo.
Request access

Start with the free datasets.

Browse free research datasets at /research. Academic teams: free researcher API at /signup/researcher. Production pipelines: pilot tier from $2,500/mo.

Request access →or register for the researcher API →

FONTEUM · PILOT

Run a 90-day pilot. Public data only. No PHI.

Request access→ Read the methodology
See also
  • /research → All published datasets — free CSV + JSON downloads.
  • /docs/bulk-export → HL7 FHIR R4 Bulk Data Access $export reference.
  • /signup/researcher → Free researcher API for .edu/.gov institutions.
  • /data → DCAT-US 3.0 catalog — all Fonteum datasets in one place.

Compliance posture

Methodology · Corrections log · Editorial policy

fonteum

Product

  • Data
  • API
  • Methodology
  • Sources
  • Freshness
  • Citations

For buyers

  • AI agents
  • RAG developers
  • Compliance
  • Researchers
  • Developers

Reference

  • Compare
  • llms.txt
  • Agent card
  • Audit pack
  • Quality scorecard
  • Pilot intake
  • Research

Sourced from federal agencies. Fonteum, Inc., Delaware C-corp. © 2026.

Request access→