Does Fonteum support bulk data export for analytics pipelines?

Fonteum documents an asynchronous FHIR R4 bulk-export surface for supported provider resources. An export reflects the serving tables at job time, not necessarily the publisher's latest release; the NPPES table's newest system timestamp was June 10 when checked July 12, 2026.

What format does Fonteum's bulk export use?

NDJSON — newline-delimited JSON — is the format specified by the HL7 FHIR R4 Bulk Data Access specification. Each line is one FHIR R4 resource JSON object, and files are grouped by supported resource type. The export manifest records per-type resource counts and the export timestamp. When a resource includes the nullable provenance tag in meta.tag, it can carry available source name, dataset identifier, observation date, and methodology version; some responses omit some or all of those fields. The tag is citation metadata, not a fact-to-signature link. NDJSON loads directly with pandas.read_json(lines=True), Spark's spark.read.json, DuckDB's read_json_auto, or a BigQuery NEWLINE_DELIMITED_JSON load job.

Can data science teams access Fonteum's provider data via API?

Yes, through two paths. First, the FHIR R4 REST API: individual resource queries and the asynchronous bulk $export at /api/fhir/* with SMART Backend Services auth (JWT/RS384) for unattended access. Second, the Researcher API: a free access tier for .edu and .gov institutions at /signup/researcher, citation-required, modeled on the same access pattern as Wharton WRDS, CMS ResDAC, and AHRQ HCUP — programmatic access in exchange for a standard citation rather than a license fee. The researcher tier exposes research snapshots and the provider graph, including semantic search over 6.8M+ active NPI embeddings for natural-language queries by specialty, geography, or clinical context. For production teams outside academia, scoped pilot exports start at $2,500/mo. Static research files at /research need no account or API key at all — they are plain CSV and JSON downloads, so an exploratory analysis can begin before any credential exchange.

Does Fonteum publish research datasets for academic use?

Yes. Static research files listed at /research are downloadable without an account where a CSV or JSON distribution is published — for example the Nursing Home Deficiency & Harm Rate study (418,148 citation records across 14,635 facilities, with 5.59% of citations at scope/severity G or above, indicating actual harm). The researcher API tier at /signup/researcher gives .edu and .gov institutions programmatic access in exchange for a standard citation. Published study pages identify their available methodology, limitations, and source files; record-level citation fields can still be null or absent.

What scale of provider data can analytics teams expect from Fonteum?

The loaded NPPES layer includes 6.8M+ active provider records and had a newest system date of June 10 when checked July 12, 2026. Named study snapshots include nursing-home deficiency, PBJ staffing, SNF ownership, and OIG LEIE data. The /sources catalog reports publisher cadence, loaded observation where available, and redistribution posture; catalog presence does not establish an ingest. Join keys and provenance fields vary by source.

How does Fonteum compare to a commercial provider-data vendor for analytics?

The core difference is provenance and cost structure. Fonteum draws from named public-record sources — including CMS NPPES, PECOS, Care Compare, OIG LEIE, HRSA, BLS, and Census — and exposes available citation fields such as source, dataset identifier, observation date, and methodology version. Those fields vary by record and can be null or absent; no deterministic signature link is implied. Static research datasets at /research are free where published; .edu and .gov teams can use the researcher API. Production pipelines use the FHIR R4 bulk NDJSON $export for supported resources, and scoped pilot exports start at $2,500/mo.

FONTEUM · USE CASE · HEALTHCARE ANALYTICS

PBJ rows

Healthcare provider data, ready for ETL.

Free CSV and JSON datasets, FHIR R4 bulk NDJSON export, and semantic search over the NPI graph.

Request access →

Data access stack

Bulk export · datasets · semantic search

Async NDJSON $export
FHIR R4 Bulk Export
ETL-ready NDJSON
HL7 FHIR R4 Bulk Data Access ($export) — async job queue, NDJSON output per supported resource type, SMART Backend Services auth (JWT/RS384). Directly loadable by Spark, Pandas, DuckDB, and BigQuery. A nullable provenance tag is returned only when the resource exposes those fields.
Explore →
Free CSV + JSON downloads
the documented federal-source catalog
Primary-source dataset layer
CMS NPPES ( active providers), PECOS, Care Compare (8 module families including NH deficiencies — citation records), OIG LEIE ( excluded parties), HRSA, BLS, and Census. Published study files are downloadable from /research where a distribution exists, with study-specific methodology and limitations.
Explore →
NPI embeddings
Semantic Search
Natural-language provider queries
Embeddings powering natural-language provider search across active NPI records. Query by specialty, geography, or clinical context. Public surface at /search; researcher API available for .edu/.gov institutions.
Explore →

Why analytics teams choose federal primary sources

Free, citable, auditable — no vendor dependency

Documented source pages with dataset-specific downloads

Static research files are freely downloadable from /research where a CSV or JSON distribution exists. Named study pages document their available methodology, source, and limitations; record-level citation fields can still be null or absent.

Pipeline-ready: NDJSON bulk export

For teams building production ETL pipelines, the FHIR R4 Bulk Data Access $export endpoint outputs NDJSON per supported resource type — directly loadable by Spark, Pandas, DuckDB, and BigQuery. The manifest includes per-type resource counts and the export timestamp. When a resource returns nullable provenance metadata in meta.tag, a downstream notebook can read its available citation fields; some responses omit them, and the tag is not a signature link.

Researcher API: free for .edu/.gov teams

Academic and government research teams can use the citation-required researcher tier at /signup/researcher. Available endpoints include research snapshots and provider-graph access. Where a delivery pins a methodology version and saved snapshot, that artifact can be rechecked against those named inputs; this is not universal historical replay.

How it works

Ingest · Provenance · Deliver

Step 1 / Ingest

Pull directly from federal data portals

Load named public files and record the loaded source or observation date. On July 12, 2026, NPPES was observed at a June 10 system timestamp, OIG LEIE at a May 8 source date, and PBJ at a June 30, 2025 source date. Publisher cadence does not prove serving-table currency.

Step 2 / Provenance

Expose available record-level provenance

Where a value has a provider_field_provenance row, available source, observation-date, and limitation fields can be returned. They vary by record and can be null or absent. Supported FHIR responses can include the same nullable citation metadata in meta.tag; it is not a fact-to-signature link.

Step 3 / Deliver

Pipeline-ready access for analytics teams

Start with free CSV and JSON at /research (no account, no API key). Build production ETL on the FHIR R4 US Core 6.1.0 bulk NDJSON $export — loadable by Spark, Pandas, DuckDB, and BigQuery with SMART Backend Services auth — or query semantic search over NPI embeddings. The researcher API is free for .edu/.gov, and scoped pilot exports start at $2,500/mo.

FAQ

Common questions

Does Fonteum support bulk data export for analytics pipelines?: Fonteum documents an asynchronous FHIR R4 bulk-export surface for supported provider resources. An export reflects the serving tables at job time, not necessarily the publisher's latest release; the NPPES table's newest system timestamp was June 10 when checked July 12, 2026.
What format does Fonteum's bulk export use?: NDJSON — newline-delimited JSON — is the format specified by the HL7 FHIR R4 Bulk Data Access specification. Each line is one FHIR R4 resource JSON object, and files are grouped by supported resource type. The export manifest records per-type resource counts and the export timestamp. When a resource includes the nullable provenance tag in meta.tag, it can carry available source name, dataset identifier, observation date, and methodology version; some responses omit some or all of those fields. The tag is citation metadata, not a fact-to-signature link. NDJSON loads directly with pandas.read_json(lines=True), Spark's spark.read.json, DuckDB's read_json_auto, or a BigQuery NEWLINE_DELIMITED_JSON load job.
Can data science teams access Fonteum's provider data via API?: Yes, through two paths. First, the FHIR R4 REST API: individual resource queries and the asynchronous bulk $export at /api/fhir/* with SMART Backend Services auth (JWT/RS384) for unattended access. Second, the Researcher API: a free access tier for .edu and .gov institutions at /signup/researcher, citation-required, modeled on the same access pattern as Wharton WRDS, CMS ResDAC, and AHRQ HCUP — programmatic access in exchange for a standard citation rather than a license fee. The researcher tier exposes research snapshots and the provider graph, including semantic search over active NPI embeddings for natural-language queries by specialty, geography, or clinical context. For production teams outside academia, scoped pilot exports start at $2,500/mo. Static research files at /research need no account or API key at all — they are plain CSV and JSON downloads, so an exploratory analysis can begin before any credential exchange.
Does Fonteum publish research datasets for academic use?: Yes. Static research files listed at /research are downloadable without an account where a CSV or JSON distribution is published — for example the Nursing Home Deficiency & Harm Rate study ( citation records across facilities, with 5.59% of citations at scope/severity G or above, indicating actual harm). The researcher API tier at /signup/researcher gives .edu and .gov institutions programmatic access in exchange for a standard citation. Published study pages identify their available methodology, limitations, and source files; record-level citation fields can still be null or absent.
What scale of provider data can analytics teams expect from Fonteum?: The loaded NPPES layer includes active provider records and had a newest system date of June 10 when checked July 12, 2026. Named study snapshots include nursing-home deficiency, PBJ staffing, SNF ownership, and OIG LEIE data. The reports publisher cadence, loaded observation where available, and redistribution posture; catalog presence does not establish an ingest. Join keys and provenance fields vary by source.
How does Fonteum compare to a commercial provider-data vendor for analytics?: The core difference is provenance and cost structure. Fonteum draws from named public-record sources — including CMS NPPES, PECOS, Care Compare, OIG LEIE, HRSA, BLS, and Census — and exposes available citation fields such as source, dataset identifier, observation date, and methodology version. Those fields vary by record and can be null or absent; no deterministic signature link is implied. Static research datasets at /research are free where published; .edu and .gov teams can use the researcher API. Production pipelines use the FHIR R4 bulk NDJSON $export for supported resources, and scoped pilot exports start at $2,500/mo.

Request access

Start with the free datasets.

Browse free research datasets at /research. Academic teams: free researcher API at /signup/researcher. Production pipelines: pilot tier from $2,500/mo.

Request access →or register for the researcher API →

FONTEUM · PILOT

Run a 90-day pilot. Public data only. No PHI.

Request access → Read the methodology