Skip to content
1,322,867 nurse-staffing records · CMS PBJ
fonteum
DataAPIResearchCompareRequest a pilot →

FOR · RAG DEVELOPERS

FHIR R4 built for retrieval.

Pre-resolved references, flat JSON, retrieval-optimized Markdown at /md. p99 under 300 ms. Citation-ready meta.source on every resource.

Get API access →Read capability statement

52% fewer tokens via /md · pre-resolved refs · citation-ready provenance

The problem · FHIR + RAG

FHIR JSON is reference-heavy. The /md endpoint fixes that.

Nested references

A FHIR Practitioner resource references PractitionerRole, which references Organization, which references Location — you need 4 round trips to build context.

Fonteum: Fonteum pre-resolves all references. The response you get is a flat, fully-populated bundle ready for chunking.

Token bloat

Raw FHIR JSON carries coding system URIs, meta fields, and extension blocks your model doesn't need. A single Practitioner can run 300+ tokens.

Fonteum: The /md endpoint serializes the same resource as structured Markdown — same clinical data, 52% fewer tokens on average.

Missing citations

When an LLM cites a provider fact, you need a traceable source and date. Standard FHIR resources don't carry that.

Fonteum: Every Fonteum resource carries meta.source and a provenance tag block: source name, last-checked date, and display rule.


Token efficiency · JSON vs /md

The same clinical data. Half the tokens.

Toggle between the standard FHIR JSON response and the ?_format=md Markdown serialization. Same provenance, same clinical data, fewer tokens in your context window.

58% more tokens than /md
{
  "resourceType": "Practitioner",
  "id": "prac-1003894328",
  "meta": {
    "tag": [
      { "system": "fonteum:provenance", "code": "cms-nppes" },
      { "system": "fonteum:last-checked", "code": "2026-05-24" }
    ]
  },
  "identifier": [
    { "system": "http://hl7.org/fhir/sid/us-npi", "value": "1003894328" }
  ],
  "name": [{ "family": "Nguyen", "given": ["Emily"], "prefix": ["MD"] }],
  "address": [
    {
      "use": "work",
      "line": ["400 Park Ave"],
      "city": "New York",
      "state": "NY",
      "postalCode": "10022"
    }
  ],
  "qualification": [
    {
      "code": {
        "coding": [
          {
            "system": "http://nucc.org/provider-taxonomy",
            "code": "207RC0000X",
            "display": "Cardiovascular Disease"
          }
        ]
      }
    }
  ]
}

LangChain · integration walkthrough

Retrieve providers with LangChain.

The FHIR retriever accepts natural-language queries and translates them to FHIR search parameters. Results come back as LangChain Document objects with metadata.source pre-populated from the Fonteum provenance block.

from langchain.retrievers import FHIRRetriever
retriever = FHIRRetriever(
    base_url="https://fonteum.com/api/fhir/r4",
    api_key="$FONTEUM_API_KEY",
    resource_type="Practitioner"
)
docs = retriever.get_relevant_documents("cardiologist New York")

Each returned Document carries metadata.source (CMS federal registry), metadata.last_checked, and metadata.npi for citation generation.


LlamaIndex · retriever

Index provider data with LlamaIndex.

Use the FHIR reader with use_markdown_endpoint=True to pull Markdown-serialized resources directly into a LlamaIndex VectorStoreIndex. Each document carries provenance metadata for downstream citation generation.

from llama_index.core import VectorStoreIndex
from llama_index.readers.fhir import FHIRReader

reader = FHIRReader(
    base_url="https://fonteum.com/api/fhir/r4",
    api_key="$FONTEUM_API_KEY",
    resource_types=["Practitioner", "Organization"],
    use_markdown_endpoint=True,  # fetch /md for 52% fewer tokens
)

documents = reader.load_data(
    search_params={"address-state": "NY", "_count": 50}
)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("cardiologists accepting Medicare in Manhattan")

Token counts · by resource type

Average token counts per resource.

ResourceJSON tokens/md tokensReduction
Practitioner312148−53%
Organization298131−56%
Location18789−52%
PractitionerRole224104−54%
HealthcareService341162−52%

Token counts measured with tiktoken cl100k_base on a representative sample of 500 records per resource type. Actual counts vary by record.


Latency benchmarks · under load

Sub-300 ms at p99.

PercentileJSON endpoint/md endpoint
p5038 ms22 ms
p95142 ms68 ms
p99290 ms138 ms
p99.9480 ms210 ms

Measured at the Vercel edge with 50 concurrent connections. Latency is gateway-to-response-complete. Source data is served from a warm CDN cache; cold-cache adds ~80 ms.

Get API access →

Compliance posture

Methodology · Corrections log · Editorial policy

fonteum

Product

  • Data
  • API
  • Methodology
  • Sources
  • Freshness
  • Citations

For buyers

  • AI agents
  • RAG developers
  • Compliance
  • Investors
  • Researchers
  • Developers

Reference

  • Compare
  • llms.txt
  • Agent card
  • Audit pack
  • Pilot intake
  • Research

Sourced from CMS and HHS-OIG. Fonteum, Inc., Delaware C-corp. © 2026.