We watch OpenAlex for papers that cite us. Operator-moderated, then published.
Fonteum runs a daily OpenAlex poll that surfaces newly published papers referencing the Fonteumapex domain, our Zenodo methodology DOIs, or our HuggingFace datasets. Hits land in an operator moderation queue. Approved citations publish to /researchers within 24 hours of review. Rejected citations soft-delete (audit trail preserved). The model is intentionally human-in-the-loop — false positives don’t reach the public surface.
See published citations → Researcher API reference → Citation format →
Citations are the proof bar for a data layer.
Fonteumis positioned as a Bloomberg-grade healthcare data layer. The credibility loop closes when independent researchers cite our data. Manually grepping Google Scholar weekly doesn’t scale; the §sprint3-citation-discovery-pipeline wave automates discovery so we surface citations within days of publication, not months.
The pipeline ships zero false-positive risk to the public surface — every discovered hit is gated through an operator review at /admin/citations before it appears on /researchers.
Daily 09:00 UTC. 50-query daily budget.
The cron is an Inngest function (citation-discoverer) running on schedule 0 9 * * * (daily 09:00 UTC). On each fire it:
- Resets the in-process daily query budget (default 50 queries).
- Enumerates registered references via
listFonteumReferences(). - Calls the OpenAlex polite-pool API per reference.
- Upserts new hits to
discovered_citationswithonConflict='openalex_id'+ignoreDuplicates=true. - Emits a
fonteum/citation.discoveredevent per NEW row (no event for re-discovered rows).
The OpenAlex polite-pool User-Agent includes a mailto: contact so we sit in the more generous rate-limit lane. Network failures, rate-limit returns, and OpenAlex 5xx errors degrade gracefully — the cron logs the failure but does not crash; the next firing retries.
Three reference types, one registry.
The set of things we ask OpenAlex to find citations to is a typed registry at src/lib/citations/references-registry.ts. There are three kinds of reference value:
- apex_url — the Fonteum apex domain (fonteum.com). Catches papers that cite a research page, sample dataset, or methodology page hosted on the brand hub.
- zenodo_doi — DOIs for the Fonteum methodology releases on Zenodo. Catches papers that pin a specific methodology version (preferred for reproducibility).
- hf_dataset — HuggingFace dataset identifiers for Fonteum-published datasets. Catches papers that cite datasets directly, often in ML/AI research contexts.
Adding a new reference is registry-only — no cron-side code changes. New entries pick up on the next 09:00 UTC firing.
Operator review at /admin/citations.
Every discovered row lands with status='pending'. The operator reviews at /admin/citations (cookie-auth, behind isAdminAuthed). Two actions are available:
- Approve → flips
statustoapproved. The row is now publicly visible on /researchers (RLS enforces this — see §6). - Reject → flips
statustorejected. Soft-delete: the row stays in the database with full audit trail (discovered_at,reviewed_at,reviewed_by). It will not be re-presented to the operator unless the OpenAlex ID is intentionally re-queued.
The /admin/citations surface is Disallowed in robots.txt; the public anon client cannot read pending or rejected rows even if it tried.
discovered_citations row anatomy.
One row per discovered paper. UNIQUE(openalex_id) prevents duplicate ingestion. Sample row in JSON form (RLS-filtered for the public anon client to status='approved' only):
{
"openalex_id": "W4406325891",
"title": "Geographic disparities in dermatology access...",
"authors_json": [
{ "display_name": "A. Researcher", "orcid": "0000-0001-...", "institutions": ["Harvard Medical School"] }
],
"journal": "JAMA Dermatology",
"published_date": "2026-04-12",
"doi": "10.1001/jamadermatol.2026.0123",
"fonteum_reference_type": "apex_url",
"fonteum_reference_value": "fonteum.com",
"status": "pending"
}authors_json is a JSONB array of { id, orcid, display_name, institutions[] } objects projected from OpenAlex’s authorships structure. citation_context (when present) is the surrounding-text snippet OpenAlex returns for the citation; we surface it on the moderation queue to help the operator confirm the reference is real.
Public reads only see approved.
The discovered_citations table is RLS-enabled with one policy:
- SELECT for anon — allowed only where
status = 'approved'. Pending and rejected rows are invisible to the public anon client. - INSERT / UPDATE / DELETE for anon — denied. Only the service-role client (cron + admin actions) can mutate.
The /researchers page reads via the public anon client, so the RLS policy is the publishing boundary. The /admin/citations page reads via the service-role client to bypass RLS for the moderation queue.
/researchers, alongside operator-curated.
Approved discovered citations render in the “Cited in” section of /researchers alongside any operator-curated FEATURED_CITATIONS entries. Each auto-discovered card carries an auto-discovered via OpenAlex badge so the surface is honest about provenance — readers can distinguish operator-curated entries from automated detection.
Each discovered card cross-links to the DOI URL (preferred), then the citing paper’s URL, then the OpenAlex page as last-resort fallback.
Reject is soft-delete; audit trail preserved.
OpenAlex full-text search has higher recall than its strict referenced_works filter, which means we will sometimes pull in papers that mention Fonteum in passing without actually citing the data (e.g. a methodology comparison paper, an editorial referencing our brand). The operator rejects these.
Rejected rows stay in the database with status='rejected', reviewed_at, and reviewed_by populated. This preserves the operator’s decision and prevents the same paper from re-surfacing on the next cron firing (the onConflict=openalex_id upsert with ignoreDuplicates=true sees the existing row and does nothing).
Phase 1 ships discovery + moderation. Phases 2-3 add notifications + analytics.
- Phase 1 (this wave): daily OpenAlex poll across three reference types + moderation queue at /admin/citations + RLS-enforced publish to /researchers.
- §sprint3-citation-notifications (queued): email-on-discovery to the operator (so review can happen within hours of detection rather than waiting for the next /admin/citations check).
- §sprint3-citation-analytics (queued): /admin/citations/analytics dashboard — citations-per-month, top citing journals, geographic distribution of citing institutions.
- §sprint3-citation-attestation (queued): ORCID-linked attestation flow — citing researchers can claim their own citation entry to enrich author metadata.