The Short Version
Every price on ClearPrice traces back to a machine-readable file published by the hospital itself, as required by federal law. We do not generate, estimate, or modify prices. We parse, index, and display what the hospital has already made public.
Data Sources
Hospital Directory
Our master list of U.S. hospitals comes from the CMS Hospital General Information dataset (data.cms.gov/provider-data/dataset/xubh-q36u), which contains roughly 5,400 acute-care and critical-access hospitals. For each hospital we store: name, CMS Certification Number (CCN), street address, city, state, and ZIP.
Pricing Files
Under 45 CFR Part 180, every hospital must publish:
- A comprehensive machine-readable file of all items and services with standard charges
- A consumer-friendly list of at least 300 shoppable services
We locate each hospital's file via three mechanisms, in order of preference:
- Direct health system scrapers: For major systems (Providence, etc.) we parse the system's transparency landing page
- Community URL datasets: We cross-reference the public Dolthub hospital-price-transparency repository
- Manual curation: Missing URLs can be added via our admin interface
Ingestion Pipeline
Each hospital goes through the same deterministic pipeline:
- URL validation: We send a HEAD request to the hospital's file URL to confirm it is reachable.
- Change detection: We compare the
Last-ModifiedHTTP header to our stored value. If unchanged, we skip the hospital. - Streaming parse: We stream the file (which can be 50 MB to 800 MB) through an incremental JSON parser, never loading it fully into memory.
- Schema mapping: We map each record to our normalized schema (procedure description, CPT/HCPCS/DRG code, gross charge, cash price, min/max negotiated, per-payer rates).
- Upsert: We insert new procedures and upsert charges keyed on
(hospital, procedure, setting). - Logging: Every run is recorded with row count, duration, and any error.
Update Frequency
Our scheduler re-runs the full pipeline nightly at 2:00 AM Pacific. A hospital's data on ClearPrice is therefore at most ~24 hours behind whatever the hospital most recently published.
Hospitals vary widely in how often they update: some republish monthly, others quarterly, and some have not updated since the initial 2021 deadline. We display the last_fetched date on every hospital page so you can judge freshness.
What We Store Per Charge
- Gross charge: The "rack rate" — what the hospital would bill if you had no insurance and paid nothing upfront
- Cash/discounted price: The self-pay rate, typically much lower than gross
- Minimum negotiated rate: The lowest rate any insurer has negotiated
- Maximum negotiated rate: The highest rate any insurer has negotiated
- Per-payer negotiated rates: Specific rates for each named insurance plan, where published
- Setting: inpatient, outpatient, or other context where provided
Format Handling
Hospitals publish in a variety of formats. Our parser currently supports:
- ✅ JSON (CMS 2024 standard) — fully supported
- ⏳ CSV, XLSX, ZIP — planned
- ❌ HTML landing pages — not parseable (no structured data)
Search
We index every procedure description with a PostgreSQL tsvector for full-text search and apg_trgm index for fuzzy autocomplete. Results are ranked by ts_rank and secondarily by cash price.
AI Summaries and Q&A
When you ask the AI assistant a question, we:
- Run a similarity search against matching procedures (top 5)
- Pull the top 10 charges across hospitals for each match
- Send the question plus this structured context to Claude
- Return Claude's plain-English response with a standard disclaimer
The AI sees only the numerical pricing context — it does not see your identity, browsing history, or any medical information. AI responses may contain errors; always verify with the source.
Accuracy Caveats
Be aware of the following:
- Hospitals sometimes publish incorrect or malformed data. Our parser reports errors to the hospital's
ingest_log. - Many hospitals bury charges inside "billing codes" rather than CPT/HCPCS, making cross-hospital comparison harder.
- A payer's "negotiated rate" in the file may not reflect bundled discounts, value-based contracts, or site-of-service adjustments.
- The "gross charge" is rarely what anyone actually pays.
Corrections
If you spot an error — wrong price, outdated URL, missing hospital, broken link — email us at corrections@clearpricehealth.org with a link to the hospital and the specific issue. We investigate and correct verified issues promptly.
Open Data
The underlying CMS machine-readable files are public record. Nothing in our data is proprietary — it all came from the hospitals themselves. Our value-add is parsing, indexing, and presentation.