EXP-0016 — Mistral OCR 4: a hosted-SaaS pattern note (not clonable)
David OlssonWhen you build a chatbot that needs to answer questions from a stack of PDFs — contracts, invoices, research papers — the first step is always the same: extract the text from those PDFs. That sounds easy but isn't. A real document has columns, tables, headers, footers, footnotes, scanned images of pages, handwritten margin notes, and the occasional rotated page. Getting all of that into clean structured text is called OCR (Optical Character Recognition), and getting it right enough to cite back to the specific paragraph in the original document — so a user can verify what the chatbot said — is harder still.
Mistral OCR 4 is a hosted commercial product from Mistral, the French AI company, that addresses this directly. You send it a document; it returns the text, the bounding boxes of where every block sits on the page, the type of each block (paragraph, table, footnote, etc.), and a per-word confidence score. The "4" indicates it's their fourth-generation OCR model. The pricing is $4 per 1,000 pages on the standard tier, $2 on a batch tier, and $5 on a higher-fidelity "Document AI" tier.
Forge is our experiment harness — it cloned this from a MarkTechPost article that David flagged with 🧪. Forge's job in these cases is to bench the project: clone, install, run, write up. But Mistral OCR 4 isn't a project — it's a hosted commercial service. There's no GitHub repo to clone, no library to install, and no way to use it without paying Mistral. Forge's sandbox is no-secrets by design (it never holds API keys), so the bench is hard-gated at the credential layer.
The honest result is a pattern note: explain what Mistral OCR 4 is, why forge can't bench it, what would change the verdict, and what reasonable open-source alternatives look like.
Status: experimented, result build-failed (not-clonable-hosted-saas). Source resolved to Mistral's hosted Document Processing API. No GitHub repo, no open weights, no self-hosted edition. The bench is hard-gated at the credential layer; forge's sandbox is no-secrets by design.
This is a forge writeup of Mistral OCR 4, sourced via a MarkTechPost article David flagged with 🧪 in #development.
TL;DR
- What it is. A hosted OCR + document-understanding API from Mistral that returns text + typed blocks + bounding boxes + per-word confidence for PDF / DOC / PPT / OpenDocument inputs.
- Why forge can't bench it. Hosted SaaS, no open weights, no self-hosted option, requires an API key. Forge's no-secrets sandbox cannot exercise it.
- Headline claims (per Mistral / MarkTechPost). 170 languages across 10 language groups; 72% win rate vs competitors per independent annotators; OlmOCRBench 85.20; OmniDocBench 93.07.
- Pricing. Standard $4 / 1k pages; Batch API $2 / 1k pages; Document AI mode $5 / 1k pages.
- Citation mechanics. Returns typed blocks (paragraph / table / heading / footnote / etc.) with bounding-box coordinates, enabling downstream RAG systems to ground citations to specific document locations.
What it actually does that's interesting
The benchmark numbers and the multi-language coverage matter, but the design choice that's most interesting to anyone building agentic RAG systems is the block typing + spatial coordinates combo. Most OCR APIs return text; this one returns text + a schema. That schema is what enables three downstream patterns:
- Source-grounded citations. A RAG chatbot can answer "the contract says X" with a clickable highlight that opens the original PDF at the exact paragraph the answer came from. The bounding box is the pointer.
- Redactions. Apply policy rules to typed blocks ("redact every block of type=signature in pages 1-3"). Spatial coordinates make automated redaction safe.
- Human-in-the-loop verification. A reviewer sees the highlighted source for every machine-generated claim and can accept / reject per claim, not per document.
These aren't OCR features in the classical sense — they're agentic-RAG features that happen to be enabled by a particular OCR output shape.
Why this is a pattern note
Forge's operationalization rule says: for every non-build source, ask in this order: (1) can I implement the system the source describes? (2) can I implement a portable wrapper around what it offers? (3) is the source too thin to operationalize?
For Mistral OCR 4:
- (1) Implement the system. The system is a 1B+ parameter multimodal OCR model trained on proprietary data. Forge cannot reproduce this; the moat is the model, not the spec.
- (2) Implement a portable wrapper. Yes — the
tpa-pin-and-benchtemplate applies. A 50-line Python wrapper around the/document/ocrendpoint, with secrets-from-env and a friendly Pydantic schema, would be a real artifact. But to verify it works, forge needs an API key, which the sandbox doesn't carry. - (3) Too thin? No — the spec is substantive enough to wrap.
So the right outcome is somewhere between (2) and a pattern note. We'd ship the wrapper if the sandbox had a key. We don't, so we ship the note and recommend the follow-up.
What would change the verdict
Three concrete unlocks:
- An API key passed into the bench. Forge accepts secrets via env vars on the host plane (never in the sandbox data plane), but for safety the no-secrets default is locked in. The right move is a one-off opt-in bench with
MISTRAL_API_KEYexposed via the orchestrator's host-only env — and a budget cap so a runaway probe doesn't burn $50 in test pages. - Self-hostable open weights. Mistral has historically open-weighted their text models (Mistral 7B, Mixtral, etc.) under permissive licenses. If OCR 4 follows that pattern — even a smaller distilled version — forge can bench self-hosted.
- A reference implementation in someone else's repo. If a competent open-source project ships a wrapper around the Mistral OCR API (with sample fixtures and recorded API responses for offline testing), forge can bench that without needing the key.
Open alternatives worth a 🧪
If the value forge sees here is "OCR + typed blocks + bounding boxes for grounded RAG citations," there are open candidates that ship the same shape:
| Project | Posture |
|---|---|
allenai/olmocr | Open-weight OCR specifically benchmarked at OlmOCRBench (which Mistral OCR 4 scores 85.20 on). Self-hostable. |
mindee/doctr | Apache 2.0, doctr ships text detection + recognition + table understanding. Mature. |
opendatalab/MinerU | Active in 2026, focused on RAG-ready PDF extraction. |
unstructured-io/unstructured | Hybrid open-core; the open library has been the de facto choice for "extract everything from anything for RAG" since 2024. |
Any of these would be a clean forge bench — clone, install, run on a fixture PDF, write up the typed-block output, and compare to what Mistral claims. If David 🧪s one in #development, forge picks it up automatically.
Reproducibility
| product page | https://mistral.ai/news/ocr-4 |
| model card | https://docs.mistral.ai/models/model-cards/ocr-4-0 |
| API docs | https://docs.mistral.ai/studio-api/document-processing/basic_ocr |
| pricing | $4 / 1k pages (standard), $2 (batch), $5 (Document AI) |
| forge verdict | build-failed (not-clonable-hosted-saas) → pattern note |
| benchmarks claimed | OlmOCRBench 85.20, OmniDocBench 93.07, 72% win rate per independent annotators |
There is no companion gist — there's no source to anchor. The article and Mistral's own docs are the canonical references; this post is the audit trail of forge's decision not to bench a hosted SaaS in the no-secrets sandbox.
See also
- EXP-0001 — AutoWiki (Factory.ai) — the first hosted-SaaS pattern note forge published.
- EXP-0007 — Pinokio — a related pattern: install verified, runtime gated. Pinokio gated on display server; Mistral OCR 4 gated on API key.
- Meet forge — the operationalization rule (decision tree).
Built and verified by forge. A hosted SaaS without open weights is a pattern note in our model; we don't fork or shadow the vendor, we document what we'd run if we could and point at open alternatives that exist today.