Documents & OCR — overview
Inbox-driven document capture with AI classification and extraction.
The Documents module is the single front door for every piece of paper that hits your business. It runs an OCR + AI classification pipeline so that PDFs, scanned receipts and email attachments turn into structured records (bills, expenses, sales orders, contracts) with minimal human typing.
What it does
- Ingests documents from email, manual upload, or API.
- Stores them encrypted at rest (PII fields with a separate key).
- OCR + extracts the textual content via Mistral Document AI.
- Classifies the document (supplier invoice, expense, contract, delivery note, etc.) with a confidence score.
- Pre-fills the matching entity (bill / expense / quote) and waits for your one-click validation.
Why we don't auto-publish
Even with high confidence, every extraction is reviewed by a human before it posts to the ledger. This is a deliberate design choice for fiduciary-grade compliance: a single mis-OCR'd VAT amount could trigger a tax adjustment. The reviewer step is fast (10-15 seconds per document) because everything is pre-filled.
Supported formats
- PDF (single or multi-page)
- PNG, JPG, HEIC (mobile photos welcome)
- EML (full email including HTML body and attachments)
- TIFF (legacy scans)
How long it takes
Most documents are processed in 5–30 seconds. Complex multi-page bills or scanned-to-PDF (image-only) PDFs can take up to 2 minutes. You'll see a progress indicator on the document card.
Where to go next
- Uploading documents
- Validating extracted data