Docs
ProductDocuments & OCR

Documents & OCR — overview

Inbox-driven document capture with AI classification and extraction.

The Documents module is the single front door for every piece of paper that hits your business. It runs an OCR + AI classification pipeline so that PDFs, scanned receipts and email attachments turn into structured records (bills, expenses, sales orders, contracts) with minimal human typing.

What it does

  1. Ingests documents from email, manual upload, or API.
  2. Stores them encrypted at rest (PII fields with a separate key).
  3. OCR + extracts the textual content via Mistral Document AI.
  4. Classifies the document (supplier invoice, expense, contract, delivery note, etc.) with a confidence score.
  5. Pre-fills the matching entity (bill / expense / quote) and waits for your one-click validation.

Why we don't auto-publish

Even with high confidence, every extraction is reviewed by a human before it posts to the ledger. This is a deliberate design choice for fiduciary-grade compliance: a single mis-OCR'd VAT amount could trigger a tax adjustment. The reviewer step is fast (10-15 seconds per document) because everything is pre-filled.

Supported formats

  • PDF (single or multi-page)
  • PNG, JPG, HEIC (mobile photos welcome)
  • EML (full email including HTML body and attachments)
  • TIFF (legacy scans)

How long it takes

Most documents are processed in 5–30 seconds. Complex multi-page bills or scanned-to-PDF (image-only) PDFs can take up to 2 minutes. You'll see a progress indicator on the document card.

Where to go next

  • Uploading documents
  • Validating extracted data