Docs
ProductDocuments & OCR

Validating extracted data

Review OCR output and convert documents into accounting entities.

After OCR, every document lands in one of three states:

StateMeaning
extracted_high_confidenceReady to convert; minimal review needed.
extracted_needs_reviewConfidence below the auto-route threshold.
extraction_failedOCR could not read enough text (rare; usually bad scan quality).

The side-by-side validation view

Click any document to open the validation view. The left pane shows the original scan or PDF, the right pane shows the extracted fields. For each field:

  • Green check — confidence ≥ 90%, extracted verbatim.
  • Amber — confidence 70–90%; verify visually.
  • Red — confidence < 70%; you must confirm.

Click any value to edit. Corrections feed back into the classifier (per-tenant, never shared across customers) so accuracy improves over time.

Field-level extraction (what's captured)

FieldSourceNotes
Supplier nameOCR text + best match in supplier listFuzzy match within current tenant
VAT numberOCR pattern + UID format checkCHE-XXX.XXX.XXX format validated
IBANOCR pattern + checksumBoth IBAN and QR-IBAN supported
Invoice numberOCRUsed for duplicate detection (12-month window)
Issue/due dateOCR + locale parsingDD.MM.YYYY (CH/EU) and ISO formats
Total + VATOCR + math checkSum-of-lines is reconciled against total
QR-bill referenceOCR of the QR codeNumezis decodes the QR-bill SPC v2 spec

Conversion

Once you're happy with the data, click Convert to bill (or expense / quote / order, depending on classification). Numezis creates the entity, pre-filled, and links the original document to it. The document state becomes converted.

If the extraction is wrong but unrecoverable (corrupt PDF, blurry photo with no usable text), mark it as Discard with a reason. Discarded documents are kept for 90 days then deleted.