Validating extracted data

After OCR, every document lands in one of three states:

State	Meaning
`extracted_high_confidence`	Ready to convert; minimal review needed.
`extracted_needs_review`	Confidence below the auto-route threshold.
`extraction_failed`	OCR could not read enough text (rare; usually bad scan quality).

The side-by-side validation view

Click any document to open the validation view. The left pane shows the original scan or PDF, the right pane shows the extracted fields. For each field:

Green check — confidence ≥ 90%, extracted verbatim.
Amber — confidence 70–90%; verify visually.
Red — confidence < 70%; you must confirm.

Click any value to edit. Corrections feed back into the classifier (per-tenant, never shared across customers) so accuracy improves over time.

Field-level extraction (what's captured)

Field	Source	Notes
Supplier name	OCR text + best match in supplier list	Fuzzy match within current tenant
VAT number	OCR pattern + UID format check	CHE-XXX.XXX.XXX format validated
IBAN	OCR pattern + checksum	Both IBAN and QR-IBAN supported
Invoice number	OCR	Used for duplicate detection (12-month window)
Issue/due date	OCR + locale parsing	DD.MM.YYYY (CH/EU) and ISO formats
Total + VAT	OCR + math check	Sum-of-lines is reconciled against total
QR-bill reference	OCR of the QR code	Numezis decodes the QR-bill SPC v2 spec

Conversion

Once you're happy with the data, click Convert to bill (or expense / quote / order, depending on classification). Numezis creates the entity, pre-filled, and links the original document to it. The document state becomes converted.

If the extraction is wrong but unrecoverable (corrupt PDF, blurry photo with no usable text), mark it as Discard with a reason. Discarded documents are kept for 90 days then deleted.

Validating extracted data

The side-by-side validation view

Field-level extraction (what's captured)

Conversion

On this page