Validating extracted data
Review OCR output and convert documents into accounting entities.
After OCR, every document lands in one of three states:
| State | Meaning |
|---|---|
extracted_high_confidence | Ready to convert; minimal review needed. |
extracted_needs_review | Confidence below the auto-route threshold. |
extraction_failed | OCR could not read enough text (rare; usually bad scan quality). |
The side-by-side validation view
Click any document to open the validation view. The left pane shows the original scan or PDF, the right pane shows the extracted fields. For each field:
- Green check — confidence ≥ 90%, extracted verbatim.
- Amber — confidence 70–90%; verify visually.
- Red — confidence < 70%; you must confirm.
Click any value to edit. Corrections feed back into the classifier (per-tenant, never shared across customers) so accuracy improves over time.
Field-level extraction (what's captured)
| Field | Source | Notes |
|---|---|---|
| Supplier name | OCR text + best match in supplier list | Fuzzy match within current tenant |
| VAT number | OCR pattern + UID format check | CHE-XXX.XXX.XXX format validated |
| IBAN | OCR pattern + checksum | Both IBAN and QR-IBAN supported |
| Invoice number | OCR | Used for duplicate detection (12-month window) |
| Issue/due date | OCR + locale parsing | DD.MM.YYYY (CH/EU) and ISO formats |
| Total + VAT | OCR + math check | Sum-of-lines is reconciled against total |
| QR-bill reference | OCR of the QR code | Numezis decodes the QR-bill SPC v2 spec |
Conversion
Once you're happy with the data, click Convert to bill (or expense /
quote / order, depending on classification). Numezis creates the entity,
pre-filled, and links the original document to it. The document state
becomes converted.
If the extraction is wrong but unrecoverable (corrupt PDF, blurry photo with no usable text), mark it as Discard with a reason. Discarded documents are kept for 90 days then deleted.