Documentation Index
Fetch the complete documentation index at: https://docs.duvo.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Time to complete | 45 minutes |
| Difficulty | Intermediate |
| Prerequisites | Access to your document source, access to your target system and master data |
| You’ll build | An assignment that extracts fields from invoices, receipts, or IDs, validates them against your data, and routes low-confidence or anomalous records for human review |
Why Automate This?
The Problem: Documents arrive in bulk — scanned invoices from vendors, receipts submitted by employees, ID copies from customers. Someone has to open each file, read the fields, and key them into a system. They cross-check every entry against vendor lists or PO registers and flag anything that looks wrong. At any volume, this is slow, error-prone, and expensive. A misread invoice number, a vendor not in the system, or a VAT rate that does not match all create downstream problems in finance, compliance, or operations. The Solution: A Duvo assignment that reads every document using the Intelligent Document Reader, extracts the fields you need, validates each field against your master data, and routes anything below your confidence threshold or outside your rules to a human reviewer. Clean documents go straight to your system of record. Borderline ones get a human decision before anything is written. Expected Results:- Eliminate manual data entry for clean, well-formatted documents
- Catch extraction errors and validation failures before they reach your system
- Create a full audit trail: what was extracted, what was changed by a reviewer, and why
- Scale document processing without adding headcount
What You’ll Build
By the end of this playbook, you’ll have an assignment that:- Reads incoming documents (PDFs, scanned images, or Word files) from a folder, email, or upload
- Extracts structured fields: vendor, invoice number, line items, amounts, dates, tax, and any custom fields you need
- Checks field-level confidence scores and flags anything the extraction was uncertain about
- Cross-validates extracted values against your master data: vendor list, PO register, employee directory, or product catalog
- Handles tax, currency, and formatting edge cases (VAT rates, multi-currency, multi-page documents)
- Routes low-confidence or anomalous documents to a human reviewer with full extraction context
- Writes validated records to your system of record: ERP, spreadsheet, finance system, or database
- Maintains a complete audit trail of every extraction, review decision, and write action
- Intelligent Document Reader — extract text and structured fields from PDFs, Word files, scanned images, and photos
- Email Attachments Reader (if documents arrive by email) — pull attachments from incoming messages
- Your source system — Google Drive, SharePoint, email inbox, or a file upload trigger
- Your target system — NetSuite, SAP, Google Sheets, Salesforce, Dynamics 365, or your ERP
- Human-in-the-Loop — pause for review on low-confidence extractions and validation failures
Before You Start
Make sure you have these ready:-
Sample documents — Prepare a test set of 8–12 documents covering:
- Clean digital PDF invoice
- Scanned image invoice (JPG or PNG)
- Multi-page PDF
- Document with multiple tax rates
- Zero-tax or tax-exempt document
- At least one intentionally imperfect scan (low resolution, skewed, or photographed)
- Master data access — Your vendor list, PO register, employee directory, or product catalog. Either accessible via API in your target system, or exportable as a CSV you can upload to Files.
- Target system login — A login with read and write access to the system where validated records will land. Store logins securely.
- Field list — Know exactly which fields you need to extract. For invoices: vendor name, vendor ID, invoice number, invoice date, due date, line items (description, quantity, unit price, total), subtotal, tax amount, tax rate, and total amount payable. Adjust for your document type.
- Validation rules — Document your rules: which vendors are in your system, what PO number formats look valid, what quantity or amount ranges are normal.
Step 1: Create Your Assignment
- Click ”+ Create Assignment” from your dashboard.
- Select “Use Assignment Builder”.
Step 2: Describe Your Workflow
Paste this prompt into the Assignment Builder and replace the bracketed placeholders with your specifics:Step 3: Review Generated SOP
Duvo will generate a structured SOP from your description. Before continuing, confirm:- The document type (invoice, receipt, ID) is correctly named throughout.
- The field list matches what your target system actually requires.
- The validation rules reflect your real business rules — not overly strict (causing too many flags) or too loose (letting bad data through).
- The anomaly routing describes what your reviewers will see and what actions they can take.
Step 4: Configure Connections
Click “Connections” and add:- Intelligent Document Reader — Already available by default; no additional setup required.
- Email Attachments Reader — Already available by default; needed if documents arrive as email attachments.
- Source connection — Gmail or Outlook (if triggered by email), Google Drive or SharePoint (if polling a folder), or leave unset if using manual file upload.
- Target system — The ERP, spreadsheet, or database where validated records land. See My Logins to store your login.
- Human-in-the-Loop — Already available by default.
Step 5: Set Confidence Thresholds
The Intelligent Document Reader returns an extraction for each field. Add explicit confidence handling to your SOP:Step 6: Cross-Validate Against Master Data
Add validation against your reference data to catch mismatches before they reach your system: Vendor validationStep 7: Handle Tax, Currency, and Multi-Page Edge Cases
Real-world documents have edge cases that cause silent errors if not handled explicitly. Tax and VAT edge casesStep 8: Configure HITL Review Requests
When the assignment routes a document for human review, give reviewers everything they need to decide quickly. Update your SOP with this format:Step 9: Test with Sample Documents
Before processing real documents, run tests that cover the range of scenarios your assignment will encounter:- Click “Start Work” to run the assignment manually with a test document.
- Cover these test cases:
| Test case | What to verify |
|---|---|
| Clean digital PDF, known vendor | All fields extracted, validation passes, record written |
| Scanned image (photo of invoice) | OCR runs, fields extracted, confidence scores visible |
| Multi-page PDF | All pages read, line items from all pages captured |
| Unknown vendor | Flagged for review, record not written |
| Mismatched total (math error) | Discrepancy flagged with specific values |
| Low-confidence field | Field flagged, reviewer sees raw text alongside extracted value |
| Duplicate document number | Duplicate flagged with link to existing record |
| Multi-currency document | Currency extracted correctly, no conversion applied, flagged if non-base |
| Tax-exempt document | Zero tax verified, no spurious tax flag triggered |
| Corrupted or unreadable file | Assignment flags document for manual intervention, no record written |
| Wrong document type (e.g. receipt submitted as invoice) | Document type mismatch surfaced as a clear error, document routed to review |
Step 10: Write to Your Target System
The method the assignment uses to create records depends on your system: API-connected systems (NetSuite, Salesforce, Dynamics 365, Supabase) The assignment calls the system’s API directly. Confirm write permissions are granted to the login you added in Step 4. Spreadsheet-based systems (Google Sheets, Excel) The assignment appends a row per document. Upload your column headers as a Files file so the assignment knows the schema. See Google Sheets or Microsoft Excel. Web-only systems (browser-based ERPs) The assignment navigates the system using Computer Use — it opens the browser, logs in, fills in the record form, and submits. Add to your SOP:Expected Results
When your assignment is running successfully: In your target system:- A new record for each validated document, with all extracted fields populated and source document reference included.
- No partial or incorrect records — anomalous documents are held for human review before anything is written.
- Pending review requests for each flagged document, with extracted fields, confidence scores, and specific failure reasons clearly described.
- For each document processed: document number, vendor, date, extracted fields, validation outcome (auto-validated or reviewed), reviewer name and corrections (if any), and timestamp.
- A complete session log for each job showing each extraction step, validation check, and write action.
Troubleshooting
Extraction is missing fields or returning wrong values
- Scanned quality: Low-resolution scans (below 150 DPI) reduce extraction accuracy. Where possible, request native digital PDFs from vendors rather than scans.
- Complex layouts: Documents with heavy formatting, watermarks, or multi-column tables can confuse extraction. Test your most complex document types and add explicit SOP instructions:
"The vendor name appears in the top-left corner of the first page."— spatial hints improve accuracy. - Non-standard field labels: Some vendors use non-standard labels (“Bill-to Party” instead of “Vendor Name”). Update your SOP to list the alternative labels your documents use.
- Multi-page line items: If line items spanning multiple pages are only partially extracted, add:
"Read all pages before extracting line items. Line items continue until the subtotal row."
Too many documents are going to human review
- Confidence threshold too high: Lower your threshold and re-run your test set to see the effect. A threshold of 85% is a reasonable starting point for most typed PDFs; scanned documents may need 75%.
- Validation rules too strict: Check which rule is generating the most flags. Common culprits: vendor name matching (names differ slightly between document and master data), date format expectations, or amount ranges set too narrow for the actual data.
- Vendor name normalization: Add a normalization step to your SOP:
"Trim whitespace, remove punctuation, and convert to title case before looking up the vendor name."This handles “ACME CORP.”, “Acme Corp”, and “Acme Corp.” as the same vendor.
Records are written with wrong values
- Review: confidence threshold too low: If low-confidence fields are passing through undetected, increase your threshold and/or raise the lower bound below which fields are treated as missing.
- Math check not firing: Confirm your SOP explicitly instructs the assignment to verify totals. Add a separate validation step:
"After extracting all amounts, verify: total = subtotal + tax. If not, flag the specific discrepancy."
Duplicate detection is generating false positives
- Blanket/standing invoices: Some vendors send monthly invoices with the same base reference number. Add an exception:
"Treat document numbers ending in -YYYY-MM as recurring monthly invoices; do not flag these as duplicates." - Reused document numbers across vendors: Narrow the duplicate check to same vendor + same document number rather than document number alone.
Assignment processing stalls or stops mid-run
- Session log: Open the job in Duvo and check the session log for timeout or connection errors. The log shows exactly which step the assignment stopped at.
- Source permissions: Verify the source folder or inbox is still accessible and that the connected login has not expired. Re-authorize the connection if needed.
- Activity Inbox: Check the Activity Inbox for any pending Human-in-the-Loop requests that may be holding up the job.
Records validated but not written to target system
- Write permissions: Confirm the login used for the target system has create/write access, not just read.
- API rate limits: If processing a large backlog, your target system may be throttling writes. Add a pause between writes in your SOP:
"Wait 2 seconds between each record creation." - Schema mismatch: Your target system may reject records missing required fields. Identify which fields are mandatory and add them to the extraction and validation steps so documents are flagged before a write is attempted.
Take It Further
Once your assignment is processing documents reliably, consider these enhancements: Route to BI reportingRelated
- Intelligent Document Reader — Extracts text, tables, and fields from PDFs, Word files, scanned images, and photos
- Human-in-the-Loop — How to pause for human review and resume after a decision
- Designing Human-in-the-Loop Workflows — Patterns for effective review requests and approval gates
- Email Attachments Reader — Extracts content from email attachments, including PDFs and spreadsheets
- Files — Upload master data (vendor lists, PO registers) for the assignment to validate against
- Computer Use — For writing to UI-only systems: browser automation for web ERPs, remote desktop for SAP GUI
- My Logins — Store target system logins securely
- Email Order Intake — Related playbook for processing orders arriving by email