Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.duvo.ai/llms.txt

Use this file to discover all available pages before exploring further.

Time to complete45 minutes
DifficultyIntermediate
PrerequisitesAccess to your document source, access to your target system and master data
You’ll buildAn assignment that extracts fields from invoices, receipts, or IDs, validates them against your data, and routes low-confidence or anomalous records for human review

Why Automate This?

The Problem: Documents arrive in bulk — scanned invoices from vendors, receipts submitted by employees, ID copies from customers. Someone has to open each file, read the fields, and key them into a system. They cross-check every entry against vendor lists or PO registers and flag anything that looks wrong. At any volume, this is slow, error-prone, and expensive. A misread invoice number, a vendor not in the system, or a VAT rate that does not match all create downstream problems in finance, compliance, or operations. The Solution: A Duvo assignment that reads every document using the Intelligent Document Reader, extracts the fields you need, validates each field against your master data, and routes anything below your confidence threshold or outside your rules to a human reviewer. Clean documents go straight to your system of record. Borderline ones get a human decision before anything is written. Expected Results:
  • Eliminate manual data entry for clean, well-formatted documents
  • Catch extraction errors and validation failures before they reach your system
  • Create a full audit trail: what was extracted, what was changed by a reviewer, and why
  • Scale document processing without adding headcount

What You’ll Build

By the end of this playbook, you’ll have an assignment that:
  1. Reads incoming documents (PDFs, scanned images, or Word files) from a folder, email, or upload
  2. Extracts structured fields: vendor, invoice number, line items, amounts, dates, tax, and any custom fields you need
  3. Checks field-level confidence scores and flags anything the extraction was uncertain about
  4. Cross-validates extracted values against your master data: vendor list, PO register, employee directory, or product catalog
  5. Handles tax, currency, and formatting edge cases (VAT rates, multi-currency, multi-page documents)
  6. Routes low-confidence or anomalous documents to a human reviewer with full extraction context
  7. Writes validated records to your system of record: ERP, spreadsheet, finance system, or database
  8. Maintains a complete audit trail of every extraction, review decision, and write action
Connections used:
  • Intelligent Document Reader — extract text and structured fields from PDFs, Word files, scanned images, and photos
  • Email Attachments Reader (if documents arrive by email) — pull attachments from incoming messages
  • Your source system — Google Drive, SharePoint, email inbox, or a file upload trigger
  • Your target system — NetSuite, SAP, Google Sheets, Salesforce, Dynamics 365, or your ERP
  • Human-in-the-Loop — pause for review on low-confidence extractions and validation failures

Before You Start

Make sure you have these ready:
  • Sample documents — Prepare a test set of 8–12 documents covering:
    • Clean digital PDF invoice
    • Scanned image invoice (JPG or PNG)
    • Multi-page PDF
    • Document with multiple tax rates
    • Zero-tax or tax-exempt document
    • At least one intentionally imperfect scan (low resolution, skewed, or photographed)
    Anonymize each file before use: redact or replace vendor names, amounts, and personal data with fictional values. Upload the set to Files so the whole team can run tests with consistent inputs. Reuse this canonical test suite whenever you change the SOP.
  • Master data access — Your vendor list, PO register, employee directory, or product catalog. Either accessible via API in your target system, or exportable as a CSV you can upload to Files.
  • Target system login — A login with read and write access to the system where validated records will land. Store logins securely.
  • Field list — Know exactly which fields you need to extract. For invoices: vendor name, vendor ID, invoice number, invoice date, due date, line items (description, quantity, unit price, total), subtotal, tax amount, tax rate, and total amount payable. Adjust for your document type.
  • Validation rules — Document your rules: which vendors are in your system, what PO number formats look valid, what quantity or amount ranges are normal.

Step 1: Create Your Assignment

  1. Click ”+ Create Assignment” from your dashboard.
  2. Select “Use Assignment Builder”.

Step 2: Describe Your Workflow

Paste this prompt into the Assignment Builder and replace the bracketed placeholders with your specifics:
For each document I provide (or that arrives in [your source — email inbox / Google Drive folder / file upload]):

1. Use the Intelligent Document Reader to extract:
   - Vendor name and vendor ID (if present)
   - Invoice/receipt/document number
   - Document date and due date
   - Line items: description, quantity, unit price, line total
   - Subtotal, tax amount, tax rate, and total amount
   - Currency
   - Any additional fields: [list your custom fields]

2. For each extracted field, note the extraction confidence. Flag any field where confidence is below [your threshold, e.g. 85%]. (You will refine this into high/medium/low confidence bands in Step 5.)

3. Validate the extracted data:
   - Check that the vendor exists in [your master data source]
   - Check that the invoice/document number matches your expected format: [your format, e.g. INV-YYYY-NNNN]
   - Check that the total amount equals subtotal + tax (within a rounding tolerance of 0.01)
   - Check that line item totals equal quantity × unit price for each line
   - Check that the tax rate is one of the valid rates: [your valid rates, e.g. 0%, 5%, 20%]
   - Check that amounts are within normal range: [your range, e.g. between $1 and $500,000]

4. If all fields pass validation and all confidence scores are above threshold:
   - Write the record to [your target system]
   - Record: document number, extracted fields, date processed, and outcome "auto-validated"

5. If any field fails validation or is below confidence threshold:
   - Do not write the record
   - Request human review. In the title include: "[failure type] — [document number] — [vendor name]". In the description include: all extracted fields (marking which failed and why), the raw text from the document for failed fields, and suggested corrections if confidence is above 60%.
   - After the human resolves the review (approves with corrections or rejects), write the corrected record or discard it. Record the reviewer's name, decision, and any corrections made.
Click “Generate” to create the assignment SOP.

Step 3: Review Generated SOP

Duvo will generate a structured SOP from your description. Before continuing, confirm:
  • The document type (invoice, receipt, ID) is correctly named throughout.
  • The field list matches what your target system actually requires.
  • The validation rules reflect your real business rules — not overly strict (causing too many flags) or too loose (letting bad data through).
  • The anomaly routing describes what your reviewers will see and what actions they can take.
Edit the SOP directly to adjust anything that does not match.

Step 4: Configure Connections

Click “Connections” and add:
  1. Intelligent Document Reader — Already available by default; no additional setup required.
  2. Email Attachments Reader — Already available by default; needed if documents arrive as email attachments.
  3. Source connection — Gmail or Outlook (if triggered by email), Google Drive or SharePoint (if polling a folder), or leave unset if using manual file upload.
  4. Target system — The ERP, spreadsheet, or database where validated records land. See My Logins to store your login.
  5. Human-in-the-Loop — Already available by default.

Step 5: Set Confidence Thresholds

The Intelligent Document Reader returns an extraction for each field. Add explicit confidence handling to your SOP:
For each extracted field:
- If extraction confidence is above [your high threshold, e.g. 90%]: treat the value as reliable.
- If confidence is between [your low threshold, e.g. 60%] and [your high threshold]: include the value in the Human-in-the-Loop review request but do not write it to the target system yet. Flag it for human verification. In the review request, show the extracted value alongside the raw text it came from.
- If confidence is below [your low threshold, e.g. 60%]: treat the field as missing. Do not populate it automatically. Flag for human to provide the correct value.
Adjust thresholds based on what your team finds in testing (Step 9). Tighter thresholds mean more human reviews; looser thresholds mean more automated processing with some risk of errors slipping through.

Step 6: Cross-Validate Against Master Data

Add validation against your reference data to catch mismatches before they reach your system: Vendor validation
Look up the extracted vendor name in [your vendor list or system].
- If an exact match is found, use the system vendor ID.
- If no exact match but a close match exists (edit distance ≤ 2 characters), flag for human confirmation: "Possible vendor match: [matched name] — confirm or correct."
- If no match at all, flag as unknown vendor.
PO/reference number validation
If a PO number is present on the document:
- Look up the PO number in [your PO system or spreadsheet].
- Check that the invoice amount does not exceed the PO's remaining balance.
- If the PO is closed or does not exist, flag as anomaly.
Amount cross-checks
Verify internally:
- Total = subtotal + tax (allow ±0.01 for rounding)
- Each line total = quantity × unit price (allow ±0.01)
- Tax amount = subtotal × tax rate (allow ±0.01)
If any check fails, flag the discrepancy with the specific values that do not reconcile.

Step 7: Handle Tax, Currency, and Multi-Page Edge Cases

Real-world documents have edge cases that cause silent errors if not handled explicitly. Tax and VAT edge cases
If the document contains multiple tax rates on different line items, extract each rate separately and validate each one.
If the document is marked as tax-exempt or zero-rated, confirm no tax amount is present.
If the currency differs from your base currency, extract the currency code and apply no conversion — flag for the reviewer to handle FX manually.
Multi-page documents
Read all pages of the document. Header fields (vendor, document number, date) typically appear on page 1. Line items may span multiple pages. Footer totals appear on the last page.
If the same field appears on multiple pages with different values, flag as a conflict and include both values for review.
Duplicate detection
Before writing any record, check [your target system] for an existing record with the same document number from the same vendor received within the last 90 days. If found, flag as a possible duplicate with a link to the existing record.

Step 8: Configure HITL Review Requests

When the assignment routes a document for human review, give reviewers everything they need to decide quickly. Update your SOP with this format:
When requesting human review:
- Title format: "[Failure type] — [Document number] — [Vendor name]"
  Examples: "Low confidence: total amount — INV-2024-0847 — Acme Supplies"
            "Unknown vendor — REC-20240312-004 — (not in vendor list)"
            "Math discrepancy — INV-2024-0901 — Beta Industries"
            "Possible duplicate — INV-2024-0851 — Gamma Corp"
- Description: show all extracted fields with their confidence scores. For each flagged field, show the raw text from the document and the extracted value side by side. For validation failures, state the specific rule that failed and the expected vs. actual value.
- Options: (a) Approve with corrections — the reviewer updates any wrong fields and the record is written. Record which fields were changed. (b) Reject — the document is discarded and logged as rejected with the reviewer's reason.
See Human-in-the-Loop and Designing Human-in-the-Loop Workflows for guidance on structuring effective review requests.

Step 9: Test with Sample Documents

Before processing real documents, run tests that cover the range of scenarios your assignment will encounter:
  1. Click “Start Work” to run the assignment manually with a test document.
  2. Cover these test cases:
Test caseWhat to verify
Clean digital PDF, known vendorAll fields extracted, validation passes, record written
Scanned image (photo of invoice)OCR runs, fields extracted, confidence scores visible
Multi-page PDFAll pages read, line items from all pages captured
Unknown vendorFlagged for review, record not written
Mismatched total (math error)Discrepancy flagged with specific values
Low-confidence fieldField flagged, reviewer sees raw text alongside extracted value
Duplicate document numberDuplicate flagged with link to existing record
Multi-currency documentCurrency extracted correctly, no conversion applied, flagged if non-base
Tax-exempt documentZero tax verified, no spurious tax flag triggered
Corrupted or unreadable fileAssignment flags document for manual intervention, no record written
Wrong document type (e.g. receipt submitted as invoice)Document type mismatch surfaced as a clear error, document routed to review
Review the session log after each test. Check that the extraction pulled the correct values and that validation fired on the right cases. Adjust confidence thresholds and validation rules in your SOP based on what you observe.

Step 10: Write to Your Target System

The method the assignment uses to create records depends on your system: API-connected systems (NetSuite, Salesforce, Dynamics 365, Supabase) The assignment calls the system’s API directly. Confirm write permissions are granted to the login you added in Step 4. Spreadsheet-based systems (Google Sheets, Excel) The assignment appends a row per document. Upload your column headers as a Files file so the assignment knows the schema. See Google Sheets or Microsoft Excel. Web-only systems (browser-based ERPs) The assignment navigates the system using Computer Use — it opens the browser, logs in, fills in the record form, and submits. Add to your SOP:
Open [your system URL] and log in using the stored login.
Navigate to Create [Invoice / Receipt / Document] and fill in the fields as extracted.
Submit the form and confirm the record number displayed on screen.
Record the system-assigned ID in the audit log.
Desktop client systems (SAP GUI, legacy ERPs) Connect via Windows Remote Desktop to a machine running the client.

Expected Results

When your assignment is running successfully: In your target system:
  • A new record for each validated document, with all extracted fields populated and source document reference included.
  • No partial or incorrect records — anomalous documents are held for human review before anything is written.
In your Activity Inbox:
  • Pending review requests for each flagged document, with extracted fields, confidence scores, and specific failure reasons clearly described.
In your audit log (spreadsheet, ERP, or Files):
  • For each document processed: document number, vendor, date, extracted fields, validation outcome (auto-validated or reviewed), reviewer name and corrections (if any), and timestamp.
In Duvo:
  • A complete session log for each job showing each extraction step, validation check, and write action.

Troubleshooting

Extraction is missing fields or returning wrong values

  • Scanned quality: Low-resolution scans (below 150 DPI) reduce extraction accuracy. Where possible, request native digital PDFs from vendors rather than scans.
  • Complex layouts: Documents with heavy formatting, watermarks, or multi-column tables can confuse extraction. Test your most complex document types and add explicit SOP instructions: "The vendor name appears in the top-left corner of the first page." — spatial hints improve accuracy.
  • Non-standard field labels: Some vendors use non-standard labels (“Bill-to Party” instead of “Vendor Name”). Update your SOP to list the alternative labels your documents use.
  • Multi-page line items: If line items spanning multiple pages are only partially extracted, add: "Read all pages before extracting line items. Line items continue until the subtotal row."

Too many documents are going to human review

  • Confidence threshold too high: Lower your threshold and re-run your test set to see the effect. A threshold of 85% is a reasonable starting point for most typed PDFs; scanned documents may need 75%.
  • Validation rules too strict: Check which rule is generating the most flags. Common culprits: vendor name matching (names differ slightly between document and master data), date format expectations, or amount ranges set too narrow for the actual data.
  • Vendor name normalization: Add a normalization step to your SOP: "Trim whitespace, remove punctuation, and convert to title case before looking up the vendor name." This handles “ACME CORP.”, “Acme Corp”, and “Acme Corp.” as the same vendor.

Records are written with wrong values

  • Review: confidence threshold too low: If low-confidence fields are passing through undetected, increase your threshold and/or raise the lower bound below which fields are treated as missing.
  • Math check not firing: Confirm your SOP explicitly instructs the assignment to verify totals. Add a separate validation step: "After extracting all amounts, verify: total = subtotal + tax. If not, flag the specific discrepancy."

Duplicate detection is generating false positives

  • Blanket/standing invoices: Some vendors send monthly invoices with the same base reference number. Add an exception: "Treat document numbers ending in -YYYY-MM as recurring monthly invoices; do not flag these as duplicates."
  • Reused document numbers across vendors: Narrow the duplicate check to same vendor + same document number rather than document number alone.

Assignment processing stalls or stops mid-run

  • Session log: Open the job in Duvo and check the session log for timeout or connection errors. The log shows exactly which step the assignment stopped at.
  • Source permissions: Verify the source folder or inbox is still accessible and that the connected login has not expired. Re-authorize the connection if needed.
  • Activity Inbox: Check the Activity Inbox for any pending Human-in-the-Loop requests that may be holding up the job.

Records validated but not written to target system

  • Write permissions: Confirm the login used for the target system has create/write access, not just read.
  • API rate limits: If processing a large backlog, your target system may be throttling writes. Add a pause between writes in your SOP: "Wait 2 seconds between each record creation."
  • Schema mismatch: Your target system may reject records missing required fields. Identify which fields are mandatory and add them to the extraction and validation steps so documents are flagged before a write is attempted.

Take It Further

Once your assignment is processing documents reliably, consider these enhancements: Route to BI reporting
After writing each validated record, also append a row to [your reporting spreadsheet or BI dataset]
with: document type, vendor, date, total amount, currency, validation outcome, and processing time.
This feeds your accounts payable or expense dashboard without additional manual export.
Close-the-loop notifications
When a document is written to your system:
- Send a Slack message to #finance-ops with: vendor name, document number, amount, and a link to the record.
- For documents that required human review, include the reviewer's name and what was corrected.
Batch processing from a shared folder
Run the assignment on a schedule — every hour, or at 7am and 1pm each day.
Each run: list all new files in [your Google Drive folder / SharePoint folder] added since the last run.
Process each one, write validated records, and send a daily summary to the team.
Escalate aging review requests
If a Human-in-the-Loop review request has been open for more than 4 hours without a response:
Send a Slack DM to [the team lead] with the document details and a link to the pending request.
Connect to spend analytics See the ROI and business-impact reporting guide once available for how to surface document processing metrics — volume processed, auto-validated rate, review rate, and error rate — to your finance team.