Skip to main content
Before you enable a schedule or trigger, you need confidence that your agent behaves correctly — and that a bad Run cannot cause irreversible damage. This guide gives you a concrete test plan, patterns for isolating side effects, and strategies for keeping test and production environments separate. This is the testing half of the journey to production. Once testing is signed off, use Promote to Production to enable your schedule or trigger and hand the agent to your team.

Test Plan Template

Copy this template for each agent you are preparing for production. Work through every section before enabling autonomous Runs.
Agent: [Name]
Builder: [Name]
Test date: [Date]

HAPPY PATH
[ ] Ran with realistic production-like input
[ ] Output is correct and complete
[ ] Run finished within expected duration

EDGE CASES
[ ] Missing or empty input fields — assignment handles gracefully (logs, skips, or escalates)
[ ] Unexpected data format (e.g., date as text, wrong currency symbol) — no crash
[ ] Duplicate input — idempotency check prevents duplicate actions

FAILURE INJECTION
[ ] Connection Login revoked mid-Run — assignment fails cleanly without partial writes
[ ] Rate limit or timeout from an external system — assignment retries or escalates appropriately
[ ] Required field absent in source data — assignment stops and surfaces the gap rather than guessing

HUMAN-IN-THE-LOOP COVERAGE
[ ] Every HITL approval branch has been triggered at least once
[ ] Approval request denied — assignment handles denial gracefully (stops, logs, does not retry automatically)
[ ] HITL request left unanswered past the configured window — escalation fires

COST AND DURATION
[ ] Estimated cost per Run is within budget
[ ] Typical Run duration fits within the business process SLA

DATA IMPACT
[ ] All writes verified to have landed in the correct records / accounts
[ ] No duplicate records created
[ ] No unintended recipients received emails or notifications

How to Mock Dangerous Side Effects

The goal during testing is to observe what the agent would do without allowing it to write to live systems. Use the patterns below for the connections your agent touches.

Read-only mode via AOP instruction

Add a shadow-mode block at the top of the AOP before running test Runs. Remove it only after you are satisfied with the outputs.
SHADOW MODE — DO NOT WRITE:
Read all data and log what actions you would have taken, but do not
submit, send, update, or delete anything. Summarize at the end:
"Would have [action] for [count] records."
See Guardrails for High-Risk Automations for the full shadow-mode pattern.

Per-connection safe testing

ConnectionSafe testing approach
Gmail / OutlookSend emails to your own address or a team test alias instead of real recipients. Add the target override to the AOP: “Send all emails to test@yourcompany.com during testing.”
Google Sheets / ExcelPoint the agent at a copy of the production spreadsheet, not the live one. Rename it clearly: [COPY — TEST ONLY] Monthly Sales.
Snowflake / BigQueryUse a read-only Login during testing. Grant write access only after output verification.
SlackPost to a private test channel (e.g., #assignment-testing) instead of customer-facing channels. Update the AOP with the channel name before testing.
HubSpot / Salesforce / CRMUse a sandbox or developer instance if your plan includes one. If not, add a test-contact filter: “Only update records tagged TEST during testing.”
Database writesCreate a staging schema or table prefixed with test_. Point the agent at it during testing.

Inspecting proposed actions before they land

Some agent types can log their intended actions before executing. Add this instruction to the AOP when you want to review a batch before committing:
Before performing any write operations, list every action you intend to take.
Pause and request approval:
Title: "Pre-flight review — [count] actions pending"
Description: List each action with the target record and the proposed change.
Only proceed after the review is approved.

Environment Separation Patterns

These are the three approaches teams use most often to keep test traffic separate from production systems.

Pattern 1 — Two builds, one AOP

Create a second build of the agent by duplicating it. Name the copy clearly (e.g., [TEST] Invoice Processor).
  • The test build uses sandbox accounts, test spreadsheets, and a test Slack channel.
  • The production build uses production credentials and live targets.
  • When the AOP changes, update both builds. Promote the change to the production build only after the test build confirms it.
This is the lowest-friction pattern for most teams. See Duplicating Agents for how to duplicate a build.

Pattern 2 — Two teams, one AOP

Some teams run a dedicated sandbox Duvo team (or workspace) alongside the production team.
  • The sandbox team holds test connections and test data sources.
  • Builders develop and test in the sandbox team, then recreate the agent in the production team.
  • Connections in the production team are scoped to live accounts.
This pattern provides the strongest isolation and is common in organizations with strict change-control requirements.

Pattern 3 — Connections scoped per environment

Use a single build but maintain two connection profiles: one for testing and one for production. Before a test run, switch the agent to use the sandbox connections. After testing, switch back. This approach requires discipline to avoid accidentally running with the wrong connections. Reserve it for simple agents where the risk of a mistaken production write is low.

Using Agent Versions for Safe Iteration

Every time you save a change to your agent, Duvo creates a new revision. Use revisions to iterate safely during testing without losing working configurations.

When to create a new revision vs edit in place

Create a meaningful revision when you make a change that materially affects behavior — new HITL gate, updated output format, different connection target. Small wording tweaks to a AOP that is already working can be saved in place.

Testing a change before making it live

  1. Open the agent and make your changes.
  2. Save. Duvo creates a new revision.
  3. Run a test Run using the new revision to verify behavior.
  4. If the test Run fails, revert to the previous revision immediately — do not leave a broken revision as live while you investigate.

Reverting to a previous revision

  1. Open the agent.
  2. Click the revision selector in the builder toolbar.
  3. Select the last known-good revision.
  4. The agent now uses the reverted AOP for the next manual run, scheduled run, or trigger.
Note: Scheduled Runs and triggers always execute the live revision, not the one you are currently viewing. A warning banner appears at the top of the builder whenever you are viewing a non-live revision.
See Agent Versions for the full versioning reference.

Pre-Production Checklist

Work through this list when testing is complete and you are ready to go live. If any item is not checked, address it before enabling the schedule or trigger.
TESTING SIGNED OFF
[ ] Test plan completed — all sections checked
[ ] At least three end-to-end test Runs have run without unexpected failures
[ ] HITL branches exercised: approval granted, approval denied, escalation triggered
[ ] Edge cases identified during testing are handled in the AOP or accepted as known gaps

ENVIRONMENT READY
[ ] Agent points to production connections, not sandbox or personal accounts
[ ] Production connections are scoped to the minimum permissions required
[ ] Logins and Secrets belong to a service account or shared Login, not a personal account
[ ] Shadow-mode instruction removed from the AOP

RISK CONTROLS IN PLACE
[ ] Every irreversible or externally visible action has a HITL approval gate
[ ] Hard caps on action volume or spend are set in the AOP
[ ] Idempotency checks prevent duplicate actions on retry

TEAM READY
[ ] Runbook complete — owner, backup contact, failure response, pause procedure
[ ] Operators know how to find Runs and respond to HITL requests
[ ] Rollback path tested — previous working revision identified
Ready to go? Continue with Promote to Production.