Test Safely Before Going Live

Before you enable a schedule or trigger, you need confidence that your agent behaves correctly — and that a bad Run cannot cause irreversible damage. This guide gives you a concrete test plan, patterns for isolating side effects, and strategies for keeping test and production environments separate. This is the testing half of the journey to production. Once testing is signed off, use Promote to Production to enable your schedule or trigger and hand the agent to your team.

Test Plan Template

Copy this template for each agent you are preparing for production. Work through every section before enabling autonomous Runs.

Agent: [Name]
Builder: [Name]
Test date: [Date]

HAPPY PATH
[ ] Ran with realistic production-like input
[ ] Output is correct and complete
[ ] Run finished within expected duration

EDGE CASES
[ ] Missing or empty input fields — agent handles gracefully (logs, skips, or escalates)
[ ] Unexpected data format (e.g., date as text, wrong currency symbol) — no crash
[ ] Duplicate input — idempotency check prevents duplicate actions

FAILURE INJECTION
[ ] Connection Login revoked mid-Run — agent fails cleanly without partial writes
[ ] Rate limit or timeout from an external system — agent retries or escalates appropriately
[ ] Required field absent in source data — agent stops and surfaces the gap rather than guessing

HUMAN-IN-THE-LOOP COVERAGE
[ ] Every HITL approval branch has been triggered at least once
[ ] Approval request denied — agent handles denial gracefully (stops, logs, does not retry automatically)
[ ] HITL request left unanswered past the configured window — escalation fires

COST AND DURATION
[ ] Estimated cost per Run is within budget
[ ] Typical Run duration fits within the business process SLA

DATA IMPACT
[ ] All writes verified to have landed in the correct records / accounts
[ ] No duplicate records created
[ ] No unintended recipients received emails or notifications

How to Mock Dangerous Side Effects

The goal during testing is to observe what the agent would do without allowing it to write to live systems. Use the patterns below for the connections your agent touches.

Read-only mode via AOP instruction

Add a shadow-mode block at the top of the AOP before running test Runs. Remove it only after you are satisfied with the outputs.

SHADOW MODE — DO NOT WRITE:
Read all data and log what actions you would have taken, but do not
submit, send, update, or delete anything. Summarize at the end:
"Would have [action] for [count] records."

See Guardrails for High-Risk Automations for the full shadow-mode pattern.

Per-connection safe testing

Connection	Safe testing approach
Gmail / Outlook	Send emails to your own address or a team test alias instead of real recipients. Add the target override to the AOP: “Send all emails to test@yourcompany.com during testing.”
Google Sheets / Excel	Point the agent at a copy of the production spreadsheet, not the live one. Rename it clearly: `[COPY — TEST ONLY] Monthly Sales`.
Snowflake / BigQuery	Use a read-only Login during testing. Grant write access only after output verification.
Slack	Post to a private test channel (e.g., `#agent-testing`) instead of customer-facing channels. Update the AOP with the channel name before testing.
HubSpot / Salesforce / CRM	Use a sandbox or developer instance if your plan includes one. If not, add a test-contact filter: “Only update records tagged `TEST` during testing.”
Database writes	Create a staging schema or table prefixed with `test_`. Point the agent at it during testing.

Inspecting proposed actions before they land

Some agent types can log their intended actions before executing. Add this instruction to the AOP when you want to review a batch before committing:

Before performing any write operations, list every action you intend to take.
Pause and request approval:
Title: "Pre-flight review — [count] actions pending"
Description: List each action with the target record and the proposed change.
Only proceed after the review is approved.

Environment Separation Patterns

These are the three approaches teams use most often to keep test traffic separate from production systems.

Pattern 1 — Two builds, one AOP

Create a second build of the agent by duplicating it. Name the copy clearly (e.g., [TEST] Invoice Processor).

The test build uses sandbox accounts, test spreadsheets, and a test Slack channel.
The production build uses production credentials and live targets.
When the AOP changes, update both builds. Promote the change to the production build only after the test build confirms it.

This is the lowest-friction pattern for most teams. See Duplicating Agents for how to duplicate a build.

Pattern 2 — Two teams, one AOP

Some teams run a dedicated sandbox Duvo team (or workspace) alongside the production team.

The sandbox team holds test connections and test data sources.
Builders develop and test in the sandbox team, then recreate the agent in the production team.
Connections in the production team are scoped to live accounts.

This pattern provides the strongest isolation and is common in organizations with strict change-control requirements.

Pattern 3 — Connections scoped per environment

Use a single build but maintain two connection profiles: one for testing and one for production. Before a test run, switch the agent to use the sandbox connections. After testing, switch back. This approach requires discipline to avoid accidentally running with the wrong connections. Reserve it for simple agents where the risk of a mistaken production write is low.

Using Agent Versions for Safe Iteration

Every time you save a change to your agent, Duvo creates a new revision. Use revisions to iterate safely during testing without losing working configurations.

When to create a new revision vs edit in place

Create a meaningful revision when you make a change that materially affects behavior — new HITL gate, updated output format, different connection target. Small wording tweaks to an AOP that is already working can be saved in place.

Testing a change before making it live

Open the agent and make your changes

Open the agent and make your changes.

Save to create a new revision

Save. Duvo creates a new revision.

Run a test Run on the new revision

Run a test Run using the new revision to verify behavior.

Revert immediately if the test fails

If the test Run fails, revert to the previous revision immediately — do not leave a broken revision as live while you investigate.

Reverting to a previous revision

Open the agent

Open the agent.

Open the revision selector

Click the revision selector in the builder toolbar.

Select the last known-good revision

Select the last known-good revision.

Confirm the reverted AOP is in effect

The agent now uses the reverted AOP for the next manual run, scheduled run, or trigger.

Scheduled Runs and triggers always execute the live revision, not the one you are currently viewing. A warning banner appears at the top of the builder whenever you are viewing a non-live revision.

See Agent Versions for the full versioning reference.

Pre-Production Checklist

Work through this list when testing is complete and you are ready to go live. If any item is not checked, address it before enabling the schedule or trigger.

TESTING SIGNED OFF
[ ] Test plan completed — all sections checked
[ ] At least three end-to-end test Runs have run without unexpected failures
[ ] HITL branches exercised: approval granted, approval denied, escalation triggered
[ ] Edge cases identified during testing are handled in the AOP or accepted as known gaps

ENVIRONMENT READY
[ ] Agent points to production connections, not sandbox or personal accounts
[ ] Production connections are scoped to the minimum permissions required
[ ] Logins and Secrets belong to a service account or shared Login, not a personal account
[ ] Shadow-mode instruction removed from the AOP

RISK CONTROLS IN PLACE
[ ] Every irreversible or externally visible action has a HITL approval gate
[ ] Hard caps on action volume or spend are set in the AOP
[ ] Idempotency checks prevent duplicate actions on retry

TEAM READY
[ ] Runbook complete — owner, backup contact, failure response, pause procedure
[ ] Operators know how to find Runs and respond to HITL requests
[ ] Rollback path tested — previous working revision identified

Ready to go? Continue with Promote to Production.

Promote to Production

Enable your schedule or trigger and hand the agent to your team.

Guardrails for High-Risk Automations

Risk scoring, shadow mode, and hard caps for high-stakes agents.

Agent Versions

How versioning works and how to revert a change.

Designing Human-in-the-Loop Workflows

When to add approval gates and how to configure escalation.

Duplicating Agents

How to create a test build alongside your production build.

​Test Plan Template

​How to Mock Dangerous Side Effects

​Read-only mode via AOP instruction

​Per-connection safe testing

​Inspecting proposed actions before they land

​Environment Separation Patterns

​Pattern 1 — Two builds, one AOP

​Pattern 2 — Two teams, one AOP

​Pattern 3 — Connections scoped per environment

​Using Agent Versions for Safe Iteration

​When to create a new revision vs edit in place

​Testing a change before making it live

​Reverting to a previous revision

​Pre-Production Checklist

​Related

Promote to Production

Guardrails for High-Risk Automations

Agent Versions

Designing Human-in-the-Loop Workflows

Duplicating Agents

Test Plan Template

How to Mock Dangerous Side Effects

Read-only mode via AOP instruction

Per-connection safe testing

Inspecting proposed actions before they land

Environment Separation Patterns

Pattern 1 — Two builds, one AOP

Pattern 2 — Two teams, one AOP

Pattern 3 — Connections scoped per environment

Using Agent Versions for Safe Iteration

When to create a new revision vs edit in place

Testing a change before making it live

Reverting to a previous revision

Pre-Production Checklist

Related