Monitor the First Week

The first five to seven days after an agent goes live are the highest-risk period. Volume is real, edge cases you did not test for will surface, and small problems compound quickly if you are not watching. This guide gives you a daily review cadence, the specific signals to track in Duvo, and a clear playbook for the most common anomalies — so you can tell the difference between normal early noise and something that genuinely needs your attention.

Why the First Week Is Different

During testing you controlled the inputs. In production, inputs are unpredictable — real emails, real files, real volumes. Agents that worked perfectly in tests often surface new failure modes in the first week simply because the variety of real-world data is much wider than your test cases covered. The goal of first-week monitoring is not to watch every Run in real time. It is to catch pattern changes early enough to act before they become problems, and to graduate off intensive monitoring once you have evidence that the agent is stable.

Daily Checks: Day 1 to Day 7

Run through this list each day while the agent is in its first week of production use. It takes about five to ten minutes.

Requests

Open Requests and check for pending approval requests.

Are there more pending requests than expected? A spike in Needs Input usually means the AOP is hitting an ambiguous case it is not confident about — see Spike in Needs Input below.
Are requests sitting unanswered for more than a few hours? Unanswered requests pause the Runs that created them. If your team is not seeing the notifications, adjust the escalation path in the AOP or your Slack/email notification setup.

Queue health (if applicable)

If the agent uses Queue, open the Cases view and check the status breakdown for the time period covering the last 24 hours.

Metric	Healthy range	Watch for
Failed rate	Under 5% of cases processed	Any sudden spike above your baseline
Needs Input rate	Stable or declining over the week	Rising rate signals an AOP gap
Postponed rate	Stable	Rising rate may indicate an upstream system is slow

Runs List

Open the Runs List (Past Runs) and apply the Needs attention filter to see any Failed or Needs Input Runs from the past 24 hours.

For each failed Run, open it and note which step failed. Look for clustering: if five Runs all fail at the same step, that is an AOP or Connection issue, not random noise.
If a Run failed due to a Connection error, go to Connections, check whether the Connection is still authorized, and re-authorize if needed.

Cost per Run trend

In Team Insights, check the average cost per run for this agent over the past 24 hours against your baseline from testing.

A cost spike often means the agent is making more tool calls than expected — common when the AOP does not give the agent a clear stopping point and it keeps searching or retrying.
A cost spike can also indicate a tool-call loop: the agent calls a tool, gets an unexpected result, and tries again repeatedly. Open the high-cost Run from the Runs List and scroll through the steps to find where calls are repeating.

Output spot-check

Pick three to five completed Runs at random and open them. Read the output the agent produced.

Does the output look correct? Would a human reviewer accept it?
Are there patterns in what looks off? Edge cases cluster — if you find one, look for others like it.

Team Insights Signals to Watch

Open Team Insights and set the time period to Last 7 days. Use these signals to identify trends early.

Run volume vs expected

Compare the number of Runs triggered against what you expected when you set up the schedule or trigger.

Significantly fewer Runs than expected may mean the trigger is not firing (check the trigger configuration) or Runs are failing before they complete (check the Failed count).
Significantly more Runs than expected may mean the trigger is firing on events you did not intend. Check the trigger source configuration and consider adding a filter in the AOP.

Source breakdown

The source breakdown shows where runs originate — manual, scheduled, or from a specific trigger.

If the source mix changes unexpectedly (for example, a scheduled agent suddenly shows a spike in manual runs), investigate whether team members are retriggering Runs manually because they do not trust the automatic output.

Failure clusters

If the failure rate is above your baseline, look for clustering before making AOP changes.

Single Connection failing repeatedly: the external service may be experiencing issues, or the Connection credentials expired. Check the third-party service status page and re-authorize the Connection if needed.
Failures spread across multiple Connections: this usually points to an AOP logic issue — the agent is attempting something in the wrong order or with the wrong data.
Failures only on certain input types: your AOP needs to handle that input shape. Add a HITL gate to catch it while you refine the AOP.

HITL approval rate by branch

If the agent has multiple approval gates, check whether one is generating significantly more requests than others. A single gate driving most of the Needs Input volume is usually the first AOP gap to fix.

Anomaly Response Patterns

Spike in Needs Input

What it means: The agent is frequently reaching a decision point it is not confident about. It is asking for help rather than deciding autonomously.How to respond:

Read recent Needs Input requests

Open several recent Needs Input requests in Requests and read them carefully.

Identify what they have in common

Identify what they have in common — the same type of question, the same step in the workflow, the same data condition.

Update the AOP to handle the condition

Update the AOP to handle that condition explicitly. Either give the agent a rule to follow, or widen the HITL gate with a clearer decision framework so reviewers can respond consistently.

Tighten the gate if approval is always required

If the condition genuinely should always require human approval, tighten the HITL gate description so reviewers understand what they are approving.

Spike in Failed Runs

What it means: Runs are ending without completing successfully. This requires immediate investigation.How to respond:

Read through failed Runs

Open two or three failed Runs from the Runs List (Past Runs in the sidebar) and read through the steps of each one to find where it failed.

Check for a shared failing step

If they all fail at the same step, that step has a problem — a Connection issue, a data format mismatch, or a logic error in the AOP.

Look for a common thread across steps

If they fail at different steps, look for a common thread: the same Connection, the same input type, or the same time of day (which may indicate an external service outage).

Pause if the rate is high

If the failure rate is high and you cannot immediately fix the root cause, consider pausing the agent (disable the schedule or trigger) and communicating to your team while you investigate.

Cost spike

What it means: Individual Runs are consuming significantly more than your baseline cost estimate.How to respond:

Open a high-cost Run

Open one of the high-cost Runs from the Runs List and scroll through the steps.

Count the tool calls

Count how many tool calls the agent made. A healthy Run typically makes a predictable number of calls. A high-cost Run often shows many repeated calls to the same tool — a sign of a loop.

Add an exit condition for loops

If you see a tool-call loop, update the AOP to give the agent a clear exit condition. For example: “If you do not find the record after searching twice, stop and request HITL.”

Check for oversized inputs

If the cost is high but the tool calls look reasonable, the inputs may be much larger than expected. Check whether you can batch the input or pre-filter it before the agent processes it.

Latency drift

What it means: Runs are taking significantly longer to complete than during testing.How to respond:

Check whether the drift is consistent

Check whether the latency drift is consistent across all Runs or isolated to specific runs.

Investigate isolated slowdowns

If isolated, the upstream system may have been slow at that time (check external service status pages or your own infrastructure logs).

Compare production vs testing volume

If consistent, compare the volume of data the agent is processing in production versus testing. Real-world volumes are often larger. Update the AOP to process in smaller batches, or add a volume cap while you investigate.

Adjust the schedule against the SLA

If Run duration is approaching your business process SLA, consider whether you need to adjust the schedule (run more frequently in smaller batches) or add a volume limit to the AOP.

Emergency Stop

If something is clearly going wrong and you need to stop the agent immediately, use one of these methods.

Disable the schedule or trigger
Revoke a connection
Revert to a previous version

Open the agent

Open the agent.

Open the schedule controls

Click the Schedule button in the agent header.

Toggle off the schedule or trigger

Toggle the schedule off, or navigate to the Triggers tab and disable the trigger.

Let running Runs finish

Runs already running will complete. No new Runs will start.

See Agent Versions for details on browsing and restoring revisions.

Post-Incident Template

When something goes wrong and you need to communicate or document it, use this template. To identify the root cause before filling it in: open one of the failed Runs from the Runs List and read the step where it failed. A Connection error points to a credential issue. Repeated failures on the same step with similar inputs point to an AOP logic gap. A failure on the very first step often means the input data was missing or malformed.

Date: [date]
Agent: [name]
Owner: [name]

What happened:
[One paragraph: what the agent did, what it should have done, and what the impact was.]

How we detected it:
[Daily check / alert / team member reported / ...]

Root cause:
[What specifically caused the failure — e.g.:
 - AOP gap: the agent hit a case the AOP did not handle
 - Connection issue: a Connection was disconnected or its credentials expired
 - Input data issue: the data passed to the agent was missing or malformed
 - External service outage: the third-party service was unavailable]

What we changed:
[AOP update / Connection fix / volume cap added / ...]

What we will watch:
[The specific metric or signal we will monitor over the next [N] days to confirm the fix worked.]

Graduating Off First-Week Mode

You can move from daily checks to a routine operational rhythm when all of the following are true for three consecutive days:

Failed rate is under 5% and stable or declining.
Needs Input rate is stable or declining.
No cost or latency spikes.
Output spot-checks look correct.
No unresolved anomalies in Requests.

Handoff to ongoing operational rhythm

When you graduate off first-week monitoring:

Move to weekly checks

Move to weekly rather than daily checks, using Team Insights to review the past seven days.

Set up persistent alerts

Set up persistent alerts if your agent uses a notification step in the AOP — for example, a Slack message after each run summarizing the outcome.

Schedule a 30-day review

Schedule a 30-day review with the agent owner to reassess whether the AOP needs refinement based on accumulated production experience.

Update the runbook

Update the agent runbook with anything you learned in the first week — owner contact, typical cost per Run, known edge cases, and the escalation path.

Promote to Production

The checklist and setup steps before going live.

Runs List

Where to find failed and in-progress Runs across all agents.

Team Insights

Run volume, failure rates, and cost trends.

Requests

Where HITL requests land and how to respond.

Queue

Case statuses and failure rates for queue-based agents.

Agent Versions

Reviewing and reverting to a previous revision.

Welcome

Getting Started

Examples

Building Agents

Running Agents

Agent Features

Skills

Connections

Playbooks

Solutions

Analytics

Advanced

Reliability

Security

Resources

Organizations

Teams

Monitor the First Week

Why the First Week Is Different

Daily Checks: Day 1 to Day 7

Requests

Queue health (if applicable)

Runs List

Cost per Run trend

Output spot-check

Team Insights Signals to Watch

Anomaly Response Patterns

Emergency Stop

Post-Incident Template

Graduating Off First-Week Mode

Handoff to ongoing operational rhythm

Promote to Production

Runs List

Team Insights

Requests

Queue

Agent Versions

​Why the First Week Is Different

​Daily Checks: Day 1 to Day 7

​Requests

​Queue health (if applicable)

​Runs List

​Cost per Run trend

​Output spot-check

​Team Insights Signals to Watch

​Anomaly Response Patterns

​Emergency Stop

​Post-Incident Template

​Graduating Off First-Week Mode

​Handoff to ongoing operational rhythm

​Related

Promote to Production

Runs List

Team Insights

Requests

Queue

Agent Versions

Why the First Week Is Different

Daily Checks: Day 1 to Day 7

Requests

Queue health (if applicable)

Runs List

Cost per Run trend

Output spot-check

Team Insights Signals to Watch

Anomaly Response Patterns

Emergency Stop

Post-Incident Template

Graduating Off First-Week Mode

Handoff to ongoing operational rhythm

Related