Skip to main content
This guide is for teams who have Assignments running in production and need to answer operational questions through the public API: Is this Job healthy? Did it complete? Where did it fail? Why is the failure rate rising? It covers patterns for checking Job status, diagnosing failures, building aggregated views, and integrating Duvo data into your existing observability tools. Where the API has gaps, each section notes the current limitation and the recommended workaround. All endpoints are on the base URL https://api.duvo.ai/v2; see Running Assignments via API for authentication and the error model.

Checking Whether a Job Is Healthy

Poll a single Job until it finishes

Jobs are asynchronous. Start a Job, capture the run_id from the response, then poll GET /runs/{run_id} until the status reaches a terminal state. Terminal statuses: completed, failed, stopped, interrupted Non-terminal statuses: pending, starting, running, waiting
#!/bin/bash
API_KEY="dv_your_api_key"
RUN_ID="550e8400-e29b-41d4-a716-446655440000"
BASE_URL="https://api.duvo.ai/v2"

while true; do
  RESPONSE=$(curl -s "$BASE_URL/runs/$RUN_ID" \
    -H "Authorization: Bearer $API_KEY")
  STATUS=$(echo "$RESPONSE" | jq -r '.run.status')

  echo "$(date -u +%H:%M:%S) — status: $STATUS"

  case "$STATUS" in
    completed|failed|stopped|interrupted)
      echo "Job finished: $STATUS"
      break
      ;;
  esac

  sleep 10
done
A waiting status means the Assignment paused for human input. Check pending_human_request on the same response to see what it is waiting on. Polling interval guidance:
Job typeSuggested interval
Short-running (< 2 min)5–10 seconds
Medium (2–15 min)30 seconds
Long-running (> 15 min)60–120 seconds
Polling more frequently than once every 5 seconds per key approaches the rate limit (300 requests per minute). If you are monitoring many Jobs in parallel, increase the interval or use a webhook instead.

Get notified when a Job finishes (webhooks)

Pass a webhook_url when starting a Job to receive a POST the moment the Job changes state. This avoids polling entirely and is the preferred pattern for production pipelines. Duvo sends run_completed, run_failed, and run_interrupted events (plus human_request events when the Assignment needs input), so filter by the event field for the state you care about.
curl -X POST "https://api.duvo.ai/v2/teams/$TEAM_ID/runs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "webhook_url": "https://your-service.example.com/hooks/duvo"
  }'
Your endpoint receives a payload that identifies the Job and its outcome — at minimum the event type, the run_id, and the run status. Use the run_id to call GET /runs/{run_id} for the full run record if you need more detail.
{
  "event": "run_completed",
  "run_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed"
}
For event-driven triggers that start Jobs automatically, see Event-Driven Triggers.

Diagnosing a Failed Job

Identify where a Job went wrong

When a Job shows status: failed, retrieve its message log to find the point of failure. GET /runs/{run_id}/messages returns every step the Assignment took in chronological order — model decisions, tool calls and their results, HITL events, and the final output.
curl -s "https://api.duvo.ai/v2/runs/$RUN_ID/messages?limit=100" \
  -H "Authorization: Bearer $API_KEY" | jq '.messages[-10:]'
Reading the last few messages usually shows what happened at the end of the Job. Look for:
  • A tool_call whose tool_result carries an error
  • A message where the Assignment describes why it is stopping
  • A human_request near the end with no follow-up response (the run status shows waiting via GET /runs/{run_id} when this happens)

Inspect the Assignment’s tool activity

The messages endpoint returns up to 100 messages per page; use limit and offset to page through longer Jobs (total in the response tells you how many there are). Each message has a type, role, timestamp, and — for tool steps — tool_call and tool_result objects.
curl -s "https://api.duvo.ai/v2/runs/$RUN_ID/messages?limit=100" \
  -H "Authorization: Bearer $API_KEY" \
  | jq '.messages[] | select(.tool_call != null) | {tool: .tool_call.name, result: .tool_result}'

Distinguish a retried success from a Job that never recovered

Duvo retries transient errors automatically. A Job that retried and then succeeded shows status: completed — you will not see intermediate retry attempts in the status field. To confirm whether a Job went through retries, check the message log for repeated identical tool_calls with error results followed eventually by a successful result. A Job that exhausted all retries shows status: failed with the final tool error in the message log. See Retries, Failures, and Skipped Steps for the full retry behavior reference.

Surface Jobs that need attention

The Jobs list supports an has_issues filter and an issue_severity filter (critical, medium, low), so you can pull the Jobs Duvo flagged for quality or reliability concerns without reading every log:
curl -s "https://api.duvo.ai/v2/teams/$TEAM_ID/runs?has_issues=true&issue_severity=critical&limit=50" \
  -H "Authorization: Bearer $API_KEY" | jq '.data[] | {run_id: .id, status: .status, agent_id: .agent_id}'
The full evaluation detail behind a flagged Job is shown in the Jobs List UI; the API exposes the flags for filtering, not the per-criterion scores.

Count Jobs by status for an Assignment

Use GET /teams/{teamId}/runs with the agent_id filter to retrieve Jobs for a specific Assignment, then count by status. The list response returns Jobs under data with a total count.
#!/bin/bash
TEAM_ID="your-team-id"
AGENT_ID="7c9e6679-7425-40de-944b-e07fc1f90ae7"
API_KEY="dv_your_api_key"
BASE_URL="https://api.duvo.ai/v2"

# Fetch the last 100 Jobs for this Assignment
curl -s "$BASE_URL/teams/$TEAM_ID/runs?agent_id=$AGENT_ID&limit=100&sort_by=created_at&sort_order=desc" \
  -H "Authorization: Bearer $API_KEY" \
  | jq '
    .data
    | group_by(.status)
    | map({ status: .[0].status, count: length })
  '
Sample output:
[
  { "status": "completed", "count": 87 },
  { "status": "failed", "count": 9 },
  { "status": "stopped", "count": 4 }
]
A failure rate above 10% for a well-established Assignment usually signals a Connection Login issue, a change in the upstream data format, or an external API outage.

Page through Jobs in a time window

The list endpoint uses offset-based pagination and accepts a since parameter (an ISO timestamp) to bound results to a time window. Page with limit=100 until you have read total rows.
#!/bin/bash
TEAM_ID="your-team-id"
API_KEY="dv_your_api_key"
BASE_URL="https://api.duvo.ai/v2"
SINCE="2026-05-17T00:00:00Z"   # last 24h
OFFSET=0

while true; do
  RESPONSE=$(curl -s "$BASE_URL/teams/$TEAM_ID/runs?since=$SINCE&limit=100&offset=$OFFSET&sort_by=created_at&sort_order=desc" \
    -H "Authorization: Bearer $API_KEY")

  echo "$RESPONSE" | jq '.data[] | {run_id: .id, status: .status, agent_id: .agent_id, started_at: .started_at, completed_at: .completed_at}'

  TOTAL=$(echo "$RESPONSE" | jq -r '.total')
  OFFSET=$((OFFSET + 100))
  [ "$OFFSET" -ge "$TOTAL" ] && break
done
Use this as the basis for a nightly summary report or a feed into a BI tool.

Measure duration and spot latency regressions

The run record exposes started_at and completed_at timestamps; compute duration from the two. A jump in average duration usually means an external service your Assignment depends on has slowed down.
curl -s "https://api.duvo.ai/v2/teams/$TEAM_ID/runs?agent_id=$AGENT_ID&status=completed&limit=100" \
  -H "Authorization: Bearer $API_KEY" \
  | jq '
    [ .data[]
      | select(.started_at != null and .completed_at != null)
      | ((.completed_at | fromdateiso8601) - (.started_at | fromdateiso8601)) ] as $secs
    | if ($secs | length) > 0 then
        { count: ($secs | length),
          avg_seconds: (($secs | add) / ($secs | length) | round),
          max_seconds: ($secs | max),
          min_seconds: ($secs | min) }
      else
        { count: 0, avg_seconds: 0, max_seconds: 0, min_seconds: 0 }
      end
  '

Integrating with Your Observability Stack

Send Job events to Datadog, Grafana, or Splunk

Duvo does not have a native push connector for third-party observability tools. The supported pattern is a pull-based pipeline: a scheduled process that pages through new Jobs and forwards them to your tool. Example: forward recent Jobs to the Datadog Logs API
import requests

DUVO_API_KEY = "dv_your_api_key"
DATADOG_API_KEY = "your_datadog_api_key"
TEAM_ID = "your-team-id"
LAST_SEEN_OFFSET = 0  # persist this between invocations

def fetch_new_runs(offset):
    resp = requests.get(
        f"https://api.duvo.ai/v2/teams/{TEAM_ID}/runs",
        headers={"Authorization": f"Bearer {DUVO_API_KEY}"},
        params={"limit": 100, "offset": offset, "sort_by": "created_at", "sort_order": "asc"},
    )
    resp.raise_for_status()
    return resp.json()

def send_to_datadog(runs):
    logs = [
        {
            "ddsource": "duvo",
            "ddtags": f"agent:{r['agent_id']},status:{r['status']}",
            "service": "duvo-assignments",
            "message": f"Job {r['id']} {r['status']}",
            "run_id": r["id"],
            "agent_id": r["agent_id"],
            "status": r["status"],
            "started_at": r.get("started_at"),
            "completed_at": r.get("completed_at"),
        }
        for r in runs
    ]
    requests.post(
        "https://http-intake.logs.datadoghq.com/api/v2/logs",
        headers={"DD-API-KEY": DATADOG_API_KEY, "Content-Type": "application/json"},
        json=logs,
    ).raise_for_status()

data = fetch_new_runs(LAST_SEEN_OFFSET)
runs = data["data"]
if runs:
    send_to_datadog(runs)
    LAST_SEEN_OFFSET += len(runs)
Run this as a cron job every 5 minutes to keep your Datadog dashboard current with a maximum 5-minute lag. For Splunk, replace the send_to_datadog call with an HTTP Event Collector (HEC) POST. For Grafana Loki, use the Loki push API with the same payload shape.

Use run_id as your correlation key

Duvo does not currently expose OpenTelemetry trace IDs in API responses. Use run_id as the stable identifier to correlate Duvo events with records in your SIEM or log tool.
# In your pipeline, always include run_id in the log record
log_record = {
    "run_id": run["id"],            # use this to join Duvo events with your SIEM
    "agent_id": run["agent_id"],
    "status": run["status"],
    "timestamp": run.get("completed_at") or run.get("started_at"),
}
For a full SIEM integration walkthrough — including exporting actor events (logins, role changes) and builder events (SOP edits, publishes) — see Audit Log and Activity Tracking.

Known Monitoring Gaps

These capabilities are not yet available via the public API. Each row includes the current workaround.
What you may wantCurrent stateWorkaround
Per-criterion evaluation scores via APIFlags are filterable; full scores are UI onlyFilter with has_issues / issue_severity; read full evaluation detail in the Jobs List
Real-time streaming of Job eventsNot available — polling or webhook onlyPoll GET /runs/{run_id} every 10–30 seconds; use the webhook_url events for state changes
Job duration as a single fieldNot a field on the run objectCompute from started_at and completed_at
Per-step tool timingNot exposed via the APIUse message timestamps to estimate the gap between steps
Per-step cost breakdownNot exposed via the APIUse Team Insights for aggregated cost trends
OpenTelemetry trace IDs in responsesNot currently exposedUse run_id as the stable correlation key across your systems
Actor and builder event export via APINot available — in-product audit log onlyContact security@duvo.ai for a data extract