https://api.duvo.ai/v2; see Running Assignments via API for authentication and the error model.
Checking Whether a Job Is Healthy
Poll a single Job until it finishes
Jobs are asynchronous. Start a Job, capture therun_id from the response, then poll GET /runs/{run_id} until the status reaches a terminal state.
Terminal statuses: completed, failed, stopped, interrupted
Non-terminal statuses: pending, starting, running, waiting
waiting status means the Assignment paused for human input. Check pending_human_request on the same response to see what it is waiting on.
Polling interval guidance:
| Job type | Suggested interval |
|---|---|
| Short-running (< 2 min) | 5–10 seconds |
| Medium (2–15 min) | 30 seconds |
| Long-running (> 15 min) | 60–120 seconds |
Get notified when a Job finishes (webhooks)
Pass awebhook_url when starting a Job to receive a POST the moment the Job changes state. This avoids polling entirely and is the preferred pattern for production pipelines. Duvo sends run_completed, run_failed, and run_interrupted events (plus human_request events when the Assignment needs input), so filter by the event field for the state you care about.
event type, the run_id, and the run status. Use the run_id to call GET /runs/{run_id} for the full run record if you need more detail.
Diagnosing a Failed Job
Identify where a Job went wrong
When a Job showsstatus: failed, retrieve its message log to find the point of failure. GET /runs/{run_id}/messages returns every step the Assignment took in chronological order — model decisions, tool calls and their results, HITL events, and the final output.
- A
tool_callwhosetool_resultcarries an error - A message where the Assignment describes why it is stopping
- A
human_requestnear the end with no follow-up response (the run status showswaitingviaGET /runs/{run_id}when this happens)
Inspect the Assignment’s tool activity
The messages endpoint returns up to 100 messages per page; uselimit and offset to page through longer Jobs (total in the response tells you how many there are). Each message has a type, role, timestamp, and — for tool steps — tool_call and tool_result objects.
Distinguish a retried success from a Job that never recovered
Duvo retries transient errors automatically. A Job that retried and then succeeded showsstatus: completed — you will not see intermediate retry attempts in the status field. To confirm whether a Job went through retries, check the message log for repeated identical tool_calls with error results followed eventually by a successful result. A Job that exhausted all retries shows status: failed with the final tool error in the message log. See Retries, Failures, and Skipped Steps for the full retry behavior reference.
Surface Jobs that need attention
The Jobs list supports anhas_issues filter and an issue_severity filter (critical, medium, low), so you can pull the Jobs Duvo flagged for quality or reliability concerns without reading every log:
Tracking Trends Across Multiple Jobs
Count Jobs by status for an Assignment
UseGET /teams/{teamId}/runs with the agent_id filter to retrieve Jobs for a specific Assignment, then count by status. The list response returns Jobs under data with a total count.
Page through Jobs in a time window
The list endpoint uses offset-based pagination and accepts asince parameter (an ISO timestamp) to bound results to a time window. Page with limit=100 until you have read total rows.
Measure duration and spot latency regressions
The run record exposesstarted_at and completed_at timestamps; compute duration from the two. A jump in average duration usually means an external service your Assignment depends on has slowed down.
Integrating with Your Observability Stack
Send Job events to Datadog, Grafana, or Splunk
Duvo does not have a native push connector for third-party observability tools. The supported pattern is a pull-based pipeline: a scheduled process that pages through new Jobs and forwards them to your tool. Example: forward recent Jobs to the Datadog Logs APIsend_to_datadog call with an HTTP Event Collector (HEC) POST. For Grafana Loki, use the Loki push API with the same payload shape.
Use run_id as your correlation key
Duvo does not currently expose OpenTelemetry trace IDs in API responses. Use run_id as the stable identifier to correlate Duvo events with records in your SIEM or log tool.
Known Monitoring Gaps
These capabilities are not yet available via the public API. Each row includes the current workaround.| What you may want | Current state | Workaround |
|---|---|---|
| Per-criterion evaluation scores via API | Flags are filterable; full scores are UI only | Filter with has_issues / issue_severity; read full evaluation detail in the Jobs List |
| Real-time streaming of Job events | Not available — polling or webhook only | Poll GET /runs/{run_id} every 10–30 seconds; use the webhook_url events for state changes |
| Job duration as a single field | Not a field on the run object | Compute from started_at and completed_at |
| Per-step tool timing | Not exposed via the API | Use message timestamps to estimate the gap between steps |
| Per-step cost breakdown | Not exposed via the API | Use Team Insights for aggregated cost trends |
| OpenTelemetry trace IDs in responses | Not currently exposed | Use run_id as the stable correlation key across your systems |
| Actor and builder event export via API | Not available — in-product audit log only | Contact security@duvo.ai for a data extract |
Related
- Running Assignments via API — Starting Jobs, uploading files, and HITL webhook details
- Retries, Failures, and Skipped Steps — How Duvo handles transient errors and permanent failures
- Audit Log and Activity Tracking — SIEM integration, actor and builder events, CSV/JSON export
- Event-Driven Triggers — Trigger Jobs automatically from file drops, status changes, or Slack messages
- Team Insights — Aggregated metrics, completion rates, and cost trends across your team