Guardrails for High-Risk Automations

This guide helps you identify when an agent carries meaningful risk — financial, regulatory, reputational, or irreversibility — and apply the right guardrails before letting it run autonomously.

Risk Classification

Before configuring guardrails, score your agent against the five risk dimensions below. The highest single dimension determines the overall risk tier.

Risk dimensions

Dimension	Low	Medium	High	Critical
Dollar impact	Under $500 per action	$500–$ 5,000	$5,000–$ 50,000	Over $50,000 or recurring charge
Irreversibility	Fully reversible (e.g., tag, draft, status)	Recoverable with effort (e.g., archive, field overwrite)	Hard to undo (e.g., sent email, posted record)	Cannot be undone (e.g., payment settled, contract signed)
External visibility	Internal only	Shared with internal teams	Customer-facing	Public or regulatory submission
Regulatory exposure	None	Internal policy only	PII / GDPR / HIPAA / SOC	Financial (SOX, PCI-DSS), legal, or contractual
System criticality	Experimental or sandbox	Non-critical production	Important but not mission-critical	Mission-critical (billing, payroll, ERP)

Tier definitions and required guardrails

Tier	Overall score	Required guardrails
Low	All dimensions score Low	None required — routine reversible operations
Medium	Any dimension scores Medium	HITL gate on the risky branch; anomaly alerts enabled
High	Any dimension scores High	HITL gate; hard caps on action volume and spend; allow/deny lists
Critical	Any dimension scores Critical	All High guardrails; shadow mode before first autonomous run; two-person AOP review before going live

Guardrail Patterns

HITL approval on high-risk branches

Add a Human-in-the-Loop approval step before any action that scores Medium or above. The approval request should give the reviewer enough context to decide without opening another system.

Before submitting the payment, request approval.
Title: "Payment — [vendor] — $[amount] — [invoice ID]"
Description: vendor name, bank details, invoice due date, budget line.
If denied, stop and log the reason. Do not retry without a new instruction.

See Designing Human-in-the-Loop Workflows for choosing the right approval shape and setting escalation chains.

Hard caps

Limit the blast radius of a runaway run by setting explicit caps in your AOP.

Process at most 50 records per run.
Send at most 20 external emails per hour.
Do not submit more than one payment per invoice ID per day.
If either cap is reached, stop and log: "Cap reached — [cap name] — [count] actions taken."

Use the Runs list to verify cap behavior during testing before removing gates.

Allow and deny lists

Restrict write operations to approved targets. Any action outside the list requires a HITL approval step. Email domain allowlist:

Only send outbound emails to addresses ending in @acme.com, @partner.com,
or addresses listed in the approved-recipients.csv file.
If the recipient is outside this list, request approval before sending.

Account allowlist for financial actions:

Only process payments to vendors in the approved-vendors list in Files.
If the vendor is not in the list, flag for procurement review and stop.

Idempotency checks

Prevent duplicate actions when a run retries or runs more than once.

Before creating the record, check whether a record with this [reference ID]
already exists for today's date. If it does, skip creation and log:
"Skipped — record already exists — [reference ID]".

Idempotency checks are especially important for financial writes, external API calls, and email sends where duplicates cause downstream problems.

Shadow mode before enabling writes

For new high-risk agents, run the agent in read-only mode first. Observe what it would have done, then enable write access once the outputs look correct.

SHADOW MODE — DO NOT WRITE:
Read the data and log what actions you would have taken, but do not
submit, send, or update anything. Summarize in the Run output:
"Would have [action] for [count] records."

After reviewing a week of shadow-mode outputs with no surprises, remove the SHADOW MODE instruction and enable the write connections.

Two-person AOP review for Critical agents

For Critical-tier agents, require a second team member to review the AOP before any change goes live. In practice this means:

Builder drafts the revision

The Builder drafts and saves the updated AOP revision.

Reviewer inspects and tests

A Manager or Administrator opens the agent, reviews the diff, and runs a test run.

Reviewer promotes the change

Only after the reviewer is satisfied does the Manager update the live revision.

Team roles and permissions defines who can create revisions versus who can promote changes on agents owned by others.

Time-window restrictions

Limit autonomous write operations to business hours to ensure a human can respond quickly to anomalies.

Only perform write actions (updates, sends, submissions) between 08:00
and 18:00 Monday–Friday in [timezone]. Outside these hours, queue the
action and log it. On the next business day, request approval before
executing queued actions.

Detection and Response

Anomaly signals to monitor

Review run outputs regularly for these patterns, especially in the first two weeks after a new agent goes live:

Signal	What it may indicate
Run duration much longer than usual	Stuck loop, external system slowdown, runaway processing
Action count far above the average	Cap not applied; duplicate records being created
Cost spike without corresponding output	Repeated retries, large context being sent to the model
Output drift — format or content changes without AOP change	Upstream data schema change; model behavior change
HITL denial rate rising above 20%	AOP is generating incorrect outputs; refine before removing gates

The Runs list and Team Insights show duration, action counts, and run history. Use these to spot unusual patterns.

Kill switch: pausing an agent

If something is going wrong, you can stop an agent immediately:

Revoking the connection is the fastest way to stop a write-heavy agent when you are not sure whether an active Run has already done damage.

Disable a schedule:

Open the agent

Open the agent.

Go to the Schedule settings

Go to Agent Settings > Schedule.

Toggle the schedule off

Toggle the schedule off. The agent stops running automatically. Runs already running continue to completion.

Disable a trigger:

Open the agent

Open the agent.

Go to the Triggers settings

Go to Agent Settings > Triggers.

Disable the trigger

Disable the trigger. No new runs start from that trigger.

Revoke a connection:

Open the Connections page

Go to the Connections page.

Find the connection

Find the connection used by the agent.

Disconnect it

Disconnect it. The agent will fail immediately if it tries to use this connection, which prevents further writes to that system.

Rollback via Agent Versions

If an AOP change caused the problem, revert to the previous working version:

Open the agent

Open the agent.

Open the revision selector

Click the revision selector in the builder toolbar.

Select the last known-good revision

Select the last known-good revision.

Resume running the reverted AOP

The agent now runs the reverted AOP on the next trigger or manual start.

See Agent Versions for details on how versioning works.

Communication template

When you pause a high-risk agent, notify stakeholders immediately. Waiting for a full post-mortem before communicating increases risk.

Subject: [Agent name] paused — [date]

We have paused the [agent name] agent as a precaution.

What happened: [one sentence — e.g., "The agent sent 12 duplicate
emails to the procurement team before the cap triggered."]

Current status: The agent is paused. No further automated actions
will occur until we have reviewed and restarted it.

Impact: [e.g., "12 emails were sent that should not have been. We are
contacting the recipients to clarify."]

Next step: We are reviewing the AOP and expect to have an update
by [date/time]. We will notify you before restarting.

Worked Example: Purchase Order Processing

The Purchase Order Processing tutorial covers the basic flow. Here is how to apply the full risk classification and guardrail set before enabling it in production.

Risk classification

Dimension	Score	Reason
Dollar impact	Critical	Individual POs can exceed $50,000
Irreversibility	High	Approved POs trigger downstream procurement actions
External visibility	Medium	Confirmations go to vendors
Regulatory exposure	Medium	Internal procurement policy; financial audit trail required
System criticality	High	ERP is mission-critical

Overall tier: Critical (highest single dimension score)

Guardrails applied

HITL gates:

For POs between $1,000 and $5,000: request approval from department manager.
Title: "Approve PO — [vendor] — $[amount] — [PO number]"
Description: vendor, department, line items, budget line.
For POs over $5,000: request approval from finance director.
Include budget impact analysis in the description.
If not approved within 48 hours, escalate to the next level.
If denied twice, halt and mark the PO as Needs Manual Review.

Hard cap:

Process at most 30 POs per run.
Do not submit more than one approval request per PO number per calendar day.
If the cap is reached, stop and log: "Run cap reached — [count] POs processed."

Vendor allowlist:

Only process POs for vendors in the approved-vendors.csv file in Files.
If the vendor is not found, flag for procurement review and skip.
Do not send any communications to unapproved vendors.

Idempotency check:

Before sending an approval request, check whether an approval request
for this PO number is already pending. If yes, skip and log:
"Skipped — approval already pending — [PO number]".

Shadow mode (first two weeks): Enable the agent with all connections in read-only mode. Add the SHADOW MODE instruction to the AOP. Review the weekly run outputs with the procurement lead before enabling write access. Time window:

Only submit approvals and send notifications between 07:00 and 19:00 UTC,
Monday–Friday. Queue any actions triggered outside these hours and process
them at 07:00 on the next business day after requesting approval.

Two-person review: A Manager reviewed and approved the AOP before the agent was promoted to live.

Designing Human-in-the-Loop Workflows

Risk tiers, approval shapes, escalation chains, and ramping toward autonomy

Agent Versions

How to review and revert to a previous AOP version

Roles and Permissions

Who can create, edit, and promote agent revisions

Security & Privacy

Platform-level security controls, SOC 2, and data handling

Purchase Order Processing

Full tutorial showing threshold-based approval routing

Expense Report Approval

Full tutorial showing HITL approval with dollar thresholds

​Risk Classification

​Risk dimensions

​Tier definitions and required guardrails

​Guardrail Patterns

​HITL approval on high-risk branches

​Hard caps

​Allow and deny lists

​Idempotency checks

​Shadow mode before enabling writes

​Two-person AOP review for Critical agents

​Time-window restrictions

​Detection and Response

​Anomaly signals to monitor

​Kill switch: pausing an agent

​Rollback via Agent Versions

​Communication template

​Worked Example: Purchase Order Processing

​Risk classification

​Guardrails applied

​Related

Designing Human-in-the-Loop Workflows

Agent Versions

Roles and Permissions

Security & Privacy

Purchase Order Processing

Expense Report Approval

Risk Classification

Risk dimensions

Tier definitions and required guardrails

Guardrail Patterns

HITL approval on high-risk branches

Hard caps

Allow and deny lists

Idempotency checks

Shadow mode before enabling writes

Two-person AOP review for Critical agents

Time-window restrictions

Detection and Response

Anomaly signals to monitor

Kill switch: pausing an agent

Rollback via Agent Versions

Communication template

Worked Example: Purchase Order Processing

Risk classification

Guardrails applied

Related