Skip to content
AI Native Builders

The Safety Playbook for Agentic AI in Production: What Nobody Tells You Until Something Breaks

Five safety layers for agentic AI in production, anchored on real incidents. Permission scoping, dry-run gates, deletion protection, blast radius limits, and audit design.

Strategy & Operating ModelintermediateOct 4, 20256 min read
Security shield layers surrounding a server icon representing defense-in-depth for production AI safety with concentric protective ringsFive safety layers between your agent and a production catastrophe: permission scoping, dry-run gates, deletion protection, blast radius estimation, and audit design.

In July 2025, an engineer at a software company used an agent-assisted coding tool to make what should have been a routine change. The agent — operating during an active code freeze — deleted the production database. Not a staging copy. Not a test instance. The live production database that customers depended on. The incident report described it as a "catastrophic failure."[1]

This was not a hypothetical scenario from a safety research paper. It was a real incident, documented in the AI Incident Database[2], covered by Fortune, and discussed across the engineering community for weeks. And it was not the only one. By late 2025, the pattern was clear: agentic systems were being deployed into production environments with permissions that far exceeded what their tasks required, without meaningful guardrails between intent and execution.

The safety playbook for agentic AI in production is not about slowing down adoption. It is about building the five layers of protection that prevent a helpful automation from becoming an expensive disaster. Every layer addresses a specific failure mode observed in real production incidents.

64%
Of large companies lost $1M+ to AI failures (EY Survey). Actual figures vary widely by industry and company size.
40%+
Agentic AI projects expected cancelled by 2027, according to Gartner research. Results vary by organization maturity.
5 layers
Safety architecture for production agents
3 tiers
Authorization levels: read, write-confirm, write-auto

Real Incidents, Real Lessons: Why Safety Layers for Agentic AI Matter

Every safety recommendation here is anchored on something that actually went wrong.

The Replit database deletion incident[1] is the most widely reported, but it is not an isolated case. According to the AI Incident Database (Incident 1152)[2], the pattern repeats across industries and tooling:

An agent with broad Terraform permissions applies a configuration change that includes a destroy operation on a production database. The infrastructure-as-code tool executes faithfully — it does exactly what the agent asked. The problem was never the execution; it was the permission scope that allowed a routine task to cascade into a destructive operation.

An automated deployment agent rolls back a failed canary release but does not realize the rollback target is three versions old, reintroducing a security vulnerability that was patched two releases ago. The blast radius of the rollback was never estimated.[5]

A data pipeline agent, tasked with cleaning up stale records, interprets "older than 90 days" differently than intended and deletes 18 months of audit logs that were under a legal hold. No dry-run step existed to show what would be deleted before it was deleted.

These are not failures of intelligence. They are failures of architecture. The agents performed their tasks competently within the permissions they were given. The problem was that nobody built the safety layers between the agent's capability and the consequences of its actions.

The Five Safety Layers for Production Agentic AI

Permission scoping, dry-run gates, deletion protection, blast radius estimation, and audit log design.

Five-Layer Safety Architecture for Agentic AI
Each layer provides defense-in-depth: an agent must pass through all five before executing a high-impact action.

Layer 1: Permission Scoping — Least Privilege Per Agent

The single most effective safety measure. An agent cannot break what it cannot touch.

Permission scoping is the practice of giving each agent the minimum set of credentials required for its specific task — and nothing more. This sounds obvious, but in practice most production agents operate with overly broad permissions because it is easier to configure one service account with admin access than to create scoped credentials for each task.[4]

The Replit incident happened in part because the agent had write access to production infrastructure during a code freeze.[1] A properly scoped agent performing code review would have had read-only access to the codebase and no infrastructure permissions at all.

Scoped permissions work at three levels. First, tool-level scoping: which tools can the agent invoke? A documentation agent should not have access to deployment tools. Second, resource-level scoping: which resources can each tool affect? A database query agent should have read access to specific tables, not all tables. Third, action-level scoping: which operations are permitted? Read and list operations carry near-zero risk. Create operations carry moderate risk. Update and delete operations carry high risk and should require additional authorization.[3]

TierPermission LevelActions AllowedHuman ApprovalExamples
Tier 1Read-OnlyRead, list, search, analyzeNone requiredLog analysis, code review, report generation
Tier 2Write-with-ConfirmationCreate, update (with preview)Required before executionConfig changes, PR creation, ticket updates
Tier 3Write-AutonomousCreate, update (within guardrails)Post-hoc audit onlyFormatting, dependency updates, test generation
NEVERDestructiveDelete, drop, destroy, force-pushAlways blocked for agentsDatabase drops, infrastructure teardown, data purge

Layer 2: Dry-Run Gates — Show Before You Do

Every write operation should have a preview mode. No exceptions.

A dry-run gate is a mandatory preview step between the agent's proposed action and its execution. The agent generates a plan showing exactly what will change — files modified, records affected, infrastructure resources created or destroyed — and a human reviews the plan before approving execution.

Terraform got this right with terraform plan. Before any apply runs, you see a detailed diff of what will be created, modified, and destroyed. The tragedy of Terraform-related agent incidents is that the plan step existed but was either skipped or auto-approved.[2]

For agentic systems, the dry-run gate must be mandatory and non-bypassable for any operation classified as Tier 2 or above. The gate should show:

  • What will change (specific files, records, resources)
  • How many entities are affected ("3 files" vs. "4,200 database rows")
  • Whether the change is reversible (and if so, how)
  • What the rollback path looks like if something goes wrong
dry-run-gate.ts
interface DryRunResult {
  action: string;
  tier: 'read' | 'write-confirm' | 'write-auto';
  affectedResources: {
    type: string;
    count: number;
    identifiers: string[];
  }[];
  reversible: boolean;
  rollbackPlan?: string;
  estimatedBlastRadius: 'none' | 'low' | 'medium' | 'high' | 'critical';
  requiresApproval: boolean;
}

async function executeSafely(
  action: AgentAction,
  context: ExecutionContext
): Promise<ExecutionResult> {
  // Step 1: Generate dry-run preview
  const preview = await generateDryRun(action, context);

  // Step 2: Check if approval is required
  if (preview.requiresApproval) {
    const approval = await requestHumanApproval(preview);
    if (!approval.granted) {
      return { status: 'denied', reason: approval.reason };
    }
  }

  // Step 3: Check blast radius against threshold
  if (preview.estimatedBlastRadius === 'critical') {
    return { status: 'escalated', reason: 'Blast radius exceeds threshold' };
  }

  // Step 4: Execute with audit logging
  return await executeWithAudit(action, preview, context);
}

Layer 3: Deletion Protection at the Infrastructure Level

Even if the agent has permission and the dry-run is approved, some things should be physically impossible to delete.

Deletion protection is the last line of defense — a hard infrastructure-level constraint that prevents destruction of critical resources regardless of who or what issues the command. This is not about permissions or approvals. It is about making certain catastrophic actions physically impossible without a separate, deliberate process.

AWS offers deletion protection on RDS instances, DynamoDB tables, and CloudFormation stacks. GCP has similar features for Cloud SQL and GKE clusters. Terraform has prevent_destroy lifecycle rules. Every major cloud provider recognized that some resources need protection beyond access controls — and every production environment running agentic systems should enable these protections.[6]

The data pipeline agent that deleted 18 months of audit logs could not have done so if the storage bucket had object lock enabled with a retention policy. The Terraform agent could not have destroyed the database if prevent_destroy = true was set in the resource configuration. These are five-minute configuration changes that prevent million-dollar incidents.

Infrastructure Deletion Protection Checklist

  • Enable deletion protection on all production databases (RDS, Cloud SQL, etc.)

  • Set prevent_destroy lifecycle on critical Terraform resources

  • Enable object lock on audit log storage buckets

  • Protect production Kubernetes namespaces with admission webhooks

  • Enable branch protection on main/production git branches

  • Configure MFA delete on S3 buckets containing backups

  • Set termination protection on production EC2 instances and ECS services

  • Enable soft-delete with retention on all data stores used by agents

Layer 4: Blast Radius Estimation — How Bad Could This Get?

Before any write operation, estimate what happens if it goes wrong.

Blast radius estimation is the practice of calculating the worst-case impact of an action before executing it. It answers the question: if this operation fails or behaves unexpectedly, what is the maximum damage it could cause?[5]

The estimation considers three dimensions:

Scope — how many resources, records, or systems are affected? An operation touching 5 files has a smaller blast radius than one touching 5,000 database rows. An operation affecting one service has a smaller blast radius than one affecting a shared database used by twelve services. These thresholds are illustrative starting points — calibrate based on your workload.

Reversibility — can the action be undone? A file modification backed by version control is fully reversible. A database DELETE without a prior backup is effectively irreversible in most cases. The blast radius of irreversible operations is categorically higher.

Downstream dependencies — what depends on the resources being modified? Changing a shared API contract affects every downstream consumer. Modifying a configuration file affects every service that reads it. The blast radius includes not just the direct target but everything that depends on it.

None/Low
Read-only operations or changes to isolated, versioned files
Medium
Write to scoped resources with rollback path available
High
Write to shared resources or operations affecting 100+ entities
Critical
Irreversible operations on production data or shared infrastructure

Layer 5: Audit Log Design — What Happened, and Why?

When something goes wrong (and it will), the audit log is the only source of truth.

Every agent action — including actions that were denied or escalated — must be logged with enough context to reconstruct what happened during an incident review. The audit log is not just a compliance requirement. It is the forensic tool that turns a mysterious production failure into a diagnosable, preventable event.[3]

A good agent audit log captures seven fields per action: the agent identity, the action requested, the timestamp, the authorization decision (approved/denied/escalated), the dry-run preview that was shown, the execution result, and the rollback status if applicable.

The critical design decision is where the audit log lives. It must be in a separate system that the agent cannot modify. If the agent writes its own audit logs, a malfunctioning agent can corrupt or delete the evidence of its malfunction. Use an append-only log in a separate account or service — one that the agent has zero write access to.

audit-logger.ts
interface AgentAuditEntry {
  id: string;                    // Unique event ID
  timestamp: string;             // ISO 8601
  agentId: string;               // Which agent
  sessionId: string;             // Conversation/task session
  action: {
    type: string;                // e.g., 'database.query', 'file.write'
    target: string;              // What resource
    parameters: Record<string, unknown>;
  };
  authorization: {
    tier: 'read' | 'write-confirm' | 'write-auto';
    decision: 'approved' | 'denied' | 'escalated';
    approvedBy?: string;         // Human approver if applicable
    denialReason?: string;
  };
  dryRun?: {
    affectedCount: number;
    blastRadius: string;
    preview: string;             // Summary of planned changes
  };
  execution: {
    status: 'success' | 'failure' | 'partial' | 'not-executed';
    duration: number;            // Milliseconds
    error?: string;
  };
  rollback?: {
    available: boolean;
    executed: boolean;
    result?: string;
  };
}

Putting It All Together: The Safety Architecture in Practice

How the five layers work as an integrated system.

  1. 1

    Define the tiered authorization model for your agents

    Catalog every tool and resource your agents can access. Classify each into read-only, write-with-confirmation, or write-autonomous. Mark destructive operations (delete, drop, destroy) as permanently blocked for agents.

  2. 2

    Implement dry-run gates for all write operations

    Build a mandatory preview step that shows what will change before any write operation executes. For Tier 2 operations, require explicit human approval. For Tier 3 operations, log the preview but allow auto-execution within guardrails.

  3. 3

    Enable deletion protection on all critical production resources

    This is a one-time infrastructure hardening step. Enable every available deletion protection mechanism on databases, storage buckets, infrastructure stacks, and git branches. This layer is independent of the agent — it protects against any deletion source.

  4. 4

    Build blast radius estimation into the dry-run output

    Extend the dry-run preview to include a blast radius score. Count affected entities, check reversibility, and map downstream dependencies. Operations that score 'critical' are automatically escalated regardless of tier classification.

  5. 5

    Deploy the audit log in a separate, agent-inaccessible system

    Set up an append-only audit log in a separate account or service. Log every action (including denials and escalations) with the seven required fields. Set retention policies that exceed your compliance requirements.

Without Safety Layers
  • Agent has broad service account with admin permissions

  • Write operations execute immediately without preview

  • Production databases can be dropped by any authenticated caller

  • No estimation of downstream impact before execution

  • Audit logs stored alongside application data — agent-accessible

  • Incident response starts from zero: what happened?

With Safety Layers
  • Agent has scoped credentials matching its specific task

  • All writes go through dry-run preview with approval for Tier 2+

  • Deletion protection makes database drops physically impossible

  • Blast radius scored and critical operations auto-escalated

  • Immutable audit logs in separate system with full action context

  • Incident response starts from the audit trail: here is exactly what happened

Does this slow down agent execution too much?

Tier 1 (read-only) operations pass through all layers in milliseconds — there is no approval gate, just logging. Tier 3 (write-autonomous) operations add a dry-run step and blast radius check, adding roughly 1-3 seconds. Only Tier 2 (write-with-confirmation) operations require human approval, and those are the operations where a few minutes of delay prevents potential hours of incident response.

How do I handle agents that need to chain multiple operations together?

Treat chains as a single unit of work with a combined blast radius. If an agent needs to modify a configuration file, restart a service, and verify health — the dry-run should show the entire chain, the blast radius should reflect the combined impact, and the approval should cover the full sequence. Do not allow individual approvals for steps in a chain, as this creates approval fatigue.

What about agents running in CI/CD pipelines where no human is available?

CI/CD agents should be classified as Tier 3 (write-autonomous within guardrails) with strict scope limits. They can create PRs, run tests, and update non-production resources autonomously. But production deployments and infrastructure changes should still require human approval — configured as a pipeline gate, not an interactive prompt. If no human approves within the timeout window, the pipeline pauses rather than proceeding.

Is five layers overkill for internal tools that only affect non-critical data?

You can simplify for low-risk environments, but never drop below three layers: permission scoping (always), audit logging (always), and at least one of the three middle layers. The Tier 1/2/3 classification handles this naturally — most low-risk operations are Tier 1 or 3 and never hit the approval gate. The layers add minimal overhead to safe operations while providing maximum protection against dangerous ones.

We deployed agents into our deployment pipeline without the dry-run gate because we trusted the permission scoping alone. Three weeks later, an agent auto-approved a rollback to a version with a known CVE. Now every write operation goes through preview first. The extra 2 seconds of latency is nothing compared to the 14-hour incident response we avoided.

Marcus Torres, Staff SRE, Series C Infrastructure Company

Non-Negotiable Safety Rules for Production Agents

No agent gets standing destructive permissions in production

Delete, drop, destroy, and force-push operations are permanently blocked for agent service accounts. If a legitimate destructive operation is needed, a human performs it manually with their own credentials.

Every write operation must have a dry-run preview

No write goes directly to execution. The preview shows what will change, how many entities are affected, and whether the operation is reversible. Skipping the preview is a safety violation, not a performance optimization.

Audit logs live in a separate system the agent cannot access

If the agent can write to its own audit trail, a failure can erase the evidence needed to diagnose it. Append-only storage in a separate account with its own access controls.

Blast radius 'critical' automatically escalates — no exceptions

When the blast radius estimation returns critical, the operation is escalated to human review regardless of the agent's authorization tier. This catches the edge cases that permission scoping alone cannot prevent.

Build the cage before you need it, not after the first escape

Deploy all five safety layers before granting production access, not incrementally after incidents. Every real-world production agent incident could have been prevented by safety layers that were known but not yet implemented.

Key terms in this piece
agentic AI safetyproduction safety playbookAI permission scopingdry-run gatesblast radius estimationagent audit logsdeletion protectiontiered authorization
Sources
  1. [1]Fortune — Replit AI Coding Tool Wiped Production Database(fortune.com)
  2. [2]AI Incident Database — Incident 1152(incidentdatabase.ai)
  3. [3]OWASP GenAI — Top 10 Risks and Mitigations for Agentic AI Security(genai.owasp.org)
  4. [4]Cobbai — AI Agent Tool Security Support(cobbai.com)
  5. [5]LoginRadius — Limiting Data Exposure and Blast Radius for AI Agents(loginradius.com)
  6. [6]Google ADK — Safety Documentation(google.github.io)
Share this article