Business Rules as Code: Stop AI From Making Things Up

Here is a pattern that plays out in every organization deploying AI agents: a customer-facing bot quotes a discount the company has never offered. An internal copilot approves a vendor spend that violates procurement policy. A support agent promises a refund window that expired two quarters ago.

The AI wasn't hallucinating in the traditional sense. It wasn't confusing one fact with another. It was filling a vacuum. The model had no structured representation of the business rule, so it did what language models do — it generated a plausible-sounding answer based on pattern matching across its training data.

The fix is not better prompts. Prompts are natural language, and natural language is ambiguous by design. The fix is encoding your business rules in machine-readable formats that sit alongside the model at inference time, constraining its outputs against verified organizational logic.

The Implicit Knowledge Gap

Why organizations underestimate what AI needs to know explicitly

Every company runs on thousands of rules that nobody has written down in a structured format. Pricing tiers, approval thresholds, regional compliance variations, contractual obligations, escalation procedures, SLA definitions — these live in employee heads, scattered Confluence pages, PDF policy manuals, and tribal knowledge passed down during onboarding.

Humans navigate this fine. A senior account manager knows that enterprise clients in the EU get net-60 terms but APAC clients get net-30. A compliance officer knows that transactions above $10,000 need a second signature. This knowledge is implicit, contextual, and usually correct because humans have years of reinforcement learning from organizational feedback.

AI systems have none of that. They get a prompt, maybe some retrieved context from a vector store, and they generate. If the vector store contains a two-year-old pricing PDF, the model will quote two-year-old prices with complete confidence.

~73%

Of enterprise AI errors estimated to trace to missing or outdated business rules, per Forrester 2025 research. Actual rates vary widely by industry and AI use case.

2,000+

Approximate implicit business rules in a typical mid-size company's operations — the exact count is illustrative; most organizations significantly underestimate this number.

~41%

Of organizations reported to have no machine-readable policy repository, per Gartner 2025 survey. Adoption rates are shifting rapidly as AI governance matures.

The gap between what humans know implicitly and what AI needs explicitly is the single largest source of business-logic errors in deployed AI systems. RAG helps with factual recall, but it does not help with conditional logic. Knowing that "the refund policy is 30 days" is not the same as knowing that "the refund policy is 30 days for consumer accounts, 90 days for enterprise with an active support contract, and 0 days for custom-built integrations after acceptance testing completes."

That kind of branching, conditional, exception-laden logic requires structure. Not paragraphs — structure.

Four Patterns for Encoding Business Rules

From simple lookup tables to full policy engines

Not every business rule needs the same level of formalization. A simple lookup table works for pricing tiers. A full policy engine is warranted for multi-jurisdiction compliance checks. Here are the four patterns, ordered by complexity and when each one earns its overhead.

Pattern	Best For	Format	Runtime Cost	Maintenance Burden
Decision Tables	Pricing, eligibility, tier assignment	JSON / YAML / CSV	Microseconds	Low — business teams can edit
Rule Engines	Multi-step logic, chained conditions	Drools / DMN / GoRules	Milliseconds	Medium — needs rule authoring skills
Policy-as-Code	Access control, compliance, guardrails	OPA Rego / Cedar / Cerbos	Milliseconds	Medium — needs policy engineering
Constraint Solvers	Scheduling, resource allocation, optimization	OR-Tools / Z3 / MiniZinc	Seconds	High — needs mathematical modeling

Pattern 1: Decision Tables

The simplest encoding that solves most business rule problems

Decision tables map input conditions to output actions. They are the oldest formalization of business rules — insurance companies have used them since the 1960s — and they remain the most practical starting point for AI-accessible business logic^[3].

The idea is straightforward: define your conditions as columns, define your outputs as columns, and each row is a rule. At inference time, the AI system (or a middleware layer) evaluates the input against the table and returns the matching output. No ambiguity, no generation, no guessing.

rules/pricing-tiers.json

{
  "table": "pricing_tiers",
  "version": "2026-03-01",
  "conditions": ["account_type", "annual_spend", "region"],
  "outputs": ["discount_pct", "payment_terms", "support_tier"],
  "rules": [
    {
      "when": {
        "account_type": "enterprise",
        "annual_spend": ">= 500000",
        "region": "NA"
      },
      "then": {
        "discount_pct": 25,
        "payment_terms": "net-60",
        "support_tier": "dedicated"
      }
    },
    {
      "when": {
        "account_type": "enterprise",
        "annual_spend": "< 500000",
        "region": "NA"
      },
      "then": {
        "discount_pct": 15,
        "payment_terms": "net-45",
        "support_tier": "priority"
      }
    },
    {
      "when": {
        "account_type": "startup",
        "annual_spend": "*",
        "region": "*"
      },
      "then": {
        "discount_pct": 10,
        "payment_terms": "net-30",
        "support_tier": "standard"
      }
    }
  ]
}

The key advantage of decision tables for AI systems: they can be injected directly into an agent's context window or called as a tool. An LLM does not need to understand the table — it needs to look up the correct row based on the customer's attributes and return the result. This is a retrieval task, not a generation task, and retrieval is where LLMs are reliable.

Pattern 2: Rule Engines for Chained Logic

When a single lookup table is not enough

Decision tables break down when rules depend on each other. If the discount depends on the payment terms, which depend on the credit score, which depends on the account age — you need forward chaining or backward chaining through a dependency graph. That is what rule engines do.

The Decision Model and Notation (DMN) standard, maintained by the Object Management Group, provides a vendor-neutral way to express these chained decisions^[3]. DMN uses a graphical notation for decision dependencies and FEEL (Friendly Enough Expression Language) for the actual rule logic. Think of it as SQL for business rules — constrained enough to be deterministic, expressive enough for real-world conditions.

rules/refund-eligibility.feel

// DMN Decision Table in FEEL
// Determines refund eligibility based on account and purchase context

if account.type = "consumer" then
  if days_since_purchase <= 30 then "full_refund"
  else if days_since_purchase <= 90 then "store_credit"
  else "no_refund"
else if account.type = "enterprise" then
  if has_active_support_contract then
    if days_since_purchase <= 90 then "full_refund"
    else "prorated_refund"
  else
    if days_since_purchase <= 30 then "full_refund"
    else "no_refund"
else if account.type = "custom_integration" then
  if acceptance_testing_complete then "no_refund"
  else "full_refund"
else "escalate_to_manager"

When an AI agent needs to answer "Can this customer get a refund?", it does not generate an answer. It calls the rule engine with the customer's attributes, gets a deterministic result, and then uses the LLM only to communicate that result in natural language. The reasoning is offloaded to verified logic; the model handles tone and phrasing.

This is the hybrid pattern that works: rules for reasoning, LLMs for communication. According to IBM's research on combining rule-based engines with LLMs, this separation yields both higher accuracy and better explainability^[5] — the system can cite exactly which rule produced the decision. Specific accuracy gains depend on the complexity of the rule set and the LLM used.

AI generates business logic

Model invents discount percentages from training data
Different answers for the same customer depending on phrasing
No audit trail for why a specific decision was made
Updating a policy means rewriting prompts and hoping
Compliance team cannot review what the AI 'knows'

AI calls verified rules

Discount pulled from versioned decision table
Deterministic output for identical inputs every time
Full trace: input → matched rule → output → version
Updating policy means changing one row in a table
Compliance team reviews rules directly in the table format

Pattern 3: Policy-as-Code for Guardrails

Enforcing compliance constraints at the execution boundary

Policy-as-code takes a different approach from decision tables and rule engines. Instead of encoding what the AI should do, it encodes what the AI must not do. It is a constraint system — a set of allow/deny rules evaluated against every action the AI attempts to take.

Open Policy Agent (OPA) and its Rego language have become the de facto standard for this pattern^[2]. Originally built for Kubernetes admission control and API authorization, OPA's architecture maps perfectly to AI guardrails: evaluate a structured request against a policy bundle, return allow or deny, and log the decision for audit.

policies/agent-guardrails.rego

package agent.guardrails

import rego.v1

# Deny discount above maximum for account tier
deny contains msg if {
    input.action == "apply_discount"
    input.discount_pct > max_discount[input.account_type]
    msg := sprintf("Discount %d%% exceeds max %d%% for %s accounts",
        [input.discount_pct, max_discount[input.account_type], input.account_type])
}

max_discount := {
    "enterprise": 30,
    "startup": 15,
    "consumer": 10,
}

# Deny transactions above threshold without approval
deny contains msg if {
    input.action == "approve_transaction"
    input.amount > 10000
    not input.has_second_signature
    msg := "Transactions above $10,000 require dual approval"
}

# Deny PII disclosure in response
deny contains msg if {
    input.action == "send_response"
    contains_pii(input.response_text)
    msg := "Response contains PII — must be redacted before sending"
}

AWS Bedrock's Automated Reasoning checks demonstrate this pattern at scale — translating natural language policies into formal logic that validates AI outputs with high verification accuracy for well-defined policy domains (AWS cites up to 99% in specific benchmark settings)^[6]. The approach works because formal logic is complete in a way that prompt engineering cannot be. You can mathematically prove that a policy covers all edge cases. You cannot prove that about a system prompt.

The practical architecture looks like this: the AI agent generates a proposed action, the policy engine evaluates it against the current rule set, and only approved actions reach the user. Denied actions get routed to a fallback — a human escalation, a safe default response, or a re-generation with tighter constraints.

Business Rules Enforcement Pipeline

User Request

Decision Tables & Rule Engine

OPA / Rego Policy Bundle

LLM Agent (generates action)

Policy Check

Verified Response

Fallback (escalate/retry)

End User

AI-generated actions pass through a policy engine before reaching users. Denied actions route to fallback handlers.

Extracting Rules From Your Organization

A practical workflow for turning tribal knowledge into machine-readable rules

The hardest part of business-rules-as-code is not the encoding format. It is the extraction. Most organizations do not have a clean inventory of their own rules. They have policy documents, employee handbooks, Slack threads where exceptions were negotiated, and institutional memory held by people who might leave next quarter.

Here is a four-step extraction workflow that works across industries.

1
Audit existing policy documents
Collect every document that contains conditional logic: pricing sheets, compliance manuals, SLA agreements, approval matrices, HR policies, vendor contracts. Do not try to formalize yet — just gather. Most organizations discover 3-5x more rule-bearing documents than they expected.
2
Classify rules by type and volatility
Not all rules change at the same rate. Tax thresholds change annually. Pricing changes quarterly. Compliance rules change when regulations update. Classify each rule into stable (changes yearly or less), volatile (changes quarterly), and dynamic (changes weekly or per-transaction). This determines your encoding format.
3
Formalize in machine-readable format
Pick the simplest encoding pattern that handles the rule's complexity. Start with decision tables — you will be surprised how many rules fit into a simple condition-to-output mapping. Graduate to rule engines only when you hit genuine chained dependencies.
4
Wire into the AI execution path
Make the rules available at inference time. This means either injecting them into the agent's tool set (the agent calls a rules API) or embedding them as middleware between the model and the output. Do not put business rules in system prompts — they get lost in long contexts, they cannot be versioned independently, and the model might override them.

Implementation Architecture

How decision tables, rule engines, and policy checks fit together at runtime

A production business-rules-as-code setup typically has three layers operating together. The decision layer resolves lookups — pricing, eligibility, categorization. The logic layer handles chained reasoning — multi-step approval workflows, complex eligibility with dependencies. The policy layer acts as the final guardrail — blocking actions that violate hard constraints regardless of what the other layers returned.

Rules Repository Structure

tree

rules/
├── decision-tables/
│   ├── pricing-tiers.json
│   ├── support-eligibility.json
│   ├── shipping-rates.json
│   └── discount-matrix.json
├── rule-engine/
│   ├── refund-eligibility.dmn
│   ├── approval-workflow.dmn
│   └── credit-assessment.dmn
├── policies/
│   ├── agent-guardrails.rego
│   ├── pii-protection.rego
│   ├── spend-limits.rego
│   └── regional-compliance.rego
├── schemas/
│   ├── rule-schema.json
│   └── audit-log-schema.json
└── CHANGELOG.md

lib/rules-middleware.ts

import { evaluate } from './rule-engine';
import { checkPolicy } from './policy-engine';
import { lookupDecisionTable } from './decision-tables';

interface AgentAction {
  type: string;
  params: Record<string, unknown>;
  context: Record<string, unknown>;
}

interface RuleResult {
  allowed: boolean;
  values: Record<string, unknown>;
  appliedRules: string[];
  deniedBy?: string;
}

export async function enforceBusinessRules(
  action: AgentAction
): Promise<RuleResult> {
  // Layer 1: Decision table lookup
  const tableResult = await lookupDecisionTable(
    action.type,
    action.params
  );

  // Layer 2: Rule engine for chained logic
  const engineResult = await evaluate({
    ...action.params,
    ...tableResult.values,
  });

  // Layer 3: Policy guardrail — final check
  const policyResult = await checkPolicy({
    action: action.type,
    ...action.params,
    ...engineResult.values,
  });

  if (policyResult.denied) {
    return {
      allowed: false,
      values: {},
      appliedRules: policyResult.violatedPolicies,
      deniedBy: policyResult.violatedPolicies[0],
    };
  }

  return {
    allowed: true,
    values: { ...tableResult.values, ...engineResult.values },
    appliedRules: [
      ...tableResult.matchedRules,
      ...engineResult.firedRules,
    ],
  };
}

Versioning, Auditing, and the Compliance Trail

Every AI decision needs a traceable chain back to a specific rule version

When a customer disputes a charge or a regulator asks why an application was denied, "the AI decided" is not an acceptable answer. You need a full trace: the input data, the rule version that was active, the specific rule that matched, and the output it produced.

This is where business-rules-as-code pays for itself beyond accuracy. Every rule evaluation produces an audit record that a human can read, verify, and defend. The rule engine does not have a "mood" — it applied rule version 2026.03.15, row 7, which maps enterprise accounts with annual spend above $500K in the NA region to a 25% discount with net-60 terms.

Rule Version

Every evaluation logs which rule version was active at execution time

Input Snapshot

Full copy of input parameters used for the rule evaluation

Match Trace

Which specific rule or row matched and why alternatives did not

Timestamp

Exact time of evaluation with millisecond precision for ordering

Store rules in Git. Treat every rule change like a code change — pull request, review by the domain owner, automated tests, and deployment through CI/CD. When rules live in a database or a UI without version control, you lose the ability to answer "what were the rules on March 3rd when this decision was made?"

The best implementations tag each AI response with the rule version hash that was active during evaluation. If the rules have changed since, the system can flag the response as potentially stale — a pattern that becomes critical for long-running conversations where a pricing update might happen mid-session.

Business Rules Worth Encoding First

High-impact categories where formalization prevents the most expensive errors

Financial rules (highest blast radius)

✓
Pricing tiers, discount maximums, and volume break points
✓
Payment terms by account type, region, and contract status
✓
Approval thresholds — what dollar amounts require which sign-offs
✓
Tax calculation rules by jurisdiction
✓
Refund and credit policies with all exception paths

Compliance and regulatory rules

✓
Data residency requirements by customer region (GDPR, CCPA, PIPL)
✓
KYC/AML thresholds and documentation requirements
✓
Industry-specific regulations (HIPAA, SOX, PCI-DSS)
✓
Record retention periods and deletion obligations
✓
Mandatory disclosure and disclaimer language by jurisdiction

Operational rules (death by a thousand cuts)

✓
SLA definitions — response times, resolution times, escalation paths
✓
Eligibility criteria for features, programs, or services
✓
Routing logic — which team handles which request type
✓
Capacity limits — max users, API rate limits, storage quotas
✓
Scheduling constraints — business hours, blackout periods, maintenance windows

Testing Business Rules Like Software

Decision tables and policies need test suites, not just spot checks

The same rigor you apply to application code applies to business rules. Every decision table needs a test suite that verifies correct outputs for known inputs, including edge cases and boundary conditions. Every policy needs negative tests that confirm denied actions are actually denied.

OPA has built-in test tooling (opa test) that makes this straightforward. For decision tables in JSON, write a simple test harness that evaluates every row against sample inputs and asserts expected outputs.

Test every row in every decision table against known inputs

typescript

// test/pricing-tiers.test.ts
import { lookupDecisionTable } from '../lib/decision-tables';

test('enterprise NA high-spend gets 25% discount', () => {
  const result = lookupDecisionTable('pricing_tiers', {
    account_type: 'enterprise',
    annual_spend: 750000,
    region: 'NA',
  });
  expect(result.discount_pct).toBe(25);
  expect(result.payment_terms).toBe('net-60');
});

Test boundary conditions — the row edges where rules change

typescript

test('exactly $500K triggers high-spend tier', () => {
  const result = lookupDecisionTable('pricing_tiers', {
    account_type: 'enterprise',
    annual_spend: 500000,  // boundary
    region: 'NA',
  });
  expect(result.discount_pct).toBe(25);
});

test('$499,999 stays in standard tier', () => {
  const result = lookupDecisionTable('pricing_tiers', {
    account_type: 'enterprise',
    annual_spend: 499999,  // just below boundary
    region: 'NA',
  });
  expect(result.discount_pct).toBe(15);
});

Test policy denials with OPA's built-in test runner

ruby

# policies/agent-guardrails_test.rego
package agent.guardrails

test_deny_excessive_discount if {
    deny with input as {
        "action": "apply_discount",
        "account_type": "consumer",
        "discount_pct": 20
    }
}

test_allow_valid_discount if {
    count(deny) == 0 with input as {
        "action": "apply_discount",
        "account_type": "enterprise",
        "discount_pct": 25
    }
}

Detecting Rules Drift

When the real world changes but your encoded rules do not

Encoding rules is a point-in-time activity. The rules are correct when you encode them. Six months later, the pricing has changed, the compliance threshold has shifted, and two new product tiers were added — but nobody updated the decision table.

Rules drift is the silent killer of business-rules-as-code systems. The AI is dutifully enforcing rules that are no longer accurate, and because the enforcement is deterministic, nobody notices until a customer complains or an audit flags a discrepancy.

Three mechanisms prevent drift.

Drift Prevention Mechanisms

Expiry dates on every rule

Every decision table row and policy rule gets a mandatory expires_at field. When a rule expires, the system does not silently continue using it — it fails loudly and routes to a human. This forces periodic review without relying on humans remembering to check.

Automated conflict detection between rules and observed behavior

Run a weekly job that compares rule outputs against actual business outcomes. If the rule says the discount should be 15% but the last 50 transactions averaged 22%, something is wrong — either the rule is outdated or it is being bypassed. Flag it.

Domain owner sign-off on a cadence matching volatility

Stable rules get annual review. Volatile rules get quarterly review. Dynamic rules get reviewed with every deployment. Assign a named owner — not a team, a person — to each rule category. Ownerless rules are rules that will drift.

Wiring Rules Into Agent Frameworks

Practical integration patterns for LangChain, Vercel AI SDK, and custom agents

Most agent frameworks support tool calling. This is your integration point. Define your rules engine as a tool that the agent can (and must) call before taking actions that involve business logic.

The key architectural decision: should the agent call rules proactively, or should middleware intercept the agent's output and validate it? Both patterns have trade-offs.

Pattern	How It Works	Pros	Cons
Agent-calls-rules (tool)	Agent has a 'check_business_rules' tool and calls it before responding	Agent learns when to check; lower latency for simple queries	Agent might skip the call; requires reliable tool-use behavior
Middleware interception	Every agent output passes through rules engine before delivery	100% coverage; agent cannot bypass	Added latency on every response; some checks are unnecessary
Hybrid (recommended)	Middleware catches high-risk actions; agent calls tools for lookups	Best coverage-to-latency ratio; defense in depth	More complex to set up; two systems to maintain

tools/business-rules-tool.ts

import { tool } from 'ai';
import { z } from 'zod';
import { enforceBusinessRules } from '../lib/rules-middleware';

export const checkBusinessRules = tool({
  description: 'Check business rules before quoting prices, ' +
    'applying discounts, or making commitments to customers. ' +
    'ALWAYS call this before responding with any financial figures.',
  parameters: z.object({
    action: z.string().describe('The action type: pricing, discount, refund, approval'),
    account_type: z.string().describe('Customer account tier'),
    region: z.string().optional().describe('Customer region code'),
    amount: z.number().optional().describe('Transaction amount if applicable'),
    additional_context: z.record(z.unknown()).optional(),
  }),
  execute: async (params) => {
    const result = await enforceBusinessRules({
      type: params.action,
      params,
      context: params.additional_context ?? {},
    });

    return {
      allowed: result.allowed,
      values: result.values,
      rules_applied: result.appliedRules,
      denied_reason: result.deniedBy ?? null,
    };
  },
});

Measuring Rules-as-Code Effectiveness

Track whether your encoded rules are actually preventing errors

You need four metrics to know if your business-rules-as-code system is working. Track them from day one — they are your justification for the investment and your early warning system for drift.

Interception Rate

Percentage of AI actions that hit a rule check — target 100% for financial actions

Denial Rate

Percentage of actions blocked by rules — too high means poor agent prompting

Escape Rate

Business logic errors that reach users despite rules — should trend toward zero

Rules Freshness

Average age of rules in production — stale rules are wrong rules

Where to Start Tomorrow Morning

A practical first-week plan for encoding your most critical business rules

First Week: Business Rules as Code

List the 10 most common customer-facing decisions your AI makes (pricing, eligibility, refunds, routing)
For each decision, find the current source of truth (PDF, spreadsheet, person's head)
Encode the top 3 as JSON decision tables with version and effective_date fields
Write a test for each decision table covering the happy path and two edge cases
Wire one decision table into your agent as a tool call — start with pricing
Add a single OPA policy blocking discounts above the maximum for each account tier
Set up audit logging that records rule version, input, matched rule, and output for every evaluation
Schedule a monthly review meeting with domain owners to verify rules freshness

Do I need a separate rules engine, or can I use the LLM to interpret rules?

Use a separate engine. LLMs interpreting rules in natural language will occasionally get them wrong — and 'occasionally wrong' on pricing or compliance is not acceptable. The LLM's job is communication: taking the deterministic output from your rules engine and explaining it clearly to the user. Keep reasoning and communication as separate concerns.

How do I handle rules that have exceptions or require human judgment?

Encode the rule, and make the exception path explicit. If a rule is 'net-30 terms except when VP of Sales approves an override,' encode both the rule and the override path. The override becomes a separate rule that requires a human-approval flag in the input. Your system should never guess about exceptions — it should either apply the rule or route to a human.

What about rules that change frequently, like promotional pricing?

Use a deployment pipeline. Promotional rules go into a separate decision table with start and end dates. A CI/CD pipeline validates the table, runs tests, and deploys it to the rules service. The agent always reads the current version. When the promo expires, the table returns to base pricing automatically. No manual intervention needed.

How do decision tables scale to thousands of rules?

Partition by domain. One table per decision category — pricing, eligibility, routing, compliance. Each table stays small and maintainable. The middleware layer knows which table to query based on the action type. Most organizations find they need 20-50 tables covering 500-2000 rules total. That is manageable with standard tooling.

Can I use an LLM to help extract rules from policy documents?

Yes, with verification. Use the LLM to propose structured rules from unstructured policy text, but always have a domain expert validate the output before it enters production. Georgetown's Beeck Center research shows LLMs achieve good policy-to-code conversion when the logic is simple, but struggle with complex multi-condition rules^[7]. Treat LLM extraction as a drafting tool, not a finalization tool.