Here is a pattern that plays out in every organization deploying AI agents: a customer-facing bot quotes a discount the company has never offered. An internal copilot approves a vendor spend that violates procurement policy. A support agent promises a refund window that expired two quarters ago.
The AI wasn't hallucinating in the traditional sense. It wasn't confusing one fact with another. It was filling a vacuum. The model had no structured representation of the business rule, so it did what language models do — it generated a plausible-sounding answer based on pattern matching across its training data.
The fix is not better prompts. Prompts are natural language, and natural language is ambiguous by design. The fix is encoding your business rules in machine-readable formats that sit alongside the model at inference time, constraining its outputs against verified organizational logic.
The Implicit Knowledge Gap
Why organizations underestimate what AI needs to know explicitly
Every company runs on thousands of rules that nobody has written down in a structured format. Pricing tiers, approval thresholds, regional compliance variations, contractual obligations, escalation procedures, SLA definitions — these live in employee heads, scattered Confluence pages, PDF policy manuals, and tribal knowledge passed down during onboarding.
Humans navigate this fine. A senior account manager knows that enterprise clients in the EU get net-60 terms but APAC clients get net-30. A compliance officer knows that transactions above $10,000 need a second signature. This knowledge is implicit, contextual, and usually correct because humans have years of reinforcement learning from organizational feedback.
AI systems have none of that. They get a prompt, maybe some retrieved context from a vector store, and they generate. If the vector store contains a two-year-old pricing PDF, the model will quote two-year-old prices with complete confidence.
The gap between what humans know implicitly and what AI needs explicitly is the single largest source of business-logic errors in deployed AI systems. RAG helps with factual recall, but it does not help with conditional logic. Knowing that "the refund policy is 30 days" is not the same as knowing that "the refund policy is 30 days for consumer accounts, 90 days for enterprise with an active support contract, and 0 days for custom-built integrations after acceptance testing completes."
That kind of branching, conditional, exception-laden logic requires structure. Not paragraphs — structure.
Four Patterns for Encoding Business Rules
From simple lookup tables to full policy engines
Not every business rule needs the same level of formalization. A simple lookup table works for pricing tiers. A full policy engine is warranted for multi-jurisdiction compliance checks. Here are the four patterns, ordered by complexity and when each one earns its overhead.
| Pattern | Best For | Format | Runtime Cost | Maintenance Burden |
|---|---|---|---|---|
| Decision Tables | Pricing, eligibility, tier assignment | JSON / YAML / CSV | Microseconds | Low — business teams can edit |
| Rule Engines | Multi-step logic, chained conditions | Drools / DMN / GoRules | Milliseconds | Medium — needs rule authoring skills |
| Policy-as-Code | Access control, compliance, guardrails | OPA Rego / Cedar / Cerbos | Milliseconds | Medium — needs policy engineering |
| Constraint Solvers | Scheduling, resource allocation, optimization | OR-Tools / Z3 / MiniZinc | Seconds | High — needs mathematical modeling |
Pattern 1: Decision Tables
The simplest encoding that solves most business rule problems
Decision tables map input conditions to output actions. They are the oldest formalization of business rules — insurance companies have used them since the 1960s — and they remain the most practical starting point for AI-accessible business logic[3].
The idea is straightforward: define your conditions as columns, define your outputs as columns, and each row is a rule. At inference time, the AI system (or a middleware layer) evaluates the input against the table and returns the matching output. No ambiguity, no generation, no guessing.
rules/pricing-tiers.json{
"table": "pricing_tiers",
"version": "2026-03-01",
"conditions": ["account_type", "annual_spend", "region"],
"outputs": ["discount_pct", "payment_terms", "support_tier"],
"rules": [
{
"when": {
"account_type": "enterprise",
"annual_spend": ">= 500000",
"region": "NA"
},
"then": {
"discount_pct": 25,
"payment_terms": "net-60",
"support_tier": "dedicated"
}
},
{
"when": {
"account_type": "enterprise",
"annual_spend": "< 500000",
"region": "NA"
},
"then": {
"discount_pct": 15,
"payment_terms": "net-45",
"support_tier": "priority"
}
},
{
"when": {
"account_type": "startup",
"annual_spend": "*",
"region": "*"
},
"then": {
"discount_pct": 10,
"payment_terms": "net-30",
"support_tier": "standard"
}
}
]
}The key advantage of decision tables for AI systems: they can be injected directly into an agent's context window or called as a tool. An LLM does not need to understand the table — it needs to look up the correct row based on the customer's attributes and return the result. This is a retrieval task, not a generation task, and retrieval is where LLMs are reliable.
Pattern 2: Rule Engines for Chained Logic
When a single lookup table is not enough
Decision tables break down when rules depend on each other. If the discount depends on the payment terms, which depend on the credit score, which depends on the account age — you need forward chaining or backward chaining through a dependency graph. That is what rule engines do.
The Decision Model and Notation (DMN) standard, maintained by the Object Management Group, provides a vendor-neutral way to express these chained decisions[3]. DMN uses a graphical notation for decision dependencies and FEEL (Friendly Enough Expression Language) for the actual rule logic. Think of it as SQL for business rules — constrained enough to be deterministic, expressive enough for real-world conditions.
rules/refund-eligibility.feel// DMN Decision Table in FEEL
// Determines refund eligibility based on account and purchase context
if account.type = "consumer" then
if days_since_purchase <= 30 then "full_refund"
else if days_since_purchase <= 90 then "store_credit"
else "no_refund"
else if account.type = "enterprise" then
if has_active_support_contract then
if days_since_purchase <= 90 then "full_refund"
else "prorated_refund"
else
if days_since_purchase <= 30 then "full_refund"
else "no_refund"
else if account.type = "custom_integration" then
if acceptance_testing_complete then "no_refund"
else "full_refund"
else "escalate_to_manager"When an AI agent needs to answer "Can this customer get a refund?", it does not generate an answer. It calls the rule engine with the customer's attributes, gets a deterministic result, and then uses the LLM only to communicate that result in natural language. The reasoning is offloaded to verified logic; the model handles tone and phrasing.
This is the hybrid pattern that works: rules for reasoning, LLMs for communication. According to IBM's research on combining rule-based engines with LLMs, this separation yields both higher accuracy and better explainability[5] — the system can cite exactly which rule produced the decision. Specific accuracy gains depend on the complexity of the rule set and the LLM used.
Model invents discount percentages from training data
Different answers for the same customer depending on phrasing
No audit trail for why a specific decision was made
Updating a policy means rewriting prompts and hoping
Compliance team cannot review what the AI 'knows'
Discount pulled from versioned decision table
Deterministic output for identical inputs every time
Full trace: input → matched rule → output → version
Updating policy means changing one row in a table
Compliance team reviews rules directly in the table format
Pattern 3: Policy-as-Code for Guardrails
Enforcing compliance constraints at the execution boundary
Policy-as-code takes a different approach from decision tables and rule engines. Instead of encoding what the AI should do, it encodes what the AI must not do. It is a constraint system — a set of allow/deny rules evaluated against every action the AI attempts to take.
Open Policy Agent (OPA) and its Rego language have become the de facto standard for this pattern[2]. Originally built for Kubernetes admission control and API authorization, OPA's architecture maps perfectly to AI guardrails: evaluate a structured request against a policy bundle, return allow or deny, and log the decision for audit.
policies/agent-guardrails.regopackage agent.guardrails
import rego.v1
# Deny discount above maximum for account tier
deny contains msg if {
input.action == "apply_discount"
input.discount_pct > max_discount[input.account_type]
msg := sprintf("Discount %d%% exceeds max %d%% for %s accounts",
[input.discount_pct, max_discount[input.account_type], input.account_type])
}
max_discount := {
"enterprise": 30,
"startup": 15,
"consumer": 10,
}
# Deny transactions above threshold without approval
deny contains msg if {
input.action == "approve_transaction"
input.amount > 10000
not input.has_second_signature
msg := "Transactions above $10,000 require dual approval"
}
# Deny PII disclosure in response
deny contains msg if {
input.action == "send_response"
contains_pii(input.response_text)
msg := "Response contains PII — must be redacted before sending"
}AWS Bedrock's Automated Reasoning checks demonstrate this pattern at scale — translating natural language policies into formal logic that validates AI outputs with high verification accuracy for well-defined policy domains (AWS cites up to 99% in specific benchmark settings)[6]. The approach works because formal logic is complete in a way that prompt engineering cannot be. You can mathematically prove that a policy covers all edge cases. You cannot prove that about a system prompt.
The practical architecture looks like this: the AI agent generates a proposed action, the policy engine evaluates it against the current rule set, and only approved actions reach the user. Denied actions get routed to a fallback — a human escalation, a safe default response, or a re-generation with tighter constraints.
Extracting Rules From Your Organization
A practical workflow for turning tribal knowledge into machine-readable rules
The hardest part of business-rules-as-code is not the encoding format. It is the extraction. Most organizations do not have a clean inventory of their own rules. They have policy documents, employee handbooks, Slack threads where exceptions were negotiated, and institutional memory held by people who might leave next quarter.
Here is a four-step extraction workflow that works across industries.
- 1
Audit existing policy documents
Collect every document that contains conditional logic: pricing sheets, compliance manuals, SLA agreements, approval matrices, HR policies, vendor contracts. Do not try to formalize yet — just gather. Most organizations discover 3-5x more rule-bearing documents than they expected.
- 2
Classify rules by type and volatility
Not all rules change at the same rate. Tax thresholds change annually. Pricing changes quarterly. Compliance rules change when regulations update. Classify each rule into stable (changes yearly or less), volatile (changes quarterly), and dynamic (changes weekly or per-transaction). This determines your encoding format.
- 3
Formalize in machine-readable format
Pick the simplest encoding pattern that handles the rule's complexity. Start with decision tables — you will be surprised how many rules fit into a simple condition-to-output mapping. Graduate to rule engines only when you hit genuine chained dependencies.
- 4
Wire into the AI execution path
Make the rules available at inference time. This means either injecting them into the agent's tool set (the agent calls a rules API) or embedding them as middleware between the model and the output. Do not put business rules in system prompts — they get lost in long contexts, they cannot be versioned independently, and the model might override them.
Implementation Architecture
How decision tables, rule engines, and policy checks fit together at runtime
A production business-rules-as-code setup typically has three layers operating together. The decision layer resolves lookups — pricing, eligibility, categorization. The logic layer handles chained reasoning — multi-step approval workflows, complex eligibility with dependencies. The policy layer acts as the final guardrail — blocking actions that violate hard constraints regardless of what the other layers returned.
Rules Repository Structure
treerules/
├── decision-tables/
│ ├── pricing-tiers.json
│ ├── support-eligibility.json
│ ├── shipping-rates.json
│ └── discount-matrix.json
├── rule-engine/
│ ├── refund-eligibility.dmn
│ ├── approval-workflow.dmn
│ └── credit-assessment.dmn
├── policies/
│ ├── agent-guardrails.rego
│ ├── pii-protection.rego
│ ├── spend-limits.rego
│ └── regional-compliance.rego
├── schemas/
│ ├── rule-schema.json
│ └── audit-log-schema.json
└── CHANGELOG.mdlib/rules-middleware.tsimport { evaluate } from './rule-engine';
import { checkPolicy } from './policy-engine';
import { lookupDecisionTable } from './decision-tables';
interface AgentAction {
type: string;
params: Record<string, unknown>;
context: Record<string, unknown>;
}
interface RuleResult {
allowed: boolean;
values: Record<string, unknown>;
appliedRules: string[];
deniedBy?: string;
}
export async function enforceBusinessRules(
action: AgentAction
): Promise<RuleResult> {
// Layer 1: Decision table lookup
const tableResult = await lookupDecisionTable(
action.type,
action.params
);
// Layer 2: Rule engine for chained logic
const engineResult = await evaluate({
...action.params,
...tableResult.values,
});
// Layer 3: Policy guardrail — final check
const policyResult = await checkPolicy({
action: action.type,
...action.params,
...engineResult.values,
});
if (policyResult.denied) {
return {
allowed: false,
values: {},
appliedRules: policyResult.violatedPolicies,
deniedBy: policyResult.violatedPolicies[0],
};
}
return {
allowed: true,
values: { ...tableResult.values, ...engineResult.values },
appliedRules: [
...tableResult.matchedRules,
...engineResult.firedRules,
],
};
}Versioning, Auditing, and the Compliance Trail
Every AI decision needs a traceable chain back to a specific rule version
When a customer disputes a charge or a regulator asks why an application was denied, "the AI decided" is not an acceptable answer. You need a full trace: the input data, the rule version that was active, the specific rule that matched, and the output it produced.
This is where business-rules-as-code pays for itself beyond accuracy. Every rule evaluation produces an audit record that a human can read, verify, and defend. The rule engine does not have a "mood" — it applied rule version 2026.03.15, row 7, which maps enterprise accounts with annual spend above $500K in the NA region to a 25% discount with net-60 terms.
Store rules in Git. Treat every rule change like a code change — pull request, review by the domain owner, automated tests, and deployment through CI/CD. When rules live in a database or a UI without version control, you lose the ability to answer "what were the rules on March 3rd when this decision was made?"
The best implementations tag each AI response with the rule version hash that was active during evaluation. If the rules have changed since, the system can flag the response as potentially stale — a pattern that becomes critical for long-running conversations where a pricing update might happen mid-session.
Business Rules Worth Encoding First
High-impact categories where formalization prevents the most expensive errors
Financial rules (highest blast radius)
- ✓
Pricing tiers, discount maximums, and volume break points
- ✓
Payment terms by account type, region, and contract status
- ✓
Approval thresholds — what dollar amounts require which sign-offs
- ✓
Tax calculation rules by jurisdiction
- ✓
Refund and credit policies with all exception paths
Compliance and regulatory rules
- ✓
Data residency requirements by customer region (GDPR, CCPA, PIPL)
- ✓
KYC/AML thresholds and documentation requirements
- ✓
Industry-specific regulations (HIPAA, SOX, PCI-DSS)
- ✓
Record retention periods and deletion obligations
- ✓
Mandatory disclosure and disclaimer language by jurisdiction
Operational rules (death by a thousand cuts)
- ✓
SLA definitions — response times, resolution times, escalation paths
- ✓
Eligibility criteria for features, programs, or services
- ✓
Routing logic — which team handles which request type
- ✓
Capacity limits — max users, API rate limits, storage quotas
- ✓
Scheduling constraints — business hours, blackout periods, maintenance windows
Testing Business Rules Like Software
Decision tables and policies need test suites, not just spot checks
The same rigor you apply to application code applies to business rules. Every decision table needs a test suite that verifies correct outputs for known inputs, including edge cases and boundary conditions. Every policy needs negative tests that confirm denied actions are actually denied.
OPA has built-in test tooling (opa test) that makes this straightforward. For decision tables in JSON, write a simple test harness that evaluates every row against sample inputs and asserts expected outputs.
- 1
Test every row in every decision table against known inputs
typescript// test/pricing-tiers.test.ts import { lookupDecisionTable } from '../lib/decision-tables'; test('enterprise NA high-spend gets 25% discount', () => { const result = lookupDecisionTable('pricing_tiers', { account_type: 'enterprise', annual_spend: 750000, region: 'NA', }); expect(result.discount_pct).toBe(25); expect(result.payment_terms).toBe('net-60'); }); - 2
Test boundary conditions — the row edges where rules change
typescripttest('exactly $500K triggers high-spend tier', () => { const result = lookupDecisionTable('pricing_tiers', { account_type: 'enterprise', annual_spend: 500000, // boundary region: 'NA', }); expect(result.discount_pct).toBe(25); }); test('$499,999 stays in standard tier', () => { const result = lookupDecisionTable('pricing_tiers', { account_type: 'enterprise', annual_spend: 499999, // just below boundary region: 'NA', }); expect(result.discount_pct).toBe(15); }); - 3
Test policy denials with OPA's built-in test runner
ruby# policies/agent-guardrails_test.rego package agent.guardrails test_deny_excessive_discount if { deny with input as { "action": "apply_discount", "account_type": "consumer", "discount_pct": 20 } } test_allow_valid_discount if { count(deny) == 0 with input as { "action": "apply_discount", "account_type": "enterprise", "discount_pct": 25 } }
Detecting Rules Drift
When the real world changes but your encoded rules do not
Encoding rules is a point-in-time activity. The rules are correct when you encode them. Six months later, the pricing has changed, the compliance threshold has shifted, and two new product tiers were added — but nobody updated the decision table.
Rules drift is the silent killer of business-rules-as-code systems. The AI is dutifully enforcing rules that are no longer accurate, and because the enforcement is deterministic, nobody notices until a customer complains or an audit flags a discrepancy.
Three mechanisms prevent drift.
Drift Prevention Mechanisms
Expiry dates on every rule
Every decision table row and policy rule gets a mandatory expires_at field. When a rule expires, the system does not silently continue using it — it fails loudly and routes to a human. This forces periodic review without relying on humans remembering to check.
Automated conflict detection between rules and observed behavior
Run a weekly job that compares rule outputs against actual business outcomes. If the rule says the discount should be 15% but the last 50 transactions averaged 22%, something is wrong — either the rule is outdated or it is being bypassed. Flag it.
Domain owner sign-off on a cadence matching volatility
Stable rules get annual review. Volatile rules get quarterly review. Dynamic rules get reviewed with every deployment. Assign a named owner — not a team, a person — to each rule category. Ownerless rules are rules that will drift.
Wiring Rules Into Agent Frameworks
Practical integration patterns for LangChain, Vercel AI SDK, and custom agents
Most agent frameworks support tool calling. This is your integration point. Define your rules engine as a tool that the agent can (and must) call before taking actions that involve business logic.
The key architectural decision: should the agent call rules proactively, or should middleware intercept the agent's output and validate it? Both patterns have trade-offs.
| Pattern | How It Works | Pros | Cons |
|---|---|---|---|
| Agent-calls-rules (tool) | Agent has a 'check_business_rules' tool and calls it before responding | Agent learns when to check; lower latency for simple queries | Agent might skip the call; requires reliable tool-use behavior |
| Middleware interception | Every agent output passes through rules engine before delivery | 100% coverage; agent cannot bypass | Added latency on every response; some checks are unnecessary |
| Hybrid (recommended) | Middleware catches high-risk actions; agent calls tools for lookups | Best coverage-to-latency ratio; defense in depth | More complex to set up; two systems to maintain |
tools/business-rules-tool.tsimport { tool } from 'ai';
import { z } from 'zod';
import { enforceBusinessRules } from '../lib/rules-middleware';
export const checkBusinessRules = tool({
description: 'Check business rules before quoting prices, ' +
'applying discounts, or making commitments to customers. ' +
'ALWAYS call this before responding with any financial figures.',
parameters: z.object({
action: z.string().describe('The action type: pricing, discount, refund, approval'),
account_type: z.string().describe('Customer account tier'),
region: z.string().optional().describe('Customer region code'),
amount: z.number().optional().describe('Transaction amount if applicable'),
additional_context: z.record(z.unknown()).optional(),
}),
execute: async (params) => {
const result = await enforceBusinessRules({
type: params.action,
params,
context: params.additional_context ?? {},
});
return {
allowed: result.allowed,
values: result.values,
rules_applied: result.appliedRules,
denied_reason: result.deniedBy ?? null,
};
},
});Measuring Rules-as-Code Effectiveness
Track whether your encoded rules are actually preventing errors
You need four metrics to know if your business-rules-as-code system is working. Track them from day one — they are your justification for the investment and your early warning system for drift.
Where to Start Tomorrow Morning
A practical first-week plan for encoding your most critical business rules
First Week: Business Rules as Code
List the 10 most common customer-facing decisions your AI makes (pricing, eligibility, refunds, routing)
For each decision, find the current source of truth (PDF, spreadsheet, person's head)
Encode the top 3 as JSON decision tables with version and effective_date fields
Write a test for each decision table covering the happy path and two edge cases
Wire one decision table into your agent as a tool call — start with pricing
Add a single OPA policy blocking discounts above the maximum for each account tier
Set up audit logging that records rule version, input, matched rule, and output for every evaluation
Schedule a monthly review meeting with domain owners to verify rules freshness
Do I need a separate rules engine, or can I use the LLM to interpret rules?
Use a separate engine. LLMs interpreting rules in natural language will occasionally get them wrong — and 'occasionally wrong' on pricing or compliance is not acceptable. The LLM's job is communication: taking the deterministic output from your rules engine and explaining it clearly to the user. Keep reasoning and communication as separate concerns.
How do I handle rules that have exceptions or require human judgment?
Encode the rule, and make the exception path explicit. If a rule is 'net-30 terms except when VP of Sales approves an override,' encode both the rule and the override path. The override becomes a separate rule that requires a human-approval flag in the input. Your system should never guess about exceptions — it should either apply the rule or route to a human.
What about rules that change frequently, like promotional pricing?
Use a deployment pipeline. Promotional rules go into a separate decision table with start and end dates. A CI/CD pipeline validates the table, runs tests, and deploys it to the rules service. The agent always reads the current version. When the promo expires, the table returns to base pricing automatically. No manual intervention needed.
How do decision tables scale to thousands of rules?
Partition by domain. One table per decision category — pricing, eligibility, routing, compliance. Each table stays small and maintainable. The middleware layer knows which table to query based on the action type. Most organizations find they need 20-50 tables covering 500-2000 rules total. That is manageable with standard tooling.
Can I use an LLM to help extract rules from policy documents?
Yes, with verification. Use the LLM to propose structured rules from unstructured policy text, but always have a domain expert validate the output before it enters production. Georgetown's Beeck Center research shows LLMs achieve good policy-to-code conversion when the logic is simple, but struggle with complex multi-condition rules[7]. Treat LLM extraction as a drafting tool, not a finalization tool.
Further Reading and Tools
Standards: DMN Specification (Object Management Group), OPA Documentation
Rule Engines: Drools (Java), GoRules (cloud-native), json-rules-engine (Node.js)
Policy Engines: Open Policy Agent (Rego), Cedar (AWS), Cerbos (API-first)
Research: IBM's rule-based-llms repo, Georgetown Beeck Center AI-Powered Rules as Code, AWS Automated Reasoning Checks
- [1]Digital Government Hub — AI-Powered Rules as Code: Experiments with Public Benefits Policy(digitalgovernmenthub.org)↩
- [2]Open Policy Agent — Open Policy Agent Documentation(openpolicyagent.org)↩
- [3]Object Management Group — Decision Model and Notation (DMN) Standard(omg.org)↩
- [4]GoRules — Cloud-Native Rule Engine(gorules.io)↩
- [5]IBM DecisionsDev — Rule-Based LLMs(github.com)↩
- [6]AWS — Minimize AI Hallucinations And Deliver Up To 99% Verification Accuracy With Automated Reasoning Checks(aws.amazon.com)↩
- [7]Georgetown Beeck Center — AI-Powered Rules as Code: Experiments with Public Benefits Policy(beeckcenter.georgetown.edu)↩
- [8]Combining Rule-Based and LLM-Based Approaches for Decision Making(arxiv.org)↩