Guardrails¶

AIBA has two independent layers of run-time protection:

pydantic-ai-shields — four capability wrappers that intercept agent calls for safety checks. Toggled by GUARDRAILS_ENABLED in .env.
UsageLimits — pydantic-ai's native resource enforcement. Request caps, tool-call caps, and token budgets. Always active, set per effort mode.

Covers: Overview, CostTracking, ToolGuard, InputGuard, SecretRedaction, UsageLimits, and Configuration ¶

Overview¶

Guardrails are pydantic-ai capabilities — wrappers that intercept agent invocations at the framework level. They are injected once at agent creation time and apply to every subsequent call.

There is one master switch: GUARDRAILS_ENABLED in .env. When false, no guardrails are loaded — the agents run bare.

Who gets what¶

Guardrail	Main Agent	Sub-Agent	Purpose
`CostTracking`	Yes	No	Kill the orchestrator when the USD budget is exceeded
`ToolGuard`	Yes	No	Require human approval before executing dangerous tools
`InputGuard`	Yes	Yes	Inspect and sanitize user input before it reaches the model
`SecretRedaction`	No	Yes	Redact secrets from sub-agent output

The main agent (orchestrator) owns cost and approval concerns — it's the one running the overall operation. The sub-agent (web worker) gets SecretRedaction because it browses external pages that may accidentally contain API keys or tokens in their source.

CostTracking¶

Enforces a hard USD budget per agent run. Configured via COST_BUDGET_USD (default 1.0).

When the agent's accumulated API cost exceeds the budget, pydantic-ai raises a UsageLimitExceeded exception. AIBA catches this and displays a resource-limit warning in the terminal.

CostTracking(budget_usd=_settings.cost_budget_usd)

The cost counter is per-run, not cumulative across sessions. Each new REPL turn or beat execution starts with a fresh budget. AIBA provides no rollover or monthly tracking — that belongs to your provider's billing dashboard.

ToolGuard¶

Requires human approval before executing specific tools. Configured via REQUIRE_APPROVAL_FOR — a comma-separated list of tool names in .env.

ToolGuard(require_approval=_settings.require_approval_for)

Example configuration:

REQUIRE_APPROVAL_FOR=["send_email"]

When a guarded tool is invoked, the agent pauses and prompts in the terminal:

  ⚠  Agent wants to call 'send_email'
  Recipient: user@example.com
  Subject: AIBA Research Results

  Approve? [y/N]:

Declining the approval returns a rejection to the agent, which can then adapt its plan.

Any tool can be guarded — spawn_sub_agents, append_csv, or custom tools. The list is empty by default ([]), meaning no tools require approval.

InputGuard¶

Inspects user input before it reaches the language model. Both agents run it.

InputGuard()

InputGuard screens for:

Prompt injection attempts (e.g. ignore previous instructions)
Excessively long inputs that could blow context windows
Unicode homoglyph attacks and obfuscation patterns

These checks happen client-side — the input never leaves your machine if it fails inspection.

SecretRedaction¶

Redacts secrets from sub-agent output before it's returned to the orchestrator. The sub-agent browses external web pages, and a page's source may contain hardcoded API keys, tokens, or credentials that the model inadvertently echoes.

SecretRedaction()

SecretRedaction scans output for patterns matching common secret formats:

Pattern	Example
GitHub tokens	`ghp_xxxxxxxxxxxxxxxxxxxx`
AWS keys	`AKIAIOSFODNN7EXAMPLE`
Generic API keys	`sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`
JWT tokens	`eyJhbGciOiJIUzI1NiJ9...`

It does not prevent the sub-agent from seeing secrets — it only redacts them from the text that flows back to the orchestrator and into conversation history.

UsageLimits¶

Separate from the pydantic-ai-shields capabilities, every effort mode enforces hard resource caps through pydantic-ai's native UsageLimits. These are not opt-in — they are always active and determined by the effort mode selected at startup.

Per-mode limits¶

Mode	`request_limit`	`tool_calls_limit`	`total_tokens_limit`
quick	15	20	100,000
balanced	25	40	300,000
max	50	100	500,000

All three caps apply to every agent invocation — the main orchestrator, each sub-agent spawned in a swarm, and the single sub-agent in agent mode.

When any limit is hit, pydantic-ai raises UsageLimitExceeded — the same exception CostTracking uses. AIBA catches it uniformly and displays a resource-limit warning.

Where they're set¶

UsageLimits are hardcoded in src/prompts/effort.py and flow into every agent call:

EFFORT_CONFIGS: dict[EffortMode, EffortConfig] = {
    EffortMode.QUICK: {
        "usage_limits": UsageLimits(
            request_limit=15,
            tool_calls_limit=20,
            total_tokens_limit=100_000,
        ),
        ...
    },
    ...
}

The config is unpacked at call time in the REPL, run_agent(), run_sub_agent(), and spawn_sub_agents():

result = agent_fn(
    user_input,
    usage_limits=config["usage_limits"],
    ...
)

There is no .env variable for UsageLimits — they are deliberately tied to effort mode. If you need tighter or looser caps, change the effort mode.

Configuration¶

All guardrail settings live in .env:

GUARDRAILS_ENABLED=true
COST_BUDGET_USD=1.0
REQUIRE_APPROVAL_FOR=[]

Variable	Type	Default	Effect
`GUARDRAILS_ENABLED`	`bool`	`true`	Master switch — `false` strips all guardrails from both agents
`COST_BUDGET_USD`	`float`	`1.0`	Maximum USD per run for `CostTracking`
`REQUIRE_APPROVAL_FOR`	`list[str]`	`[]`	Tool names that require human approval via `ToolGuard`

Guardrails are injected at agent creation time in main_agent.py and sub_agent.py. They cannot be toggled mid-session without restarting the REPL.

Wiring¶

Main agent (src/agents/main_agent.py):

capabilities=[
    ReinjectSystemPrompt(),
    IncludeToolReturnSchemas(),
    *(
        [
            CostTracking(budget_usd=_settings.cost_budget_usd),
            ToolGuard(require_approval=_settings.require_approval_for),
            InputGuard(),
        ]
        if _settings.guardrails_enabled
        else []
    ),
],

Sub-agent (src/agents/sub_agent.py):

capabilities=[
    _web_search_cap,
    WebFetch(local=True),
    playwright_cap,
    ReinjectSystemPrompt(),
    IncludeToolReturnSchemas(tools=lambda ctx, td: bool(td.return_schema)),
    *([SecretRedaction(), InputGuard()] if _settings.guardrails_enabled else []),
],