Files
codex/codex-rs/rollout-trace
Michael Bolin 4816b89204 permissions: make profiles represent enforcement (#19231)
## Why

`PermissionProfile` is becoming the canonical permissions abstraction,
but the old shape only carried optional filesystem and network fields.
It could describe allowed access, but not who is responsible for
enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy
when profiles were exported, cached, or round-tripped through app-server
APIs.

The important model change is that active permissions are now a disjoint
union over the enforcement mode. Conceptually:

```rust
pub enum PermissionProfile {
    Managed {
        file_system: FileSystemSandboxPolicy,
        network: NetworkSandboxPolicy,
    },
    Disabled,
    External {
        network: NetworkSandboxPolicy,
    },
}
```

This distinction matters because `Disabled` means Codex should apply no
outer sandbox at all, while `External` means filesystem isolation is
owned by an outside caller. Those are not equivalent to a broad managed
sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an
inner sandbox may require the outer Codex layer to use no sandbox rather
than a permissive one.

## How Existing Modeling Maps

Legacy `SandboxPolicy` remains a boundary projection, but it now maps
into the higher-fidelity profile model:

- `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed`
with restricted filesystem entries plus the corresponding network
policy.
- `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving
the “no outer sandbox” intent instead of treating it as a lax managed
sandbox.
- `ExternalSandbox { network_access }` maps to
`PermissionProfile::External { network }`, preserving external
filesystem enforcement while still carrying the active network policy.
- Split runtime policies that legacy `SandboxPolicy` cannot faithfully
express, such as managed unrestricted filesystem plus restricted
network, stay `Managed` instead of being collapsed into
`ExternalSandbox`.
- Per-command/session/turn grants remain partial overlays via
`AdditionalPermissionProfile`; full `PermissionProfile` is reserved for
complete active runtime permissions.

## What Changed

- Change active `PermissionProfile` into a tagged union: `managed`,
`disabled`, and `external`.
- Keep partial permission grants separate with
`AdditionalPermissionProfile` for command/session/turn overlays.
- Represent managed filesystem permissions as either `restricted`
entries or `unrestricted`; `glob_scan_max_depth` is non-zero when
present.
- Preserve old rollout compatibility by accepting the pre-tagged `{
network, file_system }` profile shape during deserialization.
- Preserve fidelity for important edge cases: `DangerFullAccess`
round-trips as `disabled`, `ExternalSandbox` round-trips as `external`,
and managed unrestricted filesystem + restricted network stays managed
instead of being mistaken for external enforcement.
- Preserve configured deny-read entries and bounded glob scan depth when
full profiles are projected back into runtime policies, including
unrestricted replacements that now become `:root = write` plus deny
entries.
- Regenerate the experimental app-server v2 JSON/TypeScript schema and
update the `command/exec` README example for the tagged
`permissionProfile` shape.

## Compatibility

Legacy `SandboxPolicy` remains available at config/API boundaries as the
compatibility projection. Existing rollout lines with the old
`PermissionProfile` shape continue to load. The app-server
`permissionProfile` field is experimental, so its v2 wire shape is
intentionally updated to match the higher-fidelity model.

## Verification

- `just write-app-server-schema`
- `cargo check --tests`
- `cargo test -p codex-protocol permission_profile`
- `cargo test -p codex-protocol
preserving_deny_entries_keeps_unrestricted_policy_enforceable`
- `cargo test -p codex-app-server-protocol
permission_profile_file_system_permissions`
- `cargo test -p codex-app-server-protocol serialize_client_response`
- `cargo test -p codex-core
session_configured_reports_permission_profile_for_external_sandbox`
- `just fix`
- `just fix -p codex-protocol`
- `just fix -p codex-app-server-protocol`
- `just fix -p codex-core`
- `just fix -p codex-app-server`
2026-04-23 23:02:18 -07:00
..

Rollout Trace

Privacy: Rollout tracing is not telemetry. Codex does not upload or report these traces; it writes local bundles only when CODEX_ROLLOUT_TRACE_ROOT is set. Those local bundles can contain prompts, responses, tool inputs/outputs, terminal output, and paths, so treat them as sensitive.

Rollout tracing is an opt-in diagnostic path for understanding what happened during a Codex session. It records raw runtime evidence into a local bundle on disk, then replays that bundle into a semantic graph that a debugger or UI can inspect.

The key design choice is: observe first, interpret later.

Hot-path Codex code does not try to build the final graph while the session is running. It writes ordered raw events and payload references. The offline reducer then decides which events became model-visible conversation, which events were runtime work, and how information moved between threads, tools, code cells, and terminal sessions.

What This Gives Us

Rollout traces make failures debuggable when the normal transcript is not enough. They preserve enough evidence to answer questions like:

  • Which model request produced this tool call?
  • Did this output come from the model-visible transcript, a code-mode runtime value, a terminal operation, or an agent notification?
  • Which code-mode exec cell issued a nested tool call?
  • Which terminal operation created or reused a running process?
  • Which multi-agent v2 tool call spawned, messaged, received from, or closed a child thread?

The reduced state.json is intentionally not just a transcript. It is a graph of model-visible conversation plus the runtime objects that explain how Codex got there.

System Shape

flowchart TD
    subgraph Runtime["codex-core runtime"]
        Protocol["protocol lifecycle\nthread start/end, turn start/end"]
        Inference["inference + compaction\nrequests, responses, checkpoints"]
        Tools["tool dispatch\ndirect model tools + code-mode nested tools"]
        CodeMode["code-mode runtime\nexec cells, yields, waits, termination"]
        Terminal["terminal runtime\nexec_command / write_stdin operations"]
        Agents["multi_agent_v2\nspawn, task delivery, result, close"]
    end

    Context["ThreadTraceContext\nroot/child no-op-capable producer"]
    Writer["TraceWriter\nassigns seq and writes payloads before events"]

    subgraph Bundle["trace bundle"]
        Manifest["manifest.json\ntrace_id, rollout_id, root_thread_id"]
        Events["trace.jsonl\nordered raw event spine"]
        Payloads["payloads/*.json\nlarge raw evidence"]
    end

    Reducer["replay_bundle\ndeterministic offline reducer"]

    subgraph State["state.json"]
        Threads["threads + turns"]
        Conversation["conversation_items\nwhat the model saw"]
        RuntimeObjects["inference_calls, tool_calls,\ncode_cells, terminals, compactions"]
        Edges["interaction_edges\nspawn, task, result, close"]
        RawRefs["raw_payload refs"]
    end

    Protocol --> Context
    Inference --> Context
    Tools --> Context
    CodeMode --> Context
    Terminal --> Context
    Agents --> Context

    Context --> Writer
    Writer --> Manifest
    Writer --> Payloads
    Writer --> Events

    Manifest --> Reducer
    Events --> Reducer
    Payloads --> Reducer

    Reducer --> Threads
    Reducer --> Conversation
    Reducer --> RuntimeObjects
    Reducer --> Edges
    Reducer --> RawRefs

The thread context is deliberately small and no-op capable. A root session starts one from CODEX_ROLLOUT_TRACE_ROOT; fresh spawned child threads derive their own context from the parent's context so the whole rollout tree shares one writer. Disabled contexts accept the same calls and record nothing.

Trace startup and writes are best-effort. Rollout tracing must never make a Codex session fail just because diagnostic recording failed. Core emits raw observations; this crate owns the bundle schema, trace-context APIs, writer, and reducer.

Bundle Layout

A trace bundle contains:

  • manifest.json: trace identity and bundle metadata.
  • trace.jsonl: append-only raw events ordered by writer-assigned seq.
  • payloads/*.json: raw requests, responses, tool inputs/results, runtime events, terminal output, compaction data, and protocol snapshots.
  • state.json: optional reducer output written by codex debug trace-reduce.

trace_id identifies this diagnostic artifact. rollout_id identifies the Codex rollout/session being observed. Keeping those separate lets us reason about the stored trace without confusing it with the product-level session identity.

To reduce a bundle:

codex debug trace-reduce <trace-bundle>

By default this writes <trace-bundle>/state.json. Rust callers can also call codex_rollout_trace::replay_bundle directly.

Raw Evidence vs Reduced Graph

flowchart LR
    Model["model-visible payloads\nrequests and response output items"]
    Runtime["runtime observations\ntool dispatch, terminal output, code-mode JSON"]
    RawPayloads["payloads/*.json\nexact evidence"]
    Reducer["reducer"]
    Conversation["ConversationItem\nwhat the model saw"]
    ToolCall["ToolCall\nruntime tool boundary"]
    CodeCell["CodeCell\nmodel-authored exec cell"]
    TerminalOperation["TerminalOperation\ncommand/write/poll"]
    InteractionEdge["InteractionEdge\ninformation flow"]

    Model --> RawPayloads
    Runtime --> RawPayloads
    RawPayloads --> Reducer

    Reducer --> Conversation
    Reducer --> ToolCall
    Reducer --> CodeCell
    Reducer --> TerminalOperation
    Reducer --> InteractionEdge

    CodeCell --> ToolCall
    ToolCall --> TerminalOperation
    ToolCall --> InteractionEdge
    Conversation --> InteractionEdge

This distinction is the reason the model has both raw payload references and semantic objects. A code-mode nested tool call, for example, has JSON input and output at the JavaScript runtime boundary, but the model-visible transcript only contains the surrounding exec custom tool call and its eventual output.

The reducer keeps those facts separate:

  • ConversationItem records what appeared in model-facing requests/responses.
  • ToolCall, CodeCell, TerminalOperation, InferenceCall, and Compaction record runtime/debug boundaries.
  • InteractionEdge records information flow between objects, such as a spawn_agent tool call delivering a task into a child thread.
  • RawPayloadRef points back to exact evidence when a viewer needs more detail than the reduced graph stores inline.

Multi-Agent v2

Multi-agent v2 child threads share the root trace writer. That means one root bundle reduces into one graph containing the parent thread, child threads, and the edges between them.

flowchart LR
    RootTool["root ToolCall\nspawn_agent / followup_task / send_message"]
    ChildInput["child ConversationItem\ninjected task/message"]
    ChildThread["child AgentThread"]
    ChildResult["child assistant ConversationItem\nresult message"]
    RootNotice["root ConversationItem\nsubagent notification"]
    CloseTool["root ToolCall\nclose_agent"]
    TargetThread["target AgentThread"]

    RootTool -- "spawn/task edge" --> ChildInput
    ChildInput --> ChildThread
    ChildThread --> ChildResult
    ChildResult -- "agent_result edge" --> RootNotice
    CloseTool -- "close_agent edge" --> TargetThread

Top-level independent threads still get independent bundles. Spawned child threads are different: they are part of the same rollout tree, so they belong in the same raw event log, payload directory, and reduced state.json.

Reducer Invariants

The reducer is strict where the raw evidence should be self-consistent:

  • raw events are replayed in seq order;
  • payload files must exist before events refer to them;
  • reduced object IDs are stable within one replay;
  • runtime events may be queued until the model-visible source or delivery target has been observed;
  • model-visible conversation is derived from model-facing payloads, not from runtime convenience output;
  • runtime payloads are evidence, not proof that the model saw the same bytes.

Those invariants let the reduced graph stay small while preserving a path back to the original evidence whenever a debugger needs to explain why an object or edge exists.