## Summary
- add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can
request either a committed snapshot or an interrupted snapshot
- share the model-visible `<turn_aborted>` history marker between the
live interrupt path and interrupted forks
- update the small set of direct fork callsites to pass
`ForkSnapshotMode::Committed`
Note: this enables /btw to work similarly as Esc to interrupt (hopefully
somewhat in distribution)
---------
Co-authored-by: Codex <noreply@openai.com>
## What changed
- adds a targeted snapshot test for rollback with contextual diffs in
`codex_tests.rs`
- snapshots the exact model-visible request input before the rolled-back
turn and on the follow-up request after rollback
- shows the duplicate developer and environment context pair appearing
again before the follow-up user message
## Why
Rollback currently rewinds the reference context baseline without
rewinding the live session overrides. On the next turn, the same
contextual diff is emitted again and duplicated in the request sent to
the model.
## Impact
- makes the regression visible in a canonical snapshot test
- keeps the snapshot on the shared `context_snapshot` path without
adding new formatting helpers
- gives a direct repro for future fixes to rollback/context
reconstruction
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Adds support for approvals_reviewer to `Op::UserTurn` so we can migrate
`[CodexMessageProcessor::turn_start]` to use Op::UserTurn
## Testing
- [x] Adds quick test for the new field
Co-authored-by: Codex <noreply@openai.com>
Use `serde` to encode the inter agent communication to an assistant
message and use the decode to see if this is such a message
Note: this assume serde on small pattern is fast enough
- add `PreToolUse` hook for bash-like tool execution only at first
- block shell execution before dispatch with deny-only hook behavior
- introduces common.rs matcher framework for matching when hooks are run
example run:
```
› run three parallel echo commands, and the second one should echo "[block-pre-tool-use]" as a test
• Running the three echo commands in parallel now and I’ll report the output directly.
• Running PreToolUse hook: name for demo pre tool use hook
• Running PreToolUse hook: name for demo pre tool use hook
• Running PreToolUse hook: name for demo pre tool use hook
PreToolUse hook (completed)
warning: wizard-tower PreToolUse demo inspected Bash: echo "first parallel echo"
PreToolUse hook (blocked)
warning: wizard-tower PreToolUse demo blocked a Bash command on purpose.
feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue.
PreToolUse hook (completed)
warning: wizard-tower PreToolUse demo inspected Bash: echo "third parallel echo"
• Ran echo "first parallel echo"
└ first parallel echo
• Ran echo "third parallel echo"
└ third parallel echo
• Three little waves went out in parallel.
1. printed first parallel echo
2. was blocked before execution because it contained the exact test string [block-pre-tool-use]
3. printed third parallel echo
There was also an unrelated macOS defaults warning around the successful commands, but the echoes
themselves worked fine. If you want, I can rerun the second one with a slightly modified string so
it passes cleanly.
```
## Summary
- route /realtime, Ctrl+C, and deleted realtime meters through the same
realtime stop path
- keep generic transcription placeholder cleanup free of realtime
shutdown side effects
## Testing
- Ran
- Relied on CI for verification; did not run local tests
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- capture the last guardian `EventMsg::Error` while waiting for review
completion
- reuse that error as the denial rationale when the review turn
completes without an assessment payload
- add a regression test for the `/responses` HTTP 400 path
## Testing
- `just fmt`
- `cargo test -p codex-core
guardian_review_surfaces_responses_api_errors_in_rejection_reason`
- `just argument-comment-lint -p codex-core`
## Notes
- `cargo test -p codex-core` still fails on the pre-existing unrelated
test
`tools::js_repl::tests::js_repl_imported_local_files_can_access_repl_globals`
in this environment (`mktemp ... Operation not permitted` while
downloading `dotslash`)
Co-authored-by: Codex <noreply@openai.com>
## Summary
Fix a managed ChatGPT auth bug where a stale Codex process could
proactively refresh using an old in-memory refresh token even after
another process had already rotated auth on disk.
This changes the proactive `AuthManager::auth()` path to reuse the
existing guarded `refresh_token()` flow instead of calling the refresh
endpoint directly from cached auth state.
## Original Issue
Users reported repeated `codexd` log lines like:
```text
ERROR codex_core::auth: Failed to refresh token: error sending request for url (https://auth.openai.com/oauth/token)
```
In practice this showed up most often when multiple `codexd` processes
were left running. Killing the extra processes stopped the noise, which
suggested the issue was caused by stale auth state across processes
rather than invalid user credentials.
## Diagnosis
The bug was in the proactive refresh path used by `AuthManager::auth()`:
- Process A could refresh successfully, rotate refresh token `R0` to
`R1`, and persist the updated auth state plus `last_refresh` to disk.
- Process B could keep an older auth snapshot cached in memory, still
holding `R0` and the old `last_refresh`.
- Later, when Process B called `auth()`, it checked staleness from its
cached in-memory auth instead of first reloading from disk.
- Because that cached `last_refresh` was stale, Process B would
proactively call `/oauth/token` with stale refresh token `R0`.
- On failure, `auth()` logged the refresh error but kept returning the
same stale cached auth, so repeated `auth()` calls could keep retrying
with dead state.
This differed from the existing unauthorized-recovery flow, which
already did the safer thing: guarded reload from disk first, then
refresh only if the on-disk auth was unchanged.
## What Changed
- Switched proactive refresh in `AuthManager::auth()` to:
- do a pure staleness check on cached auth
- call `refresh_token()` when stale
- return the original cached auth on genuine refresh failure, preserving
existing outward behavior
- Removed the direct proactive refresh-from-cached-state path
- Added regression tests covering:
- stale cached auth with newer same-account auth already on disk
- the same scenario even when the refresh endpoint would fail if called
## Why This Fix
`refresh_token()` already contains the right cross-process safety
behavior:
- guarded reload from disk
- same-account verification
- skip-refresh when another process already changed auth
Reusing that path makes proactive refresh consistent with unauthorized
recovery and prevents stale processes from trying to refresh
already-rotated tokens.
## Testing
Test shape:
- create a fresh temp `CODEX_HOME` from `~/.codex/auth.json`
- force `last_refresh` to an old timestamp so proactive refresh is
required
- start two long-lived helper processes against the same auth file
- start `B` first so it caches stale auth and sleeps
- start `A` second so it refreshes first
- point both at a local mock `/oauth/token` server
- inspect whether `B` makes a second refresh request with the stale
in-memory token, or reloads the rotated token from disk
### Before the fix
The repro showed the bug clearly: the mock server saw two refreshes with
the same stale token, `A` rotated to a new token, and `B` still returned
the stale token instead of reloading from disk.
```text
POST /oauth/token refresh_token=rt_j6s0...
POST /oauth/token refresh_token=rt_j6s0...
B:cached_before=rt_j6s0...
B:cached_after=rt_j6s0...
B:returned=rt_j6s0...
A:cached_before=rt_j6s0...
A:cached_after=rotated-refresh-token-logged-run-v2
A:returned=rotated-refresh-token-logged-run-v2
```
### After the fix
After the fix, the mock server saw only one refresh request. `A`
refreshed once, and `B` started with the stale token but reloaded and
returned the rotated token.
```text
POST /oauth/token refresh_token=rt_j6s0...
B:cached_before=rt_j6s0...
B:cached_after=rotated-refresh-token-fix-branch
B:returned=rotated-refresh-token-fix-branch
A:cached_before=rt_j6s0...
A:cached_after=rotated-refresh-token-fix-branch
A:returned=rotated-refresh-token-fix-branch
```
This shows the new behavior: `A` refreshes once, then `B` reuses the
updated auth from disk instead of making a second refresh request with
the stale token.
Send input now sends messages as assistant message and with this format:
```
author: /root/worker_a
recipient: /root/worker_a/tester
other_recipients: []
Content: bla bla bla. Actual content. Only text for now
```
## Summary
- queue input after the user submits `/compact` until that manual
compact turn ends
- mirror the same behavior in the app-server TUI
- add regressions for input queued before compact starts and while it is
running
Co-authored-by: Codex <noreply@openai.com>
- Duplicate app mentions are now suppressed when they’re plugin-backed
with the same display name.
- Remaining connector mentions now label category as [Plugin] when
plugin metadata is present, otherwise [App].
- Mention result lists are now capped to 8 rows after filtering.
- Updates both tui and tui_app_server with the same changes.
## Why
Fixes [#15283](https://github.com/openai/codex/issues/15283), where
sandboxed tool calls fail on older distro `bubblewrap` builds because
`/usr/bin/bwrap` does not understand `--argv0`. The upstream [bubblewrap
v0.9.0 release
notes](https://github.com/containers/bubblewrap/releases/tag/v0.9.0)
explicitly call out `Add --argv0`. Flipping `use_legacy_landlock`
globally works around that compatibility bug, but it also weakens the
default Linux sandbox and breaks proxy-routed and split-policy cases
called out in review.
The follow-up Linux CI failure was in the new launcher test rather than
the launcher logic: the fake `bwrap` helper stayed open for writing, so
Linux would not exec it. This update also closes the user-visibility gap
from review by surfacing the same startup warning when `/usr/bin/bwrap`
is present but too old for `--argv0`, not only when it is missing.
## What Changed
- keep `use_legacy_landlock` default-disabled
- teach `codex-rs/linux-sandbox/src/launcher.rs` to fall back to the
vendored bubblewrap build when `/usr/bin/bwrap` does not advertise
`--argv0` support
- add launcher tests for supported, unsupported, and missing system
`bwrap`
- write the fake `bwrap` test helper to a closed temp path so the
supported-path launcher test works on Linux too
- extend the startup warning path so Codex warns when `/usr/bin/bwrap`
is missing or too old to support `--argv0`
- mirror the warning/fallback wording across
`codex-rs/linux-sandbox/README.md` and `codex-rs/core/README.md`,
including that the fallback is the vendored bubblewrap compiled into the
binary
- cite the upstream `bubblewrap` release that introduced `--argv0`
## Verification
- `bazel test --config=remote --platforms=//:rbe
//codex-rs/linux-sandbox:linux-sandbox-unit-tests
--test_filter=launcher::tests::prefers_system_bwrap_when_help_lists_argv0
--test_output=errors`
- `cargo test -p codex-core system_bwrap_warning`
- `cargo check -p codex-exec -p codex-tui -p codex-tui-app-server -p
codex-app-server`
- `just argument-comment-lint`
## Summary
- use Shift+Left to edit the most recent queued message when running
under tmux
- mirror the same binding change in the app-server TUI
- add tmux-specific tests and snapshot coverage for the rendered
queued-message hint
## Testing
- just fmt
- cargo test -p codex-tui
- cargo test -p codex-tui-app-server
- just argument-comment-lint -p codex-tui -p codex-tui-app-server
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add a snapshot-style core test for fork startup context injection
followed by first-turn diff injection
- capture the current duplicated startup-plus-turn context behavior
without changing runtime logic
## Testing
- not run locally; relying on CI
- just fmt
---------
Co-authored-by: Codex <noreply@openai.com>
Remove the legacy `smart_approvals` config migration from core config
loading.
This change:
- stops rewriting `smart_approvals` into `guardian_approval`
- stops backfilling `approvals_reviewer = "guardian_subagent"`
- replaces the migration tests with regression coverage that asserts the
deprecated key is ignored in root and profile scopes
Verification:
- `just fmt`
- `cargo test -p codex-core smart_approvals_alias_is_ignored`
- `cargo test -p codex-core approvals_reviewer_`
- `just argument-comment-lint`
Notes:
- `cargo test -p codex-core` still hits an unrelated existing failure in
`tools::js_repl::tests::js_repl_imported_local_files_can_access_repl_globals`;
the JS REPL kernel exits after `mktemp` fails under the current
environment.
Enhancement request: requested cleanup to delete the `smart_approvals`
alias migration; no public issue link is available.
Co-authored-by: Codex <noreply@openai.com>
## Summary
- remove `tui_app_server` handling for legacy app-server notifications
- drop the local ChatGPT auth refresh request path from `tui_app_server`
- remove the now-unused refresh response helper from local auth loading
Split out of #15106 so the `tui_app_server` cleanup can land separately
from the larger `codex-exec` app-server migration.
As part of moving the TUI onto the app server, we added some temporary
handling of some legacy events. We've confirmed that these do not need
to be supported, so this PR removes this support from the
tui_app_server, allowing for additional simplifications in follow-on
PRs. These events are needed only for very old rollouts. None of the
other app server clients (IDE extension or app) support these either.
## Summary
- stop translating legacy `codex/event/*` notifications inside
`tui_app_server`
- remove the TUI-side legacy warning and rollback buffering/replay paths
that were only fed by those notifications
- keep the lower-level app-server and app-server-client legacy event
plumbing intact so PR #15106 can rebase on top and handle the remaining
exec/lower-layer migration separately
Moves Code Mode to a new crate with no dependencies on codex. This
create encodes the code mode semantics that we want for lifetime,
mounting, tool calling.
The model-facing surface is mostly unchanged. `exec` still runs raw
JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools
are still available through `tools.*`, and helpers like `text`, `image`,
`store`, `load`, `notify`, `yield_control`, and `exit` still exist.
The major change is underneath that surface:
- Old code mode was an external Node runtime.
- New code mode is an in-process V8 runtime embedded directly in Rust.
- Old code mode managed cells inside a long-lived Node runner process.
- New code mode manages cells in Rust, with one V8 runtime thread per
active `exec`.
- Old code mode used JSON protocol messages over child stdin/stdout plus
Node worker-thread messages.
- New code mode uses Rust channels and direct V8 callbacks/events.
This PR also fixes the two migration regressions that fell out of that
substrate change:
- `wait { terminate: true }` now waits for the V8 runtime to actually
stop before reporting termination.
- synchronous top-level `exit()` now succeeds again instead of surfacing
as a script error.
---
- `core/src/tools/code_mode/*` is now mostly an adapter layer for the
public `exec` / `wait` tools.
- `code-mode/src/service.rs` owns cell sessions and async control flow
in Rust.
- `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and
JavaScript execution.
- each `exec` spawns a dedicated runtime thread plus a Rust
session-control task.
- helper globals are installed directly into the V8 context instead of
being injected through a source prelude.
- helper modules like `tools.js` and `@openai/code_mode` are synthesized
through V8 module resolution callbacks in Rust.
---
Also added a benchmark for showing the speed of init and use of a code
mode env:
```
$ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128
Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s
Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae)
exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128]
scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max
cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB
warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB
cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB
warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB
cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB
warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB
memory uses a fresh-process max RSS delta for each scenario
```
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
If we are in a mode that is already explicitly un-sandboxed, then
`ApprovalPolicy::Never` should not block dangerous commands.
## Testing
- [x] Existing unit test covers old behavior
- [x] Added a unit test for this new case
## Summary
This PR fixes restricted filesystem permission profiles so Codex's
runtime-managed helper executables remain readable without requiring
explicit user configuration.
- add implicit readable roots for the configured `zsh` helper path and
the main execve wrapper
- allowlist the shared `$CODEX_HOME/tmp/arg0` root when the execve
wrapper lives there, so session-specific helper paths keep working
- dedupe injected paths and avoid adding duplicate read entries to the
sandbox policy
- add regression coverage for restricted read mode with helper
executable overrides
## Testing
before this change: got this error when executing a shell command via
zsh fork:
```
"sandbox error: sandbox denied exec error, exit code: 127, stdout: , stderr: /etc/zprofile:11: operation not permitted: /usr/libexec/path_helper\nzsh:1: operation not permitted: .codex/skills/proxy-a/scripts/fetch_example.sh\n"
```
saw this change went away after this change, meaning the readable roots
and injected correctly.
- emit a typed `thread/realtime/transcriptUpdated` notification from
live realtime transcript deltas
- expose that notification as flat `threadId`, `role`, and `text` fields
instead of a nested transcript array
- continue forwarding raw `handoff_request` items on
`thread/realtime/itemAdded`, including the accumulated
`active_transcript`
- update app-server docs, tests, and generated protocol schema artifacts
to match the delta-based payloads
---------
Co-authored-by: Codex <noreply@openai.com>
This adds a dummy v8-poc project that in Cargo links against our
prebuilt binaries and the ones provided by rusty_v8 for non musl
platforms. This demonstrates that we can successfully link and use v8 on
all platforms that we want to target.
In bazel things are slightly more complicated. Since the libraries as
published have libc++ linked in already we end up with a lot of double
linked symbols if we try to use them in bazel land. Instead we fall back
to building rusty_v8 and v8 from source (cached of course) on the
platforms we ship to.
There is likely some compatibility drift in the windows bazel builder
that we'll need to reconcile before we can re-enable them. I'm happy to
be on the hook to unwind that.
This PR add an URI-based system to reference agents within a tree. This
comes from a sync between research and engineering.
The main agent (the one manually spawned by a user) is always called
`/root`. Any sub-agent spawned by it will be `/root/agent_1` for example
where `agent_1` is chosen by the model.
Any agent can contact any agents using the path.
Paths can be used either in absolute or relative to the calling agents
Resume is not supported for now on this new path
## Summary
- make app-server treat `clientInfo.name == "codex-tui"` as a legacy
compatibility case
- fall back to `DEFAULT_ORIGINATOR` instead of sending `codex-tui` as
the originator header
- add a TODO noting this is a temporary workaround that should be
removed later
## Testing
- Not run (not requested)
`CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor
"remotely" (inside a docker container) turning any integration test into
remote test.
## Summary
- add a short guardian follow-up developer reminder before reused
reviews
- cache prior-review state on the guardian session instead of rescanning
full history on each request
- update guardian follow-up coverage and snapshot expectations
---------
Co-authored-by: Codex <noreply@openai.com>