codex

rad/codex

mirror of https://github.com/openai/codex.git synced 2026-03-05 21:45:28 +03:00

Author	SHA1	Message	Date
sayan-oai	033ef9cb9d	feat: add debug clear-memories command to hard-wipe memories state (#13085 ) #### what adds a `codex debug clear-memories` command to help with clearing all memories state from disk, sqlite db, and marking threads as `memory_mode=disabled` so they don't get resummarized when the `memories` feature is re-enabled. #### tests add tests	2026-02-27 17:45:55 -08:00
Ruslan Nigmatullin	8c1e3f3e64	app-server: Add `ephemeral` field to `Thread` object (#13084 ) Currently there is no alternative way to know that thread is ephemeral, only client which did create it has the knowledge.	2026-02-27 17:42:25 -08:00
Michael Bolin	1a8d930267	core: adopt host_executable() rules in zsh-fork (#13046 ) ## Why [#12964](https://github.com/openai/codex/pull/12964) added `host_executable()` support to `codex-execpolicy`, but the zsh-fork interception path in `unix_escalation.rs` was still evaluating commands with the default exact-token matcher. That meant an intercepted absolute executable such as `/usr/bin/git status` could still miss basename rules like `prefix_rule(pattern = ["git", "status"])`, even when the policy also defined a matching `host_executable(name = "git", ...)` entry. This PR adopts the new matching behavior in the zsh-fork runtime only. That keeps the rollout intentionally narrow: zsh-fork already requires explicit user opt-in, so it is a safer first caller to exercise the new `host_executable()` scheme before expanding it to other execpolicy call sites. It also brings zsh-fork back in line with the current `prefix_rule()` execution model. Until prefix rules can carry their own permission profiles, a matched `prefix_rule()` is expected to rerun the intercepted command unsandboxed on `allow`, or after the user accepts `prompt`, instead of merely continuing inside the inherited shell sandbox. ## What Changed - added `evaluate_intercepted_exec_policy()` in `core/src/tools/runtimes/shell/unix_escalation.rs` to centralize execpolicy evaluation for intercepted commands - switched intercepted direct execs in the zsh-fork path to `check_multiple_with_options(...)` with `MatchOptions { resolve_host_executables: true }` - added `commands_for_intercepted_exec_policy()` so zsh-fork policy evaluation works from intercepted `(program, argv)` data instead of reconstructing a synthetic command before matching - left shell-wrapper parsing intentionally disabled by default behind `ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so path-sensitive matching relies on later direct exec interception rather than shell-script parsing - made matched `prefix_rule()` decisions rerun intercepted commands with `EscalationExecution::Unsandboxed`, while unmatched-command fallback keeps the existing sandbox-preserving behavior - extracted the zsh-fork test harness into `core/tests/common/zsh_fork.rs` so both the skill-focused and approval-focused integration suites can exercise the same runtime setup - limited this change to the intercepted zsh-fork path rather than changing every execpolicy caller at once - added runtime coverage in `core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and disallowed `host_executable()` mappings and the wrapper-parsing modes - added integration coverage in `core/tests/suite/approvals.rs` to verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns under zsh-fork outside a restrictive `WorkspaceWrite` sandbox --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046). * #13065 * __->__ #13046	2026-02-28 01:41:23 +00:00
Owen Lin	8fa792868c	fix(app-server): make thread/start non-blocking (#13033 ) Stop `thread/start` from blocking other app-server requests. Before this change, `thread/start ran` inline on the request loop, so slow startup paths like MCP auth checks could hold up unrelated requests on the same connection, including `thread/loaded/list`. This moves `thread/start` into a background task. While doing so, it revealed an issue where we were doing nested locking (and there were some race conditions possible that could introduce a "phantom listener"). This PR also refactors the listener/subscription bookkeeping - listener/subscription state is now centralized in `ThreadStateManager` instead of being split across multiple lock domains. That makes late auto-attach on `thread/start` race-safe and avoids reintroducing disconnected clients as phantom subscribers.	2026-02-28 01:40:08 +00:00
Eric Traut	6604608bad	Suppress duplicate assistant output on stdout in interactive sessions (#13082 ) Addresses #12566 Summary - stop printing the final assistant message on stdout when the process is running in a terminal so interactive users only see it once - add a helper that gates the stdout emission and cover it with unit tests	2026-02-27 18:31:17 -07:00
Ruslan Nigmatullin	70ed6cbc71	app-server: Add an ability to watch events in the test client (#13080 ) Add a `watch` subcommand to `codex-app-server-test-client` binary to help in manual testing of events flow.	2026-02-27 17:19:53 -08:00
Ahmed Ibrahim	ec6f6aacbf	Add model availability NUX tooltips (#13021 ) - override startup tooltips with model availability NUX and persist per-model show counts in config - stop showing each model after four exposures and fall back to normal tooltips	2026-02-27 17:14:06 -08:00
Eric Traut	ff5cbfd7d4	Handle missing plan info for ChatGPT accounts (#13072 ) Addresses https://github.com/openai/codex/issues/13007 and https://github.com/openai/codex/issues/12170 There are situations where the ChatGPT auth backend might return a JWT that contains no plan information. Most code paths already handle this case well, but the internal implementation of the "account/read" app server call was failing in this case (returning an error rather than properly returning None for the plan). This resulted in a situation where users needed to log in every time the extension or app started even if they successfully logged in the last time. Summary - allow ChatGPT-authenticated accounts to fall back to `AccountPlanType::Unknown` when the token omits the plan claim - add regression coverage in `app-server/tests/suite/v2/account.rs` to confirm `account/read` returns `plan_type: Unknown` when the claim is absent - ensure the Rust auth helpers and fixtures treat missing plan claims as Optional and default to `Unknown`	2026-02-27 17:51:21 -07:00
Eric Traut	61c42396ab	Keep large-paste placeholders intact during file completion (#13070 ) Addresses https://github.com/openai/codex/issues/13040 Fixes a regression in 0.106.0 introduced in https://github.com/openai/codex/pull/9393 Summary - replace only the active completion range so unrelated text elements (e.g., large-paste placeholders) stay atomic and can still expand - add a regression test verifying large paste placeholders persist through completions and submit - could not fetch issue details via GitHub API because network access is disabled in this sandboxed environment	2026-02-27 17:19:11 -07:00
Felipe Coury	c3c75878e8	fix(tui): theme-aware diff backgrounds with fallback behavior (#13037 ) ## Problem The TUI diff renderer uses hardcoded background palettes for insert/delete lines that don't respect the user's chosen syntax theme. When a theme defines `markup.inserted` / `markup.deleted` scope backgrounds (the convention used by GitHub, Solarized, Monokai, and most VS Code themes), those colors are ignored — the diff always renders with the same green/red tints regardless of theme selection. Separately, ANSI-16 terminals (and Windows Terminal sessions misreported as ANSI-16) rendered diff backgrounds as full-saturation blocks that obliterated syntax token colors, making highlighted diffs unreadable. ## Mental model Diff backgrounds are resolved in three layers: 1. Color level detection — `diff_color_level_for_terminal()` maps the raw `supports-color` probe + Windows Terminal heuristics to a `DiffColorLevel` (TrueColor / Ansi256 / Ansi16). Windows Terminal gets promoted from Ansi16 to TrueColor when `WT_SESSION` is present. 2. Background resolution — `resolve_diff_backgrounds()` queries the active syntax theme for `markup.inserted`/`markup.deleted` (falling back to `diff.inserted`/`diff.deleted`), then overlays those on top of the hardcoded palette. For ANSI-256, theme RGB values are quantized to the nearest xterm-256 index. For ANSI-16, backgrounds are `None` (foreground-only). 3. Style composition — The resolved `ResolvedDiffBackgrounds` is threaded through every call to `style_add`, `style_del`, `style_sign_`, and `style_line_bg_for`, which decide how to compose foreground+background for each line kind and theme variant. A new `RichDiffColorLevel` type (a subset of `DiffColorLevel` without Ansi16) encodes the invariant "we have enough depth for tinted backgrounds" at the type level, so background-producing functions have exhaustive matches without unreachable arms. ## Non-goals - No change to gutter (line number column) styling — gutter backgrounds still use the hardcoded palette. - No per-token scope background resolution — this is line-level background only; syntax token colors come from the existing `highlight_code_to_styled_spans` path. - No dark/light theme auto-switching from scope backgrounds — `DiffTheme` is still determined by querying the terminal's background color. ## Tradeoffs - Theme trust vs. visual safety:* When a theme defines scope backgrounds, we trust them unconditionally for rich color levels. A badly authored theme could produce illegible combinations. The fallback for `None` backgrounds (foreground-only) is intentionally conservative. - Quantization quality: ANSI-256 quantization uses perceptual distance across indices 16–255, skipping system colors. The result is approximate — a subtle theme tint may land on a noticeably different xterm index. - Single-query caching: `resolve_diff_backgrounds` is called once per `render_change` invocation (i.e., once per file in a diff). If the theme changes mid-render (live preview), the next file picks up the new backgrounds. ## Architecture Files changed: \| File \| Role \| \|---\|---\| \| `tui/src/render/highlight.rs` \| New: `DiffScopeBackgroundRgbs`, `diff_scope_background_rgbs()`, scope extraction helpers \| \| `tui/src/diff_render.rs` \| New: `RichDiffColorLevel`, `ResolvedDiffBackgrounds`, `resolve_diff_backgrounds`, `quantize_rgb_to_ansi256`, Windows Terminal promotion; modified: all style helpers to accept/thread `ResolvedDiffBackgrounds` \| The scope-extraction code lives in `highlight.rs` because it uses `syntect::highlighting::Highlighter` and the theme singleton. The resolution and quantization logic lives in `diff_render.rs` because it depends on diff-specific types (`DiffTheme`, `DiffColorLevel`, ratatui `Color`). ## Observability No runtime logging was added. The most useful debugging aid is the `diff_color_level_for_terminal` function, which is pure and fully unit-tested — to diagnose a color-depth mismatch, log its four inputs (`StdoutColorLevel`, `TerminalName`, `WT_SESSION` presence, `FORCE_COLOR` presence). Scope resolution can be tested by loading a custom `.tmTheme` with known `markup.inserted` / `markup.deleted` backgrounds and checking the diff output in a truecolor terminal. ## Tests - Windows Terminal promotion:* 7 unit tests cover every branch of `diff_color_level_for_terminal` (ANSI-16 promotion, `WT_SESSION` unconditional promotion, `FORCE_COLOR` suppression, conservative `Unknown` level). - ANSI-16 foreground-only: Tests verify that `style_add`, `style_del`, `style_sign_`, `style_line_bg_for`, and `style_gutter_for` all return `None` backgrounds on ANSI-16. - Scope resolution:* Tests verify `markup.` preference over `diff.`, `None` when no scope matches, bundled theme resolution, and custom `.tmTheme` round-trip. - Quantization: Test verifies ANSI-256 quantization of a known RGB triple. - Insta snapshots: 2 new snapshot tests (`ansi16_insert_delete_no_background`, `theme_scope_background_resolution`) lock visual output.	2026-02-27 16:44:56 -07:00
viyatb-oai	a39d76dc45	feat(linux-sandbox): support restricted ReadOnlyAccess in bwrap (#12369 ) ## Summary Implements Linux bubblewrap support for restricted `ReadOnlyAccess` (introduced in #11387) by honoring `readable_roots` and `include_platform_defaults` instead of failing closed. ## What changed - Added a Linux platform-default read allowlist for common system/runtime paths (e.g. /usr, /etc, /lib*, Nix store roots). - Updated the bwrap filesystem mount builder to support restricted read access: - Full-read policies still use `--ro-bind / /` - Restricted-read policies now start from` --tmpfs `/ and add scoped `--ro-bind` mounts - Preserved existing writable-root and protected-subpath behavior (`.git`, `.codex`, etc.). `ReadOnlyAccess::Restricted` was already modeled in protocol, but Linux bwrap still returned `UnsupportedOperation` for restricted read access. This closes that gap for the active Linux filesystem backend. ## Notes Legacy Linux Landlock fallback still fail-closes for restricted read access (unchanged).	2026-02-27 15:25:50 -08:00
Matthew Zeng	392fa7de50	[apps] Stablize app list updated event. (#13067 ) Stablize app list updated event so that we only send 2 updates: 1 when installed apps become available, one when all directory apps are available. Previously it also updates when directory apps become available before installed apps, which cuts off installed apps.	2026-02-27 15:23:24 -08:00
Charley Cunningham	695957a348	Unify rollout reconstruction with resume/fork TurnContext hydration (#12612 ) ## Summary This PR unifies rollout history reconstruction and resume/fork metadata hydration under a single `Session::reconstruct_history_from_rollout` implementation. The key change from main is that replay metadata now comes from the same reconstruction pass that rebuilds model-visible history, instead of doing a second bespoke rollout scan to recover `previous_model` / `reference_context_item`. ## What Changed ### Unified reconstruction output `reconstruct_history_from_rollout` now returns a single `RolloutReconstruction` bundle containing: - rebuilt `history` - `previous_model` - `reference_context_item` Resume and fork both consume that shared output directly. ### Reverse replay core The reconstruction logic moved into `codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans rollout items newest-to-oldest. That reverse pass: - derives `previous_model` - derives whether `reference_context_item` is preserved or cleared - stops early once it has both resume metadata and a surviving `replacement_history` checkpoint History materialization is still bridged eagerly for now by replaying only the surviving suffix forward, which keeps the history result stable while moving the control flow toward the future lazy reverse loader design. ### Removed bespoke context lookup This deletes `last_rollout_regular_turn_context_lookup` and its separate compaction-aware scan. The previous model / baseline metadata is now computed from the same replay state that rebuilds history, so resume/fork cannot drift from the reconstructed transcript view. ### `TurnContextItem` persistence contract `TurnContextItem` is now treated as the replay source of truth for durable model-visible baselines. This PR keeps the following contract explicit: - persist `TurnContextItem` for the first real user turn so resume can recover `previous_model` - persist it for later turns that emit model-visible context updates - if mid-turn compaction reinjects full initial context into replacement history, persist a fresh `TurnContextItem` after `Compacted` so resume/fork can re-establish the baseline from the rewritten history - do not treat manual compaction or pre-sampling compaction as creating a new durable baseline on their own ## Behavior Preserved - rollback replay stays aligned with `drop_last_n_user_turns` - rollback skips only user turns - incomplete active user turns are dropped before older finalized turns when rollback applies - unmatched aborts do not consume the current active turn - missing abort IDs still conservatively clear stale compaction state - compaction clears `reference_context_item` until a later `TurnContextItem` re-establishes it - `previous_model` still comes from the newest surviving user turn that established one ## Tests Targeted validation run for the current branch shape: - `cd codex-rs && cargo test -p codex-core --lib codex::rollout_reconstruction_tests -- --nocapture` - `cd codex-rs && just fmt` The branch also extracts the rollout reconstruction tests into `codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic has a dedicated home instead of living inline in `codex.rs`.	2026-02-27 13:50:45 -08:00
daniel-oai	6046ca19ba	Clarify escalation guidance for sandbox-related network failures (#13051 ) This updates the on-request permissions instructions so likely sandbox-related network failures during dependency installation are treated as escalation candidates. Repro: - Run `codex -a on-request -s workspace-write` in a fresh temp dir. - Prompt: `Build a new rust app with one dependency, anyhow, and try installing the dependency`. - Before this change, DNS/registry failures like `Could not resolve host: index.crates.io` could be treated like ordinary transient failures and not escalate. Fix: - Clarify that likely sandbox-related network errors such as DNS/host resolution, registry/index access, and dependency download failures should trigger escalation. Validation: - Rebuild the CLI and rerun the same repro. The same instructions should now be more likely to trigger escalation instead of silently stopping. Related Slack canvas: - https://openai.enterprise.slack.com/docs/T0BQTNSUF/F0ACVNJAV09	2026-02-27 13:48:52 -08:00
Michael Bolin	b148d98e0e	execpolicy: add host_executable() path mappings (#12964 ) ## Why `execpolicy` currently keys `prefix_rule()` matching off the literal first token. That works for rules like `["/usr/bin/git"]`, but it means shared basename rules such as `["git"]` do not help when a caller passes an absolute executable path like `/usr/bin/git`. This PR lays the groundwork for basename-aware matching without changing existing callers yet. It adds typed host-executable metadata and an opt-in resolution path in `codex-execpolicy`, so a follow-up PR can adopt the new behavior in `unix_escalation.rs` and other call sites without having to redesign the policy layer first. ## What Changed - added `host_executable(name = ..., paths = [...])` to the execpolicy parser and validated it with `AbsolutePathBuf` - stored host executable mappings separately from prefix rules inside `Policy` - added `MatchOptions` and opt-in `*_with_options()` APIs that preserve existing behavior by default - implemented exact-first matching with optional basename fallback, gated by `host_executable()` allowlists when present - normalized executable names for cross-platform matching so Windows paths like `git.exe` can satisfy `host_executable(name = "git", ...)` - updated `match` / `not_match` example validation to exercise the host-executable resolution path instead of only raw prefix-rule matching - preserved source locations for deferred example-validation errors so policy load failures still point at the right file and line - surfaced `resolvedProgram` on `RuleMatch` so callers can tell when a basename rule matched an absolute executable path - preserved host executable metadata when requirements policies overlay file-based policies in `core/src/exec_policy.rs` - documented the new rule shape and CLI behavior in `execpolicy/README.md` ## Verification - `cargo test -p codex-execpolicy` - added coverage in `execpolicy/tests/basic.rs` for parsing, precedence, empty allowlists, basename fallback, exact-match precedence, and host-executable-backed `match` / `not_match` examples - added a regression test in `core/src/exec_policy.rs` to verify requirements overlays preserve `host_executable()` metadata - verified `cargo test -p codex-core --lib`, including source-rendering coverage for deferred validation errors	2026-02-27 12:59:24 -08:00
Michael Bolin	6e0f1e9469	fix: disable Bazel builds in CI on ubuntu-24.04-arm until we can stabilize them (#13055 ) The other three Bazel builds have experienced low flakiness in my experience whereas I find myself re-running the `ubuntu-24.04-arm` jobs often to shake out the flakes. Disabling for now.	2026-02-27 12:49:13 -08:00
Ruslan Nigmatullin	69d7a456bb	app-server: Replay pending item requests on `thread/resume` (#12560 ) Replay pending client requests after `thread/resume` and emit resolved notifications when those requests clear so approval/input UI state stays in sync after reconnects and across subscribed clients. Affected RPCs: - `item/commandExecution/requestApproval` - `item/fileChange/requestApproval` - `item/tool/requestUserInput` Motivation: - Resumed clients need to see pending approval/input requests that were already outstanding before the reconnect. - Clients also need an explicit signal when a pending request resolves or is cleared so stale UI can be removed on turn start, completion, or interruption. Implementation notes: - Use pending client requests from `OutgoingMessageSender` in order to replay them after `thread/resume` attaches the connection, using original request ids. - Emit `serverRequest/resolved` when pending requests are answered or cleared by lifecycle cleanup. - Update the app-server protocol schema, generated TypeScript bindings, and README docs for the replay/resolution flow. High-level test plan: - Added automated coverage for replaying pending command execution and file change approval requests on `thread/resume`. - Added automated coverage for resolved notifications in command approval, file change approval, request_user_input, turn start, and turn interrupt flows. - Verified schema/docs updates in the relevant protocol and app-server tests. Manual testing: - Tested reconnect/resume with multiple connections. - Confirmed state stayed in sync between connections.	2026-02-27 12:45:59 -08:00
Michael Bolin	66b0adb34c	app-server: deflake running thread resume tests (#13047 ) ## Why CI has been intermittently failing in `suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch` because these running-thread resume tests treated `turn/started` as proof that the thread was already active. That signal is too early for this path. `turn/started` is emitted optimistically from [`turn_start`](`1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L5757-L5767)`). In `single_client_mode`, the listener skips `current_turn_history` tracking in [`codex_message_processor.rs`](`1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L6461-L6465)`), so running-thread resume still depends on `ThreadWatchManager` observing the core `TurnStarted` event in [`bespoke_event_handling.rs`](`1103d0037e/codex-rs/app-server/src/bespoke_event_handling.rs (L152-L156)`). If `thread/resume` lands in that window, the thread can still look `Idle` and the assertion flakes. ## What - Add a helper in `codex-rs/app-server/tests/suite/v2/thread_resume.rs` that waits for `thread/status/changed` to report `Active` for the target thread. - Use that public v2 notification as the synchronization barrier in the four running-thread resume tests instead of relying on `turn/started`. ## Follow-up This PR keeps the fix at the test layer so we can remove the flake without changing server behavior. A broader runtime fix should still be considered separately, for example: - make `turn/start` eagerly transition the thread to `Active` so `turn/started` and `thread/status/changed` are coherent - or revisit the `single_client_mode` guard that skips current-turn tracking for running-thread resume ## Testing - `cargo test -p codex-app-server thread_resume -- --nocapture` - `for i in $(seq 1 10); do cargo test -p codex-app-server 'suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch' -- --exact --nocapture; done`	2026-02-27 19:47:30 +00:00
Jeremy Rose	bc0a5843df	Align TUI voice transcription audio with 4o ASR (#13030 ) ## Summary - switch TUI push-to-talk transcription requests to `gpt-4o-mini-transcribe` - prefer 24 kHz mono `i16` microphone configs and normalize voice input to 24 kHz mono before upload/send - add unit coverage for the new downmix/resample path ## Testing - `just fmt` - `cargo test -p codex-tui`	2026-02-27 18:22:48 +00:00
Felipe Coury	3b5996f988	fix(tui): promote windows terminal diff ansi16 to truecolor (#13016 ) ## Summary - Promote ANSI-16 to truecolor for diff rendering when running inside Windows Terminal - Respect explicit `FORCE_COLOR` override, skipping promotion when set - Extract a pure `diff_color_level_for_terminal` function for testability - Strip background tints from ANSI-16 diff output, rendering add/delete lines with foreground color only - Introduce `RichDiffColorLevel` to type-safely restrict background fills to truecolor and ansi256 ## Problem Windows Terminal fully supports 24-bit (truecolor) rendering but often does not provide the usual TERM metadata (`TERM`, `TERM_PROGRAM`, `COLORTERM`) in `cmd.exe`/PowerShell sessions. In those environments, `supports-color` can report only ANSI-16 support. The diff renderer therefore falls back to a 16-color palette, producing washed-out, hard-to-read diffs. The screenshots below demonstrate that both PowerShell and cmd.exe don't set any `TERM` environment variables. \| PowerShell \| cmd.exe \| \|---\|---\| \| <img width="2032" height="1162" alt="SCR-20260226-nfvy" src="https://github.com/user-attachments/assets/59e968cc-4add-4c7b-a415-07163297e86a" /> \| <img width="2032" height="1162" alt="SCR-20260226-nfyc" src="https://github.com/user-attachments/assets/d06b3e39-bf91-4ce3-9705-82bf9563a01b" /> \| ## Mental model `StdoutColorLevel` (from `supports-color`) is the _detected_ capability. `DiffColorLevel` is the _intended_ capability for diff rendering. A new intermediary — `diff_color_level_for_terminal` — maps one to the other and is the single place where terminal-specific overrides live. Windows Terminal is detected two independent ways: the `TerminalName` parsed by `terminal_info()` and the raw presence of `WT_SESSION`. When `WT_SESSION` is present and `FORCE_COLOR` is not set, we promote unconditionally to truecolor. When `WT_SESSION` is absent but `TerminalName::WindowsTerminal` is detected, we promote only the ANSI-16 level (not `Unknown`). A single override helper — `has_force_color_override()` — checks whether `FORCE_COLOR` is set. When it is, both the `WT_SESSION` fast-path and the `TerminalName`-based promotion are suppressed, preserving explicit user intent. \| PowerShell \| cmd.exe \| WSL \| Bash for Windows \| \|---\|---\|---\|---\| \| ![SCR-20260226-msrh](https://github.com/user-attachments/assets/0f6297a6-4241-4dbf-b7ff-cf02da8941b0) \| ![SCR-20260226-nbao](https://github.com/user-attachments/assets/bb5ff8a9-903c-4677-a2de-1f6e1f34b18e) \| ![SCR-20260226-nbej](https://github.com/user-attachments/assets/26ecec2c-a7e9-410a-8702-f73995b490a6) \| ![SCR-20260226-nbkz](https://github.com/user-attachments/assets/80c4bf9a-3b41-40e1-bc87-f5c565f96075) \| ## Non-goals - This does not change color detection for anything outside the diff renderer (e.g. the chat widget, markdown rendering). - This does not add a user-facing config knob; `FORCE_COLOR` already serves that role. ## Tradeoffs - The `has_wt_session` signal is intentionally kept separate from `TerminalName::WindowsTerminal`. `terminal_info()` is derived with `TERM_PROGRAM` precedence, so it can differ from raw `WT_SESSION`. - Real-world validation in this issue: in both `cmd.exe` and PowerShell, `TERM`/`TERM_PROGRAM`/`COLORTERM` were absent, so TERM-based capability hints were unavailable in those sessions. - Checking `FORCE_COLOR` for presence rather than parsing its value is a simplification. In practice `supports-color` has already parsed it, so our check is a coarse "did the user set _anything_?" gate. The effective color level still comes from `supports-color`. - When `WT_SESSION` is present without `FORCE_COLOR`, we promote to truecolor regardless of `stdout_level` (including `Unknown`). This is aggressive but correct: `WT_SESSION` is a strong signal that we're in Windows Terminal. - ANSI-16 add/delete backgrounds (bright green/red) overpower syntax-highlighted token colors, making diffs harder to read. Foreground-only cues (colored text, gutter signs) preserve readability on low-color terminals. ## Architecture ``` stdout_color_level() ──┐ terminal_info().name ──┤ WT_SESSION presence ──┼──▶ diff_color_level_for_terminal() ──▶ DiffColorLevel FORCE_COLOR presence ──┘ │ ▼ RichDiffColorLevel::from_diff_color_level() │ ┌──────────┴──────────┐ │ Some(TrueColor\|256) │ → bg tints │ None (Ansi16) │ → fg only └─────────────────────┘ ``` `diff_color_level()` is the environment-reading entry point; it gathers the four runtime signals and delegates to the pure, testable `diff_color_level_for_terminal()`. ## Observability No new logs or metrics. Incorrect color selection is immediately visible as broken diff rendering; the test suite covers the decision matrix exhaustively. ## Tests Six new unit tests exercise every branch of `diff_color_level_for_terminal`: \| Test \| Inputs \| Expected \| \|------\|--------\|----------\| \| `windows_terminal_promotes_ansi16_to_truecolor_for_diffs` \| Ansi16 + WindowsTerminal name \| TrueColor \| \| `wt_session_promotes_ansi16_to_truecolor_for_diffs` \| Ansi16 + WT_SESSION only \| TrueColor \| \| `non_windows_terminal_keeps_ansi16_diff_palette` \| Ansi16 + WezTerm \| Ansi16 \| \| `wt_session_promotes_unknown_color_level_to_truecolor` \| Unknown + WT_SESSION \| TrueColor \| \| `explicit_force_override_keeps_ansi16_on_windows_terminal` \| Ansi16 + WindowsTerminal + FORCE_COLOR \| Ansi16 \| \| `explicit_force_override_keeps_ansi256_on_windows_terminal` \| Ansi256 + WT_SESSION + FORCE_COLOR \| Ansi256 \| \| `ansi16_add_style_uses_foreground_only` \| Dark + Ansi16 \| fg=Green, bg=None \| \| (and any other new snapshot/assertion tests from commits `d757fee` and `d7c78b3`) \| \| \| ## Test plan - [x] Verify all new unit tests pass (`cargo test -p codex-tui --lib`) - [x] On Windows Terminal: confirm diffs render with truecolor backgrounds - [x] On Windows Terminal with `FORCE_COLOR` set: confirm promotion is disabled and output follows the forced `supports-color` level - [x] On macOS/Linux terminals: confirm no behavior change Fixes https://github.com/openai/codex/issues/12904 Fixes https://github.com/openai/codex/issues/12890 Fixes https://github.com/openai/codex/issues/12912 Fixes https://github.com/openai/codex/issues/12840	2026-02-27 10:45:59 -07:00
Michael Bolin	d09a7535ed	fix: use AbsolutePathBuf for permission profile file roots (#12970 ) ## Why `PermissionProfile` should describe filesystem roots as absolute paths at the type level. Using `PathBuf` in `FileSystemPermissions` made the shared type too permissive and blurred together three different deserialization cases: - skill metadata in `agents/openai.yaml`, where relative paths should resolve against the skill directory - app-server API payloads, where callers should have to send absolute paths - local tool-call payloads for commands like `shell_command` and `exec_command`, where `additional_permissions.file_system` may legitimately be relative to the command `workdir` This change tightens the shared model without regressing the existing local command flow. ## What Changed - changed `protocol::models::FileSystemPermissions` and the app-server `AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf` - wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so relative permission roots in `agents/openai.yaml` resolve against the containing skill directory - kept app-server/API deserialization strict, so relative `additionalPermissions.fileSystem.*` paths are rejected at the boundary - restored cwd/workdir-relative deserialization for local tool-call payloads by parsing `shell`, `shell_command`, and `exec_command` arguments under an `AbsolutePathBufGuard` rooted at the resolved command working directory - simplified runtime additional-permission normalization so it only canonicalizes and deduplicates absolute roots instead of trying to recover relative ones later - updated the app-server schema fixtures, `app-server/README.md`, and the affected transport/TUI tests to match the final behavior	2026-02-27 17:42:52 +00:00
jif-oai	8cf5b00aef	fix: more stable notify script (#13011 )	2026-02-27 16:05:44 +01:00
jif-oai	fe439afb81	chore: tmp remove awaiter (#13001 )	2026-02-27 13:22:17 +01:00
jif-oai	c76bc8d1ce	feat: use the memory mode for phase 1 extraction (#13002 )	2026-02-27 12:49:03 +01:00
jif-oai	bbd237348d	feat: gen memories config (#12999 )	2026-02-27 12:38:47 +01:00
jif-oai	a63d8bd569	feat: add use memories config (#12997 )	2026-02-27 11:40:54 +01:00
Michael Bolin	e6cd75a684	notify: include client in legacy hook payload (#12968 ) ## Why The `notify` hook payload did not identify which Codex client started the turn. That meant downstream notification hooks could not distinguish between completions coming from the TUI and completions coming from app-server clients such as VS Code or Xcode. Now that the Codex App provides its own desktop notifications, it would be nice to be able to filter those out. This change adds that context without changing the existing payload shape for callers that do not know the client name, and keeps the new end-to-end test cross-platform. ## What changed - added an optional top-level `client` field to the legacy `notify` JSON payload - threaded that value through `core` and `hooks`; the internal session and turn state now carries it as `app_server_client_name` - set the field to `codex-tui` for TUI turns - captured `initialize.clientInfo.name` in the app server and applied it to subsequent turns before dispatching hooks - replaced the notify integration test hook with a `python3` script so the test does not rely on Unix shell permissions or `bash` - documented the new field in `docs/config.md` ## Testing - `cargo test -p codex-hooks` - `cargo test -p codex-tui` - `cargo test -p codex-app-server suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name -- --exact --nocapture` - `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs` still has unrelated existing failures in this environment) ## Docs The public config reference on `developers.openai.com/codex` should mention that the legacy `notify` payload may include a top-level `client` field. The TUI reports `codex-tui`, and the app server reports `initialize.clientInfo.name` when it is available.	2026-02-26 22:27:34 -08:00
Ahmed Ibrahim	53e28f18cf	Add realtime websocket tracing (#12981 ) - add transport and conversation logs around connect, close, and parse flow - log realtime transport failures as errors for easier debugging	2026-02-26 22:15:18 -08:00
Ahmed Ibrahim	4d180ae428	Add model availability NUX metadata (#12972 ) - replace show_nux with structured availability_nux model metadata - expose availability NUX data through the app-server model API - update shared fixtures and tests for the new field	2026-02-26 22:02:57 -08:00
alexsong-oai	f53612d3b2	Add a background job to refresh the requirements local cache (#12936 ) - Update the cloud requirements cache TTL to 30 minutes. - Add a background job to refresh the cache every 5 minutes. - Ensure there is only one refresh job per process.	2026-02-27 04:16:19 +00:00
Eric Traut	cee009d117	Add oauth_resource handling for MCP login flows (#12866 ) Addresses bug https://github.com/openai/codex/issues/12589 Builds on community PR #12763. This adds `oauth_resource` support for MCP `streamable_http` servers and wires it through the relevant config and login paths. It fixes the bug where the configured OAuth resource was not reliably included in the authorization request, causing MCP login to omit the expected `resource` parameter.	2026-02-26 20:10:12 -08:00
Matthew Zeng	6fe3dc2e22	[apps] Improve app/list with force_fetch=true (#12745 ) - [x] Improve app/list with force_fetch=true, we now keep cached snapshot until both install apps and directory apps load.	2026-02-27 03:54:03 +00:00
Curtis 'Fjord' Hawthorne	7e980d7db6	Support multimodal custom tool outputs (#12948 ) ## Summary This changes `custom_tool_call_output` to use the same output payload shape as `function_call_output`, so freeform tools can return either plain text or structured content items. The main goal is to let `js_repl` return image content from nested `view_image` calls in its own `custom_tool_call_output`, instead of relying on a separate injected message. ## What changed - Changed `custom_tool_call_output.output` from `string` to `FunctionCallOutputPayload` - Updated freeform tool plumbing to preserve structured output bodies - Updated `js_repl` to aggregate nested tool content items and attach them to the outer `js_repl` result - Removed the old `js_repl` special case that injected `view_image` results as a separate pending user image message - Updated normalization/history/truncation paths to handle multimodal `custom_tool_call_output` - Regenerated app-server protocol schema artifacts ## Behavior Direct `view_image` calls still return a `function_call_output` with image content. When `view_image` is called inside `js_repl`, the outer `js_repl` `custom_tool_call_output` now carries: - an `input_text` item if the JS produced text output - one or more `input_image` items from nested tool results So the nested image result now stays inside the `js_repl` tool output instead of being injected as a separate message. ## Compatibility This is intended to be backward-compatible for resumed conversations. Older histories that stored `custom_tool_call_output.output` as a plain string still deserialize correctly, and older histories that used the previous injected-image-message flow also continue to resume. Added regression coverage for resuming a pre-change rollout containing: - string-valued `custom_tool_call_output` - legacy injected image message history #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/12948	2026-02-26 18:17:46 -08:00
Ahmed Ibrahim	f90e97e414	Add realtime audio device picker (#12850 ) ## Summary - add a dedicated /audio picker for realtime microphone and speaker selection - persist realtime audio choices and prompt to restart only local audio when voice is live - add snapshot coverage for the new picker surfaces ## Validation - cargo test -p codex-tui - cargo insta accept - just fix -p codex-tui - just fmt	2026-02-26 17:27:44 -08:00
Shijie Rao	8715a6ef84	Feat: cxa-1833 update model/list (#12958 ) ### Summary Update `model/list` in app server to include more upgrade information.	2026-02-26 17:02:24 -08:00
Ahmed Ibrahim	a11da86b37	Make realtime audio test deterministic (#12959 ) ## Summary\n- add a websocket test-server request waiter so tests can synchronize on recorded client messages\n- use that waiter in the realtime delegation test instead of a fixed audio timeout\n- add temporary timing logs in the test and websocket mock to inspect where the flake stalls	2026-02-26 16:09:00 -08:00
Celia Chen	90cc4e79a2	feat: add local date/timezone to turn environment context (#12947 ) ## Summary This PR includes the session's local date and timezone in the model-visible environment context and persists that data in `TurnContextItem`. ## What changed - captures the current local date and IANA timezone when building a turn context, with a UTC fallback if the timezone lookup fails - includes current_date and timezone in the serialized <environment_context> payload - stores those fields on TurnContextItem so they survive rollout/history handling, subagent review threads, and resume flows - treats date/timezone changes as environment updates, so prompt caching and context refresh logic do not silently reuse stale time context - updates tests to validate the new environment fields without depending on a single hardcoded environment-context string ## test built a local build and saw it in the rollout file: ``` {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}} ```	2026-02-26 23:17:35 +00:00
Michael Bolin	4cb086d96f	test: move unix_escalation tests into sibling file (#12957 ) ## Why `unix_escalation.rs` had a large inline `mod tests` block that made the implementation harder to scan. This change moves those tests into a sibling file while keeping them as a child module, so they can still exercise private items without widening visibility. ## What Changed - replaced the inline `#[cfg(test)] mod tests` block in `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` with a path-based test module declaration - moved the existing unit tests into `codex-rs/core/src/tools/runtimes/shell/unix_escalation_tests.rs` - kept the extracted tests using `super::...` imports so they continue to access private helpers and types from `unix_escalation.rs` ## Testing - `cargo test -p codex-core unix_escalation::tests`	2026-02-26 23:15:28 +00:00
Ahmed Ibrahim	a0e86c69fe	Add realtime audio device config (#12849 ) ## Summary - add top-level realtime audio config for microphone and speaker selection - apply configured devices when starting realtime capture and playback - keep missing-device behavior on the system default fallback path ## Validation - just write-config-schema - cargo test -p codex-core realtime_audio - cargo test -p codex-tui - just fix -p codex-core - just fix -p codex-tui - just fmt --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-26 15:08:21 -08:00
Michael Bolin	fd719d3828	fix: sort codex features list alphabetically (#12944 ) ## Why `codex features list` currently prints features in declaration order from `codex_core::features::FEATURES`. That makes the output harder to scan when looking for a specific flag, and the order can change for reasons unrelated to the CLI. ## What changed - Sort the `codex features list` rows by feature key before printing them in `codex-rs/cli/src/main.rs`. - Add an integration test in `codex-rs/cli/tests/features.rs` that runs `codex features list` and asserts the feature-name column is alphabetized. ## Verification - Added `features_list_is_sorted_alphabetically_by_feature_name`. - Ran `cargo test -p codex-cli`.	2026-02-26 14:44:39 -08:00
pakrym-oai	951a389654	Allow clients not to send summary as an option (#12950 ) Summary is a required parameter on UserTurn. Ideally we'd like the core to decide the appropriate summary level. Make the summary optional and don't send it when not needed.	2026-02-26 14:37:38 -08:00
Charley Cunningham	c1afb8815a	tui: use thread_id for resume/fork cwd resolution (#12727 ) ## Summary - make resume/fork targets explicit and typed as `SessionTarget { path, thread_id }` (non-optional `thread_id`) - resolve `thread_id` centrally via `resolve_session_thread_id(...)`: - use CLI input directly when it is a UUID (`--resume <uuid>` / `--fork <uuid>`) - otherwise read `thread_id` from rollout `SessionMeta` for path-based selections (picker, `--resume-last`, name-based resume/fork) - use `thread_id` to read cwd from SQLite first during resume/fork cwd resolution - keep rollout fallback for cwd resolution when SQLite is unavailable or does not return thread metadata (`TurnContext` tail, then `SessionMeta`) - keep the resume picker open when a selected row has unreadable session metadata, and show an inline recoverable error instead of aborting the TUI ## Why This removes ad-hoc rollout filename parsing and makes resume/fork target identity explicit. The resume/fork cwd check can use indexed SQLite lookup by `thread_id` in the common path, while preserving rollout-based fallback behavior. It also keeps malformed legacy rows recoverable in the picker instead of letting a selection failure unwind the app. ## Notes - minimal TUI-only change; no schema/protocol changes - includes TUI test coverage for SQLite cwd precedence when `thread_id` is available - includes TUI regression coverage for picker inline error rendering / non-fatal unreadable session rows ## Codex author `codex resume 019c9205-7f8b-7173-a2a2-f082d4df3de3`	2026-02-26 12:52:31 -08:00
jif-oai	a6065d30f4	feat: add git info to memories (#12940 )	2026-02-26 20:14:13 +00:00
Michael Bolin	7fa9d9ae35	feat: include sandbox config with escalation request (#12839 ) ## Why Before this change, an escalation approval could say that a command should be rerun, but it could not carry the sandbox configuration that should still apply when the escalated command is actually spawned. That left an unsafe gap in the `zsh-fork` skill path: skill scripts under `scripts/` that did not declare permissions could be escalated without a sandbox, and scripts that did declare permissions could lose their bounded sandbox on rerun or cached session approval. This PR extends the escalation protocol so approvals can optionally carry sandbox configuration all the way through execution. That lets the shell runtime preserve the intended sandbox instead of silently widening access. We likely want a single permissions type for this codepath eventually, probably centered on `Permissions`. For now, the protocol needs to represent both the existing `PermissionProfile` form and the fuller `Permissions` form, so this introduces a temporary disjoint union, `EscalationPermissions`, to carry either one. Further, this means that today, a skill either: - does not declare any permissions, in which case it is run using the default sandbox for the turn - specifies permissions, in which case the skill is run using that exact sandbox, which might be more restrictive than the default sandbox for the turn We will likely change the skill's permissions to be additive to the existing permissions for the turn. ## What Changed - Added `EscalationPermissions` to `codex-protocol` so escalation requests can carry either a `PermissionProfile` or a full `Permissions` payload. - Added an explicit `EscalationExecution` mode to the shell escalation protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and `Permissions(...)` instead of overloading `None`. - Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution time, which keeps ordinary `UseDefault` commands on the turn sandbox and preserves turn-level macOS seatbelt profile extensions. - Updated the `zsh-fork` skill path so a skill with no declared permissions inherits the conversation's effective sandbox instead of escalating unsandboxed. - Updated the `zsh-fork` skill path so a skill with declared permissions reruns with exactly those permissions, including when a cached session approval is reused. ## Testing - Added unit coverage in `core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit `UseDefault` / `RequireEscalated` / `WithAdditionalPermissions` execution mapping. - Added unit coverage in `core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt extension preservation in both the `TurnDefault` and explicit-permissions rerun paths. - Added integration coverage in `core/tests/suite/skill_approval.rs` for permissionless skills inheriting the turn sandbox and explicit skill permissions remaining bounded across cached approval reuse.	2026-02-26 12:00:18 -08:00
iceweasel-oai	6b879fe248	don't grant sandbox read access to ~/.ssh and a few other dirs. (#12835 ) OpenSSH complains if any other users have read access to ssh keys. ie https://github.com/openai/codex/issues/12226	2026-02-26 11:35:55 -08:00
pakrym-oai	717cbe354f	Remove noisy log (#12929 ) This log message floods logs on windows	2026-02-26 11:34:14 -08:00
jif-oai	3404ecff15	feat: add post-compaction sub-agent infos (#12774 ) Co-authored-by: Codex <noreply@openai.com>	2026-02-26 18:55:34 +00:00
Curtis 'Fjord' Hawthorne	eb77db2957	Log js_repl nested tool responses in rollout history (#12837 ) ## Summary - add tracing-based diagnostics for nested `codex.tool(...)` calls made from `js_repl` - emit a bounded, sanitized summary at `info!` - emit the exact raw serialized response object or error string seen by JavaScript at `trace!` - document how to enable these logs and where to find them, especially for `codex app-server` ## Why Nested `codex.tool(...)` calls inside `js_repl` are a debugging boundary: JavaScript sees the tool result, but that result is otherwise hard to inspect from outside the kernel. This change adds explicit tracing for that path using the repo’s normal observability pattern: - `info` for compact summaries - `trace` for exact raw payloads when deep debugging is needed ## What changed - `js_repl` now summarizes nested tool-call results across the response shapes it can receive: - message content - function-call outputs - custom tool outputs - MCP tool results and MCP error results - direct error strings - each nested `codex.tool(...)` completion logs: - `exec_id` - `tool_call_id` - `tool_name` - `ok` - a bounded summary struct describing the payload shape - at `trace`, the same path also logs the exact serialized response object or error string that JavaScript received - docs now include concrete logging examples for `codex app-server` - unit coverage was added for multimodal function output summaries and error summaries ## How to use it ### Summary-only logging Set: ```sh RUST_LOG=codex_core::tools::js_repl=info ``` For `codex app-server`, tracing output is written to the server process `stderr`. Example: ```sh RUST_LOG=codex_core::tools::js_repl=info \ LOG_FORMAT=json \ codex app-server \ 2> /tmp/codex-app-server.log ``` This emits bounded summary lines for nested `codex.tool(...)` calls. ### Full raw debugging Set: ```sh RUST_LOG=codex_core::tools::js_repl=trace ``` Example: ```sh RUST_LOG=codex_core::tools::js_repl=trace \ LOG_FORMAT=json \ codex app-server \ 2> /tmp/codex-app-server.log ``` At `trace`, you get: - the same `info` summary line - a `trace` line with the exact serialized response object seen by JavaScript - or the exact error string if the nested tool call failed ### Where the logs go For `codex app-server`, these logs go to process `stderr`, so redirect or capture `stderr` to inspect them. Example: ```sh RUST_LOG=codex_core::tools::js_repl=trace \ LOG_FORMAT=json \ /Users/fjord/code/codex/codex-rs/target/debug/codex app-server \ 2> /tmp/codex-app-server.log ``` Then inspect: ```sh rg "js_repl nested tool call" /tmp/codex-app-server.log ``` Without an explicit `RUST_LOG` override, these `js_repl` nested tool-call logs are typically not visible.	2026-02-26 10:12:28 -08:00
jif-oai	d3603ae5d3	feat: fork thread multi agent (#12499 )	2026-02-26 18:01:53 +00:00
jif-oai	c53c08f8f9	chore: calm down awaiter (#12925 )	2026-02-26 17:54:48 +00:00

1 2 3 4 5 ...

4219 Commits