Commit Graph

4137 Commits

Author SHA1 Message Date
Channing Conger
160b7ce0d3 Add new visibility flag to DyanmicToolSpecs 2026-03-02 21:34:28 +00:00
Eric Traut
d94f0b6ce7 Fix issue deduplication workflow for Codex issues (#13215)
Fixes #13203

Summary
- split the duplicate-finding workflow into two jobs so we gather all
issues first
- add an open-issue fallback job that runs only when the full scan finds
nothing
- centralize final selection so `comment-on-issue` always sees the best
dedupe output
2026-03-01 22:45:50 -07:00
Ahmed Ibrahim
0aeb55bf08 Record realtime close marker on replacement (#13058)
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
2026-03-01 13:54:12 -08:00
Thibault Sottiaux
c9cef6ba9e [codex] include plan type in account updates (#13181)
This change fixes a Codex app account-state sync bug where clients could
know the user was signed in but still miss the ChatGPT subscription
tier, which could lead to incorrect upgrade messaging for paid users.

The root cause was that `account/updated` only carried `authMode` while
plan information was available separately via `account/read` and
rate-limit snapshots, so this update adds `planType` to
`account/updated`, populates it consistently across login and refresh
paths.
2026-03-01 13:43:37 -08:00
Leo Shimonaka
4ae60cf03c fix: MacOSAutomationPermission::BundleIDs should allow communicating … (#12989)
…with launchservicesd

Add mach lookup for `launchservicesd` when extending the sandbox for
`MacOSAutomationPermission::BundleIDs`. This is necessary so that the
target application can be launched for automation.

This omission was due to a spec error in a document, which has been
fixed.
2026-03-01 11:00:54 -08:00
xl-openai
752402c4fe feat: load from plugins (#12864)
Support loading plugins.

Plugins can now be enabled via [plugins.<name>] in config.toml. They are
loaded as first-class entities through PluginsManager, and their default
skills/ and .mcp.json contributions are integrated into the existing
skills and MCP flows.
2026-03-01 10:50:56 -08:00
Michael Bolin
6a673e7339 core: resolve host_executable() rules during preflight (#13065)
## Why

[#12964](https://github.com/openai/codex/pull/12964) added
`host_executable()` support to `codex-execpolicy`, and
[#13046](https://github.com/openai/codex/pull/13046) adopted it in the
zsh-fork interception path.

The remaining gap was the preflight execpolicy check in
`core/src/exec_policy.rs`. That path derives approval requirements
before execution for `shell`, `shell_command`, and `unified_exec`, but
it was still using the default exact-token matcher.

As a result, a command that already included an absolute executable
path, such as `/usr/bin/git status`, could still miss a basename rule
like `prefix_rule(pattern = ["git"], ...)` during preflight even when
the policy also defined a matching `host_executable(name = "git", ...)`
entry.

This PR brings the same opt-in `host_executable()` resolution to the
preflight approval path when an absolute program path is already present
in the parsed command.

## What Changed

- updated
`ExecPolicyManager::create_exec_approval_requirement_for_command()` in
`core/src/exec_policy.rs` to use `check_multiple_with_options(...)` with
`MatchOptions { resolve_host_executables: true }`
- kept the existing shell parsing flow for approval derivation, but now
allow basename rules to match absolute executable paths during preflight
when `host_executable()` permits it
- updated requested-prefix amendment evaluation to use the same
host-executable-aware matching mode, so suggested `prefix_rule()`
amendments are checked consistently for absolute-path commands
- added preflight coverage for:
- absolute-path commands that should match basename rules through
`host_executable()`
- absolute-path commands whose paths are not in the allowed
`host_executable()` mapping
  - requested prefix-rule amendments for absolute-path commands

## Verification

- `just fix -p codex-core`
- `cargo test -p codex-core --lib exec_policy::tests::`
2026-02-28 17:25:30 +00:00
jif-oai
74e5150b1e fix: package models.json for Bazel tests (#13129) 2026-02-28 17:21:02 +01:00
jif-oai
84b662e74f nit: disable on windows (#13127) 2026-02-28 14:55:16 +01:00
daveaitel-openai
eec3b1e235 Speed up subagent startup (#12935)
## Summary
- skip online model refresh for subagent sessions
- avoid rollout flushes during subagent startup
- keep /models refresh for non-subagent sessions

## Testing
- cargo test -p codex-core --test all
suite::models_etag_responses::refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch
- cargo test -p codex-core --test all
suite::remote_models::remote_models_long_model_slug_is_sent_with_high_reasoning
- cargo test -p codex-core --test all
suite::model_switching::model_switch_to_smaller_model_updates_token_context_window
- cargo test -p codex-core --test all
suite::compact::pre_sampling_compact_runs_on_switch_to_smaller_context_model
- cargo test -p codex-core --test all
suite::compact::pre_sampling_compact_runs_after_resume_and_switch_to_smaller_model
- cargo test -p codex-core --test all
suite::personality::remote_model_friendly_personality_instructions_with_feature

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-28 14:54:08 +01:00
jif-oai
3bfee6fcb5 nit: ignore `resume_startup_does_not_consume_model_availability_nux_c… (#13128) 2026-02-28 14:50:41 +01:00
Andi Liu
5f7c38baa9 Tune memory read-path for stale facts (#13088)
## Why
- tighten Codex memory-read behavior around stale facts and conflicting
memory
- encode the risk-of-drift vs verification-effort decision rule directly
in the read-path prompt
- make partial stale-detail updates explicit so correcting only the
answer is not treated as sufficient

## What changed
- update `codex-rs/core/templates/memories/read_path.md`
- add guidance for when to verify cheap local facts vs when to answer
from older memory with visible provenance
- strengthen same-turn `MEMORY.md` updates when stored concrete details
are stale

## Notes
- this is based on some staleness eval work
2026-02-28 14:48:47 +01:00
jif-oai
bee93ca2f3 chore: change mem default (#13125) 2026-02-28 14:45:27 +01:00
jif-oai
d33f4b54ac feat: skill disable respect config layer (#13027) 2026-02-28 14:17:05 +01:00
jif-oai
2b38b4e03b feat: approval for sub-agent in the TUI (#12995)
<img width="766" height="290" alt="Screenshot 2026-02-27 at 10 50 48"
src="https://github.com/user-attachments/assets/3bc96cd9-ed2c-4d67-a317-8f7b60abbbb1"
/>
2026-02-28 14:07:07 +01:00
Eric Traut
83177ed7a8 Enable analytics in codex exec and codex mcp-server (#13083)
Addresses #12913

`codex exec` was not correctly defaulting to Otel metrics to enabled 
`codex mcp-server` completely lacked an Otel collector

Summary:
- default to enabling analytics when `codex exec` initializes
OpenTelemetry so the CLI actually reports metrics again
- add a regression test that proves the flag remains enabled by default
- added Otel collector to `codex mcp-server`
2026-02-27 19:22:54 -07:00
alexsong-oai
e2fef7a3d2 Make cloud_requirements fail close (#13063)
Make it fail-close only for CLI for now
Will extend this for app-server later
2026-02-27 18:22:05 -08:00
Eric Traut
e6032eb0b7 Fix CLI feedback link (#13086)
Addresses #12967

About a month ago, I updated the Github bug report templates to
accommodate the (at the time) new Codex app. The `/feedback` code path
in the CLI was referencing one of the old templates, and I didn't
realize it at the time. This PR updates the link so users don't get an
empty bug template when using `/feedback`.
2026-02-27 19:02:40 -07:00
sayan-oai
033ef9cb9d feat: add debug clear-memories command to hard-wipe memories state (#13085)
#### what
adds a `codex debug clear-memories` command to help with clearing all
memories state from disk, sqlite db, and marking threads as
`memory_mode=disabled` so they don't get resummarized when the
`memories` feature is re-enabled.

#### tests
add tests
2026-02-27 17:45:55 -08:00
Ruslan Nigmatullin
8c1e3f3e64 app-server: Add ephemeral field to Thread object (#13084)
Currently there is no alternative way to know that thread is ephemeral,
only client which did create it has the knowledge.
2026-02-27 17:42:25 -08:00
Michael Bolin
1a8d930267 core: adopt host_executable() rules in zsh-fork (#13046)
## Why

[#12964](https://github.com/openai/codex/pull/12964) added
`host_executable()` support to `codex-execpolicy`, but the zsh-fork
interception path in `unix_escalation.rs` was still evaluating commands
with the default exact-token matcher.

That meant an intercepted absolute executable such as `/usr/bin/git
status` could still miss basename rules like `prefix_rule(pattern =
["git", "status"])`, even when the policy also defined a matching
`host_executable(name = "git", ...)` entry.

This PR adopts the new matching behavior in the zsh-fork runtime only.
That keeps the rollout intentionally narrow: zsh-fork already requires
explicit user opt-in, so it is a safer first caller to exercise the new
`host_executable()` scheme before expanding it to other execpolicy call
sites.

It also brings zsh-fork back in line with the current `prefix_rule()`
execution model. Until prefix rules can carry their own permission
profiles, a matched `prefix_rule()` is expected to rerun the intercepted
command unsandboxed on `allow`, or after the user accepts `prompt`,
instead of merely continuing inside the inherited shell sandbox.

## What Changed

- added `evaluate_intercepted_exec_policy()` in
`core/src/tools/runtimes/shell/unix_escalation.rs` to centralize
execpolicy evaluation for intercepted commands
- switched intercepted direct execs in the zsh-fork path to
`check_multiple_with_options(...)` with `MatchOptions {
resolve_host_executables: true }`
- added `commands_for_intercepted_exec_policy()` so zsh-fork policy
evaluation works from intercepted `(program, argv)` data instead of
reconstructing a synthetic command before matching
- left shell-wrapper parsing intentionally disabled by default behind
`ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so
path-sensitive matching relies on later direct exec interception rather
than shell-script parsing
- made matched `prefix_rule()` decisions rerun intercepted commands with
`EscalationExecution::Unsandboxed`, while unmatched-command fallback
keeps the existing sandbox-preserving behavior
- extracted the zsh-fork test harness into
`core/tests/common/zsh_fork.rs` so both the skill-focused and
approval-focused integration suites can exercise the same runtime setup
- limited this change to the intercepted zsh-fork path rather than
changing every execpolicy caller at once
- added runtime coverage in
`core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and
disallowed `host_executable()` mappings and the wrapper-parsing modes
- added integration coverage in `core/tests/suite/approvals.rs` to
verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns
under zsh-fork outside a restrictive `WorkspaceWrite` sandbox

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046).
* #13065
* __->__ #13046
2026-02-28 01:41:23 +00:00
Owen Lin
8fa792868c fix(app-server): make thread/start non-blocking (#13033)
Stop `thread/start` from blocking other app-server requests.

Before this change, `thread/start ran` inline on the request loop, so
slow startup paths like MCP auth checks could hold up unrelated requests
on the same connection, including `thread/loaded/list`. This moves
`thread/start` into a background task.

While doing so, it revealed an issue where we were doing nested locking
(and there were some race conditions possible that could introduce a
"phantom listener"). This PR also refactors the listener/subscription
bookkeeping - listener/subscription state is now centralized in
`ThreadStateManager` instead of being split across multiple lock
domains. That makes late auto-attach on `thread/start` race-safe and
avoids reintroducing disconnected clients as phantom subscribers.
2026-02-28 01:40:08 +00:00
Eric Traut
6604608bad Suppress duplicate assistant output on stdout in interactive sessions (#13082)
Addresses #12566

Summary
- stop printing the final assistant message on stdout when the process
is running in a terminal so interactive users only see it once
- add a helper that gates the stdout emission and cover it with unit
tests
2026-02-27 18:31:17 -07:00
Ruslan Nigmatullin
70ed6cbc71 app-server: Add an ability to watch events in the test client (#13080)
Add a `watch` subcommand to `codex-app-server-test-client` binary to
help in manual testing of events flow.
2026-02-27 17:19:53 -08:00
Ahmed Ibrahim
ec6f6aacbf Add model availability NUX tooltips (#13021)
- override startup tooltips with model availability NUX and persist
per-model show counts in config
- stop showing each model after four exposures and fall back to normal
tooltips
2026-02-27 17:14:06 -08:00
Eric Traut
ff5cbfd7d4 Handle missing plan info for ChatGPT accounts (#13072)
Addresses https://github.com/openai/codex/issues/13007 and
https://github.com/openai/codex/issues/12170

There are situations where the ChatGPT auth backend might return a JWT
that contains no plan information. Most code paths already handle this
case well, but the internal implementation of the "account/read" app
server call was failing in this case (returning an error rather than
properly returning None for the plan).

This resulted in a situation where users needed to log in every time the
extension or app started even if they successfully logged in the last
time.

Summary
- allow ChatGPT-authenticated accounts to fall back to
`AccountPlanType::Unknown` when the token omits the plan claim
- add regression coverage in `app-server/tests/suite/v2/account.rs` to
confirm `account/read` returns `plan_type: Unknown` when the claim is
absent
- ensure the Rust auth helpers and fixtures treat missing plan claims as
Optional and default to `Unknown`
2026-02-27 17:51:21 -07:00
Eric Traut
61c42396ab Keep large-paste placeholders intact during file completion (#13070)
Addresses https://github.com/openai/codex/issues/13040

Fixes a regression in 0.106.0 introduced in
https://github.com/openai/codex/pull/9393

Summary
- replace only the active completion range so unrelated text elements
(e.g., large-paste placeholders) stay atomic and can still expand
- add a regression test verifying large paste placeholders persist
through completions and submit
- could not fetch issue details via GitHub API because network access is
disabled in this sandboxed environment
2026-02-27 17:19:11 -07:00
Felipe Coury
c3c75878e8 fix(tui): theme-aware diff backgrounds with fallback behavior (#13037)
## Problem

The TUI diff renderer uses hardcoded background palettes for
insert/delete lines that don't respect the user's chosen syntax theme.
When a theme defines `markup.inserted` / `markup.deleted` scope
backgrounds (the convention used by GitHub, Solarized, Monokai, and most
VS Code themes), those colors are ignored — the diff always renders with
the same green/red tints regardless of theme selection.

Separately, ANSI-16 terminals (and Windows Terminal sessions misreported
as ANSI-16) rendered diff backgrounds as full-saturation blocks that
obliterated syntax token colors, making highlighted diffs unreadable.

## Mental model

Diff backgrounds are resolved in three layers:

1. **Color level detection** — `diff_color_level_for_terminal()` maps
the raw `supports-color` probe + Windows Terminal heuristics to a
`DiffColorLevel` (TrueColor / Ansi256 / Ansi16). Windows Terminal gets
promoted from Ansi16 to TrueColor when `WT_SESSION` is present.

2. **Background resolution** — `resolve_diff_backgrounds()` queries the
active syntax theme for `markup.inserted`/`markup.deleted` (falling back
to `diff.inserted`/`diff.deleted`), then overlays those on top of the
hardcoded palette. For ANSI-256, theme RGB values are quantized to the
nearest xterm-256 index. For ANSI-16, backgrounds are `None`
(foreground-only).

3. **Style composition** — The resolved `ResolvedDiffBackgrounds` is
threaded through every call to `style_add`, `style_del`, `style_sign_*`,
and `style_line_bg_for`, which decide how to compose
foreground+background for each line kind and theme variant.

A new `RichDiffColorLevel` type (a subset of `DiffColorLevel` without
Ansi16) encodes the invariant "we have enough depth for tinted
backgrounds" at the type level, so background-producing functions have
exhaustive matches without unreachable arms.

## Non-goals

- No change to gutter (line number column) styling — gutter backgrounds
still use the hardcoded palette.
- No per-token scope background resolution — this is line-level
background only; syntax token colors come from the existing
`highlight_code_to_styled_spans` path.
- No dark/light theme auto-switching from scope backgrounds —
`DiffTheme` is still determined by querying the terminal's background
color.

## Tradeoffs

- **Theme trust vs. visual safety:** When a theme defines scope
backgrounds, we trust them unconditionally for rich color levels. A
badly authored theme could produce illegible combinations. The fallback
for `None` backgrounds (foreground-only) is intentionally conservative.
- **Quantization quality:** ANSI-256 quantization uses perceptual
distance across indices 16–255, skipping system colors. The result is
approximate — a subtle theme tint may land on a noticeably different
xterm index.
- **Single-query caching:** `resolve_diff_backgrounds` is called once
per `render_change` invocation (i.e., once per file in a diff). If the
theme changes mid-render (live preview), the next file picks up the new
backgrounds.

## Architecture

Files changed:

| File | Role |
|---|---|
| `tui/src/render/highlight.rs` | New: `DiffScopeBackgroundRgbs`,
`diff_scope_background_rgbs()`, scope extraction helpers |
| `tui/src/diff_render.rs` | New: `RichDiffColorLevel`,
`ResolvedDiffBackgrounds`, `resolve_diff_backgrounds*`,
`quantize_rgb_to_ansi256`, Windows Terminal promotion; modified: all
style helpers to accept/thread `ResolvedDiffBackgrounds` |

The scope-extraction code lives in `highlight.rs` because it uses
`syntect::highlighting::Highlighter` and the theme singleton. The
resolution and quantization logic lives in `diff_render.rs` because it
depends on diff-specific types (`DiffTheme`, `DiffColorLevel`, ratatui
`Color`).

## Observability

No runtime logging was added. The most useful debugging aid is the
`diff_color_level_for_terminal` function, which is pure and fully
unit-tested — to diagnose a color-depth mismatch, log its four inputs
(`StdoutColorLevel`, `TerminalName`, `WT_SESSION` presence,
`FORCE_COLOR` presence).

Scope resolution can be tested by loading a custom `.tmTheme` with known
`markup.inserted` / `markup.deleted` backgrounds and checking the diff
output in a truecolor terminal.

## Tests

- **Windows Terminal promotion:** 7 unit tests cover every branch of
`diff_color_level_for_terminal` (ANSI-16 promotion, `WT_SESSION`
unconditional promotion, `FORCE_COLOR` suppression, conservative
`Unknown` level).
- **ANSI-16 foreground-only:** Tests verify that `style_add`,
`style_del`, `style_sign_*`, `style_line_bg_for`, and `style_gutter_for`
all return `None` backgrounds on ANSI-16.
- **Scope resolution:** Tests verify `markup.*` preference over
`diff.*`, `None` when no scope matches, bundled theme resolution, and
custom `.tmTheme` round-trip.
- **Quantization:** Test verifies ANSI-256 quantization of a known RGB
triple.
- **Insta snapshots:** 2 new snapshot tests
(`ansi16_insert_delete_no_background`,
`theme_scope_background_resolution`) lock visual output.
2026-02-27 16:44:56 -07:00
viyatb-oai
a39d76dc45 feat(linux-sandbox): support restricted ReadOnlyAccess in bwrap (#12369)
## Summary
Implements Linux bubblewrap support for restricted `ReadOnlyAccess`
(introduced in #11387) by honoring `readable_roots` and
`include_platform_defaults` instead of failing closed.

## What changed
- Added a Linux platform-default read allowlist for common
system/runtime paths (e.g. /usr, /etc, /lib*, Nix store roots).
- Updated the bwrap filesystem mount builder to support restricted read
access:
  - Full-read policies still use `--ro-bind / /`
- Restricted-read policies now start from` --tmpfs `/ and add scoped
`--ro-bind` mounts
- Preserved existing writable-root and protected-subpath behavior
(`.git`, `.codex`, etc.).

`ReadOnlyAccess::Restricted` was already modeled in protocol, but Linux
bwrap still returned `UnsupportedOperation` for restricted read access.
This closes that gap for the active Linux filesystem backend.


## Notes
Legacy Linux Landlock fallback still fail-closes for restricted read
access (unchanged).
2026-02-27 15:25:50 -08:00
Matthew Zeng
392fa7de50 [apps] Stablize app list updated event. (#13067)
Stablize app list updated event so that we only send 2 updates: 1 when
installed apps become available, one when all directory apps are
available. Previously it also updates when directory apps become
available before installed apps, which cuts off installed apps.
2026-02-27 15:23:24 -08:00
Charley Cunningham
695957a348 Unify rollout reconstruction with resume/fork TurnContext hydration (#12612)
## Summary

This PR unifies rollout history reconstruction and resume/fork metadata
hydration under a single `Session::reconstruct_history_from_rollout`
implementation.

The key change from main is that replay metadata now comes from the same
reconstruction pass that rebuilds model-visible history, instead of
doing a second bespoke rollout scan to recover `previous_model` /
`reference_context_item`.

## What Changed

### Unified reconstruction output

`reconstruct_history_from_rollout` now returns a single
`RolloutReconstruction` bundle containing:

- rebuilt `history`
- `previous_model`
- `reference_context_item`

Resume and fork both consume that shared output directly.

### Reverse replay core

The reconstruction logic moved into
`codex-rs/core/src/codex/rollout_reconstruction.rs` and now scans
rollout items newest-to-oldest.

That reverse pass:

- derives `previous_model`
- derives whether `reference_context_item` is preserved or cleared
- stops early once it has both resume metadata and a surviving
`replacement_history` checkpoint

History materialization is still bridged eagerly for now by replaying
only the surviving suffix forward, which keeps the history result stable
while moving the control flow toward the future lazy reverse loader
design.

### Removed bespoke context lookup

This deletes `last_rollout_regular_turn_context_lookup` and its separate
compaction-aware scan.

The previous model / baseline metadata is now computed from the same
replay state that rebuilds history, so resume/fork cannot drift from the
reconstructed transcript view.

### `TurnContextItem` persistence contract

`TurnContextItem` is now treated as the replay source of truth for
durable model-visible baselines.

This PR keeps the following contract explicit:

- persist `TurnContextItem` for the first real user turn so resume can
recover `previous_model`
- persist it for later turns that emit model-visible context updates
- if mid-turn compaction reinjects full initial context into replacement
history, persist a fresh `TurnContextItem` after `Compacted` so
resume/fork can re-establish the baseline from the rewritten history
- do not treat manual compaction or pre-sampling compaction as creating
a new durable baseline on their own

## Behavior Preserved

- rollback replay stays aligned with `drop_last_n_user_turns`
- rollback skips only user turns
- incomplete active user turns are dropped before older finalized turns
when rollback applies
- unmatched aborts do not consume the current active turn
- missing abort IDs still conservatively clear stale compaction state
- compaction clears `reference_context_item` until a later
`TurnContextItem` re-establishes it
- `previous_model` still comes from the newest surviving user turn that
established one

## Tests

Targeted validation run for the current branch shape:

- `cd codex-rs && cargo test -p codex-core --lib
codex::rollout_reconstruction_tests -- --nocapture`
- `cd codex-rs && just fmt`

The branch also extracts the rollout reconstruction tests into
`codex-rs/core/src/codex/rollout_reconstruction_tests.rs` so this logic
has a dedicated home instead of living inline in `codex.rs`.
2026-02-27 13:50:45 -08:00
daniel-oai
6046ca19ba Clarify escalation guidance for sandbox-related network failures (#13051)
This updates the on-request permissions instructions so likely
sandbox-related network failures during dependency installation are
treated as escalation candidates.

Repro:
- Run `codex -a on-request -s workspace-write` in a fresh temp dir.
- Prompt: `Build a new rust app with one dependency, anyhow, and try
installing the dependency`.
- Before this change, DNS/registry failures like `Could not resolve
host: index.crates.io` could be treated like ordinary transient failures
and not escalate.

Fix:
- Clarify that likely sandbox-related network errors such as DNS/host
resolution, registry/index access, and dependency download failures
should trigger escalation.

Validation:
- Rebuild the CLI and rerun the same repro. The same instructions should
now be more likely to trigger escalation instead of silently stopping.

Related Slack canvas:
- https://openai.enterprise.slack.com/docs/T0BQTNSUF/F0ACVNJAV09
2026-02-27 13:48:52 -08:00
Michael Bolin
b148d98e0e execpolicy: add host_executable() path mappings (#12964)
## Why

`execpolicy` currently keys `prefix_rule()` matching off the literal
first token. That works for rules like `["/usr/bin/git"]`, but it means
shared basename rules such as `["git"]` do not help when a caller passes
an absolute executable path like `/usr/bin/git`.

This PR lays the groundwork for basename-aware matching without changing
existing callers yet. It adds typed host-executable metadata and an
opt-in resolution path in `codex-execpolicy`, so a follow-up PR can
adopt the new behavior in `unix_escalation.rs` and other call sites
without having to redesign the policy layer first.

## What Changed

- added `host_executable(name = ..., paths = [...])` to the execpolicy
parser and validated it with `AbsolutePathBuf`
- stored host executable mappings separately from prefix rules inside
`Policy`
- added `MatchOptions` and opt-in `*_with_options()` APIs that preserve
existing behavior by default
- implemented exact-first matching with optional basename fallback,
gated by `host_executable()` allowlists when present
- normalized executable names for cross-platform matching so Windows
paths like `git.exe` can satisfy `host_executable(name = "git", ...)`
- updated `match` / `not_match` example validation to exercise the
host-executable resolution path instead of only raw prefix-rule matching
- preserved source locations for deferred example-validation errors so
policy load failures still point at the right file and line
- surfaced `resolvedProgram` on `RuleMatch` so callers can tell when a
basename rule matched an absolute executable path
- preserved host executable metadata when requirements policies overlay
file-based policies in `core/src/exec_policy.rs`
- documented the new rule shape and CLI behavior in
`execpolicy/README.md`

## Verification

- `cargo test -p codex-execpolicy`
- added coverage in `execpolicy/tests/basic.rs` for parsing, precedence,
empty allowlists, basename fallback, exact-match precedence, and
host-executable-backed `match` / `not_match` examples
- added a regression test in `core/src/exec_policy.rs` to verify
requirements overlays preserve `host_executable()` metadata
- verified `cargo test -p codex-core --lib`, including source-rendering
coverage for deferred validation errors
2026-02-27 12:59:24 -08:00
Michael Bolin
6e0f1e9469 fix: disable Bazel builds in CI on ubuntu-24.04-arm until we can stabilize them (#13055)
The other three Bazel builds have experienced low flakiness in my
experience whereas I find myself re-running the `ubuntu-24.04-arm` jobs
often to shake out the flakes. Disabling for now.
2026-02-27 12:49:13 -08:00
Ruslan Nigmatullin
69d7a456bb app-server: Replay pending item requests on thread/resume (#12560)
Replay pending client requests after `thread/resume` and emit resolved
notifications when those requests clear so approval/input UI state stays
in sync after reconnects and across subscribed clients.

Affected RPCs:
- `item/commandExecution/requestApproval`
- `item/fileChange/requestApproval`
- `item/tool/requestUserInput`

Motivation:
- Resumed clients need to see pending approval/input requests that were
already outstanding before the reconnect.
- Clients also need an explicit signal when a pending request resolves
or is cleared so stale UI can be removed on turn start, completion, or
interruption.

Implementation notes:
- Use pending client requests from `OutgoingMessageSender` in order to
replay them after `thread/resume` attaches the connection, using
original request ids.
- Emit `serverRequest/resolved` when pending requests are answered
or cleared by lifecycle cleanup.
- Update the app-server protocol schema, generated TypeScript bindings,
and README docs for the replay/resolution flow.

High-level test plan:
- Added automated coverage for replaying pending command execution and
file change approval requests on `thread/resume`.
- Added automated coverage for resolved notifications in command
approval, file change approval, request_user_input, turn start, and turn
interrupt flows.
- Verified schema/docs updates in the relevant protocol and app-server
tests.

Manual testing:
- Tested reconnect/resume with multiple connections.
- Confirmed state stayed in sync between connections.
2026-02-27 12:45:59 -08:00
Michael Bolin
66b0adb34c app-server: deflake running thread resume tests (#13047)
## Why

CI has been intermittently failing in
`suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
because these running-thread resume tests treated `turn/started` as
proof that the thread was already active.

That signal is too early for this path. `turn/started` is emitted
optimistically from
[`turn_start`](1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L5757-L5767)).
In `single_client_mode`, the listener skips `current_turn_history`
tracking in
[`codex_message_processor.rs`](1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L6461-L6465)),
so running-thread resume still depends on `ThreadWatchManager` observing
the core `TurnStarted` event in
[`bespoke_event_handling.rs`](1103d0037e/codex-rs/app-server/src/bespoke_event_handling.rs (L152-L156)).
If `thread/resume` lands in that window, the thread can still look
`Idle` and the assertion flakes.

## What

- Add a helper in `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
that waits for `thread/status/changed` to report `Active` for the target
thread.
- Use that public v2 notification as the synchronization barrier in the
four running-thread resume tests instead of relying on `turn/started`.

## Follow-up

This PR keeps the fix at the test layer so we can remove the flake
without changing server behavior. A broader runtime fix should still be
considered separately, for example:

- make `turn/start` eagerly transition the thread to `Active` so
`turn/started` and `thread/status/changed` are coherent
- or revisit the `single_client_mode` guard that skips current-turn
tracking for running-thread resume

## Testing

- `cargo test -p codex-app-server thread_resume -- --nocapture`
- `for i in $(seq 1 10); do cargo test -p codex-app-server
'suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch'
-- --exact --nocapture; done`
2026-02-27 19:47:30 +00:00
Jeremy Rose
bc0a5843df Align TUI voice transcription audio with 4o ASR (#13030)
## Summary
- switch TUI push-to-talk transcription requests to
`gpt-4o-mini-transcribe`
- prefer 24 kHz mono `i16` microphone configs and normalize voice input
to 24 kHz mono before upload/send
- add unit coverage for the new downmix/resample path

## Testing
- `just fmt`
- `cargo test -p codex-tui`
2026-02-27 18:22:48 +00:00
Felipe Coury
3b5996f988 fix(tui): promote windows terminal diff ansi16 to truecolor (#13016)
## Summary

- Promote ANSI-16 to truecolor for diff rendering when running inside
Windows Terminal
- Respect explicit `FORCE_COLOR` override, skipping promotion when set
- Extract a pure `diff_color_level_for_terminal` function for
testability
- Strip background tints from ANSI-16 diff output, rendering add/delete
lines with foreground color only
- Introduce `RichDiffColorLevel` to type-safely restrict background
fills to truecolor and ansi256

## Problem

Windows Terminal fully supports 24-bit (truecolor) rendering but often
does not provide the usual TERM metadata (`TERM`, `TERM_PROGRAM`,
`COLORTERM`) in `cmd.exe`/PowerShell sessions. In those environments,
`supports-color` can report only ANSI-16 support. The diff renderer
therefore falls back to a 16-color palette, producing washed-out,
hard-to-read diffs.

The screenshots below demonstrate that both PowerShell and cmd.exe don't
set any `*TERM*` environment variables.

| PowerShell | cmd.exe |
|---|---|
| <img width="2032" height="1162" alt="SCR-20260226-nfvy"
src="https://github.com/user-attachments/assets/59e968cc-4add-4c7b-a415-07163297e86a"
/> | <img width="2032" height="1162" alt="SCR-20260226-nfyc"
src="https://github.com/user-attachments/assets/d06b3e39-bf91-4ce3-9705-82bf9563a01b"
/> |


## Mental model

`StdoutColorLevel` (from `supports-color`) is the _detected_ capability.
`DiffColorLevel` is the _intended_ capability for diff rendering. A new
intermediary — `diff_color_level_for_terminal` — maps one to the other
and is the single place where terminal-specific overrides live.

Windows Terminal is detected two independent ways: the `TerminalName`
parsed by `terminal_info()` and the raw presence of `WT_SESSION`. When
`WT_SESSION` is present and `FORCE_COLOR` is not set, we promote
unconditionally to truecolor. When `WT_SESSION` is absent but
`TerminalName::WindowsTerminal` is detected, we promote only the ANSI-16
level (not `Unknown`).

A single override helper — `has_force_color_override()` — checks whether
`FORCE_COLOR` is set. When it is, both the `WT_SESSION` fast-path and
the `TerminalName`-based promotion are suppressed, preserving explicit
user intent.

| PowerShell | cmd.exe | WSL | Bash for Windows |
|---|---|---|---|
|
![SCR-20260226-msrh](https://github.com/user-attachments/assets/0f6297a6-4241-4dbf-b7ff-cf02da8941b0)
|
![SCR-20260226-nbao](https://github.com/user-attachments/assets/bb5ff8a9-903c-4677-a2de-1f6e1f34b18e)
|
![SCR-20260226-nbej](https://github.com/user-attachments/assets/26ecec2c-a7e9-410a-8702-f73995b490a6)
|
![SCR-20260226-nbkz](https://github.com/user-attachments/assets/80c4bf9a-3b41-40e1-bc87-f5c565f96075)
|

## Non-goals

- This does not change color detection for anything outside the diff
renderer (e.g. the chat widget, markdown rendering).
- This does not add a user-facing config knob; `FORCE_COLOR` already
serves that role.

## Tradeoffs

- The `has_wt_session` signal is intentionally kept separate from
`TerminalName::WindowsTerminal`. `terminal_info()` is derived with
`TERM_PROGRAM` precedence, so it can differ from raw `WT_SESSION`.
- Real-world validation in this issue: in both `cmd.exe` and PowerShell,
`TERM`/`TERM_PROGRAM`/`COLORTERM` were absent, so TERM-based capability
hints were unavailable in those sessions.
- Checking `FORCE_COLOR` for presence rather than parsing its value is a
simplification. In practice `supports-color` has already parsed it, so
our check is a coarse "did the user set _anything_?" gate. The effective
color level still comes from `supports-color`.
- When `WT_SESSION` is present without `FORCE_COLOR`, we promote to
truecolor regardless of `stdout_level` (including `Unknown`). This is
aggressive but correct: `WT_SESSION` is a strong signal that we're in
Windows Terminal.
- ANSI-16 add/delete backgrounds (bright green/red) overpower
syntax-highlighted token colors, making diffs harder to read.
Foreground-only cues (colored text, gutter signs) preserve readability
on low-color terminals.

## Architecture

```
stdout_color_level()  ──┐
terminal_info().name  ──┤
WT_SESSION presence   ──┼──▶ diff_color_level_for_terminal() ──▶ DiffColorLevel
FORCE_COLOR presence  ──┘                                            │
                                                                     ▼
                                                          RichDiffColorLevel::from_diff_color_level()
                                                                     │
                                                          ┌──────────┴──────────┐
                                                          │ Some(TrueColor|256) │ → bg tints
                                                          │ None (Ansi16)       │ → fg only
                                                          └─────────────────────┘
```

`diff_color_level()` is the environment-reading entry point; it gathers
the four runtime signals and delegates to the pure, testable
`diff_color_level_for_terminal()`.

## Observability

No new logs or metrics. Incorrect color selection is immediately visible
as broken diff rendering; the test suite covers the decision matrix
exhaustively.

## Tests

Six new unit tests exercise every branch of
`diff_color_level_for_terminal`:

| Test | Inputs | Expected |
|------|--------|----------|
| `windows_terminal_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
WindowsTerminal name | TrueColor |
| `wt_session_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
WT_SESSION only | TrueColor |
| `non_windows_terminal_keeps_ansi16_diff_palette` | Ansi16 + WezTerm |
Ansi16 |
| `wt_session_promotes_unknown_color_level_to_truecolor` | Unknown +
WT_SESSION | TrueColor |
| `explicit_force_override_keeps_ansi16_on_windows_terminal` | Ansi16 +
WindowsTerminal + FORCE_COLOR | Ansi16 |
| `explicit_force_override_keeps_ansi256_on_windows_terminal` | Ansi256
+ WT_SESSION + FORCE_COLOR | Ansi256 |
| `ansi16_add_style_uses_foreground_only` | Dark + Ansi16 | fg=Green,
bg=None |
| (and any other new snapshot/assertion tests from commits d757fee and
d7c78b3) | | |

## Test plan

- [x] Verify all new unit tests pass (`cargo test -p codex-tui --lib`)
- [x] On Windows Terminal: confirm diffs render with truecolor
backgrounds
- [x] On Windows Terminal with `FORCE_COLOR` set: confirm promotion is
disabled and output follows the forced `supports-color` level
- [x] On macOS/Linux terminals: confirm no behavior change

Fixes https://github.com/openai/codex/issues/12904 
Fixes https://github.com/openai/codex/issues/12890
Fixes https://github.com/openai/codex/issues/12912
Fixes https://github.com/openai/codex/issues/12840
2026-02-27 10:45:59 -07:00
Michael Bolin
d09a7535ed fix: use AbsolutePathBuf for permission profile file roots (#12970)
## Why
`PermissionProfile` should describe filesystem roots as absolute paths
at the type level. Using `PathBuf` in `FileSystemPermissions` made the
shared type too permissive and blurred together three different
deserialization cases:

- skill metadata in `agents/openai.yaml`, where relative paths should
resolve against the skill directory
- app-server API payloads, where callers should have to send absolute
paths
- local tool-call payloads for commands like `shell_command` and
`exec_command`, where `additional_permissions.file_system` may
legitimately be relative to the command `workdir`

This change tightens the shared model without regressing the existing
local command flow.

## What Changed
- changed `protocol::models::FileSystemPermissions` and the app-server
`AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
- wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
relative permission roots in `agents/openai.yaml` resolve against the
containing skill directory
- kept app-server/API deserialization strict, so relative
`additionalPermissions.fileSystem.*` paths are rejected at the boundary
- restored cwd/workdir-relative deserialization for local tool-call
payloads by parsing `shell`, `shell_command`, and `exec_command`
arguments under an `AbsolutePathBufGuard` rooted at the resolved command
working directory
- simplified runtime additional-permission normalization so it only
canonicalizes and deduplicates absolute roots instead of trying to
recover relative ones later
- updated the app-server schema fixtures, `app-server/README.md`, and
the affected transport/TUI tests to match the final behavior
2026-02-27 17:42:52 +00:00
jif-oai
8cf5b00aef fix: more stable notify script (#13011) 2026-02-27 16:05:44 +01:00
jif-oai
fe439afb81 chore: tmp remove awaiter (#13001) 2026-02-27 13:22:17 +01:00
jif-oai
c76bc8d1ce feat: use the memory mode for phase 1 extraction (#13002) 2026-02-27 12:49:03 +01:00
jif-oai
bbd237348d feat: gen memories config (#12999) 2026-02-27 12:38:47 +01:00
jif-oai
a63d8bd569 feat: add use memories config (#12997) 2026-02-27 11:40:54 +01:00
Michael Bolin
e6cd75a684 notify: include client in legacy hook payload (#12968)
## Why

The `notify` hook payload did not identify which Codex client started
the turn. That meant downstream notification hooks could not distinguish
between completions coming from the TUI and completions coming from
app-server clients such as VS Code or Xcode. Now that the Codex App
provides its own desktop notifications, it would be nice to be able to
filter those out.

This change adds that context without changing the existing payload
shape for callers that do not know the client name, and keeps the new
end-to-end test cross-platform.

## What changed

- added an optional top-level `client` field to the legacy `notify` JSON
payload
- threaded that value through `core` and `hooks`; the internal session
and turn state now carries it as `app_server_client_name`
- set the field to `codex-tui` for TUI turns
- captured `initialize.clientInfo.name` in the app server and applied it
to subsequent turns before dispatching hooks
- replaced the notify integration test hook with a `python3` script so
the test does not rely on Unix shell permissions or `bash`
- documented the new field in `docs/config.md`

## Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-tui`
- `cargo test -p codex-app-server
suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name
-- --exact --nocapture`
- `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs`
still has unrelated existing failures in this environment)

## Docs

The public config reference on `developers.openai.com/codex` should
mention that the legacy `notify` payload may include a top-level
`client` field. The TUI reports `codex-tui`, and the app server reports
`initialize.clientInfo.name` when it is available.
2026-02-26 22:27:34 -08:00
Ahmed Ibrahim
53e28f18cf Add realtime websocket tracing (#12981)
- add transport and conversation logs around connect, close, and parse
flow
- log realtime transport failures as errors for easier debugging
2026-02-26 22:15:18 -08:00
Ahmed Ibrahim
4d180ae428 Add model availability NUX metadata (#12972)
- replace show_nux with structured availability_nux model metadata
- expose availability NUX data through the app-server model API
- update shared fixtures and tests for the new field
2026-02-26 22:02:57 -08:00
alexsong-oai
f53612d3b2 Add a background job to refresh the requirements local cache (#12936)
- Update the cloud requirements cache TTL to 30 minutes.
- Add a background job to refresh the cache every 5 minutes.
- Ensure there is only one refresh job per process.
2026-02-27 04:16:19 +00:00
Eric Traut
cee009d117 Add oauth_resource handling for MCP login flows (#12866)
Addresses bug https://github.com/openai/codex/issues/12589

Builds on community PR #12763.

This adds `oauth_resource` support for MCP `streamable_http` servers and
wires it through the relevant config and login paths. It fixes the bug
where the configured OAuth resource was not reliably included in the
authorization request, causing MCP login to omit the expected
`resource` parameter.
2026-02-26 20:10:12 -08:00
Matthew Zeng
6fe3dc2e22 [apps] Improve app/list with force_fetch=true (#12745)
- [x] Improve app/list with force_fetch=true, we now keep cached
snapshot until both install apps and directory apps load.
2026-02-27 03:54:03 +00:00