Commit Graph

710 Commits

Author SHA1 Message Date
daveaitel-openai
dcab40123f Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
## Summary
- Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
auto-export, and store results in SQLite.
- Simplify workflow: remove run/resume/get-status/export tools; spawn is
deterministic and completes in one call.
- Improve exec UX: stable, single-line progress bar with ETA; suppress
sub-agent chatter in exec.

## Why
Enables map-reduce style workflows over arbitrarily large repos using
the existing Codex orchestrator. This addresses review feedback about
overly complex job controls and non-deterministic monitoring.

## Demo (progress bar)
```
./codex-rs/target/debug/codex exec \
  --enable collab \
  --enable sqlite \
  --full-auto \
  --progress-cursor \
  -c agents.max_threads=16 \
  -C /Users/daveaitel/code/codex \
  - <<'PROMPT'
Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
path = item-01..item-30, area = test.

Then call spawn_agents_on_csv with:
- csv_path: /tmp/agent_job_progress_demo.csv
- instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
- output_csv_path: /tmp/agent_job_progress_demo_out.csv
PROMPT
```

## Review feedback addressed
- Auto-start jobs on spawn; removed run/resume/status/export tools.
- Auto-export on success.
- More descriptive tool spec + clearer prompts.
- Avoid deadlocks on spawn failure; pending/running handled safely.
- Progress bar no longer scrolls; stable single-line redraw.

## Tests
- `cd codex-rs && cargo test -p codex-exec`
- `cd codex-rs && cargo build -p codex-cli`
2026-02-24 21:00:19 +00:00
pakrym-oai
daf0f03ac8 Ensure shell command skills trigger approval (#12697)
Summary
- detect skill-invoking shell commands based on the original command
string, request approvals when needed, and cache positive decisions per
session
- keep implicit skill invocation emitted after approval and keep skill
approval decline messaging centralized to the shell handler
- expand and adjust skill approval tests to cover shell-based skill
scripts while matching the new detection expectations

Testing
- Not run (not requested)
2026-02-24 12:13:20 -08:00
sayan-oai
0b6c2e5652 fix: also try matching namespaced prefix for modelinfo candidate (#12658)
#### What
Try matching `\w+`-namespaced model after `longest prefix` as heuristic
to match `ModelInfo` from list of candidates.

This shouldn't regress existing behavior:
- `gpt-5.2-codex` -> `gpt-5.2` if `gpt-5.2-codex` not present
- `gpt-5.3` -> `gpt-5` if `gpt-5.3` not present
- `gpt-9` still doesn't match anything

while being more forgiving for custom prefixes:
- `oai/gpt-5.3-codex` -> `gpt-5.3-codex`

#### Tests
Added unit test.
2026-02-24 10:57:26 -08:00
Dylan Hurd
f6053fdfb3 feat(core) Introduce Feature::RequestPermissions (#11871)
## Summary
Introduces the initial implementation of Feature::RequestPermissions.
RequestPermissions allows the model to request that a command be run
inside the sandbox, with additional permissions, like writing to a
specific folder. Eventually this will include other rules as well, and
the ability to persist these permissions, but this PR is already quite
large - let's get the core flow working and go from there!

<img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
/>

## Testing
- [x] Added tests
- [x] Tested locally
- [x] Feature
2026-02-24 09:48:57 -08:00
pakrym-oai
97d0068658 Send warmup request (#11258)
Send a request with `generate: falls` but a full set of tools and
instructions to pre-warm inference.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-24 08:15:47 -08:00
sayan-oai
7e46e5b9c2 chore: rm hardcoded PRESETS list (#12650)
rm `PRESETS` list harcoded in `model_presets` as we now have bundled
`models.json` with equivalent info.

update logic to rely on bundled models instead, update tests.
2026-02-23 22:35:51 -08:00
pakrym-oai
58763afa0f Add skill approval event/response (#12633)
Set the stage for skill-level permission approval in addition to
command-level.

Behind a feature flag.
2026-02-23 22:28:58 -08:00
viyatb-oai
c3048ff90a feat(core): persist network approvals in execpolicy (#12357)
## Summary
Persist network approval allow/deny decisions as `network_rule(...)`
entries in execpolicy (not proxy config)

It adds `network_rule` parsing + append support in `codex-execpolicy`,
including `decision="prompt"` (parse-only; not compiled into proxy
allow/deny lists)
- compile execpolicy network rules into proxy allow/deny lists and
update the live proxy state on approval
- preserve requirements execpolicy `network_rule(...)` entries when
merging with file-based execpolicy
- reject broad wildcard hosts (for example `*`) for persisted
`network_rule(...)`
2026-02-23 21:37:46 -08:00
Ahmed Ibrahim
10a3adad8e Handle realtime spawn_transcript delegation (#12619) 2026-02-23 14:39:07 -08:00
pakrym-oai
335a4e1cbc Return image content from view_image (#12553)
Responses API supports image content
2026-02-22 23:00:08 -08:00
Michael Bolin
e8949f4507 test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518)
## Why

The zsh integration tests were still brittle in two ways:

- they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
they often did not exercise the patched zsh fork that `shell-tool-mcp`
ships
- once the tests consistently used the vendored zsh fork, they exposed
real Linux-specific zsh-fork issues in CI

In particular, the Linux failures were not just test noise:

- the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
`codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
could receive malformed arguments
- the
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
test uses the zsh exec bridge (which talks to the parent over a Unix
socket), but Linux restricted sandbox seccomp denies `connect(2)`,
causing timeouts on `ubuntu-24.04` x86/arm

This PR makes the zsh tests consistently run against the intended
vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
signal is meaningful.

## What Changed

- Added a single shared test-only DotSlash file for the patched zsh fork
at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
`bash` test resource).
- Updated both app-server and exec-server zsh tests to use that shared
DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
dependency).
- Updated the app-server zsh-fork test helper to resolve the shared
DotSlash zsh and avoid silently falling back to host zsh.
- Kept the app-server zsh-fork tests configured via `config.toml`, using
a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
to `-c`) for the subcommand-decline test.
- Hardened the app-server subcommand-decline zsh-fork test for CI
variability:
  - tolerate an extra `/responses` POST with a no-op mock response
- tolerate non-target approval ordering while remaining strict on the
two `/usr/bin/true` approvals and decline behavior
- use `DangerFullAccess` on Linux for this one test because it validates
zsh approval flow, not Linux sandbox socket restrictions
- Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
`ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
arg0 dispatch continues to work.
- Moved `maybe_run_zsh_exec_wrapper_mode()` under
`arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
handling coexists correctly with arg0-dispatched helper modes.
- Consolidated duplicated `dotslash -- fetch` resolution logic into
shared test support (`core/tests/common/lib.rs`).
- Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
differences by:
  - resolving an absolute `git` path
  - running `git init --quiet .`
- asserting success / `.git` creation instead of relying on banner text

## Verification

- `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
- `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
- `bazel test //codex-rs/exec-server:exec-server-all-test
--test_output=streamed --test_arg=--nocapture
--test_arg=accept_elicitation_for_prompt_rule_with_zsh`
- CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
aarch64-unknown-linux-gnu` passed in [run
22291424358](https://github.com/openai/codex/actions/runs/22291424358)
2026-02-22 19:39:56 -08:00
Ahmed Ibrahim
e00fa19328 Revert "Revert "Route inbound realtime text into turn start or steer"" (#12480)
With working tests this time

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-22 11:54:16 -08:00
Ahmed Ibrahim
55fc075723 Send events to realtime api (#12423)
- Send assistant messages, ExecCommandBegin, and
PatchApplyBegin/PatchApplyEnd
2026-02-21 23:24:51 -08:00
Michael Bolin
b73c4b50a2 fix: make realtime conversation flake test order-insensitive (#12475)
## Why

`codex-core::all` has a flaky test,
`suite::realtime_conversation::conversation_start_audio_text_close_round_trip`,
that assumes a fixed ordering between `conversation.item.create` and
`response.input_audio.delta` requests.

That ordering is not guaranteed: realtime text and audio input are
forwarded through separate queues and a background task, so either
request can be observed first while still being correct behavior.

## What Changed

- Updated the assertion in
`codex-rs/core/tests/suite/realtime_conversation.rs` to compare the two
observed request types order-independently.
- Kept the existing checks that `session.create` is sent first and that
exactly two follow-up requests are recorded.

## Verification

- Re-ran `cargo test -p codex-core --test all
conversation_start_audio_text_close_round_trip` 10 times locally.
2026-02-21 17:06:35 -08:00
Ahmed Ibrahim
5e505ff877 Revert "Route inbound realtime text into turn start or steer" (#12479)
Reverts openai/codex#12469
2026-02-21 15:46:03 -08:00
Ahmed Ibrahim
031d701705 Route inbound realtime text into turn start or steer (#12469)
- Route inbound realtime websocket text into normal user input handling
so it steers an active turn or starts a new one
2026-02-21 15:45:27 -08:00
pakrym-oai
b17148f13a Prefer v2 websockets if available (#12428)
And also cleanup settings flow to avoid reading many separate flags.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-21 20:08:04 +00:00
Michael Bolin
1af2a37ada chore: remove codex-core public protocol/shell re-exports (#12432)
## Why

`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.

This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.

## What Changed

- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
  - `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
  - `codex_protocol::protocol`
  - `codex_protocol::config_types`
  - `codex_protocol::models`
  - `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
  - `codex-utils-approval-presets`
  - `codex-utils-cli`

## Verification

- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
2026-02-20 23:45:35 -08:00
Michael Bolin
1a220ad77d chore: move config diagnostics out of codex-core (#12427)
## Why

Compiling `codex-rs/core` is a bottleneck for local iteration, so this
change continues the ongoing extraction of config-related functionality
out of `codex-core` and into `codex-config`.

The goal is not just to move code, but to reduce `codex-core` ownership
and indirection so more code depends on `codex-config` directly.

## What Changed

- Moved config diagnostics logic from
`core/src/config_loader/diagnostics.rs` into
`config/src/diagnostics.rs`.
- Updated `codex-core` to use `codex-config` diagnostics types/functions
directly where possible.
- Removed the `core/src/config_loader/diagnostics.rs` shim module
entirely; the remaining `ConfigToml`-specific calls are in
`core/src/config_loader/mod.rs`.
- Moved `CONFIG_TOML_FILE` into `codex-config` and updated existing
references to use `codex_config::CONFIG_TOML_FILE` directly.
- Added a direct `codex-config` dependency to `codex-cli` for its
`CONFIG_TOML_FILE` use.
2026-02-20 23:19:29 -08:00
Charley Cunningham
bb0ac5be70 Fix compaction context reinjection and model baselines (#12252)
## Summary
- move regular-turn context diff/full-context persistence into
`run_turn` so pre-turn compaction runs before incoming context updates
are recorded
- after successful pre-turn compaction, rely on a cleared
`reference_context_item` to trigger full context reinjection on the
follow-up regular turn (manual `/compact` keeps replacement history
summary-only and also clears the baseline)
- preserve `<model_switch>` when full context is reinjected, and inject
it *before* the rest of the full-context items
- scope `reference_context_item` and `previous_model` to regular user
turns only so standalone tasks (`/compact`, shell, review, undo) cannot
suppress future reinjection or `<model_switch>` behavior
- make context-diff persistence + `reference_context_item` updates
explicit in the regular-turn path, with clearer docs/comments around the
invariant
- stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
(only regular turns persist `TurnContextItem` now)
- simplify resume/fork previous-model/reference-baseline hydration by
looking up the last surviving turn context from rollout lifecycle
events, including rollback and compaction-crossing handling
- remove the legacy fallback that guessed from bare `TurnContext`
rollouts without lifecycle events
- update compaction/remote-compaction/model-visible snapshots and
compact test assertions (including remote compaction mock response
shape)

## Why
We were persisting incoming context items before spawning the regular
turn task, which let pre-turn compaction requests accidentally include
incoming context diffs without the new user message. Fixing that exposed
follow-on baseline issues around `/compact`, resume/fork, and standalone
tasks that could cause duplicate context injection or suppress
`<model_switch>` instructions.

This PR re-centers the invariants around regular turns:
- regular turns persist model-visible context diffs/full reinjection and
update the `reference_context_item`
- standalone tasks do not advance those regular-turn baselines
- compaction clears the baseline when replacement history may have
stripped the referenced context diffs

## Follow-ups (TODOs left in code)
- `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
comprehensively
- `TODO(ccunningham)`: include pending incoming context items in
pre-turn compaction threshold estimation
- `TODO(ccunningham)`: inject updated personality spec alongside
`<model_switch>` so some model-switch paths can avoid forced full
reinjection
- `TODO(ccunningham)`: review task turn lifecycle
(`TurnStarted`/`TurnComplete`) behavior and emit task-start context
diffs for task types that should have them (excluding `/compact`)

## Validation
- `just fmt`
- CI should cover the updated compaction/resume/model-visible snapshot
expectations and rollout-hydration behavior
- I did **not** rerun the full local test suite after the latest
resume-lookup / rollout-persistence simplifications
2026-02-20 23:13:08 -08:00
Dylan Hurd
a8b4b569fb fix(core) Filter non-matching prefix rules (#12314)
## Summary
`gpt-5.3-codex` really likes to write complicated shell scripts, and
suggest a partial prefix_rule that wouldn't actually approve the
command. We should only show the `prefix_rule` suggestion from the model
if it would actually fully approve the command the user is seeing.

This will technically cause more instances of overly-specific
suggestions when we fallback, but I think the UX is clearer,
particularly when the model doesn't necessarily understand the current
limitations of execpolicy parsing.

## Testing
 - [x] Add unit tests
 - [x] Add integration tests
2026-02-20 22:02:35 -08:00
Ahmed Ibrahim
b237f7cbb1 Add experimental realtime websocket backend prompt override (#12418)
- add top-level `experimental_realtime_ws_backend_prompt` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
`backend_prompt`, with config + realtime tests
2026-02-20 20:10:51 -08:00
Ahmed Ibrahim
7ae5d88016 Add experimental realtime websocket URL override (#12416)
- add top-level `experimental_realtime_ws_base_url` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
transport, with config + realtime tests
2026-02-20 19:51:20 -08:00
Ahmed Ibrahim
6817f0be8a Wire realtime api to core (#12268)
- Introduce `RealtimeConversationManager` for realtime API management 
- Add `op::conversation` to start conversation, insert audio, insert
text, and close conversation.
- emit conversation lifecycle and realtime events.
- Move shared realtime payload types into codex-protocol and add core
e2e websocket tests for start/replace/transport-close paths.

Things to consider:
- Should we use the same `op::` and `Events` channel to carry audio? I
think we should try this simple approach and later we can create
separate one if the channels got congested.
- Sending text updates to the client: we can start simple and later
restrict that.
- Provider auth isn't wired for now intentionally
2026-02-20 19:06:35 -08:00
viyatb-oai
60c2b7beca core tests: use hermetic mock server in review suite (#12291)
## Summary
- switch the review test SSE mock helper to use the shared hermetic mock
server setup
- ensure review tests always have a default `/v1/models` stub during
Codex session bootstrap
- remove the race that caused intermittent `/v1/models` connection
failures and flaky ETag refresh assertions

## Testing
- `just fmt`
- `cargo test -p codex-core --test all
refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch`
- `cargo test -p codex-core --test all
review_uses_custom_review_model_from_config`
- repeated both targeted tests 5x in a loop
- `cargo clippy -p codex-core --tests -- -D warnings`
2026-02-20 12:50:12 -08:00
viyatb-oai
e8afaed502 Refactor network approvals to host/protocol/port scope (#12140)
## Summary
Simplify network approvals by removing per-attempt proxy correlation and
moving to session-level approval dedupe keyed by (host, protocol, port).
Instead of encoding attempt IDs into proxy credentials/URLs, we now
treat approvals as a destination policy decision.

- Concurrent calls to the same destination share one approval prompt.
- Different destinations (or same host on different ports) get separate
prompts.
- Allow once approves the current queued request group only.
- Allow for session caches that (host, protocol, port) and auto-allows
future matching requests.
- Never policy continues to deny without prompting.

Example:
- 3 calls: 
  - a.com (line 443)
  - b.com (line 443)
  - a.com (line 443)
=> 2 prompts total (a, b), second a waits on the first decision.
- a.com:80 is treated separately from a.com line 443

## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core tools::network_approval::tests`
- `cargo test -p codex-core` (unit tests pass; existing
integration-suite failures remain in this environment)
2026-02-20 10:39:55 -08:00
pakrym-oai
86803ca9bf Reuse connection between turns (#12294)
Add a pool of one to the model client to reuse connections across turns.
2026-02-20 10:09:46 -08:00
colby-oai
2036a5f5e0 Add MCP server context to otel tool_result logs (#12267)
Summary
- capture the origin for each configured MCP server and expose it via
the connection manager
- plumb MCP server name/origin into tool logging and emit
codex.tool_result events with those fields
- add unit coverage for origin parsing and extend OTEL tests to assert
empty MCP fields for non-MCP tools
- currently not logging full urls or url paths to prevent logging
potentially sensitive data

Testing
- Not run (not requested)
2026-02-20 10:26:19 -05:00
jif-oai
0f9eed3a6f feat: add nick name to sub-agents (#12320)
Adding random nick name to sub-agents. Used for UX

At the same time, also storing and wiring the role of the sub-agent
2026-02-20 14:39:49 +00:00
dkumar-oai
1070a0a712 Add configurable MCP OAuth callback URL for MCP login (#11382)
## Summary

Implements a configurable MCP OAuth callback URL override for `codex mcp
login` and app-server OAuth login flows, including support for non-local
callback endpoints (for example, devbox ingress URLs).

## What changed

- Added new config key: `mcp_oauth_callback_url` in
`~/.codex/config.toml`.
- OAuth authorization now uses `mcp_oauth_callback_url` as
`redirect_uri` when set.
- Callback handling validates the callback path against the configured
redirect URI path.
- Listener bind behavior is now host-aware:
- local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to
`127.0.0.1`
  - non-local callback URL hosts bind to `0.0.0.0`
- `mcp_oauth_callback_port` remains supported and is used for the
listener port.
- Wired through:
  - CLI MCP login flow
  - App-server MCP OAuth login flow
  - Skill dependency OAuth login flow
- Updated config schema and config tests.

## Why

Some environments need OAuth callbacks to land on a specific reachable
URL (for example ingress in remote devboxes), not loopback. This change
allows that while preserving local defaults for existing users.

## Backward compatibility

- No behavior change when `mcp_oauth_callback_url` is unset.
- Existing `mcp_oauth_callback_port` behavior remains intact.
- Local callback flows continue binding to loopback by default.

## Testing

- `cargo test -p codex-rmcp-client callback -- --nocapture`
- `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture`
- `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client`

## Example config

```toml
mcp_oauth_callback_port = 5555
mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"
2026-02-19 13:32:10 -08:00
pash-openai
429cc4860e ws turn metadata via client_metadata (#11953) 2026-02-19 12:28:15 -08:00
jif-oai
f6c06108b1 try fix 2 (#12264) 2026-02-19 19:36:42 +00:00
jif-oai
928be5f515 Revert "feat: no timeout mode on ue" (#12256)
Reverts openai/codex#12250
2026-02-19 19:02:29 +00:00
jif-oai
9719dc502c feat: no timeout mode on ue (#12250) 2026-02-19 18:58:13 +00:00
sayan-oai
d54999d006 client side modelinfo overrides (#12101)
TL;DR
Add top-level `model_catalog_json` config support so users can supply a
local model catalog override from a JSON file path (including adding new
models) without backend changes.

### Problem
Codex previously had no clean client-side way to replace/overlay model
catalog data for local testing of model metadata and new model entries.

### Fix
- Add top-level `model_catalog_json` config field (JSON file path).
- Apply catalog entries when resolving `ModelInfo`:
  1. Base resolved model metadata (remote/fallback)
  2. Catalog overlay from `model_catalog_json`
3. Existing global top-level overrides (`model_context_window`,
`model_supports_reasoning_summaries`, etc.)

### Note
Will revisit per-field overrides in a follow-up

### Tests
Added tests
2026-02-19 10:38:57 -08:00
jif-oai
2daa3fd44f feat: sub-agent injection (#12152)
This PR adds parent-thread sub-agent completion notifications and change
the prompt of the model to prevent if from being confused
2026-02-19 11:32:10 +00:00
Eric Traut
227352257c Update docs links for feature flag notice (#12164)
Summary
- replace the stale `docs/config.md#feature-flags` reference in the
legacy feature notice with the canonical published URL
- align the deprecation notice test to expect the new link

This addresses #12123
2026-02-19 00:00:44 -08:00
Eric Traut
999576f7b8 Fixed a hole in token refresh logic for app server (#11802)
We've continued to receive reports from users that they're seeing the
error message "Your access token could not be refreshed because your
refresh token was already used. Please log out and sign in again." This
PR fixes two holes in the token refresh logic that lead to this
condition.

Background: A previous change in token refresh introduced the
`UnauthorizedRecovery` object. It implements a state machine in the core
agent loop that first performs a load of the on-disk auth information
guarded by a check for matching account ID. If it finds that the on-disk
version has been updated by another instance of codex, it uses the
reloaded auth tokens. If the on-disk version hasn't been updated, it
issues a refresh request from the token authority.

There are two problems that this PR addresses:

Problem 1: We weren't doing the same thing for the code path used by the
app server interface. This PR effectively replicates the
`UnauthorizedRecovery` logic for that code path.

Problem 2: The `UnauthorizedRecovery` logic contained a hole in the
`ReloadOutcome::Skipped` case. Here's the scenario. A user starts two
instances of the CLI. Instance 1 is active (working on a task), instance
2 is idle. Both instances have the same in-memory cached tokens. The
user then runs `codex logout` or `codex login` to log in to a separate
account, which overwrites the `auth.json` file. Instance 1 receives a
401 and refreshes its token, but it doesn't write the new token to the
`auth.json` file because the account ID doesn't match. Instance 2 is
later activated and presented with a new task. It immediately hits a 401
and attempts to refresh its token but fails because its cached refresh
token is now invalid. To avoid this situation, I've changed the logic to
immediately fail a token refresh if the user has since logged out or
logged in to another account. This will still be seen as an error by the
user, but the cause will be clearer.

I also took this opportunity to clean up the names of existing functions
to make their roles clearer.
* `try_refresh_token` is renamed `request_chatgpt_token_refresh`
* the existing `refresh_token` is renamed `refresh_token_from_authority`
(there's a new higher-level function named `refresh_token` now)
* `refresh_tokens` is renamed `refresh_and_persist_chatgpt_token`, and
it now implicitly reloads
* `update_tokens` is renamed `persist_tokens`
2026-02-18 09:27:04 -08:00
Charley Cunningham
c16f9daaaf Add model-visible context layout snapshot tests (#12073)
## Summary
- add a dedicated `core/tests/suite/model_visible_layout.rs` snapshot
suite to materialize model-visible request layout in high-value
scenarios
- add three reviewer-focused snapshot scenarios:
  - turn-level context updates (cwd / permissions / personality)
  - first post-resume turn with model hydration + personality change
- first post-resume turn where pre-turn model override matches rollout
model
- wire the new suite into `core/tests/suite/mod.rs`
- commit generated `insta` snapshots under `core/tests/suite/snapshots/`

## Why
This creates a stable, reviewable baseline of model-visible context
layout against `main` before follow-on context-management refactors. It
lets subsequent PRs show focused snapshot diffs for behavior changes
instead of introducing the test surface and behavior changes at once.

## Testing
- `just fmt`
- `INSTA_UPDATE=always cargo test -p codex-core model_visible_layout`
2026-02-17 22:30:29 -08:00
pakrym-oai
fc810ba045 Use V2 websockets if feature enabled (#12071) 2026-02-17 18:32:16 -08:00
Charley Cunningham
eb68767f2f Unify remote compaction snapshot mocks around default endpoint behavior (#12050)
## Summary
- standardize remote compaction test mocking around one default behavior
in shared helpers
- make default remote compact mocks mirror production shape: keep
`message/user` + `message/developer`, drop assistant/tool artifacts,
then append a summary user message
- switch non-special `compact_remote` tests to the shared default mock
instead of ad-hoc JSON payloads

## Special-case tests that still use explicit mocks
- remote compaction error payload / HTTP failure behavior
- summary-only compact output behavior
- manual `/compact` with no prior user messages
- stale developer-instruction injection coverage

## Why
This removes inconsistent manual remote compaction fixtures and gives us
one source of truth for normal remote compact behavior, while preserving
explicit mocks only where tests intentionally cover non-default
behavior.
2026-02-17 18:18:47 -08:00
Owen Lin
db4d2599b5 feat(core): plumb distinct approval ids for command approvals (#12051)
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051 👈 
- https://github.com/openai/codex/pull/12052

With upcoming support for a fork of zsh that allows us to intercept
`execve` and run execpolicy checks for each subcommand as part of a
`CommandExecution`, it will be possible for there to be multiple
approval requests for a shell command like `/path/to/zsh -lc 'git status
&& rg \"TODO\" src && make test'`.

To support that, this PR introduces a new `approval_id` field across
core, protocol, and app-server so that we can associate approvals
properly for subcommands.
2026-02-18 01:55:57 +00:00
Shijie Rao
b3a8571219 Chore: remove response model check and rely on header model for downgrade (#12061)
### Summary
Ensure that we use the model value from the response header only so that
we are guaranteed with the correct slug name. We are no longer checking
against the model value from response so that we are less likely to have
false positive.

There are two different treatments - for SSE we use the header from the
response and for websocket we check top-level events.
2026-02-18 01:50:06 +00:00
sayan-oai
41800fc876 chore: rm remote models fflag (#11699)
rm `remote_models` feature flag.

We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.

Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
2026-02-17 11:43:16 -08:00
Shijie Rao
48018e9eac Feat: add model reroute notification (#12001)
### Summary
Builiding off
5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0),
we have created a `model/rerouted` notification that captures the event
so that consumers can render as expected. Keep the `EventMsg::Warning`
path in core so that this does not affect TUI rendering.

`model/rerouted` is meant to be generic to account for future usage
including capacity planning etc.
2026-02-17 11:02:23 -08:00
sayan-oai
a1b8e34938 chore: clarify web_search deprecation notices and consolidate tests (#11224)
follow up to #10406, clarify default-enablement of web_search.

also consolidate pseudo-redundant tests

Tests pass
2026-02-17 18:20:24 +00:00
Dylan Hurd
0fbe10a807 fix(core) exec_policy parsing fixes (#11951)
## Summary
Fixes a few things in our exec_policy handling of prefix_rules:
1. Correctly match redirects specifically for exec_policy parsing. i.e.
if you have `prefix_rule(["echo"], decision="allow")` then `echo hello >
output.txt` should match - this should fix #10321
2. If there already exists any rule that would match our prefix rule
(not just a prompt), then drop it, since it won't do anything.


## Testing
- [x] Updated unit tests, added approvals ScenarioSpecs
2026-02-16 23:11:59 -08:00
Fouad Matin
02e9006547 add(core): safety check downgrade warning (#11964)
Add per-turn notice when a request is downgraded to a fallback model due
to cyber safety checks.

**Changes**

- codex-api: Emit a ServerModel event based on the openai-model response
header and/or response payload (SSE + WebSocket), including when the
model changes mid-stream.
- core: When the server-reported model differs from the requested model,
emit a single per-turn warning explaining the reroute to gpt-5.2 and
directing users to Trusted
    Access verification and the cyber safety explainer.
- app-server (v2): Surface these cyber model-routing warnings as
synthetic userMessage items with text prefixed by Warning: (and document
this behavior).
2026-02-16 22:13:36 -08:00
Dylan Hurd
19afbc35c1 chore(core) rm Feature::RequestRule (#11866)
## Summary
This feature is now reasonably stable, let's remove it so we can
simplify our upcoming iterations here.

## Testing 
- [x] Existing tests pass
2026-02-16 22:30:23 +00:00
jif-oai
af434b4f71 feat: drop MCP managing tools if no MCP servers (#11900)
Drop MCP tools if no MCP servers to save context

For this https://github.com/openai/codex/issues/11049
2026-02-16 18:40:45 +00:00