codex

mirror of https://github.com/openai/codex.git synced 2026-05-02 04:11:39 +03:00

Author	SHA1	Message	Date
Fouad Matin	32c4993c8a	fix(core): default approval behavior for mcp missing annotations (#15519 ) - Changed `requires_mcp_tool_approval` to apply MCP spec defaults when annotations are missing. - Unannotated tools now default to: - `readOnlyHint = false` - `destructiveHint = true` - `openWorldHint = true` - This means unannotated MCP tools now go through approval/ARC monitoring instead of silently bypassing it. - Explicitly read-only tools still skip approval unless they are also explicitly marked destructive. Previous behavior Failed open for missing annotations, which was unsafe for custom MCP tools that omitted or forgot annotations. --------- Co-authored-by: colby-oai <228809017+colby-oai@users.noreply.github.com>	2026-03-25 07:55:41 -07:00
Matthew Zeng	e590fad50b	[plugins] Add a flag for tool search. (#15722 ) - [x] Add a flag for tool search.	2026-03-25 07:00:25 +00:00
Charley Cunningham	d72fa2a209	[codex] Defer fork context injection until first turn (#15699 ) ## Summary - remove the fork-startup `build_initial_context` injection - keep the reconstructed `reference_context_item` as the fork baseline until the first real turn - update fork-history tests and the request snapshot, and add a `TODO(ccunningham)` for remaining nondiffable initial-context inputs ## Why Fork startup was appending current-session initial context immediately after reconstructing the parent rollout, then the first real turn could emit context updates again. That duplicated model-visible context in the child rollout. ## Impact Forked sessions now behave like resume for context seeding: startup reconstructs history and preserves the prior baseline, and the first real turn handles any current-session context emission. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-24 18:34:44 -07:00
Ahmed Ibrahim	0f957a93cd	Move git utilities into a dedicated crate (#15564 ) - create `codex-git-utils` and move the shared git helpers into it with file moves preserved for diff readability - move the `GitInfo` helpers out of `core` so stacked rollout work can depend on the shared crate without carrying its own git info module --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-24 13:26:23 -07:00
Charley Cunningham	2d61357c76	Trim pre-turn context updates during rollback (#15577 ) ## Summary - trim contiguous developer/contextual-user pre-turn updates when rollback cuts back to a user turn - add a focused history regression test for the trim behavior - update the rollback request-boundary snapshots to show the fixed non-duplicating context shape --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-24 12:43:53 -07:00
Celia Chen	88694e8417	chore: stop app-server auth refresh storms after permanent token failure (#15530 ) built from #14256. PR description from @etraut-openai: This PR addresses a hole in [PR 11802](https://github.com/openai/codex/pull/11802). The previous PR assumed that app server clients would respond to token refresh failures by presenting the user with an error ("you must log in again") and then not making further attempts to call network endpoints using the expired token. While they do present the user with this error, they don't prevent further attempts to call network endpoints and can repeatedly call `getAuthStatus(refreshToken=true)` resulting in many failed calls to the token refresh endpoint. There are three solutions I considered here: 1. Change the getAuthStatus app server call to return a null auth if the caller specified "refreshToken" on input and the refresh attempt fails. This will cause clients to immediately log out the user and return them to the log in screen. This is a really bad user experience. It's also a breaking change in the app server contract that could break third-party clients. 2. Augment the getAuthStatus app server call to return an additional field that indicates the state of "token could not be refreshed". This is a non-breaking change to the app server API, but it requires non-trivial changes for all clients to properly handle this new field properly. 3. Change the getAuthStatus implementation to handle the case where a token refresh fails by marking the AuthManager's in-memory access and refresh tokens as "poisoned" so it they are no longer used. This is the simplest fix that requires no client changes. I chose option 3. Here's Codex's explanation of this change: When an app-server client asks `getAuthStatus(refreshToken=true)`, we may try to refresh a stale ChatGPT access token. If that refresh fails permanently (for example `refresh_token_reused`, expired, or revoked), the old behavior was bad in two ways: 1. We kept the in-memory auth snapshot alive as if it were still usable. 2. Later auth checks could retry refresh again and again, creating a storm of doomed `/oauth/token` requests and repeatedly surfacing the same failure. This is especially painful for app-server clients because they poll auth status and can keep driving the refresh path without any real chance of recovery. This change makes permanent refresh failures terminal for the current managed auth snapshot without changing the app-server API contract. What changed: - `AuthManager` now poisons the current managed auth snapshot in memory after a permanent refresh failure, keyed to the unchanged `AuthDotJson`. - Once poisoned, later refresh attempts for that same snapshot fail fast locally without calling the auth service again. - The poison is cleared automatically when auth materially changes, such as a new login, logout, or reload of different auth state from storage. - `getAuthStatus(includeToken=true)` now omits `authToken` after a permanent refresh failure instead of handing out the stale cached bearer token. This keeps the current auth method visible to clients, avoids forcing an immediate logout flow, and stops repeated refresh attempts for credentials that cannot recover. --------- Co-authored-by: Eric Traut <etraut@openai.com>	2026-03-24 12:39:58 -07:00
Celia Chen	7dc2cd2ebe	chore: use access token expiration for proactive auth refresh (#15545 ) Follow up to #15357 by making proactive ChatGPT auth refresh depend on the access token's JWT expiration instead of treating `last_refresh` age as the primary source of truth.	2026-03-24 19:34:48 +00:00
Charley Cunningham	910cf49269	[codex] Stabilize second compaction history test (#15605 ) ## Summary - replace the second-compaction test fixtures with a single ordered `/responses` sequence - assert against the real recorded request order instead of aggregating per-mock captures - realign the second-summary assertion to the first post-compaction user turn where the summary actually appears ## Root cause `compact_resume_after_second_compaction_preserves_history` collected requests from multiple `mount_sse_once_match` recorders. Overlapping matchers could record the same HTTP request more than once, so the test indexed into a duplicated synthetic list rather than the true request stream. That made the summary assertion depend on matcher evaluation order and platform-specific behavior. ## Impact - makes the flaky test deterministic by removing duplicate request capture from the assertion path - keeps the change scoped to the test only ## Validation - `just fmt` - `just argument-comment-lint` - `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core compact_resume_after_second_compaction_preserves_history -- --nocapture` - repeated the same targeted test 10 times --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-24 10:14:21 -07:00
pakrym-oai	f49eb8e9d7	Extract sandbox manager and transforms into codex-sandboxing (#15603 ) Extract sandbox manager	2026-03-24 08:20:57 -07:00
dhruvgupta-oai	c2410060ea	[codex-cli][app-server] Update self-serve business usage limit copy in error returned (#15478 ) ## Summary - update the self-serve business usage-based limit message to direct users to their admin for additional credits - add a focused unit test for the self_serve_business_usage_based plan branch Added also: If you are at a rate limit but you still have credits, codex cli would tell you to switch the model. We shouldnt do this if you have credits so fixed this. ## Test - launched the source-built CLI and verified the updated message is shown for the self-serve business usage-based plan ![Test screenshot](https://raw.githubusercontent.com/openai/codex/5cc3c013ef17ac5c66dfd9395c0d3c4837602231/docs/images/self-serve-business-usage-limit.png)	2026-03-24 04:41:38 +00:00
Charley Cunningham	f547b79bd0	Add fork snapshot modes (#15239 ) ## Summary - add `ForkSnapshotMode` to `ThreadManager::fork_thread` so callers can request either a committed snapshot or an interrupted snapshot - share the model-visible `<turn_aborted>` history marker between the live interrupt path and interrupted forks - update the small set of direct fork callsites to pass `ForkSnapshotMode::Committed` Note: this enables /btw to work similarly as Esc to interrupt (hopefully somewhat in distribution) --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-23 19:05:42 -07:00
Charley Cunningham	0f34b14b41	[codex] Add rollback context duplication snapshot (#15562 ) ## What changed - adds a targeted snapshot test for rollback with contextual diffs in `codex_tests.rs` - snapshots the exact model-visible request input before the rolled-back turn and on the follow-up request after rollback - shows the duplicate developer and environment context pair appearing again before the follow-up user message ## Why Rollback currently rewinds the reference context baseline without rewinding the live session overrides. On the next turn, the same contextual diff is emitted again and duplicated in the request sent to the model. ## Impact - makes the regression visible in a canonical snapshot test - keeps the snapshot on the shared `context_snapshot` path without adding new formatting helpers - gives a direct repro for future fixes to rollback/context reconstruction --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-23 15:36:23 -07:00
Dylan Hurd	67c1c7c054	chore(core) Add approvals reviewer to UserTurn (#15426 ) ## Summary Adds support for approvals_reviewer to `Op::UserTurn` so we can migrate `[CodexMessageProcessor::turn_start]` to use Op::UserTurn ## Testing - [x] Adds quick test for the new field Co-authored-by: Codex <noreply@openai.com>	2026-03-23 15:19:01 -07:00
Andrei Eternal	73bbb07ba8	[hooks] add non-streaming (non-stdin style) shell-only PreToolUse support (#15211 ) - add `PreToolUse` hook for bash-like tool execution only at first - block shell execution before dispatch with deny-only hook behavior - introduces common.rs matcher framework for matching when hooks are run example run: ``` › run three parallel echo commands, and the second one should echo "[block-pre-tool-use]" as a test • Running the three echo commands in parallel now and I’ll report the output directly. • Running PreToolUse hook: name for demo pre tool use hook • Running PreToolUse hook: name for demo pre tool use hook • Running PreToolUse hook: name for demo pre tool use hook PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo "first parallel echo" PreToolUse hook (blocked) warning: wizard-tower PreToolUse demo blocked a Bash command on purpose. feedback: PreToolUse demo blocked the command. Remove [block-pre-tool-use] to continue. PreToolUse hook (completed) warning: wizard-tower PreToolUse demo inspected Bash: echo "third parallel echo" • Ran echo "first parallel echo" └ first parallel echo • Ran echo "third parallel echo" └ third parallel echo • Three little waves went out in parallel. 1. printed first parallel echo 2. was blocked before execution because it contained the exact test string [block-pre-tool-use] 3. printed third parallel echo There was also an unrelated macOS defaults warning around the successful commands, but the echoes themselves worked fine. If you want, I can rerun the second one with a slightly modified string so it passes cleanly. ```	2026-03-23 14:32:59 -07:00
Celia Chen	f55f5c258f	Fix: proactive auth refresh to reload guarded disk state first (#15357 ) ## Summary Fix a managed ChatGPT auth bug where a stale Codex process could proactively refresh using an old in-memory refresh token even after another process had already rotated auth on disk. This changes the proactive `AuthManager::auth()` path to reuse the existing guarded `refresh_token()` flow instead of calling the refresh endpoint directly from cached auth state. ## Original Issue Users reported repeated `codexd` log lines like: ```text ERROR codex_core::auth: Failed to refresh token: error sending request for url (https://auth.openai.com/oauth/token) ``` In practice this showed up most often when multiple `codexd` processes were left running. Killing the extra processes stopped the noise, which suggested the issue was caused by stale auth state across processes rather than invalid user credentials. ## Diagnosis The bug was in the proactive refresh path used by `AuthManager::auth()`: - Process A could refresh successfully, rotate refresh token `R0` to `R1`, and persist the updated auth state plus `last_refresh` to disk. - Process B could keep an older auth snapshot cached in memory, still holding `R0` and the old `last_refresh`. - Later, when Process B called `auth()`, it checked staleness from its cached in-memory auth instead of first reloading from disk. - Because that cached `last_refresh` was stale, Process B would proactively call `/oauth/token` with stale refresh token `R0`. - On failure, `auth()` logged the refresh error but kept returning the same stale cached auth, so repeated `auth()` calls could keep retrying with dead state. This differed from the existing unauthorized-recovery flow, which already did the safer thing: guarded reload from disk first, then refresh only if the on-disk auth was unchanged. ## What Changed - Switched proactive refresh in `AuthManager::auth()` to: - do a pure staleness check on cached auth - call `refresh_token()` when stale - return the original cached auth on genuine refresh failure, preserving existing outward behavior - Removed the direct proactive refresh-from-cached-state path - Added regression tests covering: - stale cached auth with newer same-account auth already on disk - the same scenario even when the refresh endpoint would fail if called ## Why This Fix `refresh_token()` already contains the right cross-process safety behavior: - guarded reload from disk - same-account verification - skip-refresh when another process already changed auth Reusing that path makes proactive refresh consistent with unauthorized recovery and prevents stale processes from trying to refresh already-rotated tokens. ## Testing Test shape: - create a fresh temp `CODEX_HOME` from `~/.codex/auth.json` - force `last_refresh` to an old timestamp so proactive refresh is required - start two long-lived helper processes against the same auth file - start `B` first so it caches stale auth and sleeps - start `A` second so it refreshes first - point both at a local mock `/oauth/token` server - inspect whether `B` makes a second refresh request with the stale in-memory token, or reloads the rotated token from disk ### Before the fix The repro showed the bug clearly: the mock server saw two refreshes with the same stale token, `A` rotated to a new token, and `B` still returned the stale token instead of reloading from disk. ```text POST /oauth/token refresh_token=rt_j6s0... POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rt_j6s0... B:returned=rt_j6s0... A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-logged-run-v2 A:returned=rotated-refresh-token-logged-run-v2 ``` ### After the fix After the fix, the mock server saw only one refresh request. `A` refreshed once, and `B` started with the stale token but reloaded and returned the rotated token. ```text POST /oauth/token refresh_token=rt_j6s0... B:cached_before=rt_j6s0... B:cached_after=rotated-refresh-token-fix-branch B:returned=rotated-refresh-token-fix-branch A:cached_before=rt_j6s0... A:cached_after=rotated-refresh-token-fix-branch A:returned=rotated-refresh-token-fix-branch ``` This shows the new behavior: `A` refreshes once, then `B` reuses the updated auth from disk instead of making a second refresh request with the stale token.	2026-03-23 12:07:59 -07:00
Dylan Hurd	0d9bb8ea58	chore(context) Include guardian approval context (#15366 ) ## Summary Include the guardian context in the developer message for approvals ## Testing - [x] Updated unit tests	2026-03-21 16:31:22 +00:00
Channing Conger	e4eedd6170	Code mode on v8 (#15276 ) Moves Code Mode to a new crate with no dependencies on codex. This create encodes the code mode semantics that we want for lifetime, mounting, tool calling. The model-facing surface is mostly unchanged. `exec` still runs raw JavaScript, `wait` still resumes or terminates a `cell_id`, nested tools are still available through `tools.`, and helpers like `text`, `image`, `store`, `load`, `notify`, `yield_control`, and `exit` still exist. The major change is underneath that surface: - Old code mode was an external Node runtime. - New code mode is an in-process V8 runtime embedded directly in Rust. - Old code mode managed cells inside a long-lived Node runner process. - New code mode manages cells in Rust, with one V8 runtime thread per active `exec`. - Old code mode used JSON protocol messages over child stdin/stdout plus Node worker-thread messages. - New code mode uses Rust channels and direct V8 callbacks/events. This PR also fixes the two migration regressions that fell out of that substrate change: - `wait { terminate: true }` now waits for the V8 runtime to actually stop before reporting termination. - synchronous top-level `exit()` now succeeds again instead of surfacing as a script error. --- - `core/src/tools/code_mode/` is now mostly an adapter layer for the public `exec` / `wait` tools. - `code-mode/src/service.rs` owns cell sessions and async control flow in Rust. - `code-mode/src/runtime/*.rs` owns the embedded V8 isolate and JavaScript execution. - each `exec` spawns a dedicated runtime thread plus a Rust session-control task. - helper globals are installed directly into the V8 context instead of being injected through a source prelude. - helper modules like `tools.js` and `@openai/code_mode` are synthesized through V8 module resolution callbacks in Rust. --- Also added a benchmark for showing the speed of init and use of a code mode env: ``` $ cargo bench -p codex-code-mode --bench exec_overhead -- --samples 30 --warm-iterations 25 --tool-counts 0,32,128 Finished [`bench` profile [optimized]](https://doc.rust-lang.org/cargo/reference/profiles.html#default-profiles) target(s) in 0.18s Running benches/exec_overhead.rs (target/release/deps/exec_overhead-008c440d800545ae) exec_overhead: samples=30, warm_iterations=25, tool_counts=[0, 32, 128] scenario tools samples warmups iters mean/exec p95/exec rssΔ p50 rssΔ max cold_exec 0 30 0 1 1.13ms 1.20ms 8.05MiB 8.06MiB warm_exec 0 30 1 25 473.43us 512.49us 912.00KiB 1.33MiB cold_exec 32 30 0 1 1.03ms 1.15ms 8.08MiB 8.11MiB warm_exec 32 30 1 25 509.73us 545.76us 960.00KiB 1.30MiB cold_exec 128 30 0 1 1.14ms 1.19ms 8.30MiB 8.34MiB warm_exec 128 30 1 25 575.08us 591.03us 736.00KiB 864.00KiB memory uses a fresh-process max RSS delta for each scenario ``` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-20 23:36:58 -07:00
Dylan Hurd	ea8b07e680	chore(core) Remove Feature::PowershellUtf8 (#15128 ) ## Summary This feature has been enabled for powershell for a while now, let's get rid of the logic ## Testing - [x] Unit tests	2026-03-20 22:03:31 +00:00
jif-oai	79ad7b247b	feat: change multi-agent to use path-like system instead of uuids (#15313 ) This PR add an URI-based system to reference agents within a tree. This comes from a sync between research and engineering. The main agent (the one manually spawned by a user) is always called `/root`. Any sub-agent spawned by it will be `/root/agent_1` for example where `agent_1` is chosen by the model. Any agent can contact any agents using the path. Paths can be used either in absolute or relative to the calling agents Resume is not supported for now on this new path	2026-03-20 18:23:48 +00:00
pakrym-oai	4ddde54c19	Add remote test skill (#15324 ) Teach codex to run remote tests.	2026-03-20 10:37:57 -07:00
pakrym-oai	ba85a58039	Add remote env CI matrix and integration test (#14869 ) `CODEX_TEST_REMOTE_ENV` will make `test_codex` start the executor "remotely" (inside a docker container) turning any integration test into remote test.	2026-03-20 08:02:50 -07:00
Ahmed Ibrahim	2e22885e79	Split features into codex-features crate (#15253 ) - Split the feature system into a new `codex-features` crate. - Cut `codex-core` and workspace consumers over to the new config and warning APIs. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-19 20:12:07 -07:00
Michael Bolin	a3e59e9e85	core: add a full-buffer exec capture policy (#15254 )	2026-03-20 02:38:12 +00:00
Ahmed Ibrahim	2aa4873802	Move auth code into login crate (#15150 ) - Move the auth implementation and token data into codex-login. - Keep codex-core re-exporting that surface from codex-login for existing callers. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 18:58:17 -07:00
Won Park	6b8175c734	changed save directory to codex_home (#15222 ) saving image gen default save directory to codex_home/imagegen/thread_id/	2026-03-19 15:16:26 -07:00
nicholasclark-openai	2bee37fe69	Plumb MCP turn metadata through _meta (#15190 ) ## Summary Some background. We're looking to instrument GA turns end to end. Right now a big gap is grouping mcp tool calls with their codex sessions. We send session id and turn id headers to the responses call but not the mcp/wham calls. Ideally we could pass the args as headers like with responses, but given the setup of the rmcp client, we can't send as headers without either changing the rmcp package upstream to allow per request headers or introducing a mutex which break concurrency. An earlier attempt made the assumption that we had 1 client per thread, which allowed us to set headers at the start of a turn. @pakrym mentioned that this assumption might break in the near future. So the solution now is to package the turn metadata/session id into the _meta field in the post body and pull out in codex-backend. - send turn metadata to MCP servers via `tools/call` `_meta` instead of assuming per-thread request headers on shared clients - preserve the existing `_codex_apps` metadata while adding `x-codex-turn-metadata` for all MCP tool calls - extend tests to cover both custom MCP servers and the codex apps search flow --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 22:05:13 +00:00
Andrei Eternal	267499bed8	[hooks] use a user message > developer message for prompt continuation (#14867 ) ## Summary Persist Stop-hook continuation prompts as `user` messages instead of hidden `developer` messages + some requested integration tests This is a followup to @pakrym 's comment in https://github.com/openai/codex/pull/14532 to make sure stop-block continuation prompts match training for turn loops - Stop continuation now writes `<hook_prompt hook_run_id="...">stop hook's user prompt<hook_prompt>` - Introduces quick-xml dependency, though we already indirectly depended on it anyway via syntect - This PR only has about 500 lines of actual logic changes, the rest is tests/schema ## Testing Example run (with a sessionstart hook and 3 stop hooks) - this shows context added by session start, then two stop hooks sending their own additional prompts in a new turn. The model responds with a single message addressing both. Then when that turn ends, the hooks detect that they just ran using `stop_hook_active` and decide not to infinite loop test files for this (unzip, move codex -> .codex): [codex.zip](https://github.com/user-attachments/files/26075806/codex.zip) ``` › cats • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: A wimboltine stonpet is an exotic cuisine from hyperspace • Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk cat facts, cat breeds, cat names, or build something cat-themed in this repo. • Running Stop hook: checking the tower wards • Running Stop hook: sacking the guards • Running Stop hook: hiring the guards Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (177 chars). Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: cook the stonpet Stop hook (blocked) warning: Wizard Tower Stop hook continuing conversation feedback: eat the cooked stonpet • Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and rested until the hyperspace juices settle. Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf: smoky, bright, and totally out of this dimension. • Running Stop hook: checking the tower wards • Running Stop hook: sacking the guards • Running Stop hook: hiring the guards Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (285 chars). Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. Stop hook (completed) warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop. ```	2026-03-19 10:53:08 -07:00
Andrei Eternal	42e932d7bf	[hooks] turn_id extension for Stop & UserPromptSubmit (#15118 ) ## Description Adding an extension to the spec that exposes the turn_id to hook scripts. This is a codex-specific mechanic that allows connecting the hook runs with particular turns ## Testing hooks config / sample hooks to use. Extract this, rename codex -> .codex, and place this into a repo or your home folder. It includes: config.toml that enables hooks, hooks.json, and sample python hooks: [codex.zip](https://github.com/user-attachments/files/26102671/codex.zip) example run (note the turn_ids change between turns): ``` › hello • Running SessionStart hook: lighting the observatory SessionStart hook (completed) warning: Hi, I'm a session start hook for wizard-tower (startup). hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace • Running UserPromptSubmit hook: lighting the observatory lanterns UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: hello for turn: 019d036d-c7fa-72d2-b6fd- 78878bfe34e4 hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' near the end. • Aloha! Grateful to be here and ready to build with you. Show me what you want to tackle in wizard- tower, and we’ll surf the next wave together. observatory lanterns lit • Running Stop hook: back to shore Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (170 chars) for turn: 019d036d-c7fa- 72d2-b6fd-78878bfe34e4 › what's a stonpet? • Running UserPromptSubmit hook: lighting the observatory lanterns UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: what's a stonpet? for turn: 019d036e-3164- 72c3-a170-98925564c4fc hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' near the end. • A stonpet isn’t a standard real-world word, brah. In our shared context here, a wimboltine stonpet is an exotic cuisine from hyperspace, so “stonpet” sounds like the dish or food itself. If you want, we can totally invent the lore for it next. observatory lanterns lit • Running Stop hook: back to shore Stop hook (completed) warning: Wizard Tower Stop hook reviewed the completed reply (271 chars) for turn: 019d036e-3164- 72c3-a170-98925564c4fc ```	2026-03-18 21:48:31 -07:00
Owen Lin	20f2a216df	feat(core, tracing): create turn spans over websockets (#14632 ) ## Description Dependent on: - [responsesapi] https://github.com/openai/openai/pull/760991 - [codex-backend] https://github.com/openai/openai/pull/760985 `codex app-server -> codex-backend -> responsesapi` now reuses a persistent websocket connection across many turns. This PR updates tracing when using websockets so that each `response.create` websocket request propagates the current tracing context, so we can get a holistic end-to-end trace for each turn. Tracing is propagated via special keys (`ws_request_header_traceparent`, `ws_request_header_tracestate`) set in the `client_metadata` param in Responses API. Currently tracing on websockets is a bit broken because we only set tracing context on ws connection time, so it's detached from a `turn/start` request.	2026-03-19 03:41:06 +00:00
pakrym-oai	56d0c6bf67	Add apply_patch code mode result (#15100 ) It's empty !	2026-03-18 16:11:10 -07:00
pakrym-oai	3590e181fa	Add update_plan code mode result (#15103 ) It's empty!	2026-03-18 16:10:51 -07:00
Charley Cunningham	ebbbc52ce4	Align SQLite feedback logs with feedback formatter (#13494 ) ## Summary - store a pre-rendered `feedback_log_body` in SQLite so `/feedback` exports keep span prefixes and structured event fields - render SQLite feedback exports with timestamps and level prefixes to match the old in-memory feedback formatter, while preserving existing trailing newlines - count `feedback_log_body` in the SQLite retention budget so structured or span-prefixed rows still prune correctly - bound `/feedback` row loading in SQL with the retention estimate, then apply exact whole-line truncation in Rust so uploads stay capped without splitting lines ## Details - add a `feedback_log_body` column to `logs` and backfill it from `message` for existing rows - capture span names plus formatted span and event fields at write time, since SQLite does not retain enough structure to reconstruct the old formatter later - keep SQLite feedback queries scoped to the requested thread plus same-process threadless rows - restore a SQL-side cumulative `estimated_bytes` cap for feedback export queries so over-retained partitions do not load every matching row before truncation - add focused formatting coverage for exported feedback lines and parity coverage against `tracing_subscriber` ## Testing - cargo test -p codex-state - just fix -p codex-state - just fmt codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 22:44:31 +00:00
Ahmed Ibrahim	7b37a0350f	Add final message prefix to realtime handoff output (#15077 ) - prefix realtime handoff output with the agent final message label for both realtime v1 and v2 - update realtime websocket and core expectations to match	2026-03-18 15:19:49 -07:00
pakrym-oai	5cada46ddf	Return image URL from view_image tool (#15072 ) Cleanup image semantics in code mode. `view_image` now returns `{image_url:string, details?: string}` `image()` now allows both string parameter and `{image_url:string, details?: string}`	2026-03-18 13:58:20 -07:00
pakrym-oai	88e5382fc4	Propagate tool errors to code mode (#15075 ) Clean up error flow to push the FunctionCallError all the way up to dispatcher and allow code mode to surface as exception.	2026-03-18 13:57:55 -07:00
pakrym-oai	606d85055f	Add notify to code-mode (#14842 ) Allows model to send an out-of-band notification. The notification is injected as another tool call output for the same call_id.	2026-03-18 09:37:13 -07:00
Dylan Hurd	84f4e7b39d	fix(subagents) share execpolicy by default (#13702 ) ## Summary If a subagent requests approval, and the user persists that approval to the execpolicy, it should (by default) propagate. We'll need to rethink this a bit in light of coming Permissions changes, though I think this is closer to the end state that we'd want, which is that execpolicy changes to one permissions profile should be synced across threads. ## Testing - [x] Added integration test --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 06:42:26 +00:00
Andrei Eternal	6fef421654	[hooks] userpromptsubmit - hook before user's prompt is executed (#14626 ) - this allows blocking the user's prompts from executing, and also prevents them from entering history - handles the edge case where you can both prevent the user's prompt AND add n amount of additionalContexts - refactors some old code into common.rs where hooks overlap functionality - refactors additionalContext being previously added to user messages, instead we use developer messages for them - handles queued messages correctly Sample hook for testing - if you write "[block-user-submit]" this hook will stop the thread: example run ``` › sup • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: sup hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' exactly once near the end. • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory lanterns lit › and [block-user-submit] • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (stopped) warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose. stop: Wizard Tower demo block: remove [block-user-submit] to continue. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py", "timeoutSec": 10, "statusMessage": "reading the observatory notes" } ] } ] } } ``` .codex/hooks/user_prompt_submit_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def prompt_from_payload(payload: dict) -> str: prompt = payload.get("prompt") if isinstance(prompt, str) and prompt.strip(): return prompt.strip() event = payload.get("event") if isinstance(event, dict): user_prompt = event.get("user_prompt") if isinstance(user_prompt, str): return user_prompt.strip() return "" def main() -> int: payload = json.load(sys.stdin) prompt = prompt_from_payload(payload) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" if "[block-user-submit]" in prompt: print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo blocked the prompt on purpose." ), "decision": "block", "reason": ( "Wizard Tower demo block: remove [block-user-submit] to continue." ), } ) ) return 0 prompt_preview = prompt or "(empty prompt)" if len(prompt_preview) > 80: prompt_preview = f"{prompt_preview[:77]}..." print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}" ), "hookSpecificOutput": { "hookEventName": "UserPromptSubmit", "additionalContext": ( "Wizard Tower UserPromptSubmit demo fired. " "For this reply only, include the exact phrase " "'observatory lanterns lit' exactly once near the end." ), }, } ) ) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-17 22:09:22 -07:00
Ahmed Ibrahim	3ce879c646	Handle realtime conversation end in the TUI (#14903 ) - close live realtime sessions on errors, ctrl-c, and active meter removal - centralize TUI realtime cleanup and avoid duplicate follow-up close info --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 21:04:58 -07:00
pakrym-oai	770616414a	Prefer websockets when providers support them (#13592 ) Remove all flags and model settings. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 19:46:44 -07:00
Ahmed Ibrahim	98be562fd3	Unify realtime shutdown in core (#14902 ) - route realtime startup, input, and transport failures through a single shutdown path - emit one realtime error/closed lifecycle while clearing session state once --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 15:58:52 -07:00
Ahmed Ibrahim	c6ab4ee537	Gate realtime audio interruption logic to v2 (#14984 ) - thread the realtime version into conversation start and app-server notifications - keep playback-aware mic gating and playback interruption behavior on v2 only, leaving v1 on the legacy path	2026-03-17 15:24:37 -07:00
pakrym-oai	ee756eb80f	Rename exec_wait tool to wait (#14983 ) Summary - document that code mode only exposes `exec` and the renamed `wait` tool - update code mode tool spec and descriptions to match the new tool name - rename tests and helper references from `exec_wait` to `wait` Testing - Not run (not requested)	2026-03-17 14:22:26 -07:00
Ahmed Ibrahim	4d9d4b7b0f	Stabilize approval matrix write-file command (#14968 ) ## What is flaky The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O. ## Why it was flaky The test generated a command shaped like `printf ... > file && cat file`. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine. That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload. ## How this PR fixes it This PR replaces the shell-redirection path with a deterministic `python3 -c` script that writes the file with `Path.write_text(..., encoding='utf-8')` and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable. ## Why this fix fixes the flakiness The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 13:52:36 -07:00
Ahmed Ibrahim	b02388672f	Stabilize Windows cmd-based shell test harnesses (#14958 ) ## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:21:46 +00:00
Owen Lin	6ea041032b	fix(core): prevent hanging turn/start due to websocket warming issues (#14838 ) ## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ```	2026-03-17 10:07:46 -07:00
Ahmed Ibrahim	fbd7f9b986	[stack 2/4] Align main realtime v2 wire and runtime flow (#14830 ) ## Stack Position 2/4. Built on top of #14828. ## Base - #14828 ## Unblocks - #14829 - #14827 ## Scope - Port the realtime v2 wire parsing, session, app-server, and conversation runtime behavior onto the split websocket-method base. - Branch runtime behavior directly on the current realtime session kind instead of parser-derived flow flags. - Keep regression coverage in the existing e2e suites. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-16 21:38:07 -07:00
Michael Bolin	b77fe8fefe	Apply argument comment lint across codex-rs (#14652 ) ## Why Once the repo-local lint exists, `codex-rs` needs to follow the checked-in convention and CI needs to keep it from drifting. This commit applies the fallback `/param/` style consistently across existing positional literal call sites without changing those APIs. The longer-term preference is still to avoid APIs that require comments by choosing clearer parameter types and call shapes. This PR is intentionally the mechanical follow-through for the places where the existing signatures stay in place. After rebasing onto newer `main`, the rollout also had to cover newly introduced `tui_app_server` call sites. That made it clear the first cut of the CI job was too expensive for the common path: it was spending almost as much time installing `cargo-dylint` and re-testing the lint crate as a representative test job spends running product tests. The CI update keeps the full workspace enforcement but trims that extra overhead from ordinary `codex-rs` PRs. ## What changed - keep a dedicated `argument_comment_lint` job in `rust-ci` - mechanically annotate remaining opaque positional literals across `codex-rs` with exact `/param/` comments, including the rebased `tui_app_server` call sites that now fall under the lint - keep the checked-in style aligned with the lint policy by using `/param/` and leaving string and char literals uncommented - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo registry/git metadata in the lint job - split changed-path detection so the lint crate's own `cargo test` step runs only when `tools/argument-comment-lint/` or `rust-ci.yml` changes - continue to run the repo wrapper over the `codex-rs` workspace, so product-code enforcement is unchanged Most of the code changes in this commit are intentionally mechanical comment rewrites or insertions driven by the lint itself. ## Verification - `./tools/argument-comment-lint/run.sh --workspace` - `cargo test -p codex-tui-app-server -p codex-tui` - parsed `.github/workflows/rust-ci.yml` locally with PyYAML --- -> #14652 * #14651	2026-03-16 16:48:15 -07:00
pakrym-oai	a3ba10b44b	Add exit helper to code mode scripts (#14851 ) - Summary - expose `exit` through the code mode bridge and module so scripts can stop mid-flight - surface the helper in the description documentation - add a regression test ensuring `exit()` terminates execution cleanly - Testing - Not run (not requested)	2026-03-16 22:07:58 +00:00
Charley Cunningham	6fdeb1d602	Reuse guardian session across approvals (#14668 ) ## Summary - reuse a guardian subagent session across approvals so reviews keep a stable prompt cache key and avoid one-shot startup overhead - clear the guardian child history before each review so prior guardian decisions do not leak into later approvals - include the `smart_approvals` -> `guardian_approval` feature flag rename in the same PR to minimize release latency on a very tight timeline - add regression coverage for prompt-cache-key reuse without prior-review prompt bleed ## Request - Bug/enhancement request: internal guardian prompt-cache and latency improvement request --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-15 22:56:18 -07:00

1 2 3 4 5 ...

972 Commits