codex

mirror of https://github.com/openai/codex.git synced 2026-05-01 03:42:05 +03:00

Author	SHA1	Message	Date
jif-oai	5441130e0a	feat: adding stream parser (#12666 ) Add a stream parser to extract citations (and others) from a stream. This support cases where markers are split in differen tokens. Codex never manage to make this code work so everything was done manually. Please review correctly and do not touch this part of the code without a very clear understanding of it	2026-02-25 13:27:58 +00:00
jif-oai	5a9a5b51b2	feat: add large stack test macro (#12768 ) This PR adds the macro `#[large_stack_test]` This spawns the tests in a dedicated tokio runtime with a larger stack. It is useful for tests that needs the full recursion on the harness (which is now too deep for windows for example)	2026-02-25 13:19:21 +00:00
jif-oai	10c04e11b8	feat: add service name to app-server (#12319 ) Add service name to the app-server so that the app can use it's own service name This is on thread level because later we might plan the app-server to become a singleton on the computer	2026-02-25 09:51:42 +00:00
Celia Chen	6a3233da64	Surface skill permission profiles in zsh-fork exec approvals (#12753 ) ## Summary - Preserve each skill’s raw permissions block as a permission_profile on SkillMetadata during skill loading. - Keep compiling that same metadata into the existing runtime Permissions object, so current enforcement behavior stays intact. - When zsh-fork intercepts execution of a script that belongs to a skill, include the skill’s permission_profile in the exec approval request. - This lets approval UIs show the extra filesystem access the skill declared when prompting for approval.	2026-02-25 01:23:10 -08:00
Michael Bolin	59398125f6	feat: zsh-fork forces scripts/*/ for skills to trigger a prompt (#12730 ) Direct skill-script matches force `Decision::Prompt`, so skill-backed scripts require explicit approval before they run. (Note "allow for session" is not supported in this PR, but will be done in a follow-up.) In the process of implementing this, I fixed an important bug: `ShellZshFork` is supposed to keep ordinary allowed execs on the client-side `Run` path so later `execve()` calls are still intercepted and reviewed. After the shell-escalation port, `Decision::Allow` still mapped to `Escalate`, which moved `zsh` to server-side execution too early. That broke the intended flow for skill-backed scripts and made the approval prompt depend on the wrong execution path. ## What changed - In `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`, `Decision::Allow` now returns `Run` unless escalation is actually required. - Removed the zsh-specific `argv[0]` fallback. With the `Allow -> Run` fix in place, zsh's later `execve()` of the script is intercepted normally, so the skill match happens on the script path itself. - Kept the skill-path handling in `determine_action()` focused on the direct `program` match path. ## Verification - Updated `shell_zsh_fork_prompts_for_skill_script_execution` in `codex-rs/core/tests/suite/skill_approval.rs` (gated behind `cfg(unix)`) to: - run under `SandboxPolicy::new_workspace_write_policy()` instead of `DangerFullAccess` - assert the approval command contains only the script path - assert the approved run returns both stdout and stderr markers in the shell output - Ran `cargo test -p codex-core shell_zsh_fork_prompts_for_skill_script_execution -- --nocapture` ## Manual Testing Run the dev build: ``` just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork ``` I have created `/Users/mbolin/.agents/skills/mbolin-test-skill` with: ``` ├── scripts │ └── hello-mbolin.sh └── SKILL.md ``` The skill: ``` --- name: mbolin-test-skill description: Used to exercise various features of skills. --- When this skill is invoked, run the `hello-mbolin.sh` script and report the output. ``` The script: ``` set -e # Note this script will fail if run with network disabled. curl --location openai.com ``` Use `$mbolin-test-skill` to invoke the skill manually and verify that I get prompted to run `hello-mbolin.sh`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12730). * #12750 * __->__ #12730	2026-02-24 23:51:26 -08:00
viyatb-oai	c086b36b58	feat(ui): add network approval persistence plumbing (#12358 ) ## Summary - add TUI approval options for persistent network host rules - add app-server v2 approval payload plumbing for network approval context + proposed network policy amendments - add app-server handling to translate `applyNetworkPolicyAmendment` decisions back into core review decisions - update docs/test client output and generated app-server schemas/types	2026-02-25 07:06:19 +00:00
Curtis 'Fjord' Hawthorne	9501669a24	tests(js_repl): remove node-related skip paths from js_repl tests (#12185 ) ## Summary Remove js_repl/node test-skip paths and make Node setup explicit in CI so js_repl tests always run instead of silently skipping. ## Why We had multiple “expediency” skip paths that let js_repl tests pass without actually exercising Node-backed behavior. This reduced CI signal and hid runtime/environment regressions. ## What changed ### CI - Added Node setup using `codex-rs/node-version.txt` in: - `.github/workflows/rust-ci.yml` - `.github/workflows/bazel.yml` - Added a Unix PATH copy step in Bazel workflow to expose the setup-node binary in common paths. ### js_repl test harness - Added explicit js_repl sandbox test configuration helpers in: - `codex-rs/core/src/tools/js_repl/mod.rs` - `codex-rs/core/src/tools/handlers/js_repl.rs` - Added Linux arg0 dispatch glue for js_repl tests so sandbox subprocess entrypoint behavior is correct under Linux test execution. ### Removed skip behavior - Deleted runtime guard function and early-return skips in js_repl tests (`can_run_js_repl_runtime_tests` and related per-test short-circuits). - Removed view_image integration test skip behavior: - dropped `skip_if_no_network!(Ok(()))` - removed “skip on Node missing/too old” branch after js_repl output inspection. ## Impact - js_repl/node tests now consistently execute and fail loudly when the environment is not correctly provisioned. - CI has stronger signal for js_repl regressions instead of false green from conditional skips. ## Testing - `cargo test -p codex-core` (locally) to validate js_repl unit/integration behavior with skips removed. - CI expected to surface any remaining environment/runtime gaps directly (rather than masking them). #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/12300 - ✅ `2` https://github.com/openai/codex/pull/12275 - ✅ `3` https://github.com/openai/codex/pull/12205 - ✅ `4` https://github.com/openai/codex/pull/12407 - ✅ `5` https://github.com/openai/codex/pull/12372 - 👉 `6` https://github.com/openai/codex/pull/12185 - ⏳ `7` https://github.com/openai/codex/pull/10673	2026-02-24 22:52:14 -08:00
Curtis 'Fjord' Hawthorne	8f3f2c3c02	tests(js_repl): stabilize CI runtime test execution (#12407 ) ## Summary Stabilize `js_repl` runtime test setup in CI and move tool-facing `js_repl` behavior coverage into integration tests. This is a test/CI change only. No production `js_repl` behavior change is intended. ## Why - Bazel test sandboxes (especially on macOS) could resolve a different `node` than the one installed by `actions/setup-node`, which caused `js_repl` runtime/version failures. - `js_repl` runtime tests depend on platform-specific sandbox/test-harness behavior, so they need explicit gating in a base-stability commit. - Several tests in the `js_repl` unit test module were actually black-box/tool-level behavior tests and fit better in the integration suite. ## Changes - Add `actions/setup-node` to the Bazel and Rust `Tests` workflows, using the exact version pinned in the repo’s Node version file. - In Bazel (non-Windows), pass `CODEX_JS_REPL_NODE_PATH=$(which node)` into test env so `js_repl` uses the `actions/setup-node` runtime inside Bazel tests. - Add a new integration test suite for `js_repl` tool behavior and register it in the core integration test suite module. - Move black-box `js_repl` behavior tests into the integration suite (persistence/TLA, builtin tool invocation, recursive self-call rejection, `process` isolation, blocked builtin imports). - Keep white-box manager/kernel tests in the `js_repl` unit test module. - Gate `js_repl` runtime tests to run only on macOS and only when a usable Node runtime is available (skip on other platforms / missing Node in this commit). ## Impact - Reduces `js_repl` CI failures caused by Node resolution drift in Bazel. - Improves test organization by separating tool-facing behavior tests from white-box manager/kernel tests. - Keeps the base commit stable while expanding `js_repl` runtime coverage. #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/12372 - 👉 `2` https://github.com/openai/codex/pull/12407 - ⏳ `3` https://github.com/openai/codex/pull/12185 - ⏳ `4` https://github.com/openai/codex/pull/10673	2026-02-24 21:04:34 -08:00
Celia Chen	16ca527c80	chore: migrate additional permissions to PermissionProfile (#12731 ) This PR replaces the old `additional_permissions.fs_read/fs_write` shape with a shared `PermissionProfile` model and wires it through the command approval, sandboxing, protocol, and TUI layers. The schema is adopted from the `SkillManifestPermissions`, which is also refactored to use this unified struct. This helps us easily expose permission profiles in app server/core as a follow-up.	2026-02-25 03:35:28 +00:00
Curtis 'Fjord' Hawthorne	125fbec317	Fix js_repl view_image attachments in nested tool calls (#12725 ) ## Summary - Fix `js_repl` so `await codex.tool("view_image", { path })` actually attaches the image to the active turn when called from inside the JS REPL. - Restore the behavior expected by the existing `js_repl` image-attachment test. - This is a follow-up to [#12553](https://github.com/openai/codex/pull/12553), which changed `view_image` to return structured image content. ## Root Cause - [#12553](https://github.com/openai/codex/pull/12553) changed `view_image` from directly injecting a pending user image message to returning structured `function_call_output` content items. - The nested tool-call bridge inside `js_repl` serialized that tool response back to the JS runtime, but it did not mirror returned image content into the active turn. - As a result, `view_image` appeared to succeed inside `js_repl`, but no `input_image` was actually attached for the outer turn. ## What Changed - Updated the nested tool-call path in `js_repl` to inspect function tool responses for structured content items. - When a nested tool response includes `input_image` content, `js_repl` now injects a corresponding user `Message` into the active turn before returning the raw tool result back to the JS runtime. - Kept the normal JSON result flow intact, so `codex.tool(...)` still returns the original tool output object to JavaScript. ## Why - `js_repl` documentation and tests already assume that `view_image` can be used from inside the REPL to attach generated images to the model. - Without this fix, the nested call path silently dropped that attachment behavior.	2026-02-24 18:23:53 -08:00
daveaitel-openai	dcab40123f	Agent jobs (spawn_agents_on_csv) + progress UI (#10935 ) ## Summary - Add agent job support: spawn a batch of sub-agents from CSV, auto-run, auto-export, and store results in SQLite. - Simplify workflow: remove run/resume/get-status/export tools; spawn is deterministic and completes in one call. - Improve exec UX: stable, single-line progress bar with ETA; suppress sub-agent chatter in exec. ## Why Enables map-reduce style workflows over arbitrarily large repos using the existing Codex orchestrator. This addresses review feedback about overly complex job controls and non-deterministic monitoring. ## Demo (progress bar) ``` ./codex-rs/target/debug/codex exec \ --enable collab \ --enable sqlite \ --full-auto \ --progress-cursor \ -c agents.max_threads=16 \ -C /Users/daveaitel/code/codex \ - <<'PROMPT' Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows: path = item-01..item-30, area = test. Then call spawn_agents_on_csv with: - csv_path: /tmp/agent_job_progress_demo.csv - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1." - output_csv_path: /tmp/agent_job_progress_demo_out.csv PROMPT ``` ## Review feedback addressed - Auto-start jobs on spawn; removed run/resume/status/export tools. - Auto-export on success. - More descriptive tool spec + clearer prompts. - Avoid deadlocks on spawn failure; pending/running handled safely. - Progress bar no longer scrolls; stable single-line redraw. ## Tests - `cd codex-rs && cargo test -p codex-exec` - `cd codex-rs && cargo build -p codex-cli`	2026-02-24 21:00:19 +00:00
pakrym-oai	daf0f03ac8	Ensure shell command skills trigger approval (#12697 ) Summary - detect skill-invoking shell commands based on the original command string, request approvals when needed, and cache positive decisions per session - keep implicit skill invocation emitted after approval and keep skill approval decline messaging centralized to the shell handler - expand and adjust skill approval tests to cover shell-based skill scripts while matching the new detection expectations Testing - Not run (not requested)	2026-02-24 12:13:20 -08:00
sayan-oai	0b6c2e5652	fix: also try matching namespaced prefix for modelinfo candidate (#12658 ) #### What Try matching `\w+`-namespaced model after `longest prefix` as heuristic to match `ModelInfo` from list of candidates. This shouldn't regress existing behavior: - `gpt-5.2-codex` -> `gpt-5.2` if `gpt-5.2-codex` not present - `gpt-5.3` -> `gpt-5` if `gpt-5.3` not present - `gpt-9` still doesn't match anything while being more forgiving for custom prefixes: - `oai/gpt-5.3-codex` -> `gpt-5.3-codex` #### Tests Added unit test.	2026-02-24 10:57:26 -08:00
Dylan Hurd	f6053fdfb3	feat(core) Introduce Feature::RequestPermissions (#11871 ) ## Summary Introduces the initial implementation of Feature::RequestPermissions. RequestPermissions allows the model to request that a command be run inside the sandbox, with additional permissions, like writing to a specific folder. Eventually this will include other rules as well, and the ability to persist these permissions, but this PR is already quite large - let's get the core flow working and go from there! <img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM" src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368" /> ## Testing - [x] Added tests - [x] Tested locally - [x] Feature	2026-02-24 09:48:57 -08:00
pakrym-oai	97d0068658	Send warmup request (#11258 ) Send a request with `generate: falls` but a full set of tools and instructions to pre-warm inference. --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-24 08:15:47 -08:00
sayan-oai	7e46e5b9c2	chore: rm hardcoded PRESETS list (#12650 ) rm `PRESETS` list harcoded in `model_presets` as we now have bundled `models.json` with equivalent info. update logic to rely on bundled models instead, update tests.	2026-02-23 22:35:51 -08:00
pakrym-oai	58763afa0f	Add skill approval event/response (#12633 ) Set the stage for skill-level permission approval in addition to command-level. Behind a feature flag.	2026-02-23 22:28:58 -08:00
viyatb-oai	c3048ff90a	feat(core): persist network approvals in execpolicy (#12357 ) ## Summary Persist network approval allow/deny decisions as `network_rule(...)` entries in execpolicy (not proxy config) It adds `network_rule` parsing + append support in `codex-execpolicy`, including `decision="prompt"` (parse-only; not compiled into proxy allow/deny lists) - compile execpolicy network rules into proxy allow/deny lists and update the live proxy state on approval - preserve requirements execpolicy `network_rule(...)` entries when merging with file-based execpolicy - reject broad wildcard hosts (for example `*`) for persisted `network_rule(...)`	2026-02-23 21:37:46 -08:00
Ahmed Ibrahim	10a3adad8e	Handle realtime spawn_transcript delegation (#12619 )	2026-02-23 14:39:07 -08:00
pakrym-oai	335a4e1cbc	Return image content from view_image (#12553 ) Responses API supports image content	2026-02-22 23:00:08 -08:00
Ahmed Ibrahim	e00fa19328	Revert "Revert "Route inbound realtime text into turn start or steer"" (#12480 ) With working tests this time --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-22 11:54:16 -08:00
Ahmed Ibrahim	55fc075723	Send events to realtime api (#12423 ) - Send assistant messages, ExecCommandBegin, and PatchApplyBegin/PatchApplyEnd	2026-02-21 23:24:51 -08:00
Michael Bolin	b73c4b50a2	fix: make realtime conversation flake test order-insensitive (#12475 ) ## Why `codex-core::all` has a flaky test, `suite::realtime_conversation::conversation_start_audio_text_close_round_trip`, that assumes a fixed ordering between `conversation.item.create` and `response.input_audio.delta` requests. That ordering is not guaranteed: realtime text and audio input are forwarded through separate queues and a background task, so either request can be observed first while still being correct behavior. ## What Changed - Updated the assertion in `codex-rs/core/tests/suite/realtime_conversation.rs` to compare the two observed request types order-independently. - Kept the existing checks that `session.create` is sent first and that exactly two follow-up requests are recorded. ## Verification - Re-ran `cargo test -p codex-core --test all conversation_start_audio_text_close_round_trip` 10 times locally.	2026-02-21 17:06:35 -08:00
Ahmed Ibrahim	5e505ff877	Revert "Route inbound realtime text into turn start or steer" (#12479 ) Reverts openai/codex#12469	2026-02-21 15:46:03 -08:00
Ahmed Ibrahim	031d701705	Route inbound realtime text into turn start or steer (#12469 ) - Route inbound realtime websocket text into normal user input handling so it steers an active turn or starts a new one	2026-02-21 15:45:27 -08:00
pakrym-oai	b17148f13a	Prefer v2 websockets if available (#12428 ) And also cleanup settings flow to avoid reading many separate flags. --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-21 20:08:04 +00:00
Michael Bolin	1af2a37ada	chore: remove codex-core public protocol/shell re-exports (#12432 ) ## Why `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules from `codex-protocol` and `codex-shell-command`. That made it easy for workspace crates to import those APIs through `codex-core`, which in turn hides dependency edges and makes it harder to reduce compile-time coupling over time. This change removes those public re-exports so call sites must import from the source crates directly. Even when a crate still depends on `codex-core` today, this makes dependency boundaries explicit and unblocks future work to drop `codex-core` dependencies where possible. ## What Changed - Removed public re-exports from `codex-rs/core/src/lib.rs` for: - `codex_protocol::protocol` and related protocol/model types (including `InitialHistory`) - `codex_protocol::config_types` (`protocol_config_types`) - `codex_shell_command::{bash, is_dangerous_command, is_safe_command, parse_command, powershell}` - Migrated workspace Rust call sites to import directly from: - `codex_protocol::protocol` - `codex_protocol::config_types` - `codex_protocol::models` - `codex_shell_command` - Added explicit `Cargo.toml` dependencies (`codex-protocol` / `codex-shell-command`) in crates that now import those crates directly. - Kept `codex-core` internal modules compiling by using `pub(crate)` aliases in `core/src/lib.rs` (internal-only, not part of the public API). - Updated the two utility crates that can already drop a `codex-core` dependency edge entirely: - `codex-utils-approval-presets` - `codex-utils-cli` ## Verification - `cargo test -p codex-utils-approval-presets` - `cargo test -p codex-utils-cli` - `cargo check --workspace --all-targets` - `just clippy`	2026-02-20 23:45:35 -08:00
Michael Bolin	1a220ad77d	chore: move config diagnostics out of codex-core (#12427 ) ## Why Compiling `codex-rs/core` is a bottleneck for local iteration, so this change continues the ongoing extraction of config-related functionality out of `codex-core` and into `codex-config`. The goal is not just to move code, but to reduce `codex-core` ownership and indirection so more code depends on `codex-config` directly. ## What Changed - Moved config diagnostics logic from `core/src/config_loader/diagnostics.rs` into `config/src/diagnostics.rs`. - Updated `codex-core` to use `codex-config` diagnostics types/functions directly where possible. - Removed the `core/src/config_loader/diagnostics.rs` shim module entirely; the remaining `ConfigToml`-specific calls are in `core/src/config_loader/mod.rs`. - Moved `CONFIG_TOML_FILE` into `codex-config` and updated existing references to use `codex_config::CONFIG_TOML_FILE` directly. - Added a direct `codex-config` dependency to `codex-cli` for its `CONFIG_TOML_FILE` use.	2026-02-20 23:19:29 -08:00
Charley Cunningham	bb0ac5be70	Fix compaction context reinjection and model baselines (#12252 ) ## Summary - move regular-turn context diff/full-context persistence into `run_turn` so pre-turn compaction runs before incoming context updates are recorded - after successful pre-turn compaction, rely on a cleared `reference_context_item` to trigger full context reinjection on the follow-up regular turn (manual `/compact` keeps replacement history summary-only and also clears the baseline) - preserve `<model_switch>` when full context is reinjected, and inject it before the rest of the full-context items - scope `reference_context_item` and `previous_model` to regular user turns only so standalone tasks (`/compact`, shell, review, undo) cannot suppress future reinjection or `<model_switch>` behavior - make context-diff persistence + `reference_context_item` updates explicit in the regular-turn path, with clearer docs/comments around the invariant - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots (only regular turns persist `TurnContextItem` now) - simplify resume/fork previous-model/reference-baseline hydration by looking up the last surviving turn context from rollout lifecycle events, including rollback and compaction-crossing handling - remove the legacy fallback that guessed from bare `TurnContext` rollouts without lifecycle events - update compaction/remote-compaction/model-visible snapshots and compact test assertions (including remote compaction mock response shape) ## Why We were persisting incoming context items before spawning the regular turn task, which let pre-turn compaction requests accidentally include incoming context diffs without the new user message. Fixing that exposed follow-on baseline issues around `/compact`, resume/fork, and standalone tasks that could cause duplicate context injection or suppress `<model_switch>` instructions. This PR re-centers the invariants around regular turns: - regular turns persist model-visible context diffs/full reinjection and update the `reference_context_item` - standalone tasks do not advance those regular-turn baselines - compaction clears the baseline when replacement history may have stripped the referenced context diffs ## Follow-ups (TODOs left in code) - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more comprehensively - `TODO(ccunningham)`: include pending incoming context items in pre-turn compaction threshold estimation - `TODO(ccunningham)`: inject updated personality spec alongside `<model_switch>` so some model-switch paths can avoid forced full reinjection - `TODO(ccunningham)`: review task turn lifecycle (`TurnStarted`/`TurnComplete`) behavior and emit task-start context diffs for task types that should have them (excluding `/compact`) ## Validation - `just fmt` - CI should cover the updated compaction/resume/model-visible snapshot expectations and rollout-hydration behavior - I did not rerun the full local test suite after the latest resume-lookup / rollout-persistence simplifications	2026-02-20 23:13:08 -08:00
Dylan Hurd	a8b4b569fb	fix(core) Filter non-matching prefix rules (#12314 ) ## Summary `gpt-5.3-codex` really likes to write complicated shell scripts, and suggest a partial prefix_rule that wouldn't actually approve the command. We should only show the `prefix_rule` suggestion from the model if it would actually fully approve the command the user is seeing. This will technically cause more instances of overly-specific suggestions when we fallback, but I think the UX is clearer, particularly when the model doesn't necessarily understand the current limitations of execpolicy parsing. ## Testing - [x] Add unit tests - [x] Add integration tests	2026-02-20 22:02:35 -08:00
Ahmed Ibrahim	b237f7cbb1	Add experimental realtime websocket backend prompt override (#12418 ) - add top-level `experimental_realtime_ws_backend_prompt` config key (experimental / do not use) and include it in config schema - apply the override only to `Op::RealtimeConversation` websocket `backend_prompt`, with config + realtime tests	2026-02-20 20:10:51 -08:00
Ahmed Ibrahim	7ae5d88016	Add experimental realtime websocket URL override (#12416 ) - add top-level `experimental_realtime_ws_base_url` config key (experimental / do not use) and include it in config schema - apply the override only to `Op::RealtimeConversation` websocket transport, with config + realtime tests	2026-02-20 19:51:20 -08:00
Ahmed Ibrahim	6817f0be8a	Wire realtime api to core (#12268 ) - Introduce `RealtimeConversationManager` for realtime API management - Add `op::conversation` to start conversation, insert audio, insert text, and close conversation. - emit conversation lifecycle and realtime events. - Move shared realtime payload types into codex-protocol and add core e2e websocket tests for start/replace/transport-close paths. Things to consider: - Should we use the same `op::` and `Events` channel to carry audio? I think we should try this simple approach and later we can create separate one if the channels got congested. - Sending text updates to the client: we can start simple and later restrict that. - Provider auth isn't wired for now intentionally	2026-02-20 19:06:35 -08:00
viyatb-oai	60c2b7beca	core tests: use hermetic mock server in review suite (#12291 ) ## Summary - switch the review test SSE mock helper to use the shared hermetic mock server setup - ensure review tests always have a default `/v1/models` stub during Codex session bootstrap - remove the race that caused intermittent `/v1/models` connection failures and flaky ETag refresh assertions ## Testing - `just fmt` - `cargo test -p codex-core --test all refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch` - `cargo test -p codex-core --test all review_uses_custom_review_model_from_config` - repeated both targeted tests 5x in a loop - `cargo clippy -p codex-core --tests -- -D warnings`	2026-02-20 12:50:12 -08:00
viyatb-oai	e8afaed502	Refactor network approvals to host/protocol/port scope (#12140 ) ## Summary Simplify network approvals by removing per-attempt proxy correlation and moving to session-level approval dedupe keyed by (host, protocol, port). Instead of encoding attempt IDs into proxy credentials/URLs, we now treat approvals as a destination policy decision. - Concurrent calls to the same destination share one approval prompt. - Different destinations (or same host on different ports) get separate prompts. - Allow once approves the current queued request group only. - Allow for session caches that (host, protocol, port) and auto-allows future matching requests. - Never policy continues to deny without prompting. Example: - 3 calls: - a.com (line 443) - b.com (line 443) - a.com (line 443) => 2 prompts total (a, b), second a waits on the first decision. - a.com:80 is treated separately from a.com line 443 ## Testing - `just fmt` (in `codex-rs`) - `cargo test -p codex-core tools::network_approval::tests` - `cargo test -p codex-core` (unit tests pass; existing integration-suite failures remain in this environment)	2026-02-20 10:39:55 -08:00
pakrym-oai	86803ca9bf	Reuse connection between turns (#12294 ) Add a pool of one to the model client to reuse connections across turns.	2026-02-20 10:09:46 -08:00
colby-oai	2036a5f5e0	Add MCP server context to otel tool_result logs (#12267 ) Summary - capture the origin for each configured MCP server and expose it via the connection manager - plumb MCP server name/origin into tool logging and emit codex.tool_result events with those fields - add unit coverage for origin parsing and extend OTEL tests to assert empty MCP fields for non-MCP tools - currently not logging full urls or url paths to prevent logging potentially sensitive data Testing - Not run (not requested)	2026-02-20 10:26:19 -05:00
jif-oai	0f9eed3a6f	feat: add nick name to sub-agents (#12320 ) Adding random nick name to sub-agents. Used for UX At the same time, also storing and wiring the role of the sub-agent	2026-02-20 14:39:49 +00:00
dkumar-oai	1070a0a712	Add configurable MCP OAuth callback URL for MCP login (#11382 ) ## Summary Implements a configurable MCP OAuth callback URL override for `codex mcp login` and app-server OAuth login flows, including support for non-local callback endpoints (for example, devbox ingress URLs). ## What changed - Added new config key: `mcp_oauth_callback_url` in `~/.codex/config.toml`. - OAuth authorization now uses `mcp_oauth_callback_url` as `redirect_uri` when set. - Callback handling validates the callback path against the configured redirect URI path. - Listener bind behavior is now host-aware: - local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to `127.0.0.1` - non-local callback URL hosts bind to `0.0.0.0` - `mcp_oauth_callback_port` remains supported and is used for the listener port. - Wired through: - CLI MCP login flow - App-server MCP OAuth login flow - Skill dependency OAuth login flow - Updated config schema and config tests. ## Why Some environments need OAuth callbacks to land on a specific reachable URL (for example ingress in remote devboxes), not loopback. This change allows that while preserving local defaults for existing users. ## Backward compatibility - No behavior change when `mcp_oauth_callback_url` is unset. - Existing `mcp_oauth_callback_port` behavior remains intact. - Local callback flows continue binding to loopback by default. ## Testing - `cargo test -p codex-rmcp-client callback -- --nocapture` - `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture` - `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client` ## Example config ```toml mcp_oauth_callback_port = 5555 mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"	2026-02-19 13:32:10 -08:00
pash-openai	429cc4860e	ws turn metadata via client_metadata (#11953 )	2026-02-19 12:28:15 -08:00
jif-oai	f6c06108b1	try fix 2 (#12264 )	2026-02-19 19:36:42 +00:00
jif-oai	928be5f515	Revert "feat: no timeout mode on ue" (#12256 ) Reverts openai/codex#12250	2026-02-19 19:02:29 +00:00
jif-oai	9719dc502c	feat: no timeout mode on ue (#12250 )	2026-02-19 18:58:13 +00:00
sayan-oai	d54999d006	client side modelinfo overrides (#12101 ) TL;DR Add top-level `model_catalog_json` config support so users can supply a local model catalog override from a JSON file path (including adding new models) without backend changes. ### Problem Codex previously had no clean client-side way to replace/overlay model catalog data for local testing of model metadata and new model entries. ### Fix - Add top-level `model_catalog_json` config field (JSON file path). - Apply catalog entries when resolving `ModelInfo`: 1. Base resolved model metadata (remote/fallback) 2. Catalog overlay from `model_catalog_json` 3. Existing global top-level overrides (`model_context_window`, `model_supports_reasoning_summaries`, etc.) ### Note Will revisit per-field overrides in a follow-up ### Tests Added tests	2026-02-19 10:38:57 -08:00
jif-oai	2daa3fd44f	feat: sub-agent injection (#12152 ) This PR adds parent-thread sub-agent completion notifications and change the prompt of the model to prevent if from being confused	2026-02-19 11:32:10 +00:00
Eric Traut	227352257c	Update docs links for feature flag notice (#12164 ) Summary - replace the stale `docs/config.md#feature-flags` reference in the legacy feature notice with the canonical published URL - align the deprecation notice test to expect the new link This addresses #12123	2026-02-19 00:00:44 -08:00
Eric Traut	999576f7b8	Fixed a hole in token refresh logic for app server (#11802 ) We've continued to receive reports from users that they're seeing the error message "Your access token could not be refreshed because your refresh token was already used. Please log out and sign in again." This PR fixes two holes in the token refresh logic that lead to this condition. Background: A previous change in token refresh introduced the `UnauthorizedRecovery` object. It implements a state machine in the core agent loop that first performs a load of the on-disk auth information guarded by a check for matching account ID. If it finds that the on-disk version has been updated by another instance of codex, it uses the reloaded auth tokens. If the on-disk version hasn't been updated, it issues a refresh request from the token authority. There are two problems that this PR addresses: Problem 1: We weren't doing the same thing for the code path used by the app server interface. This PR effectively replicates the `UnauthorizedRecovery` logic for that code path. Problem 2: The `UnauthorizedRecovery` logic contained a hole in the `ReloadOutcome::Skipped` case. Here's the scenario. A user starts two instances of the CLI. Instance 1 is active (working on a task), instance 2 is idle. Both instances have the same in-memory cached tokens. The user then runs `codex logout` or `codex login` to log in to a separate account, which overwrites the `auth.json` file. Instance 1 receives a 401 and refreshes its token, but it doesn't write the new token to the `auth.json` file because the account ID doesn't match. Instance 2 is later activated and presented with a new task. It immediately hits a 401 and attempts to refresh its token but fails because its cached refresh token is now invalid. To avoid this situation, I've changed the logic to immediately fail a token refresh if the user has since logged out or logged in to another account. This will still be seen as an error by the user, but the cause will be clearer. I also took this opportunity to clean up the names of existing functions to make their roles clearer. * `try_refresh_token` is renamed `request_chatgpt_token_refresh` * the existing `refresh_token` is renamed `refresh_token_from_authority` (there's a new higher-level function named `refresh_token` now) * `refresh_tokens` is renamed `refresh_and_persist_chatgpt_token`, and it now implicitly reloads * `update_tokens` is renamed `persist_tokens`	2026-02-18 09:27:04 -08:00
Charley Cunningham	c16f9daaaf	Add model-visible context layout snapshot tests (#12073 ) ## Summary - add a dedicated `core/tests/suite/model_visible_layout.rs` snapshot suite to materialize model-visible request layout in high-value scenarios - add three reviewer-focused snapshot scenarios: - turn-level context updates (cwd / permissions / personality) - first post-resume turn with model hydration + personality change - first post-resume turn where pre-turn model override matches rollout model - wire the new suite into `core/tests/suite/mod.rs` - commit generated `insta` snapshots under `core/tests/suite/snapshots/` ## Why This creates a stable, reviewable baseline of model-visible context layout against `main` before follow-on context-management refactors. It lets subsequent PRs show focused snapshot diffs for behavior changes instead of introducing the test surface and behavior changes at once. ## Testing - `just fmt` - `INSTA_UPDATE=always cargo test -p codex-core model_visible_layout`	2026-02-17 22:30:29 -08:00
pakrym-oai	fc810ba045	Use V2 websockets if feature enabled (#12071 )	2026-02-17 18:32:16 -08:00
Charley Cunningham	eb68767f2f	Unify remote compaction snapshot mocks around default endpoint behavior (#12050 ) ## Summary - standardize remote compaction test mocking around one default behavior in shared helpers - make default remote compact mocks mirror production shape: keep `message/user` + `message/developer`, drop assistant/tool artifacts, then append a summary user message - switch non-special `compact_remote` tests to the shared default mock instead of ad-hoc JSON payloads ## Special-case tests that still use explicit mocks - remote compaction error payload / HTTP failure behavior - summary-only compact output behavior - manual `/compact` with no prior user messages - stale developer-instruction injection coverage ## Why This removes inconsistent manual remote compaction fixtures and gives us one source of truth for normal remote compact behavior, while preserving explicit mocks only where tests intentionally cover non-default behavior.	2026-02-17 18:18:47 -08:00

1 2 3 4 5 ...

615 Commits