codex

mirror of https://github.com/openai/codex.git synced 2026-04-30 11:21:34 +03:00

Author	SHA1	Message	Date
Curtis 'Fjord' Hawthorne	40ab71a985	Disable js_repl when Node is incompatible at startup (#12824 ) ## Summary - validate `js_repl` Node compatibility during session startup when the experiment is enabled - if Node is missing or too old, disable `js_repl` and `js_repl_tools_only` for the session before tools and instructions are built - surface that startup disablement to users through the existing startup warning flow instead of only logging it - reuse the same compatibility check in js_repl kernel startup so startup gating and runtime behavior stay aligned - add a regression test that verifies the warning is emitted and that the first advertised tool list omits `js_repl` and `js_repl_reset` when Node is incompatible ## Why Today `js_repl` can be advertised based only on the feature flag, then fail later when the kernel starts. That makes the available tool list inaccurate at the start of a conversation, and users do not get a clear explanation for why the tool is unavailable. This change makes tool availability reflect real startup checks, keeps the advertised tool set stable for the lifetime of the session, and gives users a visible warning when `js_repl` is disabled. ## Testing - `just fmt` - `cargo test -p codex-core --test all js_repl_is_not_advertised_when_startup_node_is_incompatible`	2026-02-26 01:14:51 +00:00
Michael Bolin	14116ade8d	feat: include available decisions in command approval requests (#12758 ) Command-approval clients currently infer which choices to show from side-channel fields like `networkApprovalContext`, `proposedExecpolicyAmendment`, and `additionalPermissions`. That makes the request shape harder to evolve, and it forces each client to replicate the server's heuristics instead of receiving the exact decision list for the prompt. This PR introduces a mapping between `CommandExecutionApprovalDecision` and `codex_protocol::protocol::ReviewDecision`: ```rust impl From<CoreReviewDecision> for CommandExecutionApprovalDecision { fn from(value: CoreReviewDecision) -> Self { match value { CoreReviewDecision::Approved => Self::Accept, CoreReviewDecision::ApprovedExecpolicyAmendment { proposed_execpolicy_amendment, } => Self::AcceptWithExecpolicyAmendment { execpolicy_amendment: proposed_execpolicy_amendment.into(), }, CoreReviewDecision::ApprovedForSession => Self::AcceptForSession, CoreReviewDecision::NetworkPolicyAmendment { network_policy_amendment, } => Self::ApplyNetworkPolicyAmendment { network_policy_amendment: network_policy_amendment.into(), }, CoreReviewDecision::Abort => Self::Cancel, CoreReviewDecision::Denied => Self::Decline, } } } ``` And updates `CommandExecutionRequestApprovalParams` to have a new field: ```rust available_decisions: Option<Vec<CommandExecutionApprovalDecision>> ``` when, if specified, should make it easier for clients to display an appropriate list of options in the UI. This makes it possible for `CoreShellActionProvider::prompt()` in `unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly, adding support for `ApprovedForSession` when approving a skill script, which was previously missing in the TUI. Note this results in a significant change to `exec_options()` in `approval_overlay.rs`, as the displayed options are now derived from `available_decisions: &[ReviewDecision]`. ## What Changed - Add `available_decisions` to [`ExecApprovalRequestEvent`](`de00e932dd/codex-rs/protocol/src/approvals.rs (L111-L175)`), including helpers to derive the legacy default choices when older senders omit the field. - Map `codex_protocol::protocol::ReviewDecision` to app-server `CommandExecutionApprovalDecision` and expose the ordered list as experimental `availableDecisions` in [`CommandExecutionRequestApprovalParams`](`de00e932dd/codex-rs/app-server-protocol/src/protocol/v2.rs (L3798-L3807)`). - Thread optional `available_decisions` through the core approval path so Unix shell escalation can explicitly request `ApprovedForSession` for session-scoped approvals instead of relying on client heuristics. [`unix_escalation.rs`](`de00e932dd/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L194-L214)`) - Update the TUI approval overlay to build its buttons from the ordered decision list, while preserving the legacy fallback when `available_decisions` is missing. - Update the app-server README, test client output, and generated schema artifacts to document and surface the new field. ## Testing - Add `approval_overlay.rs` coverage for explicit decision lists, including the generic `ApprovedForSession` path and network approval options. - Update `chatwidget/tests.rs` and app-server protocol tests to populate the new optional field and keep older event shapes working. ## Developers Docs - If we document `item/commandExecution/requestApproval` on [developers.openai.com/codex](https://developers.openai.com/codex), add experimental `availableDecisions` as the preferred source of approval choices and note that older servers may omit it.	2026-02-26 01:10:46 +00:00
Celia Chen	4f45668106	Revert "Add skill approval event/response (#12633 )" (#12811 ) This reverts commit https://github.com/openai/codex/pull/12633. We no longer need this PR, because we favor sending normal exec command approval server request with `additional_permissions` of skill permissions instead	2026-02-26 01:02:42 +00:00
pakrym-oai	4fedef88e0	Use websocket v2 as model-preferred websocket protocol (#12838 )	2026-02-25 16:35:53 -08:00
Charley Cunningham	2f4d6ded1d	Enable request_user_input in Default mode (#12735 ) ## Summary - allow `request_user_input` in Default collaboration mode as well as Plan - update the Default-mode instructions to prefer assumptions first and use `request_user_input` only when a question is unavoidable - update request_user_input and app-server tests to match the new Default-mode behavior - refactor collaboration-mode availability plumbing into `CollaborationModesConfig` for future mode-related flags ## Codex author `codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`	2026-02-25 15:20:46 -08:00
Celia Chen	b6d20748e0	Revert "Ensure shell command skills trigger approval (#12697 )" (#12721 ) This reverts commit `daf0f03ac8`. # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2026-02-25 22:49:53 +00:00
sayan-oai	d45ffd5830	make 5.3-codex visible in cli for api users (#12808 ) 5.3-codex released in api, mark it visible for API users via bundled `models.json`.	2026-02-25 13:01:40 -08:00
Rasmus Rygaard	73eaebbd1c	Propagate session ID when compacting (#12802 ) We propagate the session ID when sending requests for inference but we don't do the same for compaction requests. This makes it hard to link compaction requests to their session for debugging purposes	2026-02-25 19:17:38 +00:00
Michael Bolin	648a420cbf	fix: enforce sandbox envelope for zsh fork execution (#12800 ) ## Why Zsh fork execution was still able to bypass the `WorkspaceWrite` model in edge cases because the fork path reconstructed command execution without preserving sandbox wrappers, and command extraction only accepted shell invocations in a narrow positional shape. This can allow commands to run with broader filesystem access than expected, which breaks the sandbox safety model. ## What changed - Preserved the sandboxed `ExecRequest` produced by `attempt.env_for(...)` when entering the zsh fork path in [`unix_escalation.rs`](https://github.com/openai/codex/blob/main/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs). - Updated `CoreShellCommandExecutor` to execute the sandboxed command and working directory captured from `attempt.env_for(...)`, instead of re-running a freshly reconstructed shell command. - Made zsh-fork script extraction robust to wrapped invocations by scanning command arguments for `-c`/`-lc` rather than only matching the first positional form. - Added unit tests in `unix_escalation.rs` to lock in wrapper-tolerant parsing behavior and keep unsupported shell forms rejected. - Tightened the regression in [`skill_approval.rs`](https://github.com/openai/codex/blob/main/codex-rs/core/tests/suite/skill_approval.rs): - `shell_zsh_fork_still_enforces_workspace_write_sandbox` now uses an explicit `WorkspaceWrite` policy with `exclude_tmpdir_env_var: true` and `exclude_slash_tmp: true`. - The test attempts to write to `/tmp/...`, which is only reliably outside writable roots with those explicit exclusions set. ## Verification - Added and passed the new unit tests around `extract_shell_script` parsing behavior with wrapped command shapes. - `extract_shell_script_supports_wrapped_command_prefixes` - `extract_shell_script_rejects_unsupported_shell_invocation` - Verified the regression with the focused integration test: `shell_zsh_fork_still_enforces_workspace_write_sandbox`. ## Manual Testing Prior to this change, if I ran Codex via: ``` just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork ``` and asked: ``` what is the output of /bin/ps ``` it would run it, even though the default sandbox should prevent the agent from running `/bin/ps` because it is setuid on MacOS. But with this change, I now see the expected failure because it is blocked by the sandbox: ``` /bin/ps exited with status 1 and produced no output in this environment. ```	2026-02-25 11:05:27 -08:00
pakrym-oai	9d7013eab0	Handle websocket timeout (#12791 ) Sometimes websockets will timeout with 400 error, ensure we retry it.	2026-02-25 10:31:37 -08:00
jif-oai	5441130e0a	feat: adding stream parser (#12666 ) Add a stream parser to extract citations (and others) from a stream. This support cases where markers are split in differen tokens. Codex never manage to make this code work so everything was done manually. Please review correctly and do not touch this part of the code without a very clear understanding of it	2026-02-25 13:27:58 +00:00
jif-oai	5a9a5b51b2	feat: add large stack test macro (#12768 ) This PR adds the macro `#[large_stack_test]` This spawns the tests in a dedicated tokio runtime with a larger stack. It is useful for tests that needs the full recursion on the harness (which is now too deep for windows for example)	2026-02-25 13:19:21 +00:00
jif-oai	10c04e11b8	feat: add service name to app-server (#12319 ) Add service name to the app-server so that the app can use it's own service name This is on thread level because later we might plan the app-server to become a singleton on the computer	2026-02-25 09:51:42 +00:00
Celia Chen	6a3233da64	Surface skill permission profiles in zsh-fork exec approvals (#12753 ) ## Summary - Preserve each skill’s raw permissions block as a permission_profile on SkillMetadata during skill loading. - Keep compiling that same metadata into the existing runtime Permissions object, so current enforcement behavior stays intact. - When zsh-fork intercepts execution of a script that belongs to a skill, include the skill’s permission_profile in the exec approval request. - This lets approval UIs show the extra filesystem access the skill declared when prompting for approval.	2026-02-25 01:23:10 -08:00
Michael Bolin	59398125f6	feat: zsh-fork forces scripts/*/ for skills to trigger a prompt (#12730 ) Direct skill-script matches force `Decision::Prompt`, so skill-backed scripts require explicit approval before they run. (Note "allow for session" is not supported in this PR, but will be done in a follow-up.) In the process of implementing this, I fixed an important bug: `ShellZshFork` is supposed to keep ordinary allowed execs on the client-side `Run` path so later `execve()` calls are still intercepted and reviewed. After the shell-escalation port, `Decision::Allow` still mapped to `Escalate`, which moved `zsh` to server-side execution too early. That broke the intended flow for skill-backed scripts and made the approval prompt depend on the wrong execution path. ## What changed - In `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`, `Decision::Allow` now returns `Run` unless escalation is actually required. - Removed the zsh-specific `argv[0]` fallback. With the `Allow -> Run` fix in place, zsh's later `execve()` of the script is intercepted normally, so the skill match happens on the script path itself. - Kept the skill-path handling in `determine_action()` focused on the direct `program` match path. ## Verification - Updated `shell_zsh_fork_prompts_for_skill_script_execution` in `codex-rs/core/tests/suite/skill_approval.rs` (gated behind `cfg(unix)`) to: - run under `SandboxPolicy::new_workspace_write_policy()` instead of `DangerFullAccess` - assert the approval command contains only the script path - assert the approved run returns both stdout and stderr markers in the shell output - Ran `cargo test -p codex-core shell_zsh_fork_prompts_for_skill_script_execution -- --nocapture` ## Manual Testing Run the dev build: ``` just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork ``` I have created `/Users/mbolin/.agents/skills/mbolin-test-skill` with: ``` ├── scripts │ └── hello-mbolin.sh └── SKILL.md ``` The skill: ``` --- name: mbolin-test-skill description: Used to exercise various features of skills. --- When this skill is invoked, run the `hello-mbolin.sh` script and report the output. ``` The script: ``` set -e # Note this script will fail if run with network disabled. curl --location openai.com ``` Use `$mbolin-test-skill` to invoke the skill manually and verify that I get prompted to run `hello-mbolin.sh`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12730). * #12750 * __->__ #12730	2026-02-24 23:51:26 -08:00
viyatb-oai	c086b36b58	feat(ui): add network approval persistence plumbing (#12358 ) ## Summary - add TUI approval options for persistent network host rules - add app-server v2 approval payload plumbing for network approval context + proposed network policy amendments - add app-server handling to translate `applyNetworkPolicyAmendment` decisions back into core review decisions - update docs/test client output and generated app-server schemas/types	2026-02-25 07:06:19 +00:00
Curtis 'Fjord' Hawthorne	9501669a24	tests(js_repl): remove node-related skip paths from js_repl tests (#12185 ) ## Summary Remove js_repl/node test-skip paths and make Node setup explicit in CI so js_repl tests always run instead of silently skipping. ## Why We had multiple “expediency” skip paths that let js_repl tests pass without actually exercising Node-backed behavior. This reduced CI signal and hid runtime/environment regressions. ## What changed ### CI - Added Node setup using `codex-rs/node-version.txt` in: - `.github/workflows/rust-ci.yml` - `.github/workflows/bazel.yml` - Added a Unix PATH copy step in Bazel workflow to expose the setup-node binary in common paths. ### js_repl test harness - Added explicit js_repl sandbox test configuration helpers in: - `codex-rs/core/src/tools/js_repl/mod.rs` - `codex-rs/core/src/tools/handlers/js_repl.rs` - Added Linux arg0 dispatch glue for js_repl tests so sandbox subprocess entrypoint behavior is correct under Linux test execution. ### Removed skip behavior - Deleted runtime guard function and early-return skips in js_repl tests (`can_run_js_repl_runtime_tests` and related per-test short-circuits). - Removed view_image integration test skip behavior: - dropped `skip_if_no_network!(Ok(()))` - removed “skip on Node missing/too old” branch after js_repl output inspection. ## Impact - js_repl/node tests now consistently execute and fail loudly when the environment is not correctly provisioned. - CI has stronger signal for js_repl regressions instead of false green from conditional skips. ## Testing - `cargo test -p codex-core` (locally) to validate js_repl unit/integration behavior with skips removed. - CI expected to surface any remaining environment/runtime gaps directly (rather than masking them). #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/12300 - ✅ `2` https://github.com/openai/codex/pull/12275 - ✅ `3` https://github.com/openai/codex/pull/12205 - ✅ `4` https://github.com/openai/codex/pull/12407 - ✅ `5` https://github.com/openai/codex/pull/12372 - 👉 `6` https://github.com/openai/codex/pull/12185 - ⏳ `7` https://github.com/openai/codex/pull/10673	2026-02-24 22:52:14 -08:00
Curtis 'Fjord' Hawthorne	8f3f2c3c02	tests(js_repl): stabilize CI runtime test execution (#12407 ) ## Summary Stabilize `js_repl` runtime test setup in CI and move tool-facing `js_repl` behavior coverage into integration tests. This is a test/CI change only. No production `js_repl` behavior change is intended. ## Why - Bazel test sandboxes (especially on macOS) could resolve a different `node` than the one installed by `actions/setup-node`, which caused `js_repl` runtime/version failures. - `js_repl` runtime tests depend on platform-specific sandbox/test-harness behavior, so they need explicit gating in a base-stability commit. - Several tests in the `js_repl` unit test module were actually black-box/tool-level behavior tests and fit better in the integration suite. ## Changes - Add `actions/setup-node` to the Bazel and Rust `Tests` workflows, using the exact version pinned in the repo’s Node version file. - In Bazel (non-Windows), pass `CODEX_JS_REPL_NODE_PATH=$(which node)` into test env so `js_repl` uses the `actions/setup-node` runtime inside Bazel tests. - Add a new integration test suite for `js_repl` tool behavior and register it in the core integration test suite module. - Move black-box `js_repl` behavior tests into the integration suite (persistence/TLA, builtin tool invocation, recursive self-call rejection, `process` isolation, blocked builtin imports). - Keep white-box manager/kernel tests in the `js_repl` unit test module. - Gate `js_repl` runtime tests to run only on macOS and only when a usable Node runtime is available (skip on other platforms / missing Node in this commit). ## Impact - Reduces `js_repl` CI failures caused by Node resolution drift in Bazel. - Improves test organization by separating tool-facing behavior tests from white-box manager/kernel tests. - Keeps the base commit stable while expanding `js_repl` runtime coverage. #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/12372 - 👉 `2` https://github.com/openai/codex/pull/12407 - ⏳ `3` https://github.com/openai/codex/pull/12185 - ⏳ `4` https://github.com/openai/codex/pull/10673	2026-02-24 21:04:34 -08:00
Celia Chen	16ca527c80	chore: migrate additional permissions to PermissionProfile (#12731 ) This PR replaces the old `additional_permissions.fs_read/fs_write` shape with a shared `PermissionProfile` model and wires it through the command approval, sandboxing, protocol, and TUI layers. The schema is adopted from the `SkillManifestPermissions`, which is also refactored to use this unified struct. This helps us easily expose permission profiles in app server/core as a follow-up.	2026-02-25 03:35:28 +00:00
Curtis 'Fjord' Hawthorne	125fbec317	Fix js_repl view_image attachments in nested tool calls (#12725 ) ## Summary - Fix `js_repl` so `await codex.tool("view_image", { path })` actually attaches the image to the active turn when called from inside the JS REPL. - Restore the behavior expected by the existing `js_repl` image-attachment test. - This is a follow-up to [#12553](https://github.com/openai/codex/pull/12553), which changed `view_image` to return structured image content. ## Root Cause - [#12553](https://github.com/openai/codex/pull/12553) changed `view_image` from directly injecting a pending user image message to returning structured `function_call_output` content items. - The nested tool-call bridge inside `js_repl` serialized that tool response back to the JS runtime, but it did not mirror returned image content into the active turn. - As a result, `view_image` appeared to succeed inside `js_repl`, but no `input_image` was actually attached for the outer turn. ## What Changed - Updated the nested tool-call path in `js_repl` to inspect function tool responses for structured content items. - When a nested tool response includes `input_image` content, `js_repl` now injects a corresponding user `Message` into the active turn before returning the raw tool result back to the JS runtime. - Kept the normal JSON result flow intact, so `codex.tool(...)` still returns the original tool output object to JavaScript. ## Why - `js_repl` documentation and tests already assume that `view_image` can be used from inside the REPL to attach generated images to the model. - Without this fix, the nested call path silently dropped that attachment behavior.	2026-02-24 18:23:53 -08:00
daveaitel-openai	dcab40123f	Agent jobs (spawn_agents_on_csv) + progress UI (#10935 ) ## Summary - Add agent job support: spawn a batch of sub-agents from CSV, auto-run, auto-export, and store results in SQLite. - Simplify workflow: remove run/resume/get-status/export tools; spawn is deterministic and completes in one call. - Improve exec UX: stable, single-line progress bar with ETA; suppress sub-agent chatter in exec. ## Why Enables map-reduce style workflows over arbitrarily large repos using the existing Codex orchestrator. This addresses review feedback about overly complex job controls and non-deterministic monitoring. ## Demo (progress bar) ``` ./codex-rs/target/debug/codex exec \ --enable collab \ --enable sqlite \ --full-auto \ --progress-cursor \ -c agents.max_threads=16 \ -C /Users/daveaitel/code/codex \ - <<'PROMPT' Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows: path = item-01..item-30, area = test. Then call spawn_agents_on_csv with: - csv_path: /tmp/agent_job_progress_demo.csv - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1." - output_csv_path: /tmp/agent_job_progress_demo_out.csv PROMPT ``` ## Review feedback addressed - Auto-start jobs on spawn; removed run/resume/status/export tools. - Auto-export on success. - More descriptive tool spec + clearer prompts. - Avoid deadlocks on spawn failure; pending/running handled safely. - Progress bar no longer scrolls; stable single-line redraw. ## Tests - `cd codex-rs && cargo test -p codex-exec` - `cd codex-rs && cargo build -p codex-cli`	2026-02-24 21:00:19 +00:00
pakrym-oai	daf0f03ac8	Ensure shell command skills trigger approval (#12697 ) Summary - detect skill-invoking shell commands based on the original command string, request approvals when needed, and cache positive decisions per session - keep implicit skill invocation emitted after approval and keep skill approval decline messaging centralized to the shell handler - expand and adjust skill approval tests to cover shell-based skill scripts while matching the new detection expectations Testing - Not run (not requested)	2026-02-24 12:13:20 -08:00
sayan-oai	0b6c2e5652	fix: also try matching namespaced prefix for modelinfo candidate (#12658 ) #### What Try matching `\w+`-namespaced model after `longest prefix` as heuristic to match `ModelInfo` from list of candidates. This shouldn't regress existing behavior: - `gpt-5.2-codex` -> `gpt-5.2` if `gpt-5.2-codex` not present - `gpt-5.3` -> `gpt-5` if `gpt-5.3` not present - `gpt-9` still doesn't match anything while being more forgiving for custom prefixes: - `oai/gpt-5.3-codex` -> `gpt-5.3-codex` #### Tests Added unit test.	2026-02-24 10:57:26 -08:00
Dylan Hurd	f6053fdfb3	feat(core) Introduce Feature::RequestPermissions (#11871 ) ## Summary Introduces the initial implementation of Feature::RequestPermissions. RequestPermissions allows the model to request that a command be run inside the sandbox, with additional permissions, like writing to a specific folder. Eventually this will include other rules as well, and the ability to persist these permissions, but this PR is already quite large - let's get the core flow working and go from there! <img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM" src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368" /> ## Testing - [x] Added tests - [x] Tested locally - [x] Feature	2026-02-24 09:48:57 -08:00
pakrym-oai	97d0068658	Send warmup request (#11258 ) Send a request with `generate: falls` but a full set of tools and instructions to pre-warm inference. --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-24 08:15:47 -08:00
sayan-oai	7e46e5b9c2	chore: rm hardcoded PRESETS list (#12650 ) rm `PRESETS` list harcoded in `model_presets` as we now have bundled `models.json` with equivalent info. update logic to rely on bundled models instead, update tests.	2026-02-23 22:35:51 -08:00
pakrym-oai	58763afa0f	Add skill approval event/response (#12633 ) Set the stage for skill-level permission approval in addition to command-level. Behind a feature flag.	2026-02-23 22:28:58 -08:00
viyatb-oai	c3048ff90a	feat(core): persist network approvals in execpolicy (#12357 ) ## Summary Persist network approval allow/deny decisions as `network_rule(...)` entries in execpolicy (not proxy config) It adds `network_rule` parsing + append support in `codex-execpolicy`, including `decision="prompt"` (parse-only; not compiled into proxy allow/deny lists) - compile execpolicy network rules into proxy allow/deny lists and update the live proxy state on approval - preserve requirements execpolicy `network_rule(...)` entries when merging with file-based execpolicy - reject broad wildcard hosts (for example `*`) for persisted `network_rule(...)`	2026-02-23 21:37:46 -08:00
Ahmed Ibrahim	10a3adad8e	Handle realtime spawn_transcript delegation (#12619 )	2026-02-23 14:39:07 -08:00
pakrym-oai	335a4e1cbc	Return image content from view_image (#12553 ) Responses API supports image content	2026-02-22 23:00:08 -08:00
Michael Bolin	e8949f4507	test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518 ) ## Why The zsh integration tests were still brittle in two ways: - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so they often did not exercise the patched zsh fork that `shell-tool-mcp` ships - once the tests consistently used the vendored zsh fork, they exposed real Linux-specific zsh-fork issues in CI In particular, the Linux failures were not just test noise: - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode could receive malformed arguments - the `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` test uses the zsh exec bridge (which talks to the parent over a Unix socket), but Linux restricted sandbox seccomp denies `connect(2)`, causing timeouts on `ubuntu-24.04` x86/arm This PR makes the zsh tests consistently run against the intended vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI signal is meaningful. ## What Changed - Added a single shared test-only DotSlash file for the patched zsh fork at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing `bash` test resource). - Updated both app-server and exec-server zsh tests to use that shared DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH` dependency). - Updated the app-server zsh-fork test helper to resolve the shared DotSlash zsh and avoid silently falling back to host zsh. - Kept the app-server zsh-fork tests configured via `config.toml`, using a test wrapper path where needed to force `zsh -df` (and rewrite `-lc` to `-c`) for the subcommand-decline test. - Hardened the app-server subcommand-decline zsh-fork test for CI variability: - tolerate an extra `/responses` POST with a no-op mock response - tolerate non-target approval ordering while remaining strict on the two `/usr/bin/true` approvals and decline behavior - use `DangerFullAccess` on Linux for this one test because it validates zsh approval flow, not Linux sandbox socket restrictions - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox` arg0 dispatch continues to work. - Moved `maybe_run_zsh_exec_wrapper_mode()` under `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode handling coexists correctly with arg0-dispatched helper modes. - Consolidated duplicated `dotslash -- fetch` resolution logic into shared test support (`core/tests/common/lib.rs`). - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh differences by: - resolving an absolute `git` path - running `git init --quiet .` - asserting success / `.git` creation instead of relying on banner text ## Verification - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture` - `cargo test -p codex-exec-server accept_elicitation -- --nocapture` - `bazel test //codex-rs/exec-server:exec-server-all-test --test_output=streamed --test_arg=--nocapture --test_arg=accept_elicitation_for_prompt_rule_with_zsh` - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 - x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm - aarch64-unknown-linux-gnu` passed in [run 22291424358](https://github.com/openai/codex/actions/runs/22291424358)	2026-02-22 19:39:56 -08:00
Ahmed Ibrahim	e00fa19328	Revert "Revert "Route inbound realtime text into turn start or steer"" (#12480 ) With working tests this time --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-22 11:54:16 -08:00
Ahmed Ibrahim	55fc075723	Send events to realtime api (#12423 ) - Send assistant messages, ExecCommandBegin, and PatchApplyBegin/PatchApplyEnd	2026-02-21 23:24:51 -08:00
Michael Bolin	b73c4b50a2	fix: make realtime conversation flake test order-insensitive (#12475 ) ## Why `codex-core::all` has a flaky test, `suite::realtime_conversation::conversation_start_audio_text_close_round_trip`, that assumes a fixed ordering between `conversation.item.create` and `response.input_audio.delta` requests. That ordering is not guaranteed: realtime text and audio input are forwarded through separate queues and a background task, so either request can be observed first while still being correct behavior. ## What Changed - Updated the assertion in `codex-rs/core/tests/suite/realtime_conversation.rs` to compare the two observed request types order-independently. - Kept the existing checks that `session.create` is sent first and that exactly two follow-up requests are recorded. ## Verification - Re-ran `cargo test -p codex-core --test all conversation_start_audio_text_close_round_trip` 10 times locally.	2026-02-21 17:06:35 -08:00
Ahmed Ibrahim	5e505ff877	Revert "Route inbound realtime text into turn start or steer" (#12479 ) Reverts openai/codex#12469	2026-02-21 15:46:03 -08:00
Ahmed Ibrahim	031d701705	Route inbound realtime text into turn start or steer (#12469 ) - Route inbound realtime websocket text into normal user input handling so it steers an active turn or starts a new one	2026-02-21 15:45:27 -08:00
pakrym-oai	b17148f13a	Prefer v2 websockets if available (#12428 ) And also cleanup settings flow to avoid reading many separate flags. --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-21 20:08:04 +00:00
Michael Bolin	1af2a37ada	chore: remove codex-core public protocol/shell re-exports (#12432 ) ## Why `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules from `codex-protocol` and `codex-shell-command`. That made it easy for workspace crates to import those APIs through `codex-core`, which in turn hides dependency edges and makes it harder to reduce compile-time coupling over time. This change removes those public re-exports so call sites must import from the source crates directly. Even when a crate still depends on `codex-core` today, this makes dependency boundaries explicit and unblocks future work to drop `codex-core` dependencies where possible. ## What Changed - Removed public re-exports from `codex-rs/core/src/lib.rs` for: - `codex_protocol::protocol` and related protocol/model types (including `InitialHistory`) - `codex_protocol::config_types` (`protocol_config_types`) - `codex_shell_command::{bash, is_dangerous_command, is_safe_command, parse_command, powershell}` - Migrated workspace Rust call sites to import directly from: - `codex_protocol::protocol` - `codex_protocol::config_types` - `codex_protocol::models` - `codex_shell_command` - Added explicit `Cargo.toml` dependencies (`codex-protocol` / `codex-shell-command`) in crates that now import those crates directly. - Kept `codex-core` internal modules compiling by using `pub(crate)` aliases in `core/src/lib.rs` (internal-only, not part of the public API). - Updated the two utility crates that can already drop a `codex-core` dependency edge entirely: - `codex-utils-approval-presets` - `codex-utils-cli` ## Verification - `cargo test -p codex-utils-approval-presets` - `cargo test -p codex-utils-cli` - `cargo check --workspace --all-targets` - `just clippy`	2026-02-20 23:45:35 -08:00
Michael Bolin	1a220ad77d	chore: move config diagnostics out of codex-core (#12427 ) ## Why Compiling `codex-rs/core` is a bottleneck for local iteration, so this change continues the ongoing extraction of config-related functionality out of `codex-core` and into `codex-config`. The goal is not just to move code, but to reduce `codex-core` ownership and indirection so more code depends on `codex-config` directly. ## What Changed - Moved config diagnostics logic from `core/src/config_loader/diagnostics.rs` into `config/src/diagnostics.rs`. - Updated `codex-core` to use `codex-config` diagnostics types/functions directly where possible. - Removed the `core/src/config_loader/diagnostics.rs` shim module entirely; the remaining `ConfigToml`-specific calls are in `core/src/config_loader/mod.rs`. - Moved `CONFIG_TOML_FILE` into `codex-config` and updated existing references to use `codex_config::CONFIG_TOML_FILE` directly. - Added a direct `codex-config` dependency to `codex-cli` for its `CONFIG_TOML_FILE` use.	2026-02-20 23:19:29 -08:00
Charley Cunningham	bb0ac5be70	Fix compaction context reinjection and model baselines (#12252 ) ## Summary - move regular-turn context diff/full-context persistence into `run_turn` so pre-turn compaction runs before incoming context updates are recorded - after successful pre-turn compaction, rely on a cleared `reference_context_item` to trigger full context reinjection on the follow-up regular turn (manual `/compact` keeps replacement history summary-only and also clears the baseline) - preserve `<model_switch>` when full context is reinjected, and inject it before the rest of the full-context items - scope `reference_context_item` and `previous_model` to regular user turns only so standalone tasks (`/compact`, shell, review, undo) cannot suppress future reinjection or `<model_switch>` behavior - make context-diff persistence + `reference_context_item` updates explicit in the regular-turn path, with clearer docs/comments around the invariant - stop persisting local `/compact` `RolloutItem::TurnContext` snapshots (only regular turns persist `TurnContextItem` now) - simplify resume/fork previous-model/reference-baseline hydration by looking up the last surviving turn context from rollout lifecycle events, including rollback and compaction-crossing handling - remove the legacy fallback that guessed from bare `TurnContext` rollouts without lifecycle events - update compaction/remote-compaction/model-visible snapshots and compact test assertions (including remote compaction mock response shape) ## Why We were persisting incoming context items before spawning the regular turn task, which let pre-turn compaction requests accidentally include incoming context diffs without the new user message. Fixing that exposed follow-on baseline issues around `/compact`, resume/fork, and standalone tasks that could cause duplicate context injection or suppress `<model_switch>` instructions. This PR re-centers the invariants around regular turns: - regular turns persist model-visible context diffs/full reinjection and update the `reference_context_item` - standalone tasks do not advance those regular-turn baselines - compaction clears the baseline when replacement history may have stripped the referenced context diffs ## Follow-ups (TODOs left in code) - `TODO(ccunningham)`: fix rollback/backtracking baseline handling more comprehensively - `TODO(ccunningham)`: include pending incoming context items in pre-turn compaction threshold estimation - `TODO(ccunningham)`: inject updated personality spec alongside `<model_switch>` so some model-switch paths can avoid forced full reinjection - `TODO(ccunningham)`: review task turn lifecycle (`TurnStarted`/`TurnComplete`) behavior and emit task-start context diffs for task types that should have them (excluding `/compact`) ## Validation - `just fmt` - CI should cover the updated compaction/resume/model-visible snapshot expectations and rollout-hydration behavior - I did not rerun the full local test suite after the latest resume-lookup / rollout-persistence simplifications	2026-02-20 23:13:08 -08:00
Dylan Hurd	a8b4b569fb	fix(core) Filter non-matching prefix rules (#12314 ) ## Summary `gpt-5.3-codex` really likes to write complicated shell scripts, and suggest a partial prefix_rule that wouldn't actually approve the command. We should only show the `prefix_rule` suggestion from the model if it would actually fully approve the command the user is seeing. This will technically cause more instances of overly-specific suggestions when we fallback, but I think the UX is clearer, particularly when the model doesn't necessarily understand the current limitations of execpolicy parsing. ## Testing - [x] Add unit tests - [x] Add integration tests	2026-02-20 22:02:35 -08:00
Ahmed Ibrahim	b237f7cbb1	Add experimental realtime websocket backend prompt override (#12418 ) - add top-level `experimental_realtime_ws_backend_prompt` config key (experimental / do not use) and include it in config schema - apply the override only to `Op::RealtimeConversation` websocket `backend_prompt`, with config + realtime tests	2026-02-20 20:10:51 -08:00
Ahmed Ibrahim	7ae5d88016	Add experimental realtime websocket URL override (#12416 ) - add top-level `experimental_realtime_ws_base_url` config key (experimental / do not use) and include it in config schema - apply the override only to `Op::RealtimeConversation` websocket transport, with config + realtime tests	2026-02-20 19:51:20 -08:00
Ahmed Ibrahim	6817f0be8a	Wire realtime api to core (#12268 ) - Introduce `RealtimeConversationManager` for realtime API management - Add `op::conversation` to start conversation, insert audio, insert text, and close conversation. - emit conversation lifecycle and realtime events. - Move shared realtime payload types into codex-protocol and add core e2e websocket tests for start/replace/transport-close paths. Things to consider: - Should we use the same `op::` and `Events` channel to carry audio? I think we should try this simple approach and later we can create separate one if the channels got congested. - Sending text updates to the client: we can start simple and later restrict that. - Provider auth isn't wired for now intentionally	2026-02-20 19:06:35 -08:00
viyatb-oai	60c2b7beca	core tests: use hermetic mock server in review suite (#12291 ) ## Summary - switch the review test SSE mock helper to use the shared hermetic mock server setup - ensure review tests always have a default `/v1/models` stub during Codex session bootstrap - remove the race that caused intermittent `/v1/models` connection failures and flaky ETag refresh assertions ## Testing - `just fmt` - `cargo test -p codex-core --test all refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch` - `cargo test -p codex-core --test all review_uses_custom_review_model_from_config` - repeated both targeted tests 5x in a loop - `cargo clippy -p codex-core --tests -- -D warnings`	2026-02-20 12:50:12 -08:00
viyatb-oai	e8afaed502	Refactor network approvals to host/protocol/port scope (#12140 ) ## Summary Simplify network approvals by removing per-attempt proxy correlation and moving to session-level approval dedupe keyed by (host, protocol, port). Instead of encoding attempt IDs into proxy credentials/URLs, we now treat approvals as a destination policy decision. - Concurrent calls to the same destination share one approval prompt. - Different destinations (or same host on different ports) get separate prompts. - Allow once approves the current queued request group only. - Allow for session caches that (host, protocol, port) and auto-allows future matching requests. - Never policy continues to deny without prompting. Example: - 3 calls: - a.com (line 443) - b.com (line 443) - a.com (line 443) => 2 prompts total (a, b), second a waits on the first decision. - a.com:80 is treated separately from a.com line 443 ## Testing - `just fmt` (in `codex-rs`) - `cargo test -p codex-core tools::network_approval::tests` - `cargo test -p codex-core` (unit tests pass; existing integration-suite failures remain in this environment)	2026-02-20 10:39:55 -08:00
pakrym-oai	86803ca9bf	Reuse connection between turns (#12294 ) Add a pool of one to the model client to reuse connections across turns.	2026-02-20 10:09:46 -08:00
colby-oai	2036a5f5e0	Add MCP server context to otel tool_result logs (#12267 ) Summary - capture the origin for each configured MCP server and expose it via the connection manager - plumb MCP server name/origin into tool logging and emit codex.tool_result events with those fields - add unit coverage for origin parsing and extend OTEL tests to assert empty MCP fields for non-MCP tools - currently not logging full urls or url paths to prevent logging potentially sensitive data Testing - Not run (not requested)	2026-02-20 10:26:19 -05:00
jif-oai	0f9eed3a6f	feat: add nick name to sub-agents (#12320 ) Adding random nick name to sub-agents. Used for UX At the same time, also storing and wiring the role of the sub-agent	2026-02-20 14:39:49 +00:00
dkumar-oai	1070a0a712	Add configurable MCP OAuth callback URL for MCP login (#11382 ) ## Summary Implements a configurable MCP OAuth callback URL override for `codex mcp login` and app-server OAuth login flows, including support for non-local callback endpoints (for example, devbox ingress URLs). ## What changed - Added new config key: `mcp_oauth_callback_url` in `~/.codex/config.toml`. - OAuth authorization now uses `mcp_oauth_callback_url` as `redirect_uri` when set. - Callback handling validates the callback path against the configured redirect URI path. - Listener bind behavior is now host-aware: - local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to `127.0.0.1` - non-local callback URL hosts bind to `0.0.0.0` - `mcp_oauth_callback_port` remains supported and is used for the listener port. - Wired through: - CLI MCP login flow - App-server MCP OAuth login flow - Skill dependency OAuth login flow - Updated config schema and config tests. ## Why Some environments need OAuth callbacks to land on a specific reachable URL (for example ingress in remote devboxes), not loopback. This change allows that while preserving local defaults for existing users. ## Backward compatibility - No behavior change when `mcp_oauth_callback_url` is unset. - Existing `mcp_oauth_callback_port` behavior remains intact. - Local callback flows continue binding to loopback by default. ## Testing - `cargo test -p codex-rmcp-client callback -- --nocapture` - `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture` - `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client` ## Example config ```toml mcp_oauth_callback_port = 5555 mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"	2026-02-19 13:32:10 -08:00

1 2 3 4 5 ...

730 Commits