codex

mirror of https://github.com/openai/codex.git synced 2026-05-05 05:42:33 +03:00

Author	SHA1	Message	Date
Owen Lin	0b9e42f6f7	fix(guardian): don't throw away transcript when over budget (#16956 ) ## Description This PR changes guardian transcript compaction so oversized conversations no longer collapse into a nearly empty placeholder. Before this change, if the retained user history alone exceeded the message budget, guardian would replace the entire transcript with `<transcript omitted to preserve budget for planned action>`! That meant approvals, especially network approvals, could lose the recent tool call and tool result that explained what guardian was actually reviewing. Now we keep a compact but usable transcript instead of dropping it all. ### Before ``` The following is the Codex agent history whose request action you are assessing... >>> TRANSCRIPT START <transcript omitted to preserve budget for planned action> >>> TRANSCRIPT END Conversation transcript omitted due to size. The Codex agent has requested the following action: >>> APPROVAL REQUEST START Retry reason: Sandbox blocked outbound network access. Assess the exact planned action below. Use read-only tool checks when local state matters. Planned action JSON: { "tool": "network_access", "target": "https://example.com:443", "host": "example.com", "protocol": "https", "port": 443 } >>> APPROVAL REQUEST END ``` ### After ``` The following is the Codex agent history whose request action you are assessing... >>> TRANSCRIPT START [1] user: Please investigate why uploads to example.com are failing and retry if needed. [8] user: If the request looks correct, go ahead and try again with network access. [9] tool shell call: {"command":["curl","-X","POST","https://example.com/upload"],"cwd":"/repo"} [10] tool shell result: sandbox blocked outbound network access >>> TRANSCRIPT END Some conversation entries were omitted. The Codex agent has requested the following action: >>> APPROVAL REQUEST START Retry reason: Sandbox blocked outbound network access. Assess the exact planned action below. Use read-only tool checks when local state matters. Planned action JSON: { "tool": "network_access", "target": "https://example.com:443", "host": "example.com", "protocol": "https", "port": 443 } >>> APPROVAL REQUEST END ```	2026-04-07 10:19:16 -07:00
Owen Lin	5d1671ca70	feat(analytics): generate an installation_id and pass it in responsesapi client_metadata (#16912 ) ## Summary This adds a stable Codex installation ID and includes it on Responses API requests via `x-codex-installation-id` passed in via the `client_metadata` field for analytics/debugging. The main pieces are: - persist a UUID in `$CODEX_HOME/installation_id` - thread the installation ID into `ModelClient` - send it in `client_metadata` on Responses requests so it works consistently across HTTP and WebSocket transports	2026-04-07 09:52:17 -07:00
Ahmed Ibrahim	cd591dc457	Preserve null developer instructions (#16976 ) Preserve explicit null developer-instruction overrides across app-server resume and fork flows.	2026-04-07 09:32:14 -07:00
Eric Traut	feb4f0051a	Fix nested exec thread ID restore (#16882 ) Addresses #15527 Problem: Nested `codex exec` commands could source a shell snapshot that re-exported the parent `CODEX_THREAD_ID`, so commands inside the nested session were attributed to the wrong thread. Solution: Reapply the live command env's `CODEX_THREAD_ID` after sourcing the snapshot.	2026-04-07 09:26:22 -07:00
Eric Traut	82506527f1	Fix read-only apply_patch rejection message (#16885 ) Addresses #15532 Problem: Nested read-only `apply_patch` rejections report in-project files as outside the project. Solution: Choose the rejection message based on sandbox mode so read-only sessions report a read-only-specific reason, and add focused safety coverage.	2026-04-07 09:25:39 -07:00
Eric Traut	3b32de4fab	Stabilize flaky multi-agent followup interrupt test (#16739 ) Problem: The multi-agent followup interrupt test polled history before interrupt cleanup and mailbox wakeup were guaranteed to settle, which made it flaky under CI scheduling variance. Solution: Wait for the child turn's `TurnAborted(Interrupted)` event before asserting that the redirected assistant envelope is recorded and no plain user message is left behind.	2026-04-07 09:24:14 -07:00
jif-oai	4cc6818996	chore: keep request_user_input tool to persist cache on multi-agents (#17009 )	2026-04-07 16:53:31 +01:00
pakrym-oai	413c1e1fdf	[codex] reduce module visibility (#16978 ) ## Summary - reduce public module visibility across Rust crates, preferring private or crate-private modules with explicit crate-root public exports - update external call sites and tests to use the intended public crate APIs instead of reaching through module trees - add the module visibility guideline to AGENTS.md ## Validation - `cargo check --workspace --all-targets --message-format=short` passed before the final fix/format pass - `just fix` completed successfully - `just fmt` completed successfully - `git diff --check` passed	2026-04-07 08:03:35 -07:00
jif-oai	89f1a44afa	feat: /feedback cascade (#16442 ) Example here: https://openai.sentry.io/issues/7380240430/?project=4510195390611458&query=019d498f-bec4-7ba2-96d2-612b1e4507df&referrer=issue-stream	2026-04-07 12:47:37 +01:00
jif-oai	99f167e6bf	chore: hide nickname for debug flag (#17007 )	2026-04-07 11:31:13 +01:00
jif-oai	68e16baabe	chore: send_message and followup_task do not return anything (#17008 )	2026-04-07 11:26:36 +01:00
jif-oai	2a8c3a2a52	feat: drop agent ID from v2 (#17005 )	2026-04-07 10:56:01 +01:00
jif-oai	51f75e2f56	feat: empty role ok (#16999 )	2026-04-07 10:34:08 +01:00
starr-openai	741e2fdeb8	[codex] ez - rename env=>request in codex-rs/core/src/unified_exec/process_manager.rs (#16724 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2026-04-07 10:17:31 +01:00
Won Park	90320fc51a	collapse dev message into one (#16988 ) collapse image-gen dev message into one	2026-04-06 23:49:47 -07:00
Ahmed Ibrahim	24c598e8a9	Honor null thread instructions (#16964 ) - Treat explicit null thread instructions as a blank-slate override while preserving omitted-field fallback behavior. - Preserve null through rollout resume/fork and keep explicit empty strings distinct. - Add app-server v2 start/fork coverage for the tri-state instruction params.	2026-04-07 04:10:19 +00:00
pakrym-oai	4bb507d2c4	Make AGENTS.md discovery FS-aware (#15826 ) ## Summary - make AGENTS.md discovery and loading fully FS-aware and remove the non-FS discover helper - migrate remote-aware codex-core tests to use TestEnv workspace setup instead of syncing a local workspace copy - add AGENTS.md corner-case coverage, including directory fallbacks and remote-aware integration coverage ## Testing - cargo test -p codex-core project_doc -- --nocapture - cargo test -p codex-core hierarchical_agents -- --nocapture - cargo test -p codex-core agents_md -- --nocapture - cargo test -p codex-tui status -- --nocapture - cargo test -p codex-tui-app-server status -- --nocapture - just fix - just fmt - just bazel-lock-update - just bazel-lock-check - just argument-comment-lint - remote Linux executor tests in progress via scripts/test-remote-env.sh	2026-04-06 20:26:21 -07:00
viyatb-oai	9d13d29acd	[codex] Add danger-full-access denylist-only network mode (#16946 ) ## Summary This adds `experimental_network.danger_full_access_denylist_only` for orgs that want yolo / danger-full-access sessions to keep full network access while still enforcing centrally managed deny rules. When the flag is true and the session sandbox is `danger-full-access`, the network proxy starts with: - domain allowlist set to `` - managed domain `deny` entries enforced - upstream proxy use allowed - all Unix sockets allowed - local/private binding allowed Caveat: the denylist is best effort only. In yolo / danger-full-access mode, Codex or the model can use an allowed socket or other local/private network path to bypass the proxy denylist, so this should not be treated as a hard security boundary. The flag is intentionally scoped to `SandboxPolicy::DangerFullAccess`. Read-only and workspace-write modes keep the existing managed/user allowlist, denylist, Unix socket, and local-binding behavior. This does not enable the non-loopback proxy listener setting; that still requires its own explicit config. This also threads the new field through config requirements parsing, app-server protocol/schema output, config API mapping, and the TUI debug config output. ## How to use Add the flag under `[experimental_network]` in the network policy config that is delivered to Codex. The setting is not under `[permissions]`. ```toml [experimental_network] enabled = true danger_full_access_denylist_only = true [experimental_network.domains] "blocked.example.com" = "deny" ".blocked.example.com" = "deny" ``` With that configuration, yolo / danger-full-access sessions get broad network access except for the managed denied domains above. The denylist remains a best-effort proxy policy because the session may still use allowed sockets to bypass it. Other sandbox modes do not get the wildcard domain allowlist or the socket/local-binding relaxations from this flag. ## Verification - `cargo test -p codex-config network_requirements` - `cargo test -p codex-core network_proxy_spec` - `cargo test -p codex-app-server map_requirements_toml_to_api` - `cargo test -p codex-tui debug_config_output` - `cargo test -p codex-app-server-protocol` - `just write-app-server-schema` - `just fmt` - `just fix -p codex-config -p codex-core -p codex-app-server-protocol -p codex-app-server -p codex-tui` - `just fix -p codex-core -p codex-config` - `git diff --check` - `cargo clean`	2026-04-06 19:38:51 -07:00
Matthew Zeng	5fe9ef06ce	[mcp] Support MCP Apps part 1. (#16082 ) - [x] Add `mcpResource/read` method to read mcp resource.	2026-04-06 19:17:14 -07:00
pakrym-oai	1f2411629f	Refactor config types into a separate crate (#16962 ) Move config types into a separate crate because their macros expand into a lot of new code.	2026-04-07 00:32:41 +00:00
starr-openai	a504d8f0fa	Disable env-bound tools when exec server is none (#16349 ) ## Summary - make `CODEX_EXEC_SERVER_URL=none` map to an explicit disabled environment mode instead of inferring from a missing URL - expose environment capabilities (`exec_enabled`, `filesystem_enabled`) so tool building can gate behavior explicitly and future multi-environment work has a clearer seam - suppress env-backed tools when the relevant capability is unavailable, including exec tools, `js_repl`, `apply_patch`, `list_dir`, and `view_image` - keep handler/runtime backstops so disabled environments still reject execution if a tool path somehow bypasses registration ## Testing - `just fmt` - `cargo test -p codex-exec-server` - `cargo test -p codex-tools disabled_environment_omits_environment_backed_tools` - `cargo test -p codex-tools environment_capabilities_gate_exec_and_filesystem_tools_independently` - remote devbox Bazel build via `codex-applied-devbox`: `//codex-rs/cli:cli`	2026-04-06 17:22:06 -07:00
rhan-oai	756c45ec61	[codex-analytics] add protocol-native turn timestamps (#16638 ) --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/16638). * #16870 * #16706 * #16659 * #16641 * #16640 * __->__ #16638	2026-04-06 16:22:59 -07:00
xl-openai	e62d645e67	feat: refresh non-curated cache from plugin list. (#16191 ) 1. Use versions for non-curated plugin (defined in plugin.json) for cache refresh 2. Trigger refresh from plugin/list roots	2026-04-06 15:40:00 -07:00
xl-openai	03edd4fbee	feat: fallback curated plugin download from backend endpint. (#16947 ) Add one more fallback for downloading the curated plugin repo from chatgpt.com. Have to be the last fallback for now as it is a lagging backup.	2026-04-06 15:36:20 -07:00
Ruslan Nigmatullin	1525bbdb9a	app-server: centralize AuthManager initialization (#16764 ) Extract a shared helper that builds AuthManager from Config and applies the forced ChatGPT workspace override in one place. Create the shared AuthManager at MessageProcessor call sites so that upcoming new transport's initialization can reuse the same handle, and keep only external auth refresher wiring inside `MessageProcessor`. Remove the now-unused `AuthManager::shared_with_external_auth` helper.	2026-04-06 12:46:55 -07:00
Owen Lin	ded559680d	feat(requirements): support allowed_approval_reviewers (#16701 ) ## Description Add requirements.toml support for `allowed_approvals_reviewers = ["user", "guardian_subagent"]`, so admins can now restrict the use of guardian mode. Note: If a user sets a reviewer that isn’t allowed by requirements.toml, config loading falls back to the first allowed reviewer and emits a startup warning. The table below describes the possible admin controls. \| Admin intent \| `requirements.toml` \| User `config.toml` \| End result \| \|---\|---\|---\|---\| \| Leave Guardian optional \| omit `allowed_approvals_reviewers` or set `["user", "guardian_subagent"]` \| user chooses `approvals_reviewer = "user"` or `"guardian_subagent"` \| Guardian off for `user`, on for `guardian_subagent` + `approval_policy = "on-request"` \| \| Force Guardian off \| `allowed_approvals_reviewers = ["user"]` \| any user value \| Effective reviewer is `user`; Guardian off \| \| Force Guardian on \| `allowed_approvals_reviewers = ["guardian_subagent"]` and usually `allowed_approval_policies = ["on-request"]` \| any user reviewer value; user should also have `approval_policy = "on-request"` unless policy is forced \| Effective reviewer is `guardian_subagent`; Guardian on when effective approval policy is `on-request` \| \| Allow both, but default to manual if user does nothing \| `allowed_approvals_reviewers = ["user", "guardian_subagent"]` \| omit `approvals_reviewer` \| Effective reviewer is `user`; Guardian off \| \| Allow both, and user explicitly opts into Guardian \| `allowed_approvals_reviewers = ["user", "guardian_subagent"]` \| `approvals_reviewer = "guardian_subagent"` and `approval_policy = "on-request"` \| Guardian on \| \| Invalid admin config \| `allowed_approvals_reviewers = []` \| anything \| Config load error \|	2026-04-06 11:11:44 -07:00
Eric Traut	b5edeb98a0	Fix flaky permissions escalation test on Windows (#16825 ) Problem: `rejects_escalated_permissions_when_policy_not_on_request` retried a real shell command after asserting the escalation rejection, so Windows CI could fail on command startup timing instead of approval behavior. Solution: Keep the rejection assertion, verify no turn permissions were granted, and assert through exec-policy evaluation that the same command would be allowed without escalation instead of timing a subprocess.	2026-04-05 10:51:01 -07:00
rhan-oai	4fd5c35c4f	[codex-analytics] subagent analytics (#15915 ) - creates custom event that emits subagent thread analytics from core - wires client metadata (`product_client_id, client_name, client_version`), through from app-server - creates `created_at `timestamp in core - subagent analytics are behind `FeatureFlag::GeneralAnalytics` PR stack - [[telemetry] thread events #15690](https://github.com/openai/codex/pull/15690) - --> [[telemetry] subagent events #15915](https://github.com/openai/codex/pull/15915) - [[telemetry] turn events #15591](https://github.com/openai/codex/pull/15591) - [[telemetry] steer events #15697](https://github.com/openai/codex/pull/15697) - [[telemetry] queued prompt data #15804](https://github.com/openai/codex/pull/15804) Notes: - core does not spawn a subagent thread for compact, but represented in mapping for consistency `INFO \| 2026-04-01 13:08:12 \| codex_backend.routers.analytics_events \| analytics_events.track_analytics_events:399 \| Tracked codex_thread_initialized event params={'thread_id': '019d4aa9-233b-70f2-a958-c3dbae1e30fa', 'product_surface': 'codex', 'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name': 'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process', 'experimental_api_enabled': None}, 'runtime': {'codex_rs_version': '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0', 'runtime_arch': 'aarch64'}, 'model': 'gpt-5.3-codex', 'ephemeral': False, 'initialization_mode': 'new', 'created_at': 1775074091, 'thread_source': 'subagent', 'subagent_source': 'thread_spawn', 'parent_thread_id': '019d4aa8-51ec-77e3-bafb-2c1b8e29e385'} \| ` `INFO \| 2026-04-01 13:08:41 \| codex_backend.routers.analytics_events \| analytics_events.track_analytics_events:399 \| Tracked codex_thread_initialized event params={'thread_id': '019d4aa9-94e3-75f1-8864-ff8ad0e55e1e', 'product_surface': 'codex', 'app_server_client': {'product_client_id': 'CODEX_CLI', 'client_name': 'codex-tui', 'client_version': '0.0.0', 'rpc_transport': 'in_process', 'experimental_api_enabled': None}, 'runtime': {'codex_rs_version': '0.0.0', 'runtime_os': 'macos', 'runtime_os_version': '26.4.0', 'runtime_arch': 'aarch64'}, 'model': 'gpt-5.3-codex', 'ephemeral': False, 'initialization_mode': 'new', 'created_at': 1775074120, 'thread_source': 'subagent', 'subagent_source': 'review', 'parent_thread_id': None} \| ` --------- Co-authored-by: jif-oai <jif@openai.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-04-04 11:06:43 -07:00
Thibault Sottiaux	9e19004bc2	[codex] add context-window lineage headers (#16758 ) This change adds client-owned context-window and parent thread id headers to all requests to responses api.	2026-04-04 05:54:31 +00:00
Ahmed Ibrahim	8a19dbb177	Add spawn context for MultiAgentV2 children (#16746 )	2026-04-03 19:56:59 -07:00
Ahmed Ibrahim	e4f1b3a65e	Preempt mailbox mail after reasoning/commentary items (#16725 ) Send pending mailbox mail after completed reasoning or commentary items so follow-up requests can pick it up mid-turn. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-03 18:29:05 -07:00
Thibault Sottiaux	91ca49e53c	[codex] allow disabling environment context injection (#16745 ) This adds an `include_environment_context` config/profile flag that defaults on, and guards both initial injection and later environment updates to allow skipping injection of `<environment_context>`.	2026-04-03 18:06:52 -07:00
Thibault Sottiaux	8d19646861	[codex] allow disabling prompt instruction blocks (#16735 ) This PR adds root and profile config switches to omit the generated `<permissions instructions>` and `<apps_instructions>` prompt blocks while keeping both enabled by default, and it gates both the initial developer-context injection and later permissions diff injection so turning the permissions block off stays effective across turn-context overrides. Also added a prompt debug tool that can be used as `codex debug prompt-input "hello"` and dumps the constructed items list.	2026-04-03 23:47:56 +00:00
Eric Traut	4b8bab6ad3	Remove OPENAI_BASE_URL config fallback (#16720 ) The `OPENAI_BASE_URL` environment variable has been a significant support issue, so we decided to deprecate it in favor of an `openai_base_url` config key. We've had the deprecation warning in place for about a month, so users have had time to migrate to the new mechanism. This PR removes support for `OPENAI_BASE_URL` entirely.	2026-04-03 15:03:21 -07:00
Michael Bolin	a70aee1a1e	Fix Windows Bazel app-server trust tests (#16711 ) ## Why Extracted from [#16528](https://github.com/openai/codex/pull/16528) so the Windows Bazel app-server test failures can be reviewed independently from the rest of that PR. This PR targets: - `suite::v2::thread_shell_command::thread_shell_command_runs_as_standalone_turn_and_persists_history` - `suite::v2::thread_start::thread_start_with_elevated_sandbox_trusts_project_and_followup_loads_project_config` - `suite::v2::thread_start::thread_start_with_nested_git_cwd_trusts_repo_root` There were two Windows-specific assumptions baked into those tests and the underlying trust lookup: - project trust keys were persisted and looked up using raw path strings, but Bazel's Windows test environment can surface canonicalized paths with `\\?\` / UNC prefixes or normalized symlink/junction targets, so follow-up `thread/start` requests no longer matched the project entry that had just been written - `item/commandExecution/outputDelta` assertions compared exact trailing line endings even though shell output chunk boundaries and CRLF handling can differ on Windows, and Bazel made that timing-sensitive mismatch visible There was also one behavior bug separate from the assertion cleanup: `thread/start` decided whether to persist trust from the final resolved sandbox policy, but on Windows an explicit `workspace-write` request may be downgraded to `read-only`. That incorrectly skipped writing trust even though the request had asked to elevate the project, so the new logic also keys off the requested sandbox mode. ## What - Canonicalize project trust keys when persisting/loading `[projects]` entries, while still accepting legacy raw keys for existing configs. - Persist project trust when `thread/start` explicitly requests `workspace-write` or `danger-full-access`, even if the resolved policy is later downgraded on Windows. - Make the Windows app-server tests compare persisted trust paths and command output deltas in a path/newline-normalized way. ## Verification - Existing app-server v2 tests cover the three failing Windows Bazel cases above.	2026-04-03 21:41:25 +00:00
Ahmed Ibrahim	567d2603b8	Sanitize forked child history (#16709 ) - Keep only parent system/developer/user messages plus assistant final-answer messages in forked child history. - Strip parent tool/reasoning items and remove the unmatched synthetic spawn output.	2026-04-03 21:13:34 +00:00
Michael Bolin	1d4b5f130c	fix windows-only clippy lint violation (#16722 ) I missed this in https://github.com/openai/codex/pull/16707.	2026-04-03 21:00:24 +00:00
Michael Bolin	faab4d39e1	fix: preserve platform-specific core shell env vars (#16707 ) ## Why We were seeing failures in the following tests as part of trying to get all the tests running under Bazel on Windows in CI (https://github.com/openai/codex/pull/16528): ``` suite::shell_command::unicode_output::with_login suite::shell_command::unicode_output::without_login ``` Certainly `PATHEXT` should have been included in the extra `CORE_VARS` list, so we fix that up here, but also take things a step further for now by forcibly ensuring it is set on Windows in the return value of `create_env()`. Once we get the Windows Bazel build working reliably (i.e., after #16528 is merged), we should come back to this and confirm we can remove the special case in `create_env()`. ## What - Split core env inheritance into `COMMON_CORE_VARS` plus platform-specific allowlists for Windows and Unix in [`exec_env.rs`](`1b55c88fbf/codex-rs/core/src/exec_env.rs (L45-L81)`). - Preserve `PATHEXT`, `USERNAME`, and `USERPROFILE` on Windows, and `HOME` / locale vars on Unix. - Backfill a default `PATHEXT` in `create_env()` on Windows if the parent env does not provide one, so child process launch still works in stripped-down Bazel environments. - Extend the Windows exec-env test to assert mixed-case `PathExt` survives case-insensitive core filtering, and document why the shell-command Unicode test goes through a child process. ## Verification - `cargo test -p codex-core exec_env::tests`	2026-04-03 12:07:07 -07:00
Ahmed Ibrahim	af8a9d2d2b	remove temporary ownership re-exports (#16626 ) Stacked on #16508. This removes the temporary `codex-core` / `codex-login` re-export shims from the ownership split and rewrites callsites to import directly from `codex-model-provider-info`, `codex-models-manager`, `codex-api`, `codex-protocol`, `codex-feedback`, and `codex-response-debug-context`. No behavior change intended; this is the mechanical import cleanup layer split out from the ownership move. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-03 00:33:34 -07:00
Ahmed Ibrahim	6fff9955f1	extract models manager and related ownership from core (#16508 ) ## Summary - split `models-manager` out of `core` and add `ModelsManagerConfig` plus `Config::to_models_manager_config()` so model metadata paths stop depending on `core::Config` - move login-owned/auth-owned code out of `core` into `codex-login`, move model provider config into `codex-model-provider-info`, move API bridge mapping into `codex-api`, move protocol-owned types/impls into `codex-protocol`, and move response debug helpers into a dedicated `response-debug-context` crate - move feedback tag emission into `codex-feedback`, relocate tests to the crates that now own the code, and keep broad temporary re-exports so this PR avoids a giant import-only rewrite ## Major moves and decisions - created `codex-models-manager` as the owner for model cache/catalog/config/model info logic, including the new `ModelsManagerConfig` struct - created `codex-model-provider-info` as the owner for provider config parsing/defaults and kept temporary `codex-login`/`codex-core` re-exports for old import paths - moved `api_bridge` error mapping + `CoreAuthProvider` into `codex-api`, while `codex-login::api_bridge` temporarily re-exports those symbols and keeps the `auth_provider_from_auth` wrapper - moved `auth_env_telemetry` and `provider_auth` ownership to `codex-login` - moved `CodexErr` ownership to `codex-protocol::error`, plus `StreamOutput`, `bytes_to_string_smart`, and network policy helpers to protocol-owned modules - created `codex-response-debug-context` for `extract_response_debug_context`, `telemetry_transport_error_message`, and related response-debug plumbing instead of leaving that behavior in `core` - moved `FeedbackRequestTags`, `emit_feedback_request_tags`, and `emit_feedback_request_tags_with_auth_env` to `codex-feedback` - deferred removal of temporary re-exports and the mechanical import rewrites to a stacked follow-up PR so this PR stays reviewable ## Test moves - moved auth refresh coverage from `core/tests/suite/auth_refresh.rs` to `login/tests/suite/auth_refresh.rs` - moved text encoding coverage from `core/tests/suite/text_encoding_fix.rs` to `protocol/src/exec_output_tests.rs` - moved model info override coverage from `core/tests/suite/model_info_overrides.rs` to `models-manager/src/model_info_overrides_tests.rs` --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-02 23:00:02 -07:00
Michael Bolin	beb3978a3b	test: use cmd.exe for ProviderAuthScript on Windows (#16629 ) ## Why The Windows `ProviderAuthScript` test helpers do not need PowerShell. Running them through `cmd.exe` is enough to emit the next fixture token and rotate `tokens.txt`, and it avoids a PowerShell-specific dependency in these tests. ## What changed - Replaced the Windows `print-token.ps1` fixtures with `print-token.cmd` in `codex-rs/core/src/models_manager/manager_tests.rs` and `codex-rs/login/src/auth/auth_tests.rs`. - Switched the failing external-auth helper in `codex-rs/login/src/auth/auth_tests.rs` from `powershell.exe -Command 'exit 1'` to `cmd.exe /d /s /c 'exit /b 1'`. - Updated Windows timeout comments so they no longer call out PowerShell specifically. ## Verification - `cargo test -p codex-login` - `cargo test -p codex-core` (fails in unrelated `core/src/config/config_tests.rs` assertions in this checkout)	2026-04-02 17:33:07 -07:00
Michael Bolin	7a3eec6fdb	core: cut codex-core compile time 48% with native async SessionTask (#16631 ) ## Why This continues the compile-time cleanup from #16630. `SessionTask` implementations are monomorphized, but `Session` stores the task behind a `dyn` boundary so it can drive and abort heterogenous turn tasks uniformly. That means we can move the `#[async_trait]` expansion off the implementation trait, keep a small boxed adapter only at the storage boundary, and preserve the existing task lifecycle semantics while reducing the amount of generated async-trait glue in `codex-core`. One measurement caveat showed up while exploring this: a warm incremental benchmark based on `touch core/src/tasks/mod.rs && cargo check -p codex-core --lib` was basically flat, but that was the wrong benchmark for this change. Using package-clean `codex-core` rebuilds, like #16630, shows the real win. Relevant pre-change code: - [`SessionTask` with `#[async_trait]`](`3c7f013f97/codex-rs/core/src/tasks/mod.rs (L129-L182)`) - [`RunningTask` storing `Arc<dyn SessionTask>`](`3c7f013f97/codex-rs/core/src/state/turn.rs (L69-L77)`) ## What changed - Switched `SessionTask::{run, abort}` to native RPITIT futures with explicit `Send` bounds. - Added a private `AnySessionTask` adapter that boxes those futures only at the `Arc<dyn ...>` storage boundary. - Updated `RunningTask` to store `Arc<dyn AnySessionTask>` and removed `#[async_trait]` from the concrete task impls plus test-only `SessionTask` impls. ## Timing Benchmarked package-clean `codex-core` rebuilds with dependencies left warm: ```shell cargo check -p codex-core --lib >/dev/null cargo clean -p codex-core >/dev/null /usr/bin/time -p cargo +nightly rustc -p codex-core --lib -- \ -Z time-passes \ -Z time-passes-format=json >/dev/null ``` \| revision \| rustc `total` \| process `real` \| `generate_crate_metadata` \| `MIR_borrow_checking` \| `monomorphization_collector_graph_walk` \| \| --- \| ---: \| ---: \| ---: \| ---: \| ---: \| \| parent `3c7f013f9735` \| 67.21s \| 67.71s \| 24.61s \| 23.43s \| 22.43s \| \| this PR `2cafd783ac22` \| 35.08s \| 35.60s \| 8.01s \| 7.25s \| 7.15s \| \| delta \| -47.8% \| -47.4% \| -67.5% \| -69.1% \| -68.1% \| For completeness, the warm touched-file benchmark stayed flat (`1.96s` parent vs `1.97s` this PR), which is why that benchmark should not be used to evaluate this refactor. ## Verification - Ran `cargo test -p codex-core`; this change compiled and task-related tests passed before hitting the same unrelated 5 `config::tests::guardian` failures already present on the parent stack.	2026-04-02 23:39:56 +00:00
Michael Bolin	3c7f013f97	core: cut codex-core compile time 63% with native async ToolHandler (#16630 ) ## Why `ToolHandler` was still paying a large compile-time tax from `#[async_trait]` on every concrete handler impl, even though the only object-safe boundary the registry actually stores is the internal `AnyToolHandler` adapter. This PR removes that macro-generated async wrapper layer from concrete `ToolHandler` impls while keeping the existing object-safe shim in `AnyToolHandler`. In practice, that gets essentially the same compile-time win as the larger type-erasure refactor in #16627, but with a much smaller diff and without changing the public shape of `ToolHandler<Output = T>`. That tradeoff matters here because this is a broad `codex-core` hotspot and reviewers should be able to judge the compile-time impact from hard numbers, not vibes. ## Headline result On a clean `codex-core` package rebuild (`cargo clean -p codex-core` before each command), rustc `total` dropped from 187.15s to 68.98s versus the shared `0bd31dc382bd` baseline: -63.1%. The biggest hot passes dropped by roughly 71-72%: \| Metric \| Baseline `0bd31dc382bd` \| This PR `41f7ac0adeac` \| Delta \| \|---\|---:\|---:\|---:\| \| `total` \| 187.15s \| 68.98s \| -63.1% \| \| `generate_crate_metadata` \| 84.53s \| 24.49s \| -71.0% \| \| `MIR_borrow_checking` \| 84.13s \| 24.58s \| -70.8% \| \| `monomorphization_collector_graph_walk` \| 79.74s \| 22.19s \| -72.2% \| \| `evaluate_obligation` self-time \| 180.62s \| 46.91s \| -74.0% \| Important caveat: `-Z time-passes` timings are nested, so `generate_crate_metadata` and `monomorphization_collector_graph_walk` are mostly overlapping, not additive. ## Why this PR over #16627 #16627 already proved that the `ToolHandler` stack was the right hotspot, but it got there by making `ToolHandler` object-safe and changing every handler to return `BoxFuture<Result<AnyToolResult, _>>` directly. This PR keeps the lower-churn shape: - `ToolHandler` remains generic over `type Output`. - Concrete handlers use native RPITIT futures with explicit `Send` bounds. - `AnyToolHandler` remains the only object-safe adapter and still does the boxing at the registry boundary, as before. - The implementation diff is only 33 files, +28/-77. The measurements are at least comparable, and in this run this PR is slightly faster than #16627 on the pass-level total: \| Metric \| #16627 \| This PR \| Delta \| \|---\|---:\|---:\|---:\| \| `total` \| 79.90s \| 68.98s \| -13.7% \| \| `generate_crate_metadata` \| 25.88s \| 24.49s \| -5.4% \| \| `monomorphization_collector_graph_walk` \| 23.54s \| 22.19s \| -5.7% \| \| `evaluate_obligation` self-time \| 43.29s \| 46.91s \| +8.4% \| ## Profile data ### Crate-level timings `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` after `cargo clean -p codex-core`. Baseline data below is reused from the shared parent `0bd31dc382bd` profile because this PR and #16627 are both one commit on top of that same parent. \| Crate \| Baseline `duration` \| This PR `duration` \| Delta \| Baseline `rmeta_time` \| This PR `rmeta_time` \| Delta \| \|---\|---:\|---:\|---:\|---:\|---:\|---:\| \| `codex_core` \| 187.380776583s \| 69.171113833s \| -63.1% \| 174.474507208s \| 55.873015583s \| -68.0% \| \| `starlark` \| 17.90s \| 16.773824125s \| -6.3% \| n/a \| 8.8999965s \| n/a \| ### Pass-level timings `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` after `cargo clean -p codex-core`. \| Pass \| Baseline \| This PR \| Delta \| \|---\|---:\|---:\|---:\| \| `total` \| 187.150662083s \| 68.978770375s \| -63.1% \| \| `generate_crate_metadata` \| 84.531864625s \| 24.487462958s \| -71.0% \| \| `MIR_borrow_checking` \| 84.131389375s \| 24.575553875s \| -70.8% \| \| `monomorphization_collector_graph_walk` \| 79.737515042s \| 22.190207417s \| -72.2% \| \| `codegen_crate` \| 12.362532292s \| 12.695237625s \| +2.7% \| \| `type_check_crate` \| 4.4765405s \| 5.442019542s \| +21.6% \| \| `coherence_checking` \| 3.311121208s \| 4.239935292s \| +28.0% \| \| process `real` / `user` / `sys` \| 187.70s / 201.87s / 4.99s \| 69.52s / 85.90s / 2.92s \| n/a \| ### Self-profile query summary `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` after `cargo clean -p codex-core`, summarized with `measureme summarize -p 0.5`. \| Query / phase \| Baseline self time \| This PR self time \| Delta \| Baseline total time \| This PR total time \| Baseline item count \| This PR item count \| Baseline cache hits \| This PR cache hits \| \|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\| \| `evaluate_obligation` \| 180.62s \| 46.91s \| -74.0% \| 182.08s \| 48.37s \| 572,234 \| 388,659 \| 1,130,998 \| 1,058,553 \| \| `mir_borrowck` \| 1.42s \| 1.49s \| +4.9% \| 93.77s \| 29.59s \| n/a \| 6,184 \| n/a \| 15,298 \| \| `typeck` \| 1.84s \| 1.87s \| +1.6% \| 2.38s \| 2.44s \| n/a \| 9,367 \| n/a \| 79,247 \| \| `LLVM_module_codegen_emit_obj` \| n/a \| 17.12s \| n/a \| 17.01s \| 17.12s \| n/a \| 256 \| n/a \| 0 \| \| `LLVM_passes` \| n/a \| 13.07s \| n/a \| 12.95s \| 13.07s \| n/a \| 1 \| n/a \| 0 \| \| `codegen_module` \| n/a \| 12.33s \| n/a \| 12.22s \| 13.64s \| n/a \| 256 \| n/a \| 0 \| \| `items_of_instance` \| n/a \| 676.00ms \| n/a \| n/a \| 24.96s \| n/a \| 99,990 \| n/a \| 0 \| \| `type_op_prove_predicate` \| n/a \| 660.79ms \| n/a \| n/a \| 24.78s \| n/a \| 78,762 \| n/a \| 235,877 \| \| Summary \| Baseline \| This PR \| \|---\|---:\|---:\| \| `evaluate_obligation` % of total CPU \| 70.821% \| 38.880% \| \| self-profile total CPU time \| 255.042999997s \| 120.661175956s \| \| process `real` / `user` / `sys` \| 220.96s / 235.02s / 7.09s \| 86.35s / 103.66s / 3.54s \| ### Artifact sizes From the same `measureme summarize` output: \| Artifact \| Baseline \| This PR \| Delta \| \|---\|---:\|---:\|---:\| \| `crate_metadata` \| 26,534,471 bytes \| 26,545,248 bytes \| +10,777 \| \| `dep_graph` \| 253,181,425 bytes \| 239,240,806 bytes \| -13,940,619 \| \| `linked_artifact` \| 565,366,624 bytes \| 562,673,176 bytes \| -2,693,448 \| \| `object_file` \| 513,127,264 bytes \| 510,464,096 bytes \| -2,663,168 \| \| `query_cache` \| 137,440,945 bytes \| 136,982,566 bytes \| -458,379 \| \| `cgu_instructions` \| 3,586,307 bytes \| 3,575,121 bytes \| -11,186 \| \| `codegen_unit_size_estimate` \| 2,084,846 bytes \| 2,078,773 bytes \| -6,073 \| \| `work_product_index` \| 19,565 bytes \| 19,565 bytes \| 0 \| ### Baseline hotspots before this change These are the top normalized obligation buckets from the shared baseline profile: \| Obligation bucket \| Samples \| Duration \| \|---\|---:\|---:\| \| `outlives:tasks::review::ReviewTask` \| 1,067 \| 6.33s \| \| `outlives:tools::handlers::unified_exec::UnifiedExecHandler` \| 896 \| 5.63s \| \| `trait:T as tools::registry::ToolHandler` \| 876 \| 5.45s \| \| `outlives:tools::handlers::shell::ShellHandler` \| 888 \| 5.37s \| \| `outlives:tools::handlers::shell::ShellCommandHandler` \| 870 \| 5.29s \| \| `outlives:tools::runtimes::shell::unix_escalation::CoreShellActionProvider` \| 637 \| 3.73s \| \| `outlives:tools::handlers::mcp::McpHandler` \| 695 \| 3.61s \| \| `outlives:tasks::regular::RegularTask` \| 726 \| 3.57s \| Top `items_of_instance` entries before this change were mostly concrete async handler/task impls: \| Instance \| Duration \| \|---\|---:\| \| `tasks::regular::{impl#2}::run` \| 3.79s \| \| `tools::handlers::mcp::{impl#0}::handle` \| 3.27s \| \| `tools::runtimes::shell::unix_escalation::{impl#2}::determine_action` \| 3.09s \| \| `tools::handlers::agent_jobs::{impl#11}::handle` \| 3.07s \| \| `tools::handlers::multi_agents::spawn::{impl#1}::handle` \| 2.84s \| \| `tasks::review::{impl#4}::run` \| 2.82s \| \| `tools::handlers::multi_agents_v2::spawn::{impl#2}::handle` \| 2.80s \| \| `tools::handlers::multi_agents::resume_agent::{impl#1}::handle` \| 2.73s \| \| `tools::handlers::unified_exec::{impl#2}::handle` \| 2.54s \| \| `tasks::compact::{impl#4}::run` \| 2.45s \| ## What changed Relevant pre-change registry shape: [`codex-rs/core/src/tools/registry.rs`](`0bd31dc382/codex-rs/core/src/tools/registry.rs (L38-L219)`) Current registry shape in this PR: [`codex-rs/core/src/tools/registry.rs`](`41f7ac0ade/codex-rs/core/src/tools/registry.rs (L38-L203)`) - `ToolHandler::{is_mutating, handle}` now return native `impl Future + Send` futures instead of using `#[async_trait]`. - `AnyToolHandler` remains the object-safe adapter and boxes those futures at the registry boundary with explicit lifetimes. - Concrete handlers and the registry test handler drop `#[async_trait]` but otherwise keep their async method bodies intact. - Representative examples: [`codex-rs/core/src/tools/handlers/shell.rs`](`41f7ac0ade/codex-rs/core/src/tools/handlers/shell.rs (L223-L379)`), [`codex-rs/core/src/tools/handlers/unified_exec.rs`](`41f7ac0ade/codex-rs/core/src/tools/handlers/unified_exec.rs`), [`codex-rs/core/src/tools/registry_tests.rs`](`41f7ac0ade/codex-rs/core/src/tools/registry_tests.rs`) ## Tradeoff This is intentionally less invasive than #16627: it does not move result boxing into every concrete handler and does not change `ToolHandler` into an object-safe trait. Instead, it keeps the existing registry-level type-erasure boundary and only removes the macro-generated async wrapper layer from concrete impls. So the runtime boxing story stays basically the same as before, while the compile-time savings are still large. ## Verification Existing verification for this branch still applies: - Ran `cargo test -p codex-core`; this change compiled and the suite reached the known unrelated `config::tests::guardian` failures, with no local diff under `codex-rs/core/src/config/`. Profiling commands used for the tables above: - `cargo clean -p codex-core` - `cargo +nightly build -p codex-core --lib -Z unstable-options --timings=json` - `cargo +nightly rustc -p codex-core --lib -- -Z time-passes -Z time-passes-format=json` - `cargo +nightly rustc -p codex-core --lib -- -Z self-profile=... -Z self-profile-events=default,query-keys,args,llvm,artifact-sizes` - `measureme summarize -p 0.5`	2026-04-02 16:03:52 -07:00
Michael Bolin	93380a6fac	fix: add shell fallback paths for pwsh/powershell that work on GitHub Actions Windows runners (#16617 ) Recently, I merged a number of PRs to increase startup timeouts for scripts that ran under PowerShell, but in the failure for `suite::codex_tool::test_shell_command_approval_triggers_elicitation`, I found this in the error logs when running on Bazel with BuildBuddy: ``` [mcp stderr] 2026-04-02T19:54:10.758951Z ERROR codex_core::tools::router: error=Exit code: 1 [mcp stderr] Wall time: 0.2 seconds [mcp stderr] Output: [mcp stderr] 'New-Item' is not recognized as an internal or external command, [mcp stderr] operable program or batch file. [mcp stderr] ``` This error implies that the command was run under `cmd.exe` instead of `pwsh.exe`. Under GitHub Actions, I suspect that the `%PATH%` that is passed to our Bazel builder is scrubbed such that our tests cannot find PowerShell where GitHub installs it. Having these explicit fallback paths should help. While we could enable these only for tests, I don't see any harm in keeping them in production, as well.	2026-04-02 13:47:10 -07:00
Michael Bolin	30ee9e769e	fix: increase another startup timeout for PowerShell (#16613 )	2026-04-02 13:16:16 -07:00
jif-oai	7fc36249b5	chore: rename assign_task for followup_task (#16571 )	2026-04-02 16:51:17 +02:00
jif-oai	ea27d861b2	nit: state machine desc (#16569 )	2026-04-02 16:18:53 +02:00
jif-oai	ab6cce62b8	chore: rework state machine further (#16567 )	2026-04-02 16:15:28 +02:00
jif-oai	e47ed5e57f	fix: races in end of turn (#16566 )	2026-04-02 15:55:55 +02:00
jif-oai	bd50496411	nit: lint (#16564 )	2026-04-02 15:41:18 +02:00

1 2 3 4 5 ...

2313 Commits