codex

mirror of https://github.com/openai/codex.git synced 2026-05-04 13:21:54 +03:00

Author	SHA1	Message	Date
sayan-oai	85c1500569	fix: filter dynamic deferred tools from model_visible_specs (#19771 ) fixes #19486 ### Problem Right now dynamic deferred tools are filtered at normal-turn prompt building time, rather than upstream while building the `ToolRouter` itself. This causes issues because dynamic deferred tools are then wrongly included in the router's `model_visible_specs`, which is what the compaction request-building flow relies on. ### Fix Move the dynamic deferred tool filtering to `ToolRouter` creation time to solve this problem for every request that relies on `ToolRouter` for `model_visible_specs`, which solves the issue generically. ### Tests Added unit + integration tests to ensure dynamic deferred tools are omitted from `model_visible_specs` and compaction request respectively. Tested against live `/compact` endpoint; raw deferred dynamic tools without `tool_search` returned `400` (current bug), while the filtered payload (this fix) returns `200`.	2026-04-27 19:09:02 +00:00
efrazer-oai	2009f6e894	refactor: make auth loading async (#19762 ) ## Summary Auth loading used to expose synchronous construction helpers in several places even though some auth sources now need async work. This PR makes the auth-loading surface async and updates the callers to await it. This is intentionally only plumbing. It does not change how AgentIdentity tokens are decoded, how task runtime ids are allocated, or how JWT signatures are verified. ## Stack 1. This PR: [refactor: make auth loading async](https://github.com/openai/codex/pull/19762) 2. [refactor: load AgentIdentity runtime eagerly](https://github.com/openai/codex/pull/19763) 3. [feat: verify AgentIdentity JWTs with JWKS](https://github.com/openai/codex/pull/19764) ## Important call sites \| Area \| Change \| \| --- \| --- \| \| `codex-login` auth loading \| `CodexAuth` and `AuthManager` construction paths now await auth loading. \| \| app-server startup \| Auth manager construction is awaited during initialization. \| \| CLI/TUI/exec/MCP/chatgpt callers \| Existing auth-loading calls now await the same behavior. \| \| cloud requirements storage loader \| The loader becomes async so it can share the same auth construction path. \| \| auth tests \| Tests that load auth now run in async contexts. \| ## Testing Tests: targeted Rust auth test compilation, formatter, scoped Clippy fix, and Bazel lock check.	2026-04-27 11:00:27 -07:00
jif-oai	01ab25dbb5	feat: use git-backed workspace diffs for memory consolidation (#18982 ) ## Why This PR make the `morpheus` agent (memory phase 2) use a git diff to start it's consolidation. The workflow is the following: 1. The agent acquire a lock 2. If `.codex/memories` does not exist or is not a git root, initialize everything (and make a first empty commit) 3. Update `raw_memories.md` and `rollout_summaries/` as before. Basically we select max N phase 1 memories based on a given policy 4. We use git (`gix`) to get a diff between the current state of `.codex/memories` and the last commit. 5. Dump the diff in `phase2_workspace_diff.md` 6. Spawn `morpheus` and point it to `phase2_workspace_diff.md` 7. Wait for `morpheus` to be done 8. Re-create a new `.git` and make one single commit on it. We do this because we don't want to preserve history through `.git` and this is cheap anyway 9. We release the lock On top of this, we keep the retry policies etc etc The goals of this new workflow are: * Better support of any memory extensions such as `chronicle` * Allow the user to manually edit memories and this will be considered by the phase 2 agent As a follow-up we will need to add support for user's edition while `morpheus` is running ## What Changed - Added memory workspace helpers that prepare the git baseline, compute the diff, write `phase2_workspace_diff.md`, and reset the baseline after successful consolidation. - Updated Phase 2 to sync current inputs into `raw_memories.md` and `rollout_summaries/`, prune old extension resources, skip clean workspaces, and run the consolidation subagent only when the workspace has changes. - Tightened Phase 2 job ownership around long-running consolidation with heartbeats and an ownership check before resetting the baseline. - Simplified the prompt and state APIs so DB watermarks are bookkeeping, while workspace dirtiness decides whether consolidation work exists. - Updated the memory pipeline README and tests for workspace diffs, extension-resource cleanup, pollution-driven forgetting, selection ranking, and baseline persistence. ## Verification - Added/updated coverage in `core/src/memories/tests.rs`, `core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`, and `core/tests/suite/memories.rs`. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-27 14:32:44 +02:00
Michael Bolin	0ccd659b4b	permissions: store only constrained permission profiles (#19735 )	2026-04-26 20:59:58 -07:00
Andrey Mishchenko	35bc6e3d01	Delete unused ResponseItem::Message.end_turn (#19605 ) This field is unused. Delete it.	2026-04-26 17:18:09 -07:00
pakrym-oai	9c3abcd46c	[codex] Move config loading into codex-config (#19487 ) ## Why Config loading had become split across crates: `codex-config` owned the config types and merge logic, while `codex-core` still owned the loader that assembled the layer stack. This change consolidates that responsibility in `codex-config`, so the crate that defines config behavior also owns how configs are discovered and loaded. To make that move possible without reintroducing the old dependency cycle, the shell-environment policy types and helpers that `codex-exec-server` needs now live in `codex-protocol` instead of flowing through `codex-config`. This also makes the migrated loader tests more deterministic on machines that already have managed or system Codex config installed by letting tests override the system config and requirements paths instead of reading the host's `/etc/codex`. ## What Changed - moved the config loader implementation from `codex-core` into `codex-config::loader` and deleted the old `core::config_loader` module instead of leaving a compatibility shim - moved shell-environment policy types and helpers into `codex-protocol`, then updated `codex-exec-server` and other downstream crates to import them from their new home - updated downstream callers to use loader/config APIs from `codex-config` - added test-only loader overrides for system config and requirements paths so loader-focused tests do not depend on host-managed config state - cleaned up now-unused dependency entries and platform-specific cfgs that were surfaced by post-push CI ## Testing - `cargo test -p codex-config` - `cargo test -p codex-core config_loader_tests::` - `cargo test -p codex-protocol -p codex-exec-server -p codex-cloud-requirements -p codex-rmcp-client --lib` - `cargo test --lib -p codex-app-server-client -p codex-exec` - `cargo test --no-run --lib -p codex-app-server` - `cargo test -p codex-linux-sandbox --lib` - `cargo shear` - `just bazel-lock-check` ## Notes - I did not chase unrelated full-suite failures outside the migrated loader surface. - `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive failures on this machine, and Windows CI still shows unrelated long-running/timeouting test noise outside the loader migration itself.	2026-04-26 15:10:53 -07:00
Michael Bolin	deaa307fb2	permissions: derive compatibility policies from profiles (#19392 ) ## Why After #19391, `PermissionProfile` and the split filesystem/network policies could still be stored in parallel. That creates drift risk: a profile can preserve deny globs, external enforcement, or split filesystem entries while a cached projection silently loses those details. This PR makes the profile the runtime source and derives compatibility views from it. ## What Changed - Removes stored filesystem/network sandbox projections from `Permissions` and `SessionConfiguration`; their accessors now derive from the canonical `PermissionProfile`. - Derives legacy `SandboxPolicy` snapshots from profiles only where an older API still needs that field. - Updates MCP connection and elicitation state to track `PermissionProfile` instead of `SandboxPolicy` for auto-approval decisions. - Adds semantic filesystem-policy comparison so cwd changes can preserve richer profiles while still recognizing equivalent legacy projections independent of entry ordering. - Updates config/session tests to assert profile-derived projections instead of parallel stored fields. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392). * #19395 * #19394 * #19393 * __->__ #19392	2026-04-26 15:06:42 -07:00
Michael Bolin	4d7ce3447d	permissions: make runtime config profile-backed (#19606 ) ## Why This supersedes #19391. During stack repair, GitHub marked #19391 as merged into a temporary stack branch rather than into `main`, so the runtime-config change needed a fresh PR. `PermissionProfile` is now the canonical permissions shape after #19231 because it can distinguish `Managed`, `Disabled`, and `External` enforcement while also carrying filesystem rules that legacy `SandboxPolicy` cannot represent cleanly. Core config and session state still needed to accept profile-backed permissions without forcing every profile through the strict legacy bridge, which rejected valid runtime profiles such as direct write roots. The unrelated CI/test hardening that previously rode along with this PR has been split into #19683 so this PR stays focused on the permissions model migration. ## What Changed - Adds `Permissions.permission_profile` and `SessionConfiguration.permission_profile` as constrained runtime state, while keeping `sandbox_policy` as a legacy compatibility projection. - Introduces profile setters that keep `PermissionProfile`, split filesystem/network policies, and legacy `SandboxPolicy` projections synchronized. - Uses a compatibility projection for requirement checks and legacy consumers instead of rejecting profiles that cannot round-trip through `SandboxPolicy` exactly. - Updates config loading, config overrides, session updates, turn context plumbing, prompt permission text, sandbox tags, and exec request construction to carry profile-backed runtime permissions. - Preserves configured deny-read entries and `glob_scan_max_depth` when command/session profiles are narrowed. - Adds `PermissionProfile::read_only()` and `PermissionProfile::workspace_write()` presets that match legacy defaults. ## Verification - `cargo test -p codex-core direct_write_roots` - `cargo test -p codex-core runtime_roots_to_legacy_projection` - `cargo test -p codex-app-server requested_permissions_trust_project_uses_permission_profile_intent` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606). * #19395 * #19394 * #19393 * #19392 * __->__ #19606	2026-04-26 13:29:54 -07:00
Dylan Hurd	f5497f4d65	Split approval matrix test groups (#19454 ) ## Why Recent `main` CI repeatedly timed out in: - `codex-core::all suite::approvals::approval_matrix_covers_all_modes` It failed in runs [24909500958](https://github.com/openai/codex/actions/runs/24909500958), [24908076251](https://github.com/openai/codex/actions/runs/24908076251), [24906197645](https://github.com/openai/codex/actions/runs/24906197645), [24905823212](https://github.com/openai/codex/actions/runs/24905823212), [24903439629](https://github.com/openai/codex/actions/runs/24903439629), [24903336028](https://github.com/openai/codex/actions/runs/24903336028), and [24898949647](https://github.com/openai/codex/actions/runs/24898949647). The failure pattern was a 60s Linux remote timeout. Logs showed many approval scenarios completing before the single matrix test timed out. ## Root Cause `approval_matrix_covers_all_modes` packed every approval/sandbox/tool scenario into one test case. That made the test vulnerable to normal CI variance: one slow scenario or a slow process startup could push the whole monolithic case past the 60s per-test timeout. It also hid which part of the matrix was slow because the runner only reported the one large matrix test. ## What Changed - Keep the shared `scenarios()` table as the single source of approval matrix coverage. - Use one `#[test_case]` per `ScenarioGroup` to generate five async Tokio tests: danger/full-access, read-only, workspace-write, apply-patch, and unified-exec. - Keep the group runner small and add per-scenario error context so a failure still reports the specific scenario name. ## Why This Should Be Reliable Each scenario group now has its own test harness timeout instead of sharing one timeout window with the full matrix. That removes the long sequential loop from a single test while keeping the implementation compact and easy to scan. The tests still run through the same scenario definitions and runner, so this preserves coverage. `test-case` already composes with `#[tokio::test]` in this crate and is already available for test code. ## Verification - `cargo test -p codex-core --test all approval_matrix_ -- --list` - `cargo test -p codex-core --test all approval_matrix_`	2026-04-24 21:38:27 -07:00
Curtis 'Fjord' Hawthorne	8a559e7938	Remove js_repl feature (#19410 )	2026-04-24 17:49:29 -07:00
Michael Bolin	789f387982	permissions: remove legacy read-only access modes (#19449 ) ## Why `ReadOnlyAccess` was a transitional legacy shape on `SandboxPolicy`: `FullAccess` meant the historical read-only/workspace-write modes could read the full filesystem, while `Restricted` tried to carry partial readable roots. The partial-read model now belongs in `FileSystemSandboxPolicy` and `PermissionProfile`, so keeping it on `SandboxPolicy` makes every legacy projection reintroduce lossy read-root bookkeeping and creates unnecessary noise in the rest of the permissions migration. This PR makes the legacy policy model narrower and explicit: `SandboxPolicy::ReadOnly` and `SandboxPolicy::WorkspaceWrite` represent the old full-read sandbox modes only. Split readable roots, deny-read globs, and platform-default/minimal read behavior stay in the runtime permissions model. ## What changed - Removes `ReadOnlyAccess` from `codex_protocol::protocol::SandboxPolicy`, including the generated `access` and `readOnlyAccess` API fields. - Updates legacy policy/profile conversions so restricted filesystem reads are represented only by `FileSystemSandboxPolicy` / `PermissionProfile` entries. - Keeps app-server v2 compatible with legacy `fullAccess` read-access payloads by accepting and ignoring that no-op shape, while rejecting legacy `restricted` read-access payloads instead of silently widening them to full-read legacy policies. - Carries Windows sandbox platform-default read behavior with an explicit override flag instead of depending on `ReadOnlyAccess::Restricted`. - Refreshes generated app-server schema/types and updates tests/docs for the simplified legacy policy shape. ## Verification - `cargo check -p codex-app-server-protocol --tests` - `cargo check -p codex-windows-sandbox --tests` - `cargo test -p codex-app-server-protocol sandbox_policy_` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19449). * #19395 * #19394 * #19393 * #19392 * #19391 * __->__ #19449	2026-04-24 17:16:58 -07:00
rreichel3-oai	219c65dc2f	[codex] Forward Codex Apps tool call IDs to backend metadata (#19207 ) ## Summary - include the outer tool `call_id` in Codex Apps MCP request metadata under `_meta._codex_apps.call_id` - preserve existing Codex Apps metadata like `resource_uri` and `contains_mcp_source` - add request metadata coverage for both the existing-metadata and no-existing-metadata cases ## Why The paired backend change in [openai/openai#850796](https://github.com/openai/openai/pull/850796) updates MCP compliance logging to prefer `_meta._codex_apps.call_id` instead of the JSON-RPC request id. This client change sends that outer tool call id so the backend can record the model/tool call identifier when it is available. This is wire-compatible with older backends because `_meta._codex_apps` is already reserved backend-only metadata. Backends that do not read `call_id` will ignore the extra field. ## Testing - `cargo test -p codex-core request_meta` - `just fmt` - `just fix -p codex-core`	2026-04-24 18:49:34 -04:00
xl-openai	1e560f33e1	feat: Compress skill paths with root aliases (#19098 ) Add skill root tracking so model-visible skill lists can use short path aliases when absolute paths would exceed the metadata budget.	2026-04-24 15:49:07 -07:00
Ahmed Ibrahim	6de6eaa0c1	[4/4] Honor Streamable HTTP MCP placement (#18584 )	2026-04-24 15:03:55 -07:00
Tom	0a9b559c0b	Migrate fork and resume reads to thread store (#18900 ) - Route cold thread/resume and thread/fork source loading through ThreadStore reads instead of direct rollout path operations - Keep lookups that explicitly specify a rollout-path using the local thread store methods but return an invalid-request error for remote ThreadStore configurations - Add some additional unit tests for code path coverage	2026-04-24 13:51:37 -07:00
sayan-oai	c10f95ddac	Update models.json and related fixtures (#19323 ) Supersedes #18735. The scheduled rust-release-prepare workflow force-pushed `bot/update-models-json` back to the generated models.json-only diff, which dropped the test and snapshot updates needed for CI. This PR keeps the latest generated `models.json` from #18735 and adds the corresponding fixture updates: - preserve model availability NUX in the app-server model cache fixture - update core/TUI expectations for the new `gpt-5.4` `xhigh` default reasoning - refresh affected TUI chatwidget snapshots for the `gpt-5.5` default/model copy changes Validation run locally while preparing the fix: - `just fmt` - `cargo test -p codex-app-server model_list` - `cargo test -p codex-core includes_no_effort_in_request` - `cargo test -p codex-core includes_default_reasoning_effort_in_request_when_defined_by_model_info` - `cargo test -p codex-tui --lib chatwidget::tests` - `cargo insta pending-snapshots` --------- Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>	2026-04-24 11:14:13 +02:00
sayan-oai	e083b6c757	chore: apply truncation policy to unified_exec (#19247 ) we were not respecting turn's `truncation_policy` to clamp output tokens for `unified_exec` and `write_stdin`. this meant truncation was only being applied by `ContextManager` before the output was stored in-memory (so it _was_ being truncated from model-visible context), but the full output was persisted to rollout on disk. now we respect that `truncation_policy` and `ContextManager`-level truncation remains a backup. ### Tests added tests, tested locally.	2026-04-24 00:17:39 -07:00
Michael Bolin	4816b89204	permissions: make profiles represent enforcement (#19231 ) ## Why `PermissionProfile` is becoming the canonical permissions abstraction, but the old shape only carried optional filesystem and network fields. It could describe allowed access, but not who is responsible for enforcing it. That made `DangerFullAccess` and `ExternalSandbox` lossy when profiles were exported, cached, or round-tripped through app-server APIs. The important model change is that active permissions are now a disjoint union over the enforcement mode. Conceptually: ```rust pub enum PermissionProfile { Managed { file_system: FileSystemSandboxPolicy, network: NetworkSandboxPolicy, }, Disabled, External { network: NetworkSandboxPolicy, }, } ``` This distinction matters because `Disabled` means Codex should apply no outer sandbox at all, while `External` means filesystem isolation is owned by an outside caller. Those are not equivalent to a broad managed sandbox. For example, macOS cannot nest Seatbelt inside Seatbelt, so an inner sandbox may require the outer Codex layer to use no sandbox rather than a permissive one. ## How Existing Modeling Maps Legacy `SandboxPolicy` remains a boundary projection, but it now maps into the higher-fidelity profile model: - `ReadOnly` and `WorkspaceWrite` map to `PermissionProfile::Managed` with restricted filesystem entries plus the corresponding network policy. - `DangerFullAccess` maps to `PermissionProfile::Disabled`, preserving the “no outer sandbox” intent instead of treating it as a lax managed sandbox. - `ExternalSandbox { network_access }` maps to `PermissionProfile::External { network }`, preserving external filesystem enforcement while still carrying the active network policy. - Split runtime policies that legacy `SandboxPolicy` cannot faithfully express, such as managed unrestricted filesystem plus restricted network, stay `Managed` instead of being collapsed into `ExternalSandbox`. - Per-command/session/turn grants remain partial overlays via `AdditionalPermissionProfile`; full `PermissionProfile` is reserved for complete active runtime permissions. ## What Changed - Change active `PermissionProfile` into a tagged union: `managed`, `disabled`, and `external`. - Keep partial permission grants separate with `AdditionalPermissionProfile` for command/session/turn overlays. - Represent managed filesystem permissions as either `restricted` entries or `unrestricted`; `glob_scan_max_depth` is non-zero when present. - Preserve old rollout compatibility by accepting the pre-tagged `{ network, file_system }` profile shape during deserialization. - Preserve fidelity for important edge cases: `DangerFullAccess` round-trips as `disabled`, `ExternalSandbox` round-trips as `external`, and managed unrestricted filesystem + restricted network stays managed instead of being mistaken for external enforcement. - Preserve configured deny-read entries and bounded glob scan depth when full profiles are projected back into runtime policies, including unrestricted replacements that now become `:root = write` plus deny entries. - Regenerate the experimental app-server v2 JSON/TypeScript schema and update the `command/exec` README example for the tagged `permissionProfile` shape. ## Compatibility Legacy `SandboxPolicy` remains available at config/API boundaries as the compatibility projection. Existing rollout lines with the old `PermissionProfile` shape continue to load. The app-server `permissionProfile` field is experimental, so its v2 wire shape is intentionally updated to match the higher-fidelity model. ## Verification - `just write-app-server-schema` - `cargo check --tests` - `cargo test -p codex-protocol permission_profile` - `cargo test -p codex-protocol preserving_deny_entries_keeps_unrestricted_policy_enforceable` - `cargo test -p codex-app-server-protocol permission_profile_file_system_permissions` - `cargo test -p codex-app-server-protocol serialize_client_response` - `cargo test -p codex-core session_configured_reports_permission_profile_for_external_sandbox` - `just fix` - `just fix -p codex-protocol` - `just fix -p codex-app-server-protocol` - `just fix -p codex-core` - `just fix -p codex-app-server`	2026-04-23 23:02:18 -07:00
Celia Chen	e8d8080818	feat: let model providers own model discovery (#18950 ) ## Why `codex-models-manager` had grown to own provider-specific concerns: constructing OpenAI-compatible `/models` requests, resolving provider auth, emitting request telemetry, and deciding how provider catalogs should be sourced. That made the manager harder to reuse for providers whose model catalog is not fetched from the OpenAI `/models` endpoint, such as Amazon Bedrock. This change moves provider-specific model discovery behind provider-owned implementations, so the models manager can focus on refresh policy, cache behavior, picker ordering, and model metadata merging. ## What Changed - Introduced a `ModelsManager` trait with separate `OpenAiModelsManager` and `StaticModelsManager` implementations. - Added `ModelsEndpointClient` so OpenAI-compatible HTTP fetching lives outside `codex-models-manager`. - Moved `/models` request construction, provider auth resolution, timeout handling, and request telemetry into `codex-model-provider` via `OpenAiModelsEndpoint`. - Added provider-owned `models_manager(...)` construction so configured OpenAI-compatible providers use `OpenAiModelsManager`, while static/catalog-backed providers can return `StaticModelsManager`. - Added an Amazon Bedrock static model catalog for the GPT OSS Bedrock model IDs. - Updated core/session/thread manager code and tests to depend on `Arc<dyn ModelsManager>`. - Moved offline model test helpers into `codex_models_manager::test_support`. ## Metadata References The Bedrock catalog metadata is based on the official Amazon Bedrock OpenAI model documentation: - [Amazon Bedrock OpenAI models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html) lists the Bedrock model IDs, text input/output modalities, and `128,000` token context window for `gpt-oss-20b` and `gpt-oss-120b`. - [Amazon Bedrock `gpt-oss-120b` model card](https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-openai-gpt-oss-120b.html) lists the `bedrock-runtime` model ID `openai.gpt-oss-120b-1:0`, the `bedrock-mantle` model ID `openai.gpt-oss-120b`, text-only modalities, and `128K` context window. - [OpenAI `gpt-oss-120b` model docs](https://developers.openai.com/api/docs/models/gpt-oss-120b) document configurable reasoning effort with `low`, `medium`, and `high`, plus text input/output modality. The display names, default reasoning effort, and priority ordering are Codex-local catalog choices. ## Test Plan - Manually verified app-server model listing with an AWS profile: ```shell CODEX_HOME="$(mktemp -d)" cargo run -p codex-app-server-test-client -- \ --codex-bin ./target/debug/codex \ -c 'model_provider="amazon-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.profile="codex-bedrock"' \ -c 'model_providers.amazon-bedrock.aws.region="us-west-2"' \ model-list ``` The response returned the Bedrock catalog with `openai.gpt-oss-120b-1:0` as the default model and `openai.gpt-oss-20b-1:0` as the second listed model, both text-only and supporting low/medium/high reasoning effort.	2026-04-24 04:28:25 +00:00
Michael Bolin	040976b218	tests: isolate approval fixtures from host rules (#18288 ) ## Why Several approval-focused tests were unintentionally sensitive to host-level rule files. On machines with broader allowed command prefixes, commonly allowed commands such as `/bin/date` could bypass the approval path these tests were meant to exercise, making the fixtures depend on the developer or CI host configuration. ## What changed - Pins the approval matrix fixture to the explicit user reviewer so it does not inherit a host reviewer. - Changes OTel approval fixtures to request `/usr/bin/touch codex-otel-approval-test`, avoiding a command that may be pre-approved by local rules. - Clears the config layer stack for the permissions-message assertion that needs to compare only the permissions text under test. ## Verification - `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test all approval_matrix_covers_all_modes -- --nocapture` - `env -u CODEX_SANDBOX_NETWORK_DISABLED cargo test -p codex-core --test all permissions_messages -- --nocapture`	2026-04-23 14:12:09 -07:00
Eric Traut	a50cb205b7	Stabilize plugin MCP tools test (#19191 ) ## Summary The plugin MCP tool-listing test could hide MCP startup failures by polling `ListMcpTools` until its own 30s deadline. If the plugin MCP server startup had already failed or timed out, the session-owned MCP manager would keep returning an empty tool list, so CI only reported `discovered tools: []` instead of the startup state that mattered. This makes the test synchronize on `McpStartupComplete` for the sample plugin MCP server before asserting listed tools, and gives the Bazel-launched test server a larger startup window. ## Notes Confidence is about 80%. The source path strongly supports the RCA: a failed MCP startup is represented as an empty tool list through `ListMcpTools`, so the old polling contract could not distinguish "not ready yet" from "startup already failed." I could not retrieve the CI execution-log artifact to confirm the exact hidden startup error, but the observed Ubuntu Bazel failure matches this path: repeated `ListMcpTools` responses with no tools until the test-local timeout fired. I think this is the right solution because it keeps plugin behavior unchanged and fixes only the test contract. Future startup failures should now report the `McpStartupComplete` failure/cancellation instead of timing out on an empty tool snapshot. This test was introduced in https://github.com/openai/codex/pull/12864.	2026-04-23 14:08:40 -07:00
Michael Bolin	f90cc0ee64	tui: carry permission profiles on user turns (#18285 ) ## Why Per-turn permission overrides should use the same canonical profile abstraction as session configuration. That lets TUI submissions preserve exact configured permissions without round-tripping through legacy sandbox fields. ## What changed This adds `permission_profile` to user-turn operations, threads it through TUI/app-server submission paths, fills the new field in existing test fixtures, and adds coverage that composer submission includes the configured profile. ## Verification - `cargo test -p codex-tui permissions -- --nocapture` - `cargo test -p codex-core --test all permissions_messages -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18285). * #18288 * #18287 * #18286 * __->__ #18285	2026-04-23 11:54:17 -07:00
jif-oai	a2f868c9d6	feat: drop spawned-agent context instructions (#19127 ) ## Why MultiAgentV2 children should not receive an extra model-visible developer fragment just because they were spawned. The parent/configured developer instructions should carry through normally, but the dedicated `<spawned_agent_context>` block is no longer desired. ## What changed - Removed the `SpawnAgentInstructions` context fragment and its `<spawned_agent_context>` wrapper. - Stopped appending spawned-agent instructions in `codex-rs/core/src/tools/handlers/multi_agents_v2/spawn.rs`. - Updated subagent notification coverage to assert inherited parent developer instructions without expecting the spawned-agent wrapper. ## Verification - `cargo test -p codex-core --test all spawned_multi_agent_v2_child_inherits_parent_developer_context -- --nocapture` - `cargo test -p codex-core --test all skills_toggle_skips_instructions_for_parent_and_spawned_child -- --nocapture` - `cargo test -p codex-core --test all subagent_notifications -- --nocapture`	2026-04-23 18:54:45 +02:00
Abhinav	305825abd9	Support MCP tools in hooks (#18385 ) ## Summary Lifecycle hooks currently treat `PreToolUse`, `PostToolUse`, and `PermissionRequest` as Bash-only flows - hook schema constrains `tool_name` to `Bash` - hook input assumes a command-shaped `tool_input` - core hook dispatch path passes only shell command strings That means hooks cannot target MCP tools even though MCP tool names are model-visible and stable This change generalizes those hook paths so they can match and receive payloads for MCP tools while preserving the existing Bash behavior. ## Reviewer Notes I think these are the key files - `codex-rs/core/src/tools/handlers/mcp.rs` - `codex-rs/core/src/mcp_tool_call.rs` Otherwise the changes across apply_patch, shell, and unified_exec are mainly to rewire everything to be `tool_input` based instead of just `command` so that it'll make sense for MCP tools. ## Changes - Allow `PreToolUse`, `PostToolUse`, and `PermissionRequest` hook inputs to carry arbitrary `tool_name` and `tool_input` values instead of hard-coding `Bash` and command-only payloads. - Add MCP hook payload support through `McpHandler`, using the model-visible tool name from `ToolInvocation` and the raw MCP arguments as `tool_input`. - Include MCP tool responses in `PostToolUse` by serializing `McpToolOutput` into the hook response payload. - Run `PermissionRequest` hooks for MCP approval requests after remembered approval checks and before falling back to user-facing MCP elicitation. - Preserve exact matching for literal hook matchers like `Bash` and `mcp__memory__create_entities`, while keeping regex matcher support for patterns like `mcp__memory__.` and `mcp__.__write.*`. --------- Co-authored-by: Andrei Eternal <eternal@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-04-23 07:33:57 +00:00
Eric Traut	bbff4ee61a	Add safety check notification and error handling (#19055 ) Adds a new app-server notification that fires when a user account has been flagged for potential safety reasons.	2026-04-22 22:24:12 -07:00
Andrei Eternal	2b2de3f38b	codex: support hooks in config.toml and requirements.toml (#18893 ) ## Summary Support the existing hooks schema in inline TOML so hooks can be configured from both `config.toml` and enterprise-managed `requirements.toml` without requiring a separate `hooks.json` payload. This gives enterprise admins a way to ship managed hook policy through the existing requirements channel while still leaving script delivery to MDM or other device-management tooling, and it keeps `hooks.json` working unchanged for existing users. This also lays the groundwork for follow-on managed filtering work such as #15937, while continuing to respect project trust gating from #14718. It does not implement `allow_managed_hooks_only` itself. NOTE: yes, it's a bit unfortunate that the toml isn't formatted as closely as normal to our default styling. This is because we're trying to stay compatible with the spec for plugins/hooks that we'll need to support & the main usecase here is embedding into requirements.toml ## What changed - moved the shared hook serde model out of `codex-rs/hooks` into `codex-rs/config` so the same schema can power `hooks.json`, inline `config.toml` hooks, and managed `requirements.toml` hooks - added `hooks` support to both `ConfigToml` and `ConfigRequirementsToml`, including requirements-side `managed_dir` / `windows_managed_dir` - treated requirements-managed hooks as one constrained value via `Constrained`, so managed hook policy is merged atomically and cannot drift across requirement sources - updated hook discovery to load requirements-managed hooks first, then per-layer `hooks.json`, then per-layer inline TOML hooks, with a warning when a single layer defines both representations - threaded managed hook metadata through discovered handlers and exposed requirements hooks in app-server responses, generated schemas, and `/debug-config` - added hook/config coverage in `codex-rs/config`, `codex-rs/hooks`, `codex-rs/core/src/config_loader/tests.rs`, and `codex-rs/core/tests/suite/hooks.rs` ## Testing - `cargo test -p codex-config` - `cargo test -p codex-hooks` - `cargo test -p codex-app-server config_api` ## Documentation Companion updates are needed in the developers website repo for: - the hooks guide - the config reference, sample, basic, and advanced pages - the enterprise managed configuration guide --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-04-22 21:20:09 -07:00
Dylan Hurd	5e71da1424	feat(request-permissions) approve with strict review (#19050 ) ## Summary Allow the user to approve a request_permissions_tool request with the condition that all commands in the rest of the turn are reviewed by guardian, regardless of sandbox status. ## Testing - [x] Added unit tests - [x] Ran locally	2026-04-23 01:56:32 +00:00
Matthew Zeng	8f0a92c1e5	Fix relative stdio MCP cwd fallback (#19031 )	2026-04-22 17:52:17 -07:00
Andrei Eternal	eed0e07825	hooks: emit Bash PostToolUse when exec_command completes via write_stdin (#18888 ) Fixes #16246. ## Why `exec_command` already emits `PreToolUse`, but long-running unified exec commands that finish on a later `write_stdin` poll could miss the matching `PostToolUse`. That left the Bash hook lifecycle inconsistent, broke expectations around `tool_use_id` and `tool_input.command`, and meant `PostToolUse` block/replacement feedback could fail to replace the final session output before it reached model context. This keeps the fix scoped to the `exec_command` / `write_stdin` lifecycle. Broader non-Bash hook expansion is still out of scope here and remains tracked separately in #16732. ## What changed - Compute and store `PostToolUsePayload` while handlers still have access to their concrete output type, and carry `tool_use_id` through that payload. - Preserve the original hook-facing `exec_command` string through unified exec state (`ExecCommandRequest`, `ProcessEntry`, `PreparedProcessHandles`, and `ExecCommandToolOutput`) via `hook_command`, and remove the now-unused `session_command` output metadata. - Emit exactly one Bash `PostToolUse` for long-running `exec_command` sessions when a later `write_stdin` poll observes final completion, using the original `exec_command` call id and hook-facing command. - Keep one-shot `exec_command` behavior aligned with the same payload construction, including interactive completions that return a final result directly. - Apply `PostToolUse` block/replacement feedback before the final `write_stdin` completion output is sent back to the model. - Keep `write_stdin` itself out of `PreToolUse` matching so it continues to act as transport/polling for the original Bash tool call. - Restore plain matcher behavior for tool-name matchers such as `Bash` and `Edit\|Write`, while still treating patterns with regex characters (for example `mcp__.*`) as regexes. - Add unit coverage for unified exec payload construction and parallel session separation, plus a core integration regression that verifies a blocked `PostToolUse` replaces the final `write_stdin` output in model context. ## Testing - `cargo test -p codex-hooks` - `cargo test -p codex-core post_tool_use_payload` - `cargo test -p codex-core post_tool_use_blocks_when_exec_session_completes_via_write_stdin`	2026-04-22 17:14:22 -07:00
Michael Bolin	6ca038bbd1	rollout: persist turn permission profiles (#18281 ) ## Why Resume and reconstruction need to preserve the permissions that were active for each user turn. If rollouts only keep legacy sandbox fields, replay cannot faithfully represent profile-shaped overrides introduced earlier in the stack. ## What changed This records `permission_profile` on user-turn rollout events, reconstructs it through history/state extraction, and updates rollout reconstruction and related fixtures to keep the field explicit. ## Verification - `cargo test -p codex-core --test all permissions_messages -- --nocapture` - `cargo test -p codex-core --test all request_permissions -- --nocapture` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18281). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * __->__ #18281	2026-04-22 17:00:29 -07:00
Michael Bolin	d3dd0d759b	exec-server: expose arg0 alias root to fs sandbox (#19016 ) ## Why The post-merge `rust-ci-full` run for #18999 still failed the Ubuntu remote `suite::remote_env` sandboxed filesystem tests. That run checked out merge commit `ddde50c611e4800cb805f243ed3c50bbafe7d011`, so the arg0 guard lifetime fix was present. The Docker-backed failure had two remaining pieces: - The sandboxed filesystem helper needs to execute Codex through the `codex-linux-sandbox` arg0 alias path. The helper sandbox was only granting read access to the real Codex executable parent, so the alias parent also has to be visible inside the helper sandbox. - The remote-env tests were building sandbox contexts with `FileSystemSandboxContext::new()`, which captures the local test runner cwd. In the Docker remote exec-server, that host checkout path does not exist, so spawning the filesystem helper failed with `No such file or directory` before the helper could process the request. ## What Changed - Track all helper runtime read roots instead of a single root. - Add both the real Codex executable parent and the `codex-linux-sandbox` alias parent to sandbox readable roots. - Avoid sending an unused local cwd in remote filesystem sandbox contexts when the permission profile has no cwd-dependent entries. - Build the Docker remote-env test sandbox contexts with a cwd path that exists inside the container. - Add unit coverage for the alias-parent root and remote sandbox cwd handling. ## Verification - `cargo test -p codex-exec-server` - `cargo test -p codex-core remote_test_env_sandboxed_read_allows_readable_root` - `just fix -p codex-exec-server` - `just fix -p codex-core`	2026-04-22 21:34:22 +00:00
Michael Bolin	18a26d7bbc	app-server: accept permission profile overrides (#18279 ) ## Why `PermissionProfile` is becoming the canonical permissions shape shared by core and app-server. After app-server responses expose the active profile, clients need to be able to send that same shape back when starting, resuming, forking, or overriding a turn instead of translating through the legacy `sandbox`/`sandboxPolicy` shorthands. This still needs to preserve the existing requirements/platform enforcement model. A profile-shaped request can be downgraded or rejected by constraints, but the server should keep the user's elevated-access intent for project trust decisions. Turn-level profile overrides also need to retain existing read protections, including deny-read entries and bounded glob-scan metadata, so a permission override cannot accidentally drop configured protections such as `*/.env = deny`. ## What changed - Adds optional `permissionProfile` request fields to `thread/start`, `thread/resume`, `thread/fork`, and `turn/start`. - Rejects ambiguous requests that specify both `permissionProfile` and the legacy `sandbox`/`sandboxPolicy` fields, including running-thread resume requests. - Converts profile-shaped overrides into core runtime filesystem/network permissions while continuing to derive the constrained legacy sandbox projection used by existing execution paths. - Preserves project-trust intent for profile overrides that are equivalent to workspace-write or full-access sandbox requests. - Preserves existing deny-read entries and `globScanMaxDepth` when applying turn-level `permissionProfile` overrides. - Updates app-server docs plus generated JSON/TypeScript schema fixtures and regression coverage. ## Verification - `cargo test -p codex-app-server-protocol schema_fixtures` - `cargo test -p codex-core session_configuration_apply_permission_profile_preserves_existing_deny_read_entries` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18279). * #18288 * #18287 * #18286 * #18285 * #18284 * #18283 * #18282 * #18281 * #18280 * __->__ #18279	2026-04-22 13:34:33 -07:00
cassirer-openai	f67383bcba	[rollout_trace] Record core session rollout traces (#18877 ) ## Summary Wires rollout trace recording into `codex-core` session and turn execution. This records the core model request/response, compaction, and session lifecycle boundaries needed for replay without yet tracing every nested runtime/tool boundary. ## Stack This is PR 2/5 in the rollout trace stack. - [#18876](https://github.com/openai/codex/pull/18876): Add rollout trace crate - [#18877](https://github.com/openai/codex/pull/18877): Record core session rollout traces - [#18878](https://github.com/openai/codex/pull/18878): Trace tool and code-mode boundaries - [#18879](https://github.com/openai/codex/pull/18879): Trace sessions and multi-agent edges - [#18880](https://github.com/openai/codex/pull/18880): Add debug trace reduction command ## Review Notes This layer is the first live integration point. The important review question is whether trace recording is isolated from normal session behavior: trace failures should not become user-visible execution failures, and recording should preserve the existing turn/session lifecycle semantics. The PR depends on the reducer/data model from the first stack entry and only introduces the core recorder surface that later PRs use for richer runtime and relationship events.	2026-04-22 17:00:48 +00:00
rhan-oai	213b17b7a3	[codex-analytics] guardian review TTFT plumbing and emission (#17696 ) ## Why Guardian analytics includes time-to-first-token, but the Guardian reviewer runs as a normal Codex session and `TurnCompleteEvent` did not expose TTFT. The timing needs to flow through the standard turn-completion protocol so Guardian review analytics can consume the same value as the rest of the session machinery. ## What changed Adds optional `time_to_first_token_ms` to `TurnCompleteEvent` and populates it from `TurnTiming`. The value is carried through app-server thread history, rollout reconstruction, TUI/app-server adapters, and Guardian review session handling. Guardian review analytics now captures TTFT from the reviewer turn-complete event when available. Existing tests and fixtures are updated to set the new optional field to `None` where TTFT is not relevant. ## Verification - `cargo clippy -p codex-tui --tests -- -D warnings` - `cargo clippy -p codex-core --lib --tests -- -D warnings` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17696). * __->__ #17696 * #17695 * #17693 * #18278 * #18953	2026-04-22 01:52:48 -07:00
Michael Bolin	e18fe7a07f	test(core): move prompt debug coverage to integration suite (#18916 ) ## Why `build_prompt_input` now initializes `ExecServerRuntimePaths`, which requires a configured Codex executable path. The previous inline unit test in `core/src/prompt_debug.rs` built a bare `test_config()` and then failed before it could assert anything useful: ```text Codex executable path is not configured ``` This coverage is also integration-shaped: it drives the public `build_prompt_input` entry point through config, thread, and session setup rather than testing a small internal helper in isolation. Bazel CI did not catch this earlier because the affected test was behind the same wrapped Rust unit-test path fixed by #18913. Before that launcher/sharding fix, the outer `workspace_root_test` changed the working directory for Insta compatibility while the inner `rules_rust` sharding wrapper still expected its runfiles working directory. In practice, Bazel could report success without executing the Rust test cases in that shard. Once #18913 makes the wrapper run the Rust test binary directly and shard with libtest arguments, this stale unit test actually runs and exposes the missing `codex_self_exe` setup. ## What Changed - Moved `build_prompt_input_includes_context_and_user_message` out of `core/src/prompt_debug.rs`. - Added `core/tests/suite/prompt_debug_tests.rs` and registered it from `core/tests/suite/mod.rs`. - Builds the test config with `ConfigBuilder` and provides `codex_self_exe` using the current test executable, matching the runtime-path invariant required by prompt debug setup. - Preserves the existing assertions that the generated prompt input includes both the debug user message and project-specific user instructions. ## Verification - `cargo test -p codex-core --test all prompt_debug_tests::build_prompt_input_includes_context_and_user_message` - `bazel test //codex-rs/core:core-all-test --test_arg=prompt_debug_tests::build_prompt_input_includes_context_and_user_message --test_output=errors` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18916). * #18913 * __->__ #18916	2026-04-22 01:08:25 +00:00
Felipe Coury	09ebc34f17	fix(core): emit hooks for apply_patch edits (#18391 ) Fixes https://github.com/openai/codex/issues/16732. ## Why `apply_patch` is Codex's primary file edit path, but it was not emitting `PreToolUse` or `PostToolUse` hook events. That meant hook-based policy, auditing, and write coordination could observe shell commands while missing the actual file mutation performed by `apply_patch`. The issue also exposed that the hook runtime serialized command hook payloads with `tool_name: "Bash"` unconditionally. Even if `apply_patch` supplied hook payloads, hooks would either fail to match it directly or receive misleading stdin that identified the edit as a Bash tool call. ## What Changed - Added `PreToolUse` and `PostToolUse` payload support to `ApplyPatchHandler`. - Exposed the raw patch body as `tool_input.command` for both JSON/function and freeform `apply_patch` calls. - Taught tool hook payloads to carry a handler-supplied hook-facing `tool_name`. - Preserved existing shell compatibility by continuing to emit `Bash` for shell-like tools. - Serialized the selected hook `tool_name` into hook stdin instead of hardcoding `Bash`. - Relaxed the generated hook command input schema so `tool_name` can represent tools other than `Bash`. ## Verification Added focused handler coverage for: - JSON/function `apply_patch` calls producing a `PreToolUse` payload. - Freeform `apply_patch` calls producing a `PreToolUse` payload. - Successful `apply_patch` output producing a `PostToolUse` payload. - Shell and `exec_command` handlers continuing to expose `Bash`. Added end-to-end hook coverage for: - A `PreToolUse` hook matching `^apply_patch$` blocking the patch before the target file is created. - A `PostToolUse` hook matching `^apply_patch$` receiving the patch input and tool response, then adding context to the follow-up model request. - Non-participating tools such as the plan tool continuing not to emit `PreToolUse`/`PostToolUse` hook events. Also validated manually with a live `codex exec` smoke test using an isolated temp workspace and temp `CODEX_HOME`. The smoke test confirmed that a real `apply_patch` edit emits `PreToolUse`/`PostToolUse` with `tool_name: "apply_patch"`, a shell command still emits `tool_name: "Bash"`, and a denying `PreToolUse` hook prevents the blocked patch file from being created.	2026-04-21 22:00:40 -03:00
starr-openai	1d4cc494c9	Add turn-scoped environment selections (#18416 ) ## Summary - add experimental turn/start.environments params for per-turn environment id + cwd selections - pass selections through core protocol ops and resolve them with EnvironmentManager before TurnContext creation - treat omitted selections as default behavior, empty selections as no environment, and non-empty selections as first environment/cwd as the turn primary ## Testing - ran `just fmt` - ran `just write-app-server-schema` - not run: unit tests for this stacked PR --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 17:48:33 -07:00
starr-openai	ddbe2536be	Support multiple managed environments (#18401 ) ## Summary - refactor EnvironmentManager to own keyed environments with default/local lookup helpers - keep remote exec-server client creation lazy until exec/fs use - preserve disabled agent environment access separately from internal local environment access ## Validation - not run (per Codex worktree instruction to avoid tests/builds unless requested) --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-21 15:29:35 -07:00
efrazer-oai	be75785504	fix: fully revert agent identity runtime wiring (#18757 ) ## Summary This PR fully reverts the previously merged Agent Identity runtime integration from the old stack: https://github.com/openai/codex/pull/17387/changes It removes the Codex-side task lifecycle wiring, rollout/session persistence, feature flag plumbing, lazy `auth.json` mutation, background task auth paths, and request callsite changes introduced by that stack. This leaves the repo in a clean pre-AgentIdentity integration state so the follow-up PRs can reintroduce the pieces in smaller reviewable layers. ## Stack 1. This PR: full revert 2. https://github.com/openai/codex/pull/18871: move Agent Identity business logic into a crate 3. https://github.com/openai/codex/pull/18785: add explicit AgentIdentity auth mode and startup task allocation 4. https://github.com/openai/codex/pull/18811: migrate auth callsites through AuthProvider ## Testing Tests: targeted Rust checks, cargo-shear, Bazel lock check, and CI.	2026-04-21 14:30:55 -07:00
jif-oai	15b8cde2a4	chore: default multi-agent v2 fork to all (#18873 ) Default sub-agents v2 to `all` for the fork mode	2026-04-21 21:54:58 +01:00
Michael Bolin	f8562bd47b	sandboxing: intersect permission profiles semantically (#18275 ) ## Why Permission approval responses must not be able to grant more access than the tool requested. Moving this flow to `PermissionProfile` means the comparison must be profile-shaped instead of `SandboxPolicy`-shaped, and cwd-relative special paths such as `:cwd` and `:project_roots` must stay anchored to the turn that produced the request. ## What changed This implements semantic `PermissionProfile` intersection in `codex-sandboxing` for file-system and network permissions. The intersection accepts narrower path grants, rejects broader grants, preserves deny-read carve-outs and glob scan depth, and materializes cwd-dependent special-path grants to absolute paths before they can be recorded for reuse. The request-permissions response paths now use that intersection consistently. App-server captures the request turn cwd before waiting for the client response, includes that cwd in the v2 approval params, and core stores the requested profile plus cwd for direct TUI/client responses and Guardian decisions before recording turn- or session-scoped grants. The TUI app-server bridge now preserves the app-server request cwd when converting permission approval params into core events. ## Verification - `cargo test -p codex-sandboxing intersect_permission_profiles -- --nocapture` - `cargo test -p codex-app-server request_permissions_response -- --nocapture` - `cargo test -p codex-core request_permissions_response_materializes_session_cwd_grants_before_recording -- --nocapture` - `cargo check -p codex-tui --tests` - `cargo check --tests` - `cargo test -p codex-tui app_server_request_permissions_preserves_file_system_permissions`	2026-04-21 10:23:01 -07:00
pakrym-oai	2a226096f6	Split DeveloperInstructions into individual fragments. (#18813 ) Split DeveloperInstructions into individual fragments.	2026-04-21 10:22:36 -07:00
pash-openai	dc1a8f2190	[tool search] support namespaced deferred dynamic tools (#18413 ) Deferred dynamic tools need to round-trip a namespace so a tool returned by `tool_search` can be called through the same registry key that core uses for dispatch. This change adds namespace support for dynamic tool specs/calls, persists it through app-server thread state, and routes dynamic tool calls by full `ToolName` while still sending the app the leaf tool name. Deferred dynamic tools must provide a namespace; non-deferred dynamic tools may remain top-level. It also introduces `LoadableToolSpec` as the shared function-or-namespace Responses shape used by both `tool_search` output and dynamic tool registration, so dynamic tools use the same wrapping logic in both paths. Validation: - `cargo test -p codex-tools` - `cargo test -p codex-core tool_search` --------- Co-authored-by: Sayan Sisodiya <sayan@openai.com>	2026-04-21 14:13:08 +08:00
Celia Chen	cefcfe43b9	feat: add a built-in Amazon Bedrock model provider (#18744 ) ## Why Codex needs a first-class `amazon-bedrock` model provider so users can select Bedrock without copying a full provider definition into `config.toml`. The provider has Codex-owned defaults for the pieces that should stay consistent across users: the display `name`, Bedrock `base_url`, and `wire_api`. At the same time, users still need a way to choose the AWS credential profile used by their local environment. This change makes `amazon-bedrock` a partially modifiable built-in provider: code owns the provider identity and endpoint defaults, while user config can set `model_providers.amazon-bedrock.aws.profile`. For example: ```toml model_provider = "amazon-bedrock" [model_providers.amazon-bedrock.aws] profile = "codex-bedrock" ``` ## What Changed - Added `amazon-bedrock` to the built-in model provider map with: - `name = "Amazon Bedrock"` - `base_url = "https://bedrock-mantle.us-east-1.api.aws/v1"` - `wire_api = "responses"` - Added AWS provider auth config with a profile-only shape: `model_providers.<id>.aws.profile`. - Kept AWS auth config restricted to `amazon-bedrock`; custom providers that set `aws` are rejected. - Allowed `model_providers.amazon-bedrock` through reserved-provider validation so it can act as a partial override. - During config loading, only `aws.profile` is copied from the user-provided `amazon-bedrock` entry onto the built-in provider. Other Bedrock provider fields remain hard-coded by the built-in definition. - Updated the generated config schema for the new provider AWS profile config.	2026-04-21 00:54:05 +00:00
guinness-oai	ca3246f77a	[codex] Send realtime transcript deltas on handoff (#18761 ) ## Summary - Track how many realtime transcript entries have already been attached to a background-agent handoff. - Attach only entries added since the previous handoff as `<transcript_delta>` instead of resending the accumulated transcript snapshot. - Update the realtime integration test so the second delegation carries only the second transcript delta. ## Validation - `just fmt` - `cargo test -p codex-api` - `cargo test -p codex-core inbound_handoff_request_sends_transcript_delta_after_each_handoff` - `cargo build -p codex-cli -p codex-app-server` ## Manual testing Built local debug binaries at: - `codex-rs/target/debug/codex` - `codex-rs/target/debug/codex-app-server`	2026-04-20 16:46:15 -07:00
viyatb-oai	33fa952426	fix: fix stale proxy env restoration after shell snapshots (#17271 ) ## Summary This fixes a stale-environment path in shell snapshot restoration. A sandboxed command can source a shell snapshot that was captured while an older proxy process was running. If that proxy has died and come back on a different port, the snapshot can otherwise put old proxy values back into the command environment, which is how tools like `pip` end up talking to a dead proxy. The wrapper now captures the live process environment before sourcing the snapshot and then restores or clears every proxy env var from the proxy crate's canonical list. That makes proxy state after shell snapshot restoration match the current command environment, rather than whatever proxy values happened to be present in the snapshot. On macOS, the Codex-generated `GIT_SSH_COMMAND` is refreshed when the SOCKS listener changes, while custom SSH wrappers are still left alone. --------- Co-authored-by: Codex <noreply@openai.com>	2026-04-20 16:39:17 -07:00
guinness-oai	1029742cf7	Add realtime silence tool (#18635 ) ## Summary Adds a second realtime v2 function tool, `remain_silent`, so the realtime model has an explicit non-speaking action when the collaboration mode or latest context says it should not answer aloud. This is stacked on #18597. ## Design - Advertise `remain_silent` alongside `background_agent` in realtime v2 conversational sessions. - Parse `remain_silent` function calls into a typed `RealtimeEvent::NoopRequested` event. - Have core answer that function call with an empty `function_call_output` and deliberately avoid `response.create`, so no follow-up realtime response is requested. - Keep the event hidden from app-server/TUI surfaces; it is operational plumbing, not user-visible conversation content.	2026-04-20 15:43:20 -07:00
Thibault Sottiaux	54bd07d28c	[codex] prefer inherited spawn agent model (#18701 ) This updates the spawn-agent tool contract so subagents are presented as inheriting the parent model by default. The visible model list is now framed as optional overrides, the model parameter tells callers to leave it unset and the delegation guidance no longer nudges models toward picking a smaller/mini override. Fixes reports that 5.4 would occasionally pick 5.2 or lower as sub-agents.	2026-04-20 22:34:08 +00:00
guinness-oai	126bd6e7a8	Update realtime handoff transcript handling (#18597 ) ## Summary This PR aims to improve integration between the realtime model and the codex agent by sharing more context with each other. In particular, we now share full realtime conversation transcript deltas in addition to the delegation message. realtime_conversation.rs now turns a handoff into: ``` <realtime_delegation> <input>...</input> <transcript_delta>...</transcript_delta> </realtime_delegation> ``` ## Implementation notes The transcript is accumulated in the realtime websocket layer as parsed realtime events arrive. When a background-agent handoff is requested, the current transcript snapshot is copied onto the handoff event and then serialized by `realtime_conversation.rs` into the hidden realtime delegation envelope that Codex receives as user-turn context. For Realtime V2, the session now explicitly enables input audio transcription, and the parser handles the relevant input/output transcript completion events so the snapshot includes both user speech and realtime model responses. The delegation `<input>` remains the actual handoff request, while `<transcript_delta>` carries the surrounding conversation history for context. Reviewers should note that the transcript payload is intended for Codex context sharing, not UI rendering. The realtime delegation envelope should stay hidden from the user-facing transcript surface, while still being included in the background-agent turn so Codex can answer with the same conversational context the realtime model had.	2026-04-20 14:04:09 -07:00
Ahmed Ibrahim	316cf0e90b	Update models.json (#18586 ) - Replace the active models-manager catalog with the deleted core catalog contents. - Replace stale hardcoded test model slugs with current bundled model slugs. - Keep this as a stacked change on top of the cleanup PR.	2026-04-20 10:27:01 -07:00

1 2 3 4 5 ...

986 Commits