codex

mirror of https://github.com/openai/codex.git synced 2026-05-02 12:21:26 +03:00

Author	SHA1	Message	Date
pakrym-oai	3d41ff0b77	Add model-controlled truncation for code mode results (#14258 ) Summary - document that `@openai/code_mode` exposes `set_max_output_tokens_per_exec_call` and that `code_mode` truncates the final Rust-side output when the budget is exceeded - enforce the configured budget in the Rust tool runner, reusing truncation helpers so text-only outputs follow the unified-exec wrapper and mixed outputs still fit within the limit - ensure the new behavior is covered by a code-mode integration test and string spec update Testing - Not run (not requested)	2026-03-11 12:33:08 -07:00
pakrym-oai	ee8f84153e	Add output schema to MCP tools and expose MCP tool results in code mode (#14236 ) Summary - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers to keep MCP tooling focused on the final result shape - wire the new schema definitions through code mode, context, handlers, and spec modules so MCP tools serialize the exact output shape expected by the model - extend code mode tests to cover multiple MCP call scenarios and ensure the serialized data matches the new schema - refresh JS runner helpers and protocol models alongside the schema changes Testing - Not run (not requested)	2026-03-11 12:33:08 -07:00
Won Park	722e8f08e1	unifying all image saves to /tmp to bug-proof (#14149 ) image-gen feature will have the model saving to /tmp by default + at all times	2026-03-11 12:33:08 -07:00
Ahmed Ibrahim	91ca20c7c3	Add spawn_agent model overrides (#14160 ) - add `model` and `reasoning_effort` to the `spawn_agent` schema so the values pass through - validate requested models against `model.model` and only check that the selected model supports the requested reasoning effort --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-11 12:33:08 -07:00
pakrym-oai	00ea8aa7ee	Expose strongly-typed result for exec_command (#14183 ) Summary - document output types for the various tool handlers and registry so the API exposes richer descriptions - update unified execution helpers and client tests to align with the new output metadata - clean up unused helpers across tool dispatch paths Testing - Not run (not requested)	2026-03-11 12:33:07 -07:00
Eric Traut	7144f84c69	Fix release-mode integration test compiler failure (#13603 ) Addresses #13586 This doesn't affect our CI scripts. It was user-reported. Summary - add `wiremock::ResponseTemplate` and `body_string_contains` imports behind `#[cfg(not(debug_assertions))]` in `codex-rs/core/tests/suite/view_image.rs` so release builds only pull the helpers they actually use	2026-03-11 12:33:07 -07:00
Ahmed Ibrahim	aa6a57dfa2	Stabilize incomplete SSE retry test (#13879 ) ## What changed - The retry test now uses the same streaming SSE test server used by production-style tests instead of a wiremock sequence. - The fixture is resolved via `find_resource!`, and the test asserts that exactly two outbound requests were sent. ## Why this fixes the flake - The old wiremock sequence approximated early-close behavior, but it did not reproduce the same streaming semantics the real client sees. - That meant the retry path depended on mock implementation details instead of on the actual transport behavior we care about. - Switching to the streaming SSE helper makes the test exercise the real early-close/retry contract, and counting requests directly verifies that we retried exactly once rather than merely hoping the sequence aligned. ## Scope - Test-only change.	2026-03-09 22:34:44 -07:00
Ahmed Ibrahim	2e24be2134	Use realtime transcript for handoff context (#14132 ) - collect input/output transcript deltas into active handoff transcript state - attach and clear that transcript on each handoff, and regenerate schema/tests	2026-03-09 22:30:03 -07:00
Matthew Zeng	566e4cee4b	[apps] Fix apps enablement condition. (#14011 ) - [x] Fix apps enablement condition to check both the feature flag and that the user is not an API key user.	2026-03-09 22:25:43 -07:00
xl-openai	0c33af7746	feat: support disabling bundled system skills (#13792 ) Support disable bundled system skills with a config: [skills.bundled] enabled = false	2026-03-09 22:02:53 -07:00
pakrym-oai	710682598d	Export tools module into code mode runner (#14167 ) Summary - allow `code_mode` to pass enabled tools metadata to the runner and expose them via `tools.js` - import tools inside JavaScript rather than relying only on globals or proxies for nested tool calls - update specs, docs, and tests to exercise the new bridge and explain the tooling changes Testing - Not run (not requested)	2026-03-09 21:59:09 -07:00
pakrym-oai	d71e042694	Enforce single tool output type in codex handlers (#14157 ) We'll need to associate output schema with each tool. Each tool can only have on output type.	2026-03-09 21:49:44 -07:00
pakrym-oai	da616136cc	Add code_mode experimental feature (#13418 ) A much narrower and more isolated (no node features) version of js_repl	2026-03-09 20:56:27 -07:00
Dylan Hurd	6da84efed8	feat(approvals) RejectConfig for request_permissions (#14118 ) ## Summary We need to support allowing request_permissions calls when using `Reject` policy <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06 40 PM" src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5" /> Note that this is a backwards-incompatible change for Reject policy. I'm not sure if we need to add a default based on our current use/setup ## Testing - [x] Added tests - [x] Tested locally	2026-03-09 18:16:54 -07:00
Dylan Hurd	c1defcc98c	fix(core) RequestPermissions + ApplyPatch (#14055 ) ## Summary The apply_patch tool should also respect AdditionalPermissions ## Testing - [x] Added unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 16:11:19 -07:00
Dylan Hurd	d241dc598c	feat(core) Persist request_permission data across turns (#14009 ) ## Summary request_permissions flows should support persisting results for the session. Open Question: Still deciding if we need within-turn approvals - this adds complexity but I could see it being useful ## Testing - [x] Updated unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 14:36:38 -07:00
Ahmed Ibrahim	44ecc527cb	Stabilize RMCP streamable HTTP readiness tests (#13880 ) ## What changed - The RMCP streamable HTTP tests now wait for metadata and tool readiness before issuing tool calls. - OAuth state is isolated per test home. - The helper server startup path now uses bounded bind retries so transient `AddrInUse` collisions do not fail the test immediately. ## Why this fixes the flake - The old tests could begin issuing tool requests before the helper server had finished advertising its metadata and tools, so the first request sometimes raced the server startup sequence. - On top of that, shared OAuth state and occasional bind collisions on CI runners introduced cross-test environmental noise unrelated to the functionality under test. - Readiness polling makes the client wait for an observable “server is ready” signal, while isolated state and bounded bind retries remove external contention that was causing intermittent failures. ## Scope - Test-only change.	2026-03-09 19:52:55 +00:00
Dylan Hurd	0334ddeccb	fix(ci) Faster shell_command::unicode_output test (#14114 ) ## Summary Alternative to #14061 - we need to use a child process on windows to correctly validate Powershell behavior. ## Testing - [x] These are tests	2026-03-09 19:09:56 +00:00
Ahmed Ibrahim	fefd01b9e0	Stabilize resumed rollout messages (#14060 ) ## What changed - add a bounded `resume_until_initial_messages` helper in `core/tests/suite/resume.rs` - retry the resume call until `initial_messages` contains the fully persisted final turn shape before asserting ## Why this fixes flakiness The old test resumed once immediately after `TurnComplete` and sometimes read rollout state before the final turn had been persisted. That made the assertion race persistence timing instead of checking the resumed message shape. The new helper polls for up to two seconds in 10ms steps and only asserts once the expected message sequence is actually present, so the test waits for the real readiness condition instead of depending on a lucky timing window. ## Scope - test-only - no production logic change	2026-03-09 11:48:13 -07:00
Ahmed Ibrahim	75e608343c	Stabilize realtime startup context tests (#13876 ) ## What changed - The realtime startup-context tests no longer assume the interesting websocket payload is always `connection 1 / request 0`. - Instead, they now wait for the first outbound websocket request that actually carries `session.instructions`, regardless of which websocket connection won the accept-order race on the runner. - The env-key fallback test stays serialized because it mutates process environment. ## Why this fixes the flake - The old test synchronized on the mirrored `session.updated` client event and then inspected a fixed websocket slot. - On CI, the response websocket and the realtime websocket can race each other during startup. When the response websocket wins that race, the fixed slot can contain `response.create` instead of the startup-context-bearing `session.update` request the test actually cares about. - That made the test fail nondeterministically by inspecting the wrong request, or by timing out waiting on a secondary event even though the real outbound request path was correct. - Waiting directly on the first request whose payload includes `session.instructions` removes both ordering assumptions and makes the assertion line up with the actual contract under test. - Separately, serializing the environment-mutating fallback case prevents unrelated tests from seeing partially updated auth state. ## Scope - Test-only change.	2026-03-09 10:57:43 -07:00
Charley Cunningham	f23fcd6ced	guardian initial feedback / tweaks (#13897 ) ## Summary - remove the remaining model-visible guardian-specific `on-request` prompt additions so enabling the feature does not change the main approval-policy instructions - neutralize user-facing guardian wording to talk about automatic approval review / approval requests rather than a second reviewer or only sandbox escalations - tighten guardian retry-context handling so agent-authored `justification` stays in the structured action JSON and is not also injected as raw retry context - simplify guardian review plumbing in core by deleting dead prompt-append paths and trimming some request/transcript setup code ## Notable Changes - delete the dead `permissions/approval_policy/guardian.md` append path and stop threading `guardian_approval_enabled` through model-facing developer-instruction builders - rename the experimental feature copy to `Automatic approval review` and update the `/experimental` snapshot text accordingly - make approval-review status strings generic across shell, patch, network, and MCP review types - forward real sandbox/network retry reasons for shell and unified-exec guardian review, but do not pass agent-authored justification as raw retry context - simplify `guardian.rs` by removing the one-field request wrapper, deduping reasoning-effort selection, and cleaning up transcript entry collection ## Testing - `just fmt` - full validation left to CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 09:25:24 -07:00
Jack Mousseau	e6b93841c5	Add request permissions tool (#13092 ) Adds a built-in `request_permissions` tool and wires it through the Codex core, protocol, and app-server layers so a running turn can ask the client for additional permissions instead of relying on a static session policy. The new flow emits a `RequestPermissions` event from core, tracks the pending request by call ID, forwards it through app-server v2 as an `item/permissions/requestApproval` request, and resumes the tool call once the client returns an approved subset of the requested permission profile.	2026-03-08 20:23:06 -07:00
Dylan Hurd	f41b1638c9	fix(core) patch otel test (#14014 ) ## Summary This test was missing the turn completion event in the responses stream, so it was hanging. This PR fixes the issue ## Testing - [x] This does update the test	2026-03-08 19:06:30 -07:00
Celia Chen	340f9c9ecb	app-server: include experimental skill metadata in exec approval requests (#13929 ) ## Summary This change surfaces skill metadata on command approval requests so app-server clients can tell when an approval came from a skill script and identify the originating `SKILL.md`. - add `skill_metadata` to exec approval events in the shared protocol - thread skill metadata through core shell escalation and delegated approval handling for skill-triggered approvals - expose the field in app-server v2 as experimental `skillMetadata` - regenerate the JSON/TypeScript schemas and cover the new field in protocol, transport, core, and TUI tests ## Why Skill-triggered approvals already carry skill context inside core, but app-server clients could not see which skill caused the prompt. Sending the skill metadata with the approval request makes it possible for clients to present better approval UX and connect the prompt back to the relevant skill definition. ## example event in app-server-v2 verified that we see this event when experimental api is on: ``` < { < "id": 11, < "method": "item/commandExecution/requestApproval", < "params": { < "additionalPermissions": { < "fileSystem": null, < "macos": { < "accessibility": false, < "automations": { < "bundle_ids": [ < "com.apple.Notes" < ] < }, < "calendar": false, < "preferences": "read_only" < }, < "network": null < }, < "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes", < "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy", < "skillMetadata": { < "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md" < }, < "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69", < "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f" < } < }` ``` & verified that this is the event when experimental api is off: ``` < { < "id": 13, < "method": "item/commandExecution/requestApproval", < "params": { < "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt", < "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4", < "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a" < } < } ```	2026-03-08 18:07:46 -07:00
Ahmed Ibrahim	1f150eda8b	Stabilize shell serialization tests (#13877 ) ## What changed - The duration-recording fixture sleep was reduced from a large artificial delay to `0.2s`, and the assertion floor was lowered to `0.1s`. - The shell tool fixtures now force `login = false` so they do not invoke login-shell startup paths. ## Why this fixes the flake - The old tests were paying for two kinds of noise that had nothing to do with the feature being validated: oversized sleep time and variable shell initialization cost. - Login shells can pick up runner-specific startup files and incur inconsistent startup latency. - The test only needs to prove that we record a nontrivial duration and preserve shell output. A shorter fixture delay plus a non-login shell keeps that coverage while removing runner-dependent wall-clock variance. ## Scope - Test-only change.	2026-03-08 13:37:41 -07:00
Michael Bolin	bf5c2f48a5	seatbelt: honor split filesystem sandbox policies (#13448 ) ## Why After `#13440` and `#13445`, macOS Seatbelt policy generation was still deriving filesystem and network behavior from the legacy `SandboxPolicy` projection. That projection loses explicit unreadable carveouts and conflates split network decisions, so the generated Seatbelt policy could still be wider than the split policy that Codex had already computed. ## What changed - added Seatbelt entrypoints that accept `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` directly - built read and write policy stanzas from access roots plus excluded subpaths so explicit unreadable carveouts survive into the generated Seatbelt policy - switched network policy generation to consult `NetworkSandboxPolicy` directly - failed closed when managed-network or proxy-constrained sessions do not yield usable loopback proxy endpoints - updated the macOS callers and test helpers that now need to carry the split policies explicitly ## Verification - added regression coverage in `core/src/seatbelt.rs` for unreadable carveouts under both full-disk and scoped-readable policies - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13448). * #13453 * #13452 * #13451 * #13449 * __->__ #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-08 00:35:19 +00:00
Dylan Hurd	92f7541624	fix(ci) fix guardian ci (#13911 ) ## Summary #13910 was merged with some unused imports, let's fix this ## Testing - [x] Let's make sure CI is green --------- Co-authored-by: Charles Cunningham <ccunningham@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-07 23:34:56 +00:00
Charley Cunningham	e84ee33cc0	Add guardian approval MVP (#13692 ) ## Summary - add the guardian reviewer flow for `on-request` approvals in command, patch, sandbox-retry, and managed-network approval paths - keep guardian behind `features.guardian_approval` instead of exposing a public `approval_policy = guardian` mode - route ordinary `OnRequest` approvals to the guardian subagent when the feature is enabled, without changing the public approval-mode surface ## Public model - public approval modes stay unchanged - guardian is enabled via `features.guardian_approval` - when that feature is on, `approval_policy = on-request` keeps the same approval boundaries but sends those approval requests to the guardian reviewer instead of the user - `/experimental` only persists the feature flag; it does not rewrite `approval_policy` - CLI and app-server no longer expose a separate `guardian` approval mode in this PR ## Guardian reviewer - the reviewer runs as a normal subagent and reuses the existing subagent/thread machinery - it is locked to a read-only sandbox and `approval_policy = never` - it does not inherit user/project exec-policy rules - it prefers `gpt-5.4` when the current provider exposes it, otherwise falls back to the parent turn's active model - it fail-closes on timeout, startup failure, malformed output, or any other review error - it currently auto-approves only when `risk_score < 80` ## Review context and policy - guardian mirrors `OnRequest` approval semantics rather than introducing a separate approval policy - explicit `require_escalated` requests follow the same approval surface as `OnRequest`; the difference is only who reviews them - managed-network allowlist misses that enter the approval flow are also reviewed by guardian - the review prompt includes bounded recent transcript history plus recent tool call/result evidence - transcript entries and planned-action strings are truncated with explicit `<guardian_truncated ... />` markers so large payloads stay bounded - apply-patch reviews include the full patch content (without duplicating the structured `changes` payload) - the guardian request layout is snapshot-tested using the same model-visible Responses request formatter used elsewhere in core ## Guardian network behavior - the guardian subagent inherits the parent session's managed-network allowlist when one exists, so it can use the same approved network surface while reviewing - exact session-scoped network approvals are copied into the guardian session with protocol/port scope preserved - those copied approvals are now seeded before the guardian's first turn is submitted, so inherited approvals are available during any immediate review-time checks ## Out of scope / follow-ups - the sandbox-permission validation split was pulled into a separate PR and is not part of this diff - a future follow-up can enable `serde_json` preserve-order in `codex-core` and then simplify the guardian action rendering further --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-07 05:40:10 -08:00
jif-oai	cf143bf71e	feat: simplify DB further (#13771 )	2026-03-07 03:48:36 -08:00
Celia Chen	b0ce16c47a	fix(core): respect reject policy by approval source for skill scripts (#13816 ) ## Summary - distinguish reject-policy handling for prefix-rule approvals versus sandbox approvals in Unix shell escalation - keep prompting for skill-script execution when `rules=true` but `sandbox_approval=false`, instead of denying the command up front - add regression coverage for both skill-script reject-policy paths in `codex-rs/core/tests/suite/skill_approval.rs`	2026-03-06 21:43:14 -08:00
Michael Bolin	22ac6b9aaa	sandboxing: plumb split sandbox policies through runtime (#13439 ) ## Why `#13434` introduces split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy`, but the runtime still made most execution-time sandbox decisions from the legacy `SandboxPolicy` projection. That projection loses information about combinations like unrestricted filesystem access with restricted network access. In practice, that means the runtime can choose the wrong platform sandbox behavior or set the wrong network-restriction environment for a command even when config has already separated those concerns. This PR carries the split policies through the runtime so sandbox selection, process spawning, and exec handling can consult the policy that actually matters. ## What changed - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state, unified exec, and app-server exec overrides - updated sandbox selection in `core/src/sandboxing/mod.rs` and `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus `NetworkSandboxPolicy`, rather than inferring behavior only from the legacy `SandboxPolicy` - updated process spawning in `core/src/spawn.rs` and the platform wrappers to use `NetworkSandboxPolicy` when deciding whether to set `CODEX_SANDBOX_NETWORK_DISABLED` - kept additional-permissions handling and legacy `ExternalSandbox` compatibility projections aligned with the split policies, including explicit user-shell execution and Windows restricted-token routing - updated callers across `core`, `app-server`, and `linux-sandbox` to pass the split policies explicitly ## Verification - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to verify `RunUserShellCommand` does not inherit `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn - added coverage in `core/src/exec.rs` for Windows restricted-token sandbox selection when the legacy projection is `ExternalSandbox` - updated Linux sandbox coverage in `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy exec path - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439). * #13453 * #13452 * #13451 * #13449 * #13448 * #13445 * #13440 * __->__ #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 02:30:21 +00:00
Rohan Mehta	61098c7f51	Allow full web search tool config (#13675 ) Previously, we could only configure whether web search was on/off. This PR enables sending along a web search config, which includes all the stuff responsesapi supports: filters, location, etc.	2026-03-07 00:50:50 +00:00
Celia Chen	8b81284975	fix(core): skip exec approval for permissionless skill scripts (#13791 ) ## Summary - Treat skill scripts with no permission profile, or an explicitly empty one, as permissionless and run them with the turn's existing sandbox instead of forcing an exec approval prompt. - Keep the approval flow unchanged for skills that do declare additional permissions. - Update the skill approval tests to assert that permissionless skill scripts do not prompt on either the initial run or a rerun. ## Why Permissionless skills should inherit the current turn sandbox directly. Prompting for exec approval in that case adds friction without granting any additional capability.	2026-03-06 16:40:41 -08:00
Owen Lin	289ed549cf	chore(otel): rename OtelManager to SessionTelemetry (#13808 ) ## Summary This is a purely mechanical refactor of `OtelManager` -> `SessionTelemetry` to better convey what the struct is doing. No behavior change. ## Why `OtelManager` ended up sounding much broader than what this type actually does. It doesn't manage OTEL globally; it's the session-scoped telemetry surface for emitting log/trace events and recording metrics with consistent session metadata (`app_version`, `model`, `slug`, `originator`, etc.). `SessionTelemetry` is a more accurate name, and updating the call sites makes that boundary a lot easier to follow. ## Validation - `just fmt` - `cargo test -p codex-otel` - `cargo test -p codex-core`	2026-03-06 16:23:30 -08:00
Ahmed Ibrahim	a11c59f634	Add realtime startup context override (#13796 ) - add experimental_realtime_ws_startup_context to override or disable realtime websocket startup context - preserve generated startup context when unset and cover the new override paths in tests	2026-03-06 16:00:30 -08:00
Michael Bolin	f82678b2a4	config: add initial support for the new permission profile config language in config.toml (#13434 ) ## Why `SandboxPolicy` currently mixes together three separate concerns: - parsing layered config from `config.toml` - representing filesystem sandbox state - carrying basic network policy alongside filesystem choices That makes the existing config awkward to extend and blocks the new TOML proposal where `[permissions]` becomes a table of named permission profiles selected by `default_permissions`. (The idea is that if `default_permissions` is not specified, we assume the user is opting into the "traditional" way to configure the sandbox.) This PR adds the config-side plumbing for those profiles while still projecting back to the legacy `SandboxPolicy` shape that the current macOS and Linux sandbox backends consume. It also tightens the filesystem profile model so scoped entries only exist for `:project_roots`, and so nested keys must stay within a project root instead of using `.` or `..` traversal. This drops support for the short-lived `[permissions.network]` in `config.toml` because now that would be interpreted as a profile named `network` within `[permissions]`. ## What Changed - added `PermissionsToml`, `PermissionProfileToml`, `FilesystemPermissionsToml`, and `FilesystemPermissionToml` so config can parse named profiles under `[permissions.<profile>.filesystem]` - added top-level `default_permissions` selection, validation for missing or unknown profiles, and compilation from a named profile into split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` values - taught config loading to choose between the legacy `sandbox_mode` path and the profile-based path without breaking legacy users - introduced `codex-protocol::permissions` for the split filesystem and network sandbox types, and stored those alongside the legacy projected `sandbox_policy` in runtime `Permissions` - modeled `FileSystemSpecialPath` so only `ProjectRoots` can carry a nested `subpath`, matching the intended config syntax instead of allowing invalid states for other special paths - restricted scoped filesystem maps to `:project_roots`, with validation that nested entries are non-empty descendant paths and cannot use `.` or `..` to escape the project root - kept existing runtime consumers working by projecting `FileSystemSandboxPolicy` back into `SandboxPolicy`, with an explicit error for profiles that request writes outside the workspace root - loaded proxy settings from top-level `[network]` - regenerated `core/config.schema.json` ## Verification - added config coverage for profile deserialization, `default_permissions` selection, top-level `[network]` loading, network enablement, rejection of writes outside the workspace root, rejection of nested entries for non-`:project_roots` special paths, and rejection of parent-directory traversal in `:project_roots` maps - added protocol coverage for the legacy bridge rejecting non-workspace writes ## Docs - update the Codex config docs on developers.openai.com/codex to document named `[permissions.<profile>]` entries, `default_permissions`, scoped `:project_roots` syntax, the descendant-path restriction for nested `:project_roots` entries, and top-level `[network]` proxy configuration --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13434). * #13453 * #13452 * #13451 * #13449 * #13448 * #13445 * #13440 * #13439 * __->__ #13434	2026-03-06 15:39:13 -08:00
sayan-oai	8a54d3caaa	feat: structured plugin parsing (#13711 ) #### What Add structured `@plugin` parsing and TUI support for plugin mentions. - Core: switch from plain-text `@display_name` parsing to structured `plugin://...` mentions via `UserInput::Mention` and `[$...](plugin://...)` links in text, same pattern as apps/skills. - TUI: add plugin mention popup, autocomplete, and chips when typing `$`. Load plugin capability summaries and feed them into the composer; plugin mentions appear alongside skills and apps. - Generalize mention parsing to a sigil parameter, still defaults to `$` <img width="797" height="119" alt="image" src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb" /> Builds on #13510. Currently clients have to build their own `id` via `plugin@marketplace` and filter plugins to show by `enabled`, but we will add `id` and `available` as fields returned from `plugin/list` soon. ####Tests Added tests, verified locally.	2026-03-06 11:08:36 -08:00
Charley Cunningham	cb1a182bbe	Clarify sandbox permission override helper semantics (#13703 ) ## Summary Today `SandboxPermissions::requires_additional_permissions()` does not actually mean "is `WithAdditionalPermissions`". It returns `true` for any non-default sandbox override, including `RequireEscalated`. That broad behavior is relied on in multiple `main` callsites. The naming is security-sensitive because `SandboxPermissions` is used on shell-like tool calls to tell the executor how a single command should relate to the turn sandbox: - `UseDefault`: run with the turn sandbox unchanged - `RequireEscalated`: request execution outside the sandbox - `WithAdditionalPermissions`: stay sandboxed but widen permissions for that command only ## Problem The old helper name reads as if it only applies to the `WithAdditionalPermissions` variant. In practice it means "this command requested any explicit sandbox override." That ambiguity made it easy to read production checks incorrectly and made the guardian change look like a standalone `main` fix when it is not. On `main` today: - `shell` and `unified_exec` intentionally reject any explicit `sandbox_permissions` request unless approval policy is `OnRequest` - `exec_policy` intentionally treats any explicit sandbox override as prompt-worthy in restricted sandboxes - tests intentionally serialize both `RequireEscalated` and `WithAdditionalPermissions` as explicit sandbox override requests So changing those callsites from the broad helper to a narrow `WithAdditionalPermissions` check would be a behavior change, not a pure cleanup. ## What This PR Does - documents `SandboxPermissions` as a per-command sandbox override, not a generic permissions bag - adds `requests_sandbox_override()` for the broad meaning: anything except `UseDefault` - adds `uses_additional_permissions()` for the narrow meaning: only `WithAdditionalPermissions` - keeps `requires_additional_permissions()` as a compatibility alias to the broad meaning for now - updates the current broad callsites to use the accurately named broad helper - adds unit coverage that locks in the semantics of all three helpers ## What This PR Does Not Do This PR does not change runtime behavior. That is intentional. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-06 09:57:48 -08:00
Won Park	ee1a20258a	Enabling CWD Saving for Image-Gen (#13607 ) Codex now saves the generated image on to your current working directory.	2026-03-06 00:47:21 -08:00
Ahmed Ibrahim	629cb15bc6	Replay thread rollback from rollout history (#13615 ) - Replay thread rollback from the persisted rollout history instead of truncating in-memory state.\n- Add rollback coverage, including rollback-behind-compaction snapshot coverage.	2026-03-05 16:40:09 -08:00
Ahmed Ibrahim	6cf0ed4e79	Refine realtime startup context formatting (#13560 ) ## Summary - group recent work by git repo when available, otherwise by directory - render recent work as bounded user asks with per-thread cwd context - exclude hidden files and directories from workspace trees	2026-03-05 16:31:20 -08:00
sayan-oai	4e77ea0ec7	add @plugin mentions (#13510 ) ## Note-- added plugin mentions via @, but that conflicts with file mentions depends and builds upon #13433. - introduces explicit `@plugin` mentions. this injects the plugin's mcp servers, app names, and skill name format into turn context as a dev message. - we do not yet have UI for these mentions, so we currently parse raw text (as opposed to skills and apps which have UI chips, autocomplete, etc.) this depends on a `plugins/list` app-server endpoint we can feed the UI with, which is upcoming - also annotate mcp and app tool descriptions with the plugin(s) they come from. this gives the model a first class way of understanding what tools come from which plugins, which will help implicit invocation. ### Tests Added and updated tests, unit and integration. Also confirmed locally a raw `@plugin` injects the dev message, and the model knows about its apps, mcps, and skills.	2026-03-06 00:03:39 +00:00
Owen Lin	aa3fe8abf8	feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602 ) This PR adds a durable trace linkage for each turn by storing the active trace ID on the rollout TurnContext record stored in session rollout files. Before this change, we propagated trace context at runtime but didn’t persist a stable per-turn trace key in rollout history. That made after-the-fact debugging harder (for example, mapping a historical turn to the corresponding trace in datadog). This sets us up for much easier debugging in the future. ### What changed - Added an optional `trace_id` to TurnContextItem (rollout schema). - Added a small OTEL helper to read the current span trace ID. - Captured `trace_id` when creating `TurnContext` and included it in `to_turn_context_item()`. - Updated tests and fixtures that construct TurnContextItem so older/no-trace cases still work. ### Why this approach TurnContext is already the canonical durable per-turn metadata in rollout. This keeps ownership clean: trace linkage lives with other persisted turn metadata.	2026-03-05 13:26:48 -08:00
Curtis 'Fjord' Hawthorne	657841e7f5	Persist initialized js_repl bindings after failed cells (#13482 ) ## Summary - Change `js_repl` failed-cell persistence so later cells keep prior bindings plus only the current-cell bindings whose initialization definitely completed before the throw. - Preserve initialized lexical bindings across failed cells via module-namespace readability, including top-level destructuring that partially succeeds before a later throw. - Preserve hoisted `var` and `function` bindings only when execution clearly reached their declaration site, and preserve direct top-level pre-declaration `var` writes and updates through explicit write-site markers. - Preserve top-level `for...in` / `for...of` `var` bindings when the loop body executes at least once, using a first-iteration guard to avoid per-iteration bookkeeping overhead. - Keep prior module state intact across link-time failures and evaluation failures before the prelude runs, while still allowing failed cells that already recreated prior bindings to persist updates to those existing bindings. - Hide internal commit hooks from user `js_repl` code after the prelude aliases them, so snippets cannot spoof committed bindings by calling the raw `import.meta` hooks directly. - Add focused regression coverage for the supported failed-cell behaviors and the intentionally unsupported boundaries. - Update `js_repl` docs and generated instructions to describe the new, narrower failed-cell persistence model. ## Motivation We saw `js_repl` drop bindings that had already been initialized successfully when a later statement in the same cell threw, for example: const { context: liveContext, session } = await initializeGoogleSheetsLiveForTab(tab); // later statement throws That was surprising in practice because successful earlier work disappeared from the next cell. This change makes failed-cell persistence more useful without trying to model every possible partially executed JavaScript edge case. The resulting behavior is narrower and easier to reason about: - prior bindings are always preserved - lexical bindings persist when their initialization completed before the throw - hoisted `var` / `function` bindings persist only when execution clearly reached their declaration or a supported top-level `var` write site - failed cells that already recreated prior bindings can persist writes to those existing bindings even if they introduce no new bindings The detailed edge-case matrix stays in `docs/js_repl.md`. The model-facing `project_doc` guidance is intentionally shorter and focused on generation-relevant behavior. ## Supported Failed-Cell Behavior - Prior bindings remain available after a failed cell. - Initialized lexical bindings remain available after a failed cell. - Top-level destructuring like `const { a, b } = ...` preserves names whose initialization completed before a later throw. - Hoisted `function` bindings persist when execution reached the declaration statement before the throw. - Direct top-level pre-declaration `var` writes and updates persist, for example: - `x = 1` - `x += 1` - `x++` - short-circuiting logical assignments only persist when the write branch actually runs - Non-empty top-level `for...in` / `for...of` `var` loops persist their loop bindings. - Failed cells can persist updates to existing carried bindings after the prelude has run, even when the cell commits no new bindings. - Link failures and eval failures before the prelude do not poison `@prev`. ## Intentionally Unsupported Failed-Cell Cases - Hoisted function reads before the declaration, such as `foo(); ...; function foo() {}` - Aliasing or inference-based recovery from reads before declaration - Nested writes inside already-instrumented assignment RHS expressions - Destructuring-assignment recovery for hoisted `var` - Partial `var` destructuring recovery - Pre-declaration `undefined` reads for hoisted `var` - Empty top-level `for...in` / `for...of` loop vars - Nested or scope-sensitive pre-declaration `var` writes outside direct top-level expression statements	2026-03-05 11:01:46 -08:00
sayan-oai	03d55f0e6f	chore: add web_search_tool_type for image support (#13538 ) add `web_search_tool_type` on model_info that can be populated from backend. will be used to filter which models can use `web_search` with images and which cant. added small unit test.	2026-03-05 07:02:27 +00:00
sayan-oai	d44398905b	feat: track plugins mcps/apps and add plugin info to user_instructions (#13433 ) ### first half of changes, followed by #13510 Track plugin capabilities as derived summaries on `PluginLoadOutcome` for enabled plugins with at least one skill/app/mcp. Also add `Plugins` section to `user_instructions` injected on session start. These introduce the plugins concept and list enabled plugins, but do NOT currently include paths to enabled plugins or details on what apps/mcps the plugins contain (current plan is to inject this on @-mention). that can be adjusted in a follow up and based on evals. ### tests Added/updated tests, confirmed locally that new `Plugins` section + currently enabled plugins show up in `user_instructions`.	2026-03-04 19:46:13 -08:00
Won Park	229e6d0347	image-gen-event/client_processing (#13512 ) enabling client-side to process with image-generation capabilities (setting app-server)	2026-03-04 16:54:38 -08:00
Ahmed Ibrahim	294079b0b1	Prefix handoff messages with role (#13505 ) Format handoff context by prefixing each message with its role (for example "user:" and "assistant:") before forwarding to the agent.	2026-03-04 15:37:31 -08:00
Won Park	fa2306b303	image-gen-core (#13290 ) Core tool-calling for image-gen, handles requesting and receiving logic for images using response API	2026-03-03 23:11:28 -08:00
Val Kharitonov	4f6c4bb143	support 'flex' tier in app-server in addition to 'fast' (#13391 )	2026-03-03 22:46:05 -08:00

... 3 4 5 6 7 ...

1014 Commits