codex

mirror of https://github.com/openai/codex.git synced 2026-05-05 22:01:37 +03:00

Author	SHA1	Message	Date
pakrym-oai	da616136cc	Add code_mode experimental feature (#13418 ) A much narrower and more isolated (no node features) version of js_repl	2026-03-09 20:56:27 -07:00
pakrym-oai	aa04ea6bd7	Refactor tool output into trait implementations (#14152 ) First state to making tool outputs strongly typed (and `renderable`).	2026-03-09 19:38:32 -07:00
viyatb-oai	1165a16e6f	fix: keep permissions profiles forward compatible (#14107 ) ## Summary - preserve unknown `:special_path` tokens, including nested entries, so older Codex builds warn and ignore instead of failing config load - fail closed with a startup warning when a permissions profile has missing or empty filesystem entries instead of aborting profile compilation - normalize Windows verbatim paths like `\?\C:\...` before absolute-path validation while keeping explicit errors for truly invalid paths ## Testing - just fmt - cargo test -p codex-core permissions_profiles_allow - cargo test -p codex-core normalize_absolute_path_for_platform_simplifies_windows_verbatim_paths - cargo test -p codex-protocol unknown_special_paths_are_ignored_by_legacy_bridge - cargo clippy -p codex-core -p codex-protocol --all-targets -- -D warnings - cargo clean	2026-03-09 18:43:38 -07:00
viyatb-oai	b0cbc25a48	fix(protocol): preserve legacy workspace-write semantics (#13957 ) ## Summary This is a fast follow to the initial `[permissions]` structure. - keep the new split-policy carveout behavior for narrower non-write entries under broader writable roots - preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge that drops only redundant nested readable roots when projecting from `SandboxPolicy` - route the legacy macOS seatbelt adapter through that same legacy bridge so redundant nested readable roots do not become read-only carveouts on macOS - derive the legacy bridge for `command_exec` using the sandbox root cwd rather than the request cwd so policy derivation matches later sandbox enforcement - add regression coverage for the legacy macOS nested-readable-root case ## Examples ### Legacy `workspace-write` on macOS A legacy `workspace-write` policy can redundantly list a nested readable root under an already-writable workspace root. For example, legacy config can effectively mean: - workspace root (`.` / `cwd`) is writable - `docs/` is also listed in `readable_roots` The new shared split-policy helper intentionally treats a narrower non-write entry under a broader writable root as a carveout for real `[permissions]` configs. Without this fast follow, the unchanged macOS seatbelt legacy adapter could project that legacy shape into a `FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout under the writable workspace root. In practice, legacy callers on macOS could unexpectedly lose write access inside `docs/`, even though that path was writable before the `[permissions]` migration work. This change fixes that by routing the legacy seatbelt path through the cwd-aware legacy bridge, so: - legacy `workspace-write` keeps `docs/` writable when `docs/` was only a redundant readable root - explicit `[permissions]` entries like `'.' = 'write'` and `'docs' = 'read'` still make `docs/` read-only, which is the new intended split-policy behavior ### Legacy `command_exec` with a subdirectory cwd `command_exec` can run a command from a request cwd that is narrower than the sandbox root cwd. For example: - sandbox root cwd is `/repo` - request cwd is `/repo/subdir` - legacy policy is still `workspace-write` rooted at `/repo` Before this fast follow, `command_exec` derived the legacy bridge using the request cwd, but the sandbox was later built using the sandbox root cwd. That mismatch could miss redundant legacy readable roots during projection and accidentally reintroduce read-only carveouts for paths that should still be writable under the legacy model. This change fixes that by deriving the legacy bridge with the same sandbox root cwd that sandbox enforcement later uses. ## Verification - `just fmt` - `cargo test -p codex-core seatbelt_legacy_workspace_write_nested_readable_root_stays_writable` - `cargo test -p codex-core test_sandbox_config_parsing` - `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D warnings` - `cargo clean`	2026-03-09 18:43:27 -07:00
Dylan Hurd	6da84efed8	feat(approvals) RejectConfig for request_permissions (#14118 ) ## Summary We need to support allowing request_permissions calls when using `Reject` policy <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06 40 PM" src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5" /> Note that this is a backwards-incompatible change for Reject policy. I'm not sure if we need to add a default based on our current use/setup ## Testing - [x] Added tests - [x] Tested locally	2026-03-09 18:16:54 -07:00
Dylan Hurd	c1defcc98c	fix(core) RequestPermissions + ApplyPatch (#14055 ) ## Summary The apply_patch tool should also respect AdditionalPermissions ## Testing - [x] Added unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 16:11:19 -07:00
Owen Lin	d309c102ef	fix(core): use dedicated types for responsesapi web search tool config (#14136 ) This changes the web_search tool spec in codex-core to use dedicated Responses-API payload structs instead of shared config types and custom serializers. Previously, `ToolSpec::WebSearch` stored `WebSearchFilters` and `WebSearchUserLocation` directly and relied on hand-written serializers to shape the outgoing JSON. This worked, but it mixed config/schema types with the OpenAI Responses payload contract and created an easy place for drift if those shared types changed later. ### Why This keeps the boundary clearer: - app-server/config/schema types stay focused on config - Responses tool payload types stay focused on the OpenAI wire format It also makes the serialization behavior obvious from the structs themselves, instead of hiding it in custom serializer functions.	2026-03-09 14:58:33 -07:00
Dylan Hurd	d241dc598c	feat(core) Persist request_permission data across turns (#14009 ) ## Summary request_permissions flows should support persisting results for the session. Open Question: Still deciding if we need within-turn approvals - this adds complexity but I could see it being useful ## Testing - [x] Updated unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 14:36:38 -07:00
Won Park	42f20a6845	pass on save info to model + ui tweaks (#14123 ) Passing on more information to the model for context purposes, to streamline image-identification.	2026-03-09 20:10:15 +00:00
Owen Lin	da991bdf3a	feat(otel): Centralize OTEL metric names and shared tag builders (#14117 ) This cleans up a bunch of metric plumbing that had started to drift. The main change is making `codex-otel` the canonical home for shared metric definitions and metric tag helpers. I moved the `turn/thread` metric names that were still duplicated into the OTEL metric registry, added a shared `metrics::tags` module for common tag keys and session tag construction, and updated `SessionTelemetry` to build its metadata tags through that shared path. On the codex-core side, TTFT/TTFM now use the shared metric-name constants instead of local string definitions. I also switched the obvious remaining turn/thread metric callsites over to the shared constants, and added a small helper so TTFT/TTFM can attach an optional sanitized client.name tag from TurnContext. This should make follow-on telemetry work less ad hoc: - one canonical place for metric names - one canonical place for common metric tag keys/builders - less duplication between `codex-core` and `codex-otel`	2026-03-09 12:46:42 -07:00
sayan-oai	6ad448b658	chore: plugin/uninstall endpoint (#14111 ) add `plugin/uninstall` app-server endpoint to fully rm plugin from plugins cache dir and rm entry from user config file. plugin-enablement is session-scoped, so uninstalls are only picked up in new sessions (like installs). added tests.	2026-03-09 12:40:25 -07:00
Ahmed Ibrahim	e03e9b63ea	Stabilize guardian approval coverage (#14103 ) ## Summary - align the guardian permission test with the actual sandbox policy it widens and use a slightly larger Windows-only timeout budget - expose the additional-permissions normalization helper to the guardian test module - replace the guardian popup snapshot assertion with targeted string assertions ## Why this fixes the flake This group was carrying two separate sources of drift. The guardian core test widened derived sandbox policies without updating the source sandbox policy, and it used a Windows command/timeout combination that was too tight on slower runners. Separately, the TUI test was snapshotting the full popup even though unrelated feature text changes were the only thing moving. The new assertions keep coverage on the guardian entry itself while removing unrelated snapshot churn.	2026-03-09 11:23:20 -07:00
Ahmed Ibrahim	ad57505ef5	Stabilize interrupted task approval cleanup (#14102 ) ## Summary - drain the active turn tasks before clearing pending approvals during interruption - keep the turn in hand long enough for interrupted tasks to observe cancellation first ## Why this fixes the flake Interrupted turns could clear pending approvals too early, which let an in-flight approval wait surface as a model-visible rejection before the turn emitted `TurnAborted`. Reordering the cleanup removes that race without changing the steady-state task model.	2026-03-09 11:22:51 -07:00
Ahmed Ibrahim	4a0e6dc916	Serialize shell snapshot stdin test (#13878 ) ## What changed - `snapshot_shell_does_not_inherit_stdin` now runs under its own serial key. - The change isolates it from other Unix shell-snapshot tests that also interact with stdin. ## Why this fixes the flake - The failure was not a shell-snapshot logic bug. It was shared-stdin interference between concurrently executing tests. - When multiple tests compete for inherited stdin at the same time, one test can observe EOF or consumed input that actually belongs to a different test. - Running this specific test in a dedicated serial bucket guarantees exclusive ownership of stdin, which makes the assertion deterministic without weakening coverage. ## Scope - Test-only change.	2026-03-09 10:44:13 -07:00
Charley Cunningham	f23fcd6ced	guardian initial feedback / tweaks (#13897 ) ## Summary - remove the remaining model-visible guardian-specific `on-request` prompt additions so enabling the feature does not change the main approval-policy instructions - neutralize user-facing guardian wording to talk about automatic approval review / approval requests rather than a second reviewer or only sandbox escalations - tighten guardian retry-context handling so agent-authored `justification` stays in the structured action JSON and is not also injected as raw retry context - simplify guardian review plumbing in core by deleting dead prompt-append paths and trimming some request/transcript setup code ## Notable Changes - delete the dead `permissions/approval_policy/guardian.md` append path and stop threading `guardian_approval_enabled` through model-facing developer-instruction builders - rename the experimental feature copy to `Automatic approval review` and update the `/experimental` snapshot text accordingly - make approval-review status strings generic across shell, patch, network, and MCP review types - forward real sandbox/network retry reasons for shell and unified-exec guardian review, but do not pass agent-authored justification as raw retry context - simplify `guardian.rs` by removing the one-field request wrapper, deduping reasoning-effort selection, and cleaning up transcript entry collection ## Testing - `just fmt` - full validation left to CI --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 09:25:24 -07:00
Jack Mousseau	e6b93841c5	Add request permissions tool (#13092 ) Adds a built-in `request_permissions` tool and wires it through the Codex core, protocol, and app-server layers so a running turn can ask the client for additional permissions instead of relying on a static session policy. The new flow emits a `RequestPermissions` event from core, tracks the pending request by call ID, forwards it through app-server v2 as an `item/permissions/requestApproval` request, and resumes the tool call once the client returns an approved subset of the requested permission profile.	2026-03-08 20:23:06 -07:00
Celia Chen	340f9c9ecb	app-server: include experimental skill metadata in exec approval requests (#13929 ) ## Summary This change surfaces skill metadata on command approval requests so app-server clients can tell when an approval came from a skill script and identify the originating `SKILL.md`. - add `skill_metadata` to exec approval events in the shared protocol - thread skill metadata through core shell escalation and delegated approval handling for skill-triggered approvals - expose the field in app-server v2 as experimental `skillMetadata` - regenerate the JSON/TypeScript schemas and cover the new field in protocol, transport, core, and TUI tests ## Why Skill-triggered approvals already carry skill context inside core, but app-server clients could not see which skill caused the prompt. Sending the skill metadata with the approval request makes it possible for clients to present better approval UX and connect the prompt back to the relevant skill definition. ## example event in app-server-v2 verified that we see this event when experimental api is on: ``` < { < "id": 11, < "method": "item/commandExecution/requestApproval", < "params": { < "additionalPermissions": { < "fileSystem": null, < "macos": { < "accessibility": false, < "automations": { < "bundle_ids": [ < "com.apple.Notes" < ] < }, < "calendar": false, < "preferences": "read_only" < }, < "network": null < }, < "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes", < "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy", < "skillMetadata": { < "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md" < }, < "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69", < "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f" < } < }` ``` & verified that this is the event when experimental api is off: ``` < { < "id": 13, < "method": "item/commandExecution/requestApproval", < "params": { < "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt", < "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4", < "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a" < } < } ```	2026-03-08 18:07:46 -07:00
Charley Cunningham	7ba1fccfc1	fix(ci): restore guardian coverage and bazel unit tests (#13912 ) ## Summary - restore the guardian review request snapshot test and its tracked snapshot after it was dropped from `main` - make Bazel Rust unit-test wrappers resolve runfiles correctly on manifest-only platforms like macOS and point Insta at the real workspace root - harden the shell-escalation socket-closure assertion so the musl Bazel test no longer depends on fd reuse behavior ## Verification - cargo test -p codex-core guardian_review_request_layout_matches_model_visible_request_snapshot - cargo test -p codex-shell-escalation - bazel test //codex-rs/exec:exec-unit-tests //codex-rs/shell-escalation:shell-escalation-unit-tests Supersedes #13894. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com> Co-authored-by: viyatb-oai <viyatb@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-08 12:05:19 -07:00
Ahmed Ibrahim	dc19e78962	Stabilize abort task follow-up handling (#13874 ) - production logic plus tests; cancel running tasks before clearing pending turn state - suppress follow-up model requests after cancellation and assert on stabilized request counts instead of fixed sleeps	2026-03-07 22:56:00 -08:00
Michael Bolin	3b5fe5ca35	protocol: keep root carveouts sandboxed (#13452 ) ## Why A restricted filesystem policy that grants `:root` read or write access but also carries explicit deny entries should still behave like scoped access with carveouts, not like unrestricted disk access. Without that distinction, later platform backends cannot preserve blocked subpaths under root-level permissions because the protocol layer reports the policy as fully unrestricted. ## What changed - taught `FileSystemSandboxPolicy` to treat root access plus explicit deny entries as scoped access rather than full-disk access - derived readable and writable roots from the filesystem root when root access is combined with carveouts, while preserving the denied paths as read-only subpaths - added protocol coverage for root-write policies with carveouts and a core sandboxing regression so those policies still require platform sandboxing ## Verification - added protocol coverage in `protocol/src/permissions.rs` and `protocol/src/protocol.rs` for root access with explicit carveouts - added platform-sandbox regression coverage in `core/src/sandboxing/mod.rs` - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13452). * #13453 * __->__ #13452 * #13451 * #13449 * #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 21:15:47 -08:00
Michael Bolin	46b8d127cf	sandboxing: preserve denied paths when widening permissions (#13451 ) ## Why After the split-policy plumbing landed, additional-permissions widening still rebuilt filesystem access through the legacy projection in a few places. That can erase explicit deny entries and make the runtime treat a policy as fully writable even when it still has blocked subpaths, which in turn can skip the platform sandbox when it is still needed. ## What changed - preserved explicit deny entries when merging additional read and write permissions into `FileSystemSandboxPolicy` - switched platform-sandbox selection to rely on `FileSystemSandboxPolicy::has_full_disk_write_access()` instead of ad hoc root-write checks - kept the widened policy path in `core/src/exec.rs` and `core/src/sandboxing/mod.rs` aligned so denied subpaths survive both policy merging and sandbox selection - added regression coverage for root-write policies that still carry carveouts ## Verification - added regression coverage in `core/src/sandboxing/mod.rs` showing that root write plus carveouts still requires the platform sandbox - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13451). * #13453 * #13452 * __->__ #13451 * #13449 * #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-08 04:29:35 +00:00
Michael Bolin	07a30da3fb	linux-sandbox: plumb split sandbox policies through helper (#13449 ) ## Why The Linux sandbox helper still only accepted the legacy `SandboxPolicy` payload. That meant the runtime could compute split filesystem and network policies, but the helper would immediately collapse them back to the compatibility projection before applying seccomp or staging the bubblewrap inner command. ## What changed - added hidden `--file-system-sandbox-policy` and `--network-sandbox-policy` flags alongside the legacy `--sandbox-policy` flag so the helper can migrate incrementally - updated the core-side Landlock wrapper to pass the split policies explicitly when launching `codex-linux-sandbox` - added helper-side resolution logic that accepts either the legacy policy alone or a complete split-policy pair and normalizes that into one effective configuration - switched Linux helper network decisions to use `NetworkSandboxPolicy` directly - added `FromStr` support for the split policy types so the helper can parse them from CLI JSON ## Verification - added helper coverage in `linux-sandbox/src/linux_run_main_tests.rs` for split-policy flags and policy resolution - added CLI argument coverage in `core/src/landlock.rs` - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13449). * #13453 * #13452 * #13451 * __->__ #13449 * #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 19:40:10 -08:00
Matthew Zeng	a4a9536fd7	[elicitations] Support always allow option for mcp tool calls. (#13807 ) - [x] Support always allow option for mcp tool calls, writes to config.toml. - [x] Fix config hot-reload after starting a new thread for TUI.	2026-03-08 01:46:40 +00:00
sayan-oai	590cfa6176	chore: use @plugin instead of $plugin for plaintext mentions (#13921 ) change plaintext plugin-mentions from `$plugin` to `@plugin`, ensure TUI can correctly decode these from history. tested locally, added/updated tests.	2026-03-08 01:36:39 +00:00
Michael Bolin	bf5c2f48a5	seatbelt: honor split filesystem sandbox policies (#13448 ) ## Why After `#13440` and `#13445`, macOS Seatbelt policy generation was still deriving filesystem and network behavior from the legacy `SandboxPolicy` projection. That projection loses explicit unreadable carveouts and conflates split network decisions, so the generated Seatbelt policy could still be wider than the split policy that Codex had already computed. ## What changed - added Seatbelt entrypoints that accept `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` directly - built read and write policy stanzas from access roots plus excluded subpaths so explicit unreadable carveouts survive into the generated Seatbelt policy - switched network policy generation to consult `NetworkSandboxPolicy` directly - failed closed when managed-network or proxy-constrained sessions do not yield usable loopback proxy endpoints - updated the macOS callers and test helpers that now need to carry the split policies explicitly ## Verification - added regression coverage in `core/src/seatbelt.rs` for unreadable carveouts under both full-disk and scoped-readable policies - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13448). * #13453 * #13452 * #13451 * #13449 * __->__ #13448 * #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-08 00:35:19 +00:00
Dylan Hurd	92f7541624	fix(ci) fix guardian ci (#13911 ) ## Summary #13910 was merged with some unused imports, let's fix this ## Testing - [x] Let's make sure CI is green --------- Co-authored-by: Charles Cunningham <ccunningham@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-07 23:34:56 +00:00
Dylan Hurd	1c888709b5	fix(core) rm guardian snapshot test (#13910 ) ## Summary This test is good, but flakey and we have to figure out some bazel build issues. Let's get CI back go green and then land a stable version! ## Test Summary - [x] CI Passes	2026-03-07 14:28:54 -08:00
Charley Cunningham	e84ee33cc0	Add guardian approval MVP (#13692 ) ## Summary - add the guardian reviewer flow for `on-request` approvals in command, patch, sandbox-retry, and managed-network approval paths - keep guardian behind `features.guardian_approval` instead of exposing a public `approval_policy = guardian` mode - route ordinary `OnRequest` approvals to the guardian subagent when the feature is enabled, without changing the public approval-mode surface ## Public model - public approval modes stay unchanged - guardian is enabled via `features.guardian_approval` - when that feature is on, `approval_policy = on-request` keeps the same approval boundaries but sends those approval requests to the guardian reviewer instead of the user - `/experimental` only persists the feature flag; it does not rewrite `approval_policy` - CLI and app-server no longer expose a separate `guardian` approval mode in this PR ## Guardian reviewer - the reviewer runs as a normal subagent and reuses the existing subagent/thread machinery - it is locked to a read-only sandbox and `approval_policy = never` - it does not inherit user/project exec-policy rules - it prefers `gpt-5.4` when the current provider exposes it, otherwise falls back to the parent turn's active model - it fail-closes on timeout, startup failure, malformed output, or any other review error - it currently auto-approves only when `risk_score < 80` ## Review context and policy - guardian mirrors `OnRequest` approval semantics rather than introducing a separate approval policy - explicit `require_escalated` requests follow the same approval surface as `OnRequest`; the difference is only who reviews them - managed-network allowlist misses that enter the approval flow are also reviewed by guardian - the review prompt includes bounded recent transcript history plus recent tool call/result evidence - transcript entries and planned-action strings are truncated with explicit `<guardian_truncated ... />` markers so large payloads stay bounded - apply-patch reviews include the full patch content (without duplicating the structured `changes` payload) - the guardian request layout is snapshot-tested using the same model-visible Responses request formatter used elsewhere in core ## Guardian network behavior - the guardian subagent inherits the parent session's managed-network allowlist when one exists, so it can use the same approved network surface while reviewing - exact session-scoped network approvals are copied into the guardian session with protocol/port scope preserved - those copied approvals are now seeded before the guardian's first turn is submitted, so inherited approvals are available during any immediate review-time checks ## Out of scope / follow-ups - the sandbox-permission validation split was pulled into a separate PR and is not part of this diff - a future follow-up can enable `serde_json` preserve-order in `codex-core` and then simplify the guardian action rendering further --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-07 05:40:10 -08:00
jif-oai	cf143bf71e	feat: simplify DB further (#13771 )	2026-03-07 03:48:36 -08:00
Michael Bolin	5ceff6588e	safety: honor filesystem policy carveouts in apply_patch (#13445 ) ## Why `apply_patch` safety approval was still checking writable paths through the legacy `SandboxPolicy` projection. That can hide explicit `none` carveouts when a split filesystem policy projects back to compatibility `ExternalSandbox`, which leaves one more approval path that can auto-approve writes inside paths that are intentionally blocked. ## What changed - passed `turn.file_system_sandbox_policy` into `assess_patch_safety` - changed writable-path checks to derive effective access from `FileSystemSandboxPolicy` instead of the legacy `SandboxPolicy` - made those checks reject explicit unreadable roots before considering broad write access or writable roots - added regression coverage showing that an `ExternalSandbox` compatibility projection still asks for approval when the split filesystem policy blocks a subpath ## Verification - `cargo test -p codex-core safety::tests::` - `cargo test -p codex-core test_sandbox_config_parsing` - `cargo clippy -p codex-core --all-targets -- -D warnings` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13445). * #13453 * #13452 * #13451 * #13449 * #13448 * __->__ #13445 * #13440 * #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 08:01:08 +00:00
Celia Chen	b0ce16c47a	fix(core): respect reject policy by approval source for skill scripts (#13816 ) ## Summary - distinguish reject-policy handling for prefix-rule approvals versus sandbox approvals in Unix shell escalation - keep prompting for skill-script execution when `rules=true` but `sandbox_approval=false`, instead of denying the command up front - add regression coverage for both skill-script reject-policy paths in `codex-rs/core/tests/suite/skill_approval.rs`	2026-03-06 21:43:14 -08:00
Michael Bolin	22ac6b9aaa	sandboxing: plumb split sandbox policies through runtime (#13439 ) ## Why `#13434` introduces split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy`, but the runtime still made most execution-time sandbox decisions from the legacy `SandboxPolicy` projection. That projection loses information about combinations like unrestricted filesystem access with restricted network access. In practice, that means the runtime can choose the wrong platform sandbox behavior or set the wrong network-restriction environment for a command even when config has already separated those concerns. This PR carries the split policies through the runtime so sandbox selection, process spawning, and exec handling can consult the policy that actually matters. ## What changed - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state, unified exec, and app-server exec overrides - updated sandbox selection in `core/src/sandboxing/mod.rs` and `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus `NetworkSandboxPolicy`, rather than inferring behavior only from the legacy `SandboxPolicy` - updated process spawning in `core/src/spawn.rs` and the platform wrappers to use `NetworkSandboxPolicy` when deciding whether to set `CODEX_SANDBOX_NETWORK_DISABLED` - kept additional-permissions handling and legacy `ExternalSandbox` compatibility projections aligned with the split policies, including explicit user-shell execution and Windows restricted-token routing - updated callers across `core`, `app-server`, and `linux-sandbox` to pass the split policies explicitly ## Verification - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to verify `RunUserShellCommand` does not inherit `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn - added coverage in `core/src/exec.rs` for Windows restricted-token sandbox selection when the legacy projection is `ExternalSandbox` - updated Linux sandbox coverage in `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy exec path - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439). * #13453 * #13452 * #13451 * #13449 * #13448 * #13445 * #13440 * __->__ #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 02:30:21 +00:00
viyatb-oai	25fa974166	fix: support managed network allowlist controls (#12752 ) ## Summary - treat `requirements.toml` `allowed_domains` and `denied_domains` as managed network baselines for the proxy - in restricted modes by default, build the effective runtime policy from the managed baseline plus user-configured allowlist and denylist entries, so common hosts can be pre-approved without blocking later user expansion - add `experimental_network.managed_allowed_domains_only = true` to pin the effective allowlist to managed entries, ignore user allowlist additions, and hard-deny non-managed domains without prompting - apply `managed_allowed_domains_only` anywhere managed network enforcement is active, including full access, while continuing to respect denied domains from all sources - add regression coverage for merged-baseline behavior, managed-only behavior, and full-access managed-only enforcement ## Behavior Assuming `requirements.toml` defines both `experimental_network.allowed_domains` and `experimental_network.denied_domains`. ### Default mode - By default, the effective allowlist is `experimental_network.allowed_domains` plus user or persisted allowlist additions. - By default, the effective denylist is `experimental_network.denied_domains` plus user or persisted denylist additions. - Allowlist misses can go through the network approval flow. - Explicit denylist hits and local or private-network blocks are still hard-denied. - When `experimental_network.managed_allowed_domains_only = true`, only managed `allowed_domains` are respected, user allowlist additions are ignored, and non-managed domains are hard-denied without prompting. - Denied domains continue to be respected from all sources. ### Full access - With managed requirements present, the effective allowlist is pinned to `experimental_network.allowed_domains`. - With managed requirements present, the effective denylist is pinned to `experimental_network.denied_domains`. - There is no allowlist-miss approval path in full access. - Explicit denylist hits are hard-denied. - `experimental_network.managed_allowed_domains_only = true` now also applies in full access, so managed-only behavior remains in effect anywhere managed network enforcement is active.	2026-03-06 17:52:54 -08:00
viyatb-oai	5deaf9409b	fix: avoid invoking git before project trust is established (#13804 ) ## Summary - resolve trust roots by inspecting `.git` entries on disk instead of spawning `git rev-parse --git-common-dir` - keep regular repo and linked-worktree trust inheritance behavior intact - add a synthetic regression test that proves worktree trust resolution works without a real git command ## Testing - `just fmt` - `cargo test -p codex-core resolve_root_git_project_for_trust` - `cargo clippy -p codex-core --all-targets -- -D warnings` - `cargo test -p codex-core` (fails in this environment on unrelated managed-config `DangerFullAccess` tests in `codex::tests`, `tools::js_repl::tests`, and `unified_exec::tests`)	2026-03-06 17:46:23 -08:00
Ruslan Nigmatullin	e9bd8b20a1	app-server: Add streaming and tty/pty capabilities to `command/exec` (#13640 ) * Add an ability to stream stdin, stdout, and stderr * Streaming of stdout and stderr has a configurable cap for total amount of transmitted bytes (with an ability to disable it) * Add support for overriding environment variables * Add an ability to terminate running applications (using `command/exec/terminate`) * Add TTY/PTY support, with an ability to resize the terminal (using `command/exec/resize`)	2026-03-06 17:30:17 -08:00
Rohan Mehta	61098c7f51	Allow full web search tool config (#13675 ) Previously, we could only configure whether web search was on/off. This PR enables sending along a web search config, which includes all the stuff responsesapi supports: filters, location, etc.	2026-03-07 00:50:50 +00:00
Celia Chen	8b81284975	fix(core): skip exec approval for permissionless skill scripts (#13791 ) ## Summary - Treat skill scripts with no permission profile, or an explicitly empty one, as permissionless and run them with the turn's existing sandbox instead of forcing an exec approval prompt. - Keep the approval flow unchanged for skills that do declare additional permissions. - Update the skill approval tests to assert that permissionless skill scripts do not prompt on either the initial run or a rerun. ## Why Permissionless skills should inherit the current turn sandbox directly. Prompting for exec approval in that case adds friction without granting any additional capability.	2026-03-06 16:40:41 -08:00
xl-openai	0243734300	feat: Add curated plugin marketplace + Metadata Cleanup. (#13712 ) 1. Add a synced curated plugin marketplace and include it in marketplace discovery. 2. Expose optional plugin.json interface metadata in plugin/list 3. Tighten plugin and marketplace path handling using validated absolute paths. 4. Let manifests override skill, MCP, and app config paths. 5. Restrict plugin enablement/config loading to the user config layer so plugin enablement is at global level	2026-03-06 19:39:35 -05:00
Owen Lin	289ed549cf	chore(otel): rename OtelManager to SessionTelemetry (#13808 ) ## Summary This is a purely mechanical refactor of `OtelManager` -> `SessionTelemetry` to better convey what the struct is doing. No behavior change. ## Why `OtelManager` ended up sounding much broader than what this type actually does. It doesn't manage OTEL globally; it's the session-scoped telemetry surface for emitting log/trace events and recording metrics with consistent session metadata (`app_version`, `model`, `slug`, `originator`, etc.). `SessionTelemetry` is a more accurate name, and updating the call sites makes that boundary a lot easier to follow. ## Validation - `just fmt` - `cargo test -p codex-otel` - `cargo test -p codex-core`	2026-03-06 16:23:30 -08:00
Ahmed Ibrahim	a11c59f634	Add realtime startup context override (#13796 ) - add experimental_realtime_ws_startup_context to override or disable realtime websocket startup context - preserve generated startup context when unset and cover the new override paths in tests	2026-03-06 16:00:30 -08:00
Michael Bolin	f82678b2a4	config: add initial support for the new permission profile config language in config.toml (#13434 ) ## Why `SandboxPolicy` currently mixes together three separate concerns: - parsing layered config from `config.toml` - representing filesystem sandbox state - carrying basic network policy alongside filesystem choices That makes the existing config awkward to extend and blocks the new TOML proposal where `[permissions]` becomes a table of named permission profiles selected by `default_permissions`. (The idea is that if `default_permissions` is not specified, we assume the user is opting into the "traditional" way to configure the sandbox.) This PR adds the config-side plumbing for those profiles while still projecting back to the legacy `SandboxPolicy` shape that the current macOS and Linux sandbox backends consume. It also tightens the filesystem profile model so scoped entries only exist for `:project_roots`, and so nested keys must stay within a project root instead of using `.` or `..` traversal. This drops support for the short-lived `[permissions.network]` in `config.toml` because now that would be interpreted as a profile named `network` within `[permissions]`. ## What Changed - added `PermissionsToml`, `PermissionProfileToml`, `FilesystemPermissionsToml`, and `FilesystemPermissionToml` so config can parse named profiles under `[permissions.<profile>.filesystem]` - added top-level `default_permissions` selection, validation for missing or unknown profiles, and compilation from a named profile into split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` values - taught config loading to choose between the legacy `sandbox_mode` path and the profile-based path without breaking legacy users - introduced `codex-protocol::permissions` for the split filesystem and network sandbox types, and stored those alongside the legacy projected `sandbox_policy` in runtime `Permissions` - modeled `FileSystemSpecialPath` so only `ProjectRoots` can carry a nested `subpath`, matching the intended config syntax instead of allowing invalid states for other special paths - restricted scoped filesystem maps to `:project_roots`, with validation that nested entries are non-empty descendant paths and cannot use `.` or `..` to escape the project root - kept existing runtime consumers working by projecting `FileSystemSandboxPolicy` back into `SandboxPolicy`, with an explicit error for profiles that request writes outside the workspace root - loaded proxy settings from top-level `[network]` - regenerated `core/config.schema.json` ## Verification - added config coverage for profile deserialization, `default_permissions` selection, top-level `[network]` loading, network enablement, rejection of writes outside the workspace root, rejection of nested entries for non-`:project_roots` special paths, and rejection of parent-directory traversal in `:project_roots` maps - added protocol coverage for the legacy bridge rejecting non-workspace writes ## Docs - update the Codex config docs on developers.openai.com/codex to document named `[permissions.<profile>]` entries, `default_permissions`, scoped `:project_roots` syntax, the descendant-path restriction for nested `:project_roots` entries, and top-level `[network]` proxy configuration --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13434). * #13453 * #13452 * #13451 * #13449 * #13448 * #13445 * #13440 * #13439 * __->__ #13434	2026-03-06 15:39:13 -08:00
Curtis 'Fjord' Hawthorne	d6c8186195	Clarify js_repl binding reuse guidance (#13803 ) ## Summary Clarify the `js_repl` prompt guidance around persistent bindings and redeclaration recovery. This updates the generated `js_repl` instructions in `core/src/project_doc.rs` to prefer this order when a name is already bound: 1. Reuse the existing binding 2. Reassign a previously declared `let` 3. Pick a new descriptive name 4. Use `{ ... }` only for short-lived scratch scope 5. Reset the kernel only when a clean state is actually needed The prompt now also explicitly warns against wrapping an entire cell in block scope when the goal is to reuse names across later cells. ## Why The previous wording still left too much room for low-value workarounds like whole-cell block wrapping. In downstream browser rollouts, that pattern was adding tokens and preventing useful state reuse across `js_repl` cells. This change makes the preferred behavior more explicit without changing runtime semantics. ## Scope - Prompt/documentation change only - No runtime behavior changes - Updates the matching string-backed `project_doc` tests	2026-03-06 15:19:06 -08:00
Ruslan Nigmatullin	5b04cc657f	utils/pty: add streaming spawn and terminal sizing primitives (#13695 ) Enhance pty utils: * Support closing stdin * Separate stderr and stdout streams to allow consumers differentiate them * Provide compatibility helper to merge both streams back into combined one * Support specifying terminal size for pty, including on-demand resizes while process is already running * Support terminating the process while still consuming its outputs	2026-03-06 15:13:12 -08:00
Michael Bolin	488875f24d	fix: move unit tests in codex-rs/core/src/codex.rs into their own file (#13783 ) This is analogous to https://github.com/openai/codex/pull/13780.	2026-03-06 11:56:49 -08:00
Michael Bolin	39869f7443	fix: move unit tests in codex-rs/core/src/config/mod.rs into their own file (#13780 ) At over 7,000 lines, `codex-rs/core/src/config/mod.rs` was getting a bit unwieldy. This PR does the same type of move as https://github.com/openai/codex/pull/12957 to put unit tests in their own file, though I decided `config_tests.rs` is a more intuitive name than `mod_tests.rs`. Ultimately, I'll codemod the rest of the codebase to follow suit, but I want to do it in stages to reduce merge conflicts for people.	2026-03-06 11:21:58 -08:00
sayan-oai	8a54d3caaa	feat: structured plugin parsing (#13711 ) #### What Add structured `@plugin` parsing and TUI support for plugin mentions. - Core: switch from plain-text `@display_name` parsing to structured `plugin://...` mentions via `UserInput::Mention` and `[$...](plugin://...)` links in text, same pattern as apps/skills. - TUI: add plugin mention popup, autocomplete, and chips when typing `$`. Load plugin capability summaries and feed them into the composer; plugin mentions appear alongside skills and apps. - Generalize mention parsing to a sigil parameter, still defaults to `$` <img width="797" height="119" alt="image" src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb" /> Builds on #13510. Currently clients have to build their own `id` via `plugin@marketplace` and filter plugins to show by `enabled`, but we will add `id` and `available` as fields returned from `plugin/list` soon. ####Tests Added tests, verified locally.	2026-03-06 11:08:36 -08:00
jif-oai	0e41a5c4a8	chore: improve DB flushing (#13620 ) This branch: * Avoid flushing DB when not necessary * Filter events for which we perfom an `upsert` into the DB * Add a dedicated update function of the `thread:updated_at` that is lighter This should significantly reduce the DB lock contention. If it is not sufficient, we can de-sync the flush of the DB for `updated_at`	2026-03-06 19:58:14 +01:00
Owen Lin	3449e00bc9	feat(otel, core): record turn TTFT and TTFM metrics in codex-core (#13630 ) ### Summary This adds turn-level latency metrics for the first model output and the first completed agent message. - `codex.turn.ttft.duration_ms` starts at turn start and records on the first output signal we see from the model. That includes normal assistant text, reasoning deltas, and non-text outputs like tool-call items. - `codex.turn.ttfm.duration_ms` also starts at turn start, but it records when the first agent message finishes streaming rather than when its first delta arrives. ### Implementation notes The timing is tracked in codex-core, not app-server, so the definition stays consistent across CLI, TUI, and app-server clients. I reused the existing turn lifecycle boundary that already drives `codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn state, and record each metric once per turn. I also wired the new metric names into the OTEL runtime metrics summary so they show up in the same in-memory/debug snapshot path as the existing timing metrics.	2026-03-06 10:23:48 -08:00
Charley Cunningham	cb1a182bbe	Clarify sandbox permission override helper semantics (#13703 ) ## Summary Today `SandboxPermissions::requires_additional_permissions()` does not actually mean "is `WithAdditionalPermissions`". It returns `true` for any non-default sandbox override, including `RequireEscalated`. That broad behavior is relied on in multiple `main` callsites. The naming is security-sensitive because `SandboxPermissions` is used on shell-like tool calls to tell the executor how a single command should relate to the turn sandbox: - `UseDefault`: run with the turn sandbox unchanged - `RequireEscalated`: request execution outside the sandbox - `WithAdditionalPermissions`: stay sandboxed but widen permissions for that command only ## Problem The old helper name reads as if it only applies to the `WithAdditionalPermissions` variant. In practice it means "this command requested any explicit sandbox override." That ambiguity made it easy to read production checks incorrectly and made the guardian change look like a standalone `main` fix when it is not. On `main` today: - `shell` and `unified_exec` intentionally reject any explicit `sandbox_permissions` request unless approval policy is `OnRequest` - `exec_policy` intentionally treats any explicit sandbox override as prompt-worthy in restricted sandboxes - tests intentionally serialize both `RequireEscalated` and `WithAdditionalPermissions` as explicit sandbox override requests So changing those callsites from the broad helper to a narrow `WithAdditionalPermissions` check would be a behavior change, not a pure cleanup. ## What This PR Does - documents `SandboxPermissions` as a per-command sandbox override, not a generic permissions bag - adds `requests_sandbox_override()` for the broad meaning: anything except `UseDefault` - adds `uses_additional_permissions()` for the narrow meaning: only `WithAdditionalPermissions` - keeps `requires_additional_permissions()` as a compatibility alias to the broad meaning for now - updates the current broad callsites to use the accurately named broad helper - adds unit coverage that locks in the semantics of all three helpers ## What This PR Does Not Do This PR does not change runtime behavior. That is intentional. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-06 09:57:48 -08:00
jif-oai	f891f516a5	feat: drop discrepency metrics (#13753 )	2026-03-06 18:32:25 +01:00

... 8 9 10 11 12 ...

2329 Commits