codex

mirror of https://github.com/openai/codex.git synced 2026-05-05 22:01:37 +03:00

Author	SHA1	Message	Date
Ahmed Ibrahim	528e7fde9d	merge	2025-11-20 19:37:47 -08:00
Michael Bolin	f56d1dc8fc	feat: update process_exec_tool_call() to take a cancellation token (#6972 ) This updates `ExecParams` so that instead of taking `timeout_ms: Option<u64>`, it now takes a more general cancellation mechanism, `ExecExpiration`, which is an enum that includes a `Cancellation(tokio_util::sync::CancellationToken)` variant. If the cancellation token is fired, then `process_exec_tool_call()` returns in the same way as if a timeout was exceeded. This is necessary so that in #6973, we can manage the timeout logic external to the `process_exec_tool_call()` because we want to "suspend" the timeout when an elicitation from a human user is pending. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972). * #7005 * #6973 * __->__ #6972	2025-11-20 16:29:57 -08:00
pakrym-oai	856f97f449	Delete shell_command feature (#7024 )	2025-11-20 14:14:56 -08:00
pakrym-oai	30ca89424c	Always fallback to real shell (#6953 ) Either cmd.exe or `/bin/sh`.	2025-11-20 10:58:46 -08:00
Owen Lin	d6c30ed25e	[app-server] feat: v2 apply_patch approval flow (#6760 ) This PR adds the API V2 version of the apply_patch approval flow, which centers around `ThreadItem::FileChange`. This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only) and related events (`item/started`, `item/completed` for `ThreadItem::FileChange`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. Similar to https://github.com/openai/codex/pull/6758, the approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add a few additional fields to `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those were fairly lightweight. However, the `EventMsg`s emitted by core are the following: ``` 1) Auto-approved (no request for approval)  - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 2) Approved by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 3) Declined by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd ``` For a request triggering an approval, this would result in: ``` item/fileChange/requestApproval item/started item/completed ``` which is different from the `ThreadItem::CommandExecution` flow introduced in https://github.com/openai/codex/pull/6758, which does the below and is preferable: ``` item/started item/commandExecution/requestApproval item/completed ``` To fix this, we leverage `TurnSummaryStore` on codex_message_processor to store a little bit of state, allowing us to fire `item/started` and `item/fileChange/requestApproval` whenever we receive the underlying `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the `EventMsg::PatchApplyBegin` later. This is much less invasive than modifying the order of EventMsg within core (I tried). The resulting payloads: ``` { "method": "item/started", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "inProgress", "type": "fileChange" } } } ``` ``` { "id": 0, "method": "item/fileChange/requestApproval", "params": { "grantRoot": null, "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686", "reason": null, "threadId": "019a9e11-8295-7883-a283-779e06502c6f", "turnId": "1" } } ``` ``` { "id": 0, "result": { "decision": "accept" } } ``` ``` { "method": "item/completed", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "completed", "type": "fileChange" } } } ```	2025-11-19 20:13:31 -08:00
zhao-oai	65c13f1ae7	execpolicy2 core integration (#6641 ) This PR threads execpolicy2 into codex-core. activated via feature flag: exec_policy (on by default) reads and parses all .codexpolicy files in `codex_home/codex` refactored tool runtime API to integrate execpolicy logic --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-11-19 16:50:43 -08:00
Ahmed Ibrahim	2e44082a30	shell	2025-11-19 16:44:45 -08:00
Ahmed Ibrahim	efebc62fb7	Move shell to use `truncate_text` (#6842 ) Move shell to use the configurable `truncate_text` --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-19 01:56:08 -08:00
pakrym-oai	ee0484a98c	shell_command returns freeform output (#6860 ) Instead of returning structured out and then re-formatting it into freeform, return the freeform output from shell_command tool. Keep `shell` as the default tool for GPT-5.	2025-11-18 23:38:43 -08:00
Dylan Hurd	29ca89c414	chore(config) enable shell_command (#6843 ) ## Summary Enables shell_command as default for `gpt-5` and `codex-` models. ## Testing - [x] Updated unit tests	2025-11-18 12:46:02 -08:00
Ahmed Ibrahim	3de8790714	Add the utility to truncate by tokens (#6746 ) - This PR is to make it on path for truncating by tokens. This path will be initially used by unified exec and context manager (responsible for MCP calls mainly). - We are exposing new config `calls_output_max_tokens` - Use `tokens` as the main budget unit but truncate based on the model family by Introducing `TruncationPolicy`. - Introduce `truncate_text` as a router for truncation based on the mode. In next PRs: - remove truncate_with_line_bytes_budget - Add the ability to the model to override the token budget.	2025-11-18 11:36:23 -08:00
zhao-oai	e9e644a119	fixing localshell tool calls (#6823 ) - Local-shell tool responses were always tagged as `ExecCommandSource::UserShell` because handler would call `run_exec_like` with `is_user_shell_cmd` set to true. - Treat `ToolPayload::LocalShell` the same as other model generated shell tool calls by deleting `is_user_shell_cmd` from `run_exec_like` (since actual user shell commands follow a separate code path)	2025-11-18 17:28:26 +00:00
Dylan Hurd	28ebe1c97a	fix(windows) shell_command on windows, minor parsing (#6811 ) ## Summary Enables shell_command for windows users, and starts adding some basic command parsing here, to at least remove powershell prefixes. We'll follow this up with command parsing but I wanted to land this change separately with some basic UX. NOTE: This implementation parses bash and powershell on both platforms. In theory this is possible, since you can use git bash on windows or powershell on linux. In practice, this may not be worth the complexity of supporting, so I don't feel strongly about the current approach vs. platform-specific branching. ## Testing - [x] Added a bunch of tests - [x] Ran on both windows and os x	2025-11-17 22:23:53 -08:00
Owen Lin	cecbd5b021	[app-server] feat: add v2 command execution approval flow (#6758 ) This PR adds the API V2 version of the command‑execution approval flow for the shell tool. This PR wires the new RPC (`item/commandExecution/requestApproval`, V2 only) and related events (`item/started`, `item/completed`, and `item/commandExecution/delta`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. The approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add additional fields to `EventMsg::ExecCommandEndEvent` to capture the command's input so that app-server can statelessly transform these events to a `ThreadItem::CommandExecution` item for the `item/completed` event. Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept. Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server. Example payloads below: ``` { "method": "item/started", "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": null, "exitCode": null, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "inProgress", "type": "commandExecution" } } } ``` ``` { "id": 0, "method": "item/commandExecution/requestApproval", "params": { "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "reason": "Need to create file in /tmp which is outside workspace sandbox", "risk": null, "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a", "turnId": "1" } } ``` ``` { "id": 0, "result": { "acceptSettings": { "forSession": false }, "decision": "accept" } } ``` ``` { "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": 224, "exitCode": 0, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "completed", "type": "commandExecution" } } } ```	2025-11-18 00:23:54 +00:00
Jeremy Rose	ab2e7499f8	core: add a feature to disable the shell tool (#6481 ) `--disable shell_tool` disables the built-in shell tool. This is useful for MCP-only operation. --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-11-17 22:56:19 +00:00
Dylan Hurd	daf77b8452	chore(core) Update shell instructions (#6679 ) ## Summary Consolidates `shell` and `shell_command` tool instructions. ## Testing - [x] Updated tests, tested locally	2025-11-17 13:05:15 -08:00
Jeremy Rose	03ffe4d595	core/tui: non-blocking MCP startup (#6334 ) This makes MCP startup not block TUI startup. Messages sent while MCPs are booting will be queued. https://github.com/user-attachments/assets/96e1d234-5d8f-4932-a935-a675d35c05e0 Fixes #6317 --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-17 11:26:11 -08:00
Dylan Hurd	497fb4a19c	fix(core) serialize shell_command (#6744 ) ## Summary Ensures we're serializing calls to `shell_command` ## Testing - [x] Added unit test	2025-11-16 23:16:51 -08:00
jif-oai	63c8c01f40	feat: better UI for unified_exec (#6515 ) <img width="376" height="132" alt="Screenshot 2025-11-12 at 17 36 22" src="https://github.com/user-attachments/assets/ce693f0d-5ca0-462e-b170-c20811dcc8d5" />	2025-11-14 16:31:12 +01:00
Ahmed Ibrahim	9890ceb939	Avoid double truncation (#6631 ) 1. Avoid double truncation by giving 10% above the tool default constant 2. Add tests that fails when const = 1	2025-11-13 16:59:31 -08:00
pakrym-oai	7b027e7536	Revert "Revert "Overhaul shell detection and centralize command generation for unified exec"" (#6607 ) Reverts openai/codex#6606	2025-11-13 16:45:17 -08:00
pakrym-oai	0792a7953d	Update default yield time (#6610 ) 10s for exec and 250ms for write_stdin	2025-11-13 10:24:41 -08:00
pakrym-oai	e6995174c1	Revert "Overhaul shell detection and centralize command generation for unified exec" (#6606 ) Reverts openai/codex#6577	2025-11-13 08:43:00 -08:00
pakrym-oai	d28e912214	Overhaul shell detection and centralize command generation for unified exec (#6577 ) This fixes command display for unified exec. All `cd`s and `ls`es are now parsed. <img width="452" height="237" alt="image" src="https://github.com/user-attachments/assets/ce92d81f-f74c-485a-9b34-1eaa29290ec6" /> Deletes a ton of tests that were doing nothing from shell.rs. --------- Co-authored-by: Pavel Krymets <pavel@krymets.com>	2025-11-13 08:28:09 -08:00
Ahmed Ibrahim	b1979b70a8	remove porcupine model slug (#6580 )	2025-11-13 04:43:31 +00:00
Ahmed Ibrahim	ad7eaa80f9	Change model picker to include gpt5.1 (#6569 ) - Change the presets - Change the tests that make sure we keep the list of tools updated - Filter out deprecated models	2025-11-12 19:44:53 -08:00
jif-oai	e00eb50db3	feat: only wait for mutating tools for ghost commit (#6534 )	2025-11-12 18:16:32 +00:00
Michael Bolin	29364f3a9b	feat: shell_command tool (#6510 ) This adds support for a new variant of the shell tool behind a flag. To test, run `codex` with `--enable shell_command_tool`, which will register the tool with Codex under the name `shell_command` that accepts the following shape: ```python { command: str workdir: str \| None, timeout_ms: int \| None, with_escalated_permissions: bool \| None, justification: str \| None, } ``` This is comparable to the existing tool registered under `shell`/`container.exec`. The primary difference is that it accepts `command` as a `str` instead of a `str[]`. The `shell_command` tool executes by running `execvp(["bash", "-lc", command])`, though the exact arguments to `execvp(3)` depend on the user's default shell. The hypothesis is that this will simplify things for the model. For example, on Windows, instead of generating: ```json {"command": ["pwsh.exe", "-NoLogo", "-Command", "ls -Name"]} ``` The model could simply generate: ```json {"command": "ls -Name"} ``` As part of this change, I extracted some logic out of `user_shell.rs` as `Shell::derive_exec_args()` so that it can be reused in `codex-rs/core/src/tools/handlers/shell.rs`. Note the original code generated exec arg lists like: ```javascript ["bash", "-lc", command] ["zsh", "-lc", command] ["pwsh.exe", "-NoProfile", "-Command", command] ``` Using `-l` for Bash and Zsh, but then specifying `-NoProfile` for PowerShell seemed inconsistent to me, so I changed this in the new implementation while also adding a `use_login_shell: bool` option to make this explicit. If we decide to add a `login: bool` to `ShellCommandToolCallParams` like we have for unified exec: `807e2c27f0/codex-rs/core/src/tools/handlers/unified_exec.rs (L33-L34)` Then this should make it straightforward to support.	2025-11-12 08:18:57 -08:00
pakrym-oai	807e2c27f0	Add unified exec escalation handling and tests (#6492 ) Similar implementation to the shell tool	2025-11-11 08:19:35 -08:00
jif-oai	ad279eacdc	nit: logs to trace (#6503 )	2025-11-11 13:37:06 +00:00
jif-oai	f01f2ec9ee	feat: add workdir to unified_exec (#6466 )	2025-11-10 19:53:36 +00:00
pakrym-oai	91b16b8682	Don't request approval for safe commands in unified exec (#6380 )	2025-11-07 16:36:04 -08:00
pakrym-oai	4c1a6f0ee0	Promote shell config tool to model family config (#6351 )	2025-11-07 10:11:11 -08:00
pakrym-oai	c368c6aeea	Remove shell tool when unified exec is enabled (#6345 ) Also drop streameable shell that's just an alias for unified exec.	2025-11-06 15:46:24 -08:00
pakrym-oai	b5349202e9	Freeform unified exec output formatting (#6233 )	2025-11-06 22:14:27 +00:00
Ahmed Ibrahim	1a89f70015	refactor Conversation history file into its own directory (#6229 ) This is just a refactor of `conversation_history` file by breaking it up into multiple smaller ones with helper. This refactor will help us move more functionality related to context management here. in a clean way.	2025-11-05 10:49:35 -08:00
Ahmed Ibrahim	6ee7fbcfff	feat: add the time after aborting (#5996 ) Tell the model how much time passed after the user aborted the call.	2025-11-03 11:44:06 -08:00
Eric Traut	d5853d9c47	Changes to sandbox command assessment feature based on initial experiment feedback (#6091 ) * Removed sandbox risk categories; feedback indicates that these are not that useful and "less is more" * Tweaked the assessment prompt to generate terser answers * Fixed bug in orchestrator that prevents this feature from being exposed in the extension	2025-11-01 14:52:23 -07:00
iceweasel-oai	87cce88f48	Windows Sandbox - Alpha version (#4905 ) - Added the new codex-windows-sandbox crate that builds both a library entry point (run_windows_sandbox_capture) and a CLI executable to launch commands inside a Windows restricted-token sandbox, including ACL management, capability SID provisioning, network lockdown, and output capture (windows-sandbox-rs/src/lib.rs:167, windows-sandbox-rs/src/main.rs:54). - Introduced the experimental WindowsSandbox feature flag and wiring so Windows builds can opt into the sandbox: SandboxType::WindowsRestrictedToken, the in-process execution path, and platform sandbox selection now honor the flag (core/src/features.rs:47, core/src/config.rs:1224, core/src/safety.rs:19, core/src/sandboxing/mod.rs:69, core/src/exec.rs:79, core/src/exec.rs:172). - Updated workspace metadata to include the new crate and its Windows-specific dependencies so the core crate can link against it (codex-rs/ Cargo.toml:91, core/Cargo.toml:86). - Added a PowerShell bootstrap script that installs the Windows toolchain, required CLI utilities, and builds the workspace to ease development on the platform (scripts/setup-windows.ps1:1). - Landed a Python smoke-test suite that exercises read-only/workspace-write policies, ACL behavior, and network denial for the Windows sandbox binary (windows-sandbox-rs/sandbox_smoketests.py:1).	2025-10-30 15:51:57 -07:00
jif-oai	3183935bd7	feat: add output even in sandbox denied (#5908 )	2025-10-29 18:21:18 +00:00
Abhishek Bhardwaj	89591e4246	feature: Add "!cmd" user shell execution (#2471 ) feature: Add "!cmd" user shell execution This change lets users run local shell commands directly from the TUI by prefixing their input with ! (e.g. !ls). Output is truncated to keep the exec cell usable, and Ctrl-C cleanly interrupts long-running commands (e.g. !sleep 10000). Summary of changes - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs. - Reuse the existing tool router: the task constructs a ToolCall for the local_shell tool and relies on ShellHandler, so no manual MCP tool lookup is required. - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the TUI can show command metadata, live output, and exit status. End-to-end flow TUI handling 1. ChatWidget::submit_user_message (TUI) intercepts messages starting with !. 2. Non-empty commands dispatch Op::RunUserShellCommand { command }; empty commands surface a help hint. 3. No UserInput items are created, so nothing is enqueued for the model. Core submission loop 4. The submission loop routes the op to handlers::run_user_shell_command (core/src/codex.rs). 5. A fresh TurnContext is created and Session::spawn_user_shell_command enqueues UserShellCommandTask. Task execution 6. UserShellCommandTask::run emits TaskStartedEvent, formats the command, and prepares a ToolCall targeting local_shell. 7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler. Shell tool runtime 8. ShellHandler::run_exec_like launches the process via the unified exec runtime, honoring sandbox and shell policies, and emits ExecCommandBegin/End. 9. Stdout/stderr are captured for the UI, but the task does not turn the resulting ToolOutput into a model response. Completion 10. After ExecCommandEnd, the task finishes without an assistant message; the session marks it complete and the exec cell displays the final output. Conversation context - The command and its output never enter the conversation history or the model prompt; the flow is local-only. - Only exec/task events are emitted for UI rendering. Demo video https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996	2025-10-29 00:31:20 -07:00
Gabriel Peal	b0bdc04c30	[MCP] Render MCP tool call result images to the model (#5600 ) It's pretty amazing we have gotten here without the ability for the model to see image content from MCP tool calls. This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get adequete credit here but I also want to get this fix in ASAP so I gave him a week to update it and haven't gotten a response so I'm going to take it across the finish line. This test highlights how absured the current situation is. I asked the model to read this image using the Chrome MCP <img width="2378" height="674" alt="image" src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0" /> After this change, it correctly outputs: > Captured the page: image dhows a dark terminal-style UI labeled `OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and working directory `/codex/codex-rs` (and more) Before this change, it said: > Took the full-page screenshot you asked for. It shows a long, horizontally repeating pattern of stylized people in orange, light-blue, and mustard clothing, holding hands in alternating poses against a white background. No text or other graphics-just rows of flat illustration stretching off to the right. Without this change, the Figma, Playwright, Chrome, and other visual MCP servers are pretty much entirely useless. I tested this change with the openai respones api as well as a third party completions api	2025-10-27 17:55:57 -04:00
Ahmed Ibrahim	7226365397	Centralize truncation in conversation history (#5652 ) move the truncation logic to conversation history to use on any tool output. This will help us in avoiding edge cases while truncating the tool calls and mcp calls.	2025-10-27 14:05:35 -07:00
jif-oai	e92c4f6561	feat: async ghost commit (#5618 )	2025-10-27 10:09:10 +00:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
jif-oai	a6b9471548	feat: end events on unified exec (#5551 )	2025-10-23 18:51:34 +01:00
jif-oai	6745b12427	chore: testing on apply_path (#5557 )	2025-10-23 17:00:48 +01:00
Ahmed Ibrahim	f59978ed3d	Handle cancelling/aborting while processing a turn (#5543 ) Currently we collect all all turn items in a vector, then we add it to the history on success. This result in losing those items on errors including aborting `ctrl+c`. This PR: - Adds the ability for the tool call to handle cancellation - bubble the turn items up to where we are recording this info Admittedly, this logic is an ad-hoc logic that doesn't handle a lot of error edge cases. The right thing to do is recording to the history on the spot as `items`/`tool calls output` come. However, this isn't possible because of having different `task_kind` that has different `conversation_histories`. The `try_run_turn` has no idea what thread are we using. We cannot also pass an `arc` to the `conversation_histories` because it's a private element of `state`. That's said, `abort` is the most common case and we should cover it until we remove `task kind`	2025-10-23 08:47:10 -07:00
jif-oai	892eaff46d	fix: approval issue (#5525 )	2025-10-23 11:13:53 +01:00
jif-oai	8e291a1706	chore: clean `handle_container_exec_with_params` (#5516 ) Drop `handle_container_exec_with_params` to have simpler and more straight forward execution path	2025-10-23 09:24:01 +01:00

1 2

72 Commits