codex

mirror of https://github.com/openai/codex.git synced 2026-05-02 04:11:39 +03:00

Author	SHA1	Message	Date
Owen Lin	cecbd5b021	[app-server] feat: add v2 command execution approval flow (#6758 ) This PR adds the API V2 version of the command‑execution approval flow for the shell tool. This PR wires the new RPC (`item/commandExecution/requestApproval`, V2 only) and related events (`item/started`, `item/completed`, and `item/commandExecution/delta`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. The approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add additional fields to `EventMsg::ExecCommandEndEvent` to capture the command's input so that app-server can statelessly transform these events to a `ThreadItem::CommandExecution` item for the `item/completed` event. Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept. Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server. Example payloads below: ``` { "method": "item/started", "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": null, "exitCode": null, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "inProgress", "type": "commandExecution" } } } ``` ``` { "id": 0, "method": "item/commandExecution/requestApproval", "params": { "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "reason": "Need to create file in /tmp which is outside workspace sandbox", "risk": null, "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a", "turnId": "1" } } ``` ``` { "id": 0, "result": { "acceptSettings": { "forSession": false }, "decision": "accept" } } ``` ``` { "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": 224, "exitCode": 0, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "completed", "type": "commandExecution" } } } ```	2025-11-18 00:23:54 +00:00
Eric Traut	d5853d9c47	Changes to sandbox command assessment feature based on initial experiment feedback (#6091 ) * Removed sandbox risk categories; feedback indicates that these are not that useful and "less is more" * Tweaked the assessment prompt to generate terser answers * Fixed bug in orchestrator that prevents this feature from being exposed in the extension	2025-11-01 14:52:23 -07:00
Owen Lin	89c00611c2	[app-server] remove serde(skip_serializing_if = "Option::is_none") annotations (#5939 ) We had this annotation everywhere in app-server APIs which made it so that fields get serialized as `field?: T`, meaning if the field as `None` we would omit the field in the payload. Removing this annotation changes it so that we return `field: T \| null` instead, which makes codex app-server's API more aligned with the convention of public OpenAI APIs like Responses. Separately, remove the `#[ts(optional_fields = nullable)]` annotations that were recently added which made all the TS types become `field?: T \| null` which is not great since clients need to handle undefined and null. I think generally it'll be best to have optional types be either: - `field: T \| null` (preferred, aligned with public OpenAI APIs) - `field?: T` where we have to, such as types generated from the MCP schema: https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-06-18/schema.ts (see changes to `mcp-types/`) I updated @etraut-openai's unit test to check that all generated TS types are one or the other, not both (so will error if we have a type that has `field?: T \| null`). I don't think there's currently a good use case for that - but we can always revisit.	2025-10-30 18:18:53 +00:00
Eric Traut	069a38a06c	Add missing "nullable" macro to protocol structs that contain optional fields (#5901 ) This PR addresses a current hole in the TypeScript code generation for the API server protocol. Fields that are marked as "Optional<>" in the Rust code are serialized such that the value is omitted when it is deserialized — appearing as `undefined`, but the TS type indicates (incorrectly) that it is always defined but possibly `null`. This can lead to subtle errors that the TypeScript compiler doesn't catch. The fix is to include the `#[ts(optional_fields = nullable)]` macro for all protocol structs that contain one or more `Optional<>` fields. This PR also includes a new test that validates that all TS protocol code containing "\| null" in its type is marked optional ("?") to catch cases where `#[ts(optional_fields = nullable)]` is omitted.	2025-10-29 12:09:47 -07:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00

5 Commits