codex

mirror of https://github.com/openai/codex.git synced 2026-05-04 21:32:21 +03:00

Author	SHA1	Message	Date
sayan-oai	014a59fb0b	check app auth in plugin/install (#13685 ) #### What on `plugin/install`, check if installed apps are already authed on chatgpt, and return list of all apps that are not. clients can use this list to trigger auth workflows as needed. checks are best effort based on `codex_apps` loading, much like `app/list`. #### Tests Added integration tests, tested locally.	2026-03-06 06:45:00 +00:00
Dylan Hurd	4c9b1c38f6	fix(tui) remove config check for trusted setting (#11874 ) ## Summary Simplify the trusted directory flow. This logic was originally designed several months ago, to determine if codex should start in read-only or workspace-write mode. However, that's no longer the purpose of directory trust - and therefore we should get rid of this logic. ## Testing - [x] Unit tests pass	2026-03-05 22:29:34 -08:00
iceweasel-oai	14de492985	copy current exe to CODEX_HOME/.sandbox-bin for apply_patch (#13669 ) We do this for codex-command-runner.exe as well for the same reason. Windows sandbox users cannot execute binaries in the WindowsApp/ installed directory for the Codex App. This causes apply-patch to fail because it tries to execute codex.exe as the sandbox user.	2026-03-05 22:15:10 -08:00
viyatb-oai	6a79ed5920	refactor: remove proxy admin endpoint (#13687 ) ## Summary - delete the network proxy admin server and its runtime listener/task plumbing - remove the admin endpoint config, runtime, requirement, protocol, schema, and debug-surface fields - update proxy docs to reflect the remaining HTTP and SOCKS listeners only	2026-03-05 22:03:16 -08:00
xl-openai	520ed724d2	support plugin/list. (#13540 ) Introduce a plugin/list which reads from local marketplace.json. Also update the signature for plugin/install.	2026-03-05 21:58:50 -05:00
Ahmed Ibrahim	629cb15bc6	Replay thread rollback from rollout history (#13615 ) - Replay thread rollback from the persisted rollout history instead of truncating in-memory state.\n- Add rollback coverage, including rollback-behind-compaction snapshot coverage.	2026-03-05 16:40:09 -08:00
Ahmed Ibrahim	6cf0ed4e79	Refine realtime startup context formatting (#13560 ) ## Summary - group recent work by git repo when available, otherwise by directory - render recent work as bounded user asks with per-thread cwd context - exclude hidden files and directories from workspace trees	2026-03-05 16:31:20 -08:00
Owen Lin	c3736cff0a	feat(otel): safe tracing (#13626 ) ### Motivation Today config.toml has three different OTEL knobs under `[otel]`: - `exporter` controls where OTEL logs go - `trace_exporter` controls where OTEL traces go - `metrics_exporter` controls where metrics go Those often (pretty much always?) serve different purposes. For example, for OpenAI internal usage, the log exporter is already being used for IT/security telemetry, and that use case is intentionally content-rich: tool calls, arguments, outputs, MCP payloads, and in some cases user content are all useful there. `log_user_prompt` is a good example of that distinction. When it’s enabled, we include raw prompt text in OTEL logs, which is acceptable for the security use case. The trace exporter is a different story. The goal there is to give OpenAI engineers visibility into latency and request behavior when they run Codex locally, without sending sensitive prompt or tool data as trace event data. In other words, traces should help answer “what was slow?” or “where did time go?”, not “what did the user say?” or “what did the tool return?” The complication is that Rust’s `tracing` crate does not make a hard distinction between “logs” and “trace events.” It gives us one instrumentation API for logs and trace events (via `tracing::event!`), and subscribers decide what gets treated as logs, trace events, or both. Before this change, our OTEL trace layer was effectively attached to the general tracing stream, which meant turning on `trace_exporter` could pick up content-rich events that were originally written with logging (and the `log_exporter`) in mind. That made it too easy for sensitive data to end up in exported traces by accident. ### Concrete example In `otel_manager.rs`, this `tracing::event!` call would be exported in both logs AND traces (as a trace event). ``` pub fn user_prompt(&self, items: &[UserInput]) { let prompt = items .iter() .flat_map(\|item\| match item { UserInput::Text { text, .. } => Some(text.as_str()), _ => None, }) .collect::<String>(); let prompt_to_log = if self.metadata.log_user_prompts { prompt.as_str() } else { "[REDACTED]" }; tracing::event!( tracing::Level::INFO, event.name = "codex.user_prompt", event.timestamp = %timestamp(), // ... prompt = %prompt_to_log, ); } ``` Instead of `tracing::event!`, we should now be using `log_event!` and `trace_event!` instead to more clearly indicate which sink (logs vs. traces) that event should be exported to. ### What changed This PR makes the log and trace export distinct instead of treating them as two sinks for the same data. On the provider side, OTEL logs and traces now have separate routing/filtering policy. The log exporter keeps receiving the existing `codex_otel` events, while trace export is limited to spans and trace events. On the event side, `OtelManager` now emits two flavors of telemetry where needed: - a log-only event with the current rich payloads - a tracing-safe event with summaries only It also has a convenience `log_and_trace_event!` macro for emitting to both logs and traces when it's safe to do so, as well as log- and trace-specific fields. That means prompts, tool args, tool output, account email, MCP metadata, and similar content stay in the log lane, while traces get the pieces that are actually useful for performance work: durations, counts, sizes, status, token counts, tool origin, and normalized error classes. This preserves current IT/security logging behavior while making it safe to turn on trace export for employees. ### Full list of things removed from trace export - raw user prompt text from `codex.user_prompt` - raw tool arguments and output from `codex.tool_result` - MCP server metadata from `codex.tool_result` (mcp_server, mcp_server_origin) - account identity fields like `user.email` and `user.account_id` from trace-safe OTEL events - `host.name` from trace resources - generic `codex.tool_decision` events from traces - generic `codex.sse_event` events from traces - the full ToolCall debug payload from the `handle_tool_call` span What traces now keep instead is mostly: - spans - trace-safe OTEL events - counts, lengths, durations, status, token counts, and tool origin summaries	2026-03-05 16:30:53 -08:00
Celia Chen	aaefee04cd	core/protocol: add structured macOS additional permissions and merge them into sandbox execution (#13499 ) ## Summary - Introduce strongly-typed macOS additional permissions across protocol/core/app-server boundaries. - Merge additional permissions into effective sandbox execution, including macOS seatbelt profile extensions. - Expand docs, schema/tool definitions, UI rendering, and tests for `network`, `file_system`, and `macos` additional permissions.	2026-03-05 16:21:45 -08:00
sayan-oai	4e77ea0ec7	add @plugin mentions (#13510 ) ## Note-- added plugin mentions via @, but that conflicts with file mentions depends and builds upon #13433. - introduces explicit `@plugin` mentions. this injects the plugin's mcp servers, app names, and skill name format into turn context as a dev message. - we do not yet have UI for these mentions, so we currently parse raw text (as opposed to skills and apps which have UI chips, autocomplete, etc.) this depends on a `plugins/list` app-server endpoint we can feed the UI with, which is upcoming - also annotate mcp and app tool descriptions with the plugin(s) they come from. this gives the model a first class way of understanding what tools come from which plugins, which will help implicit invocation. ### Tests Added and updated tests, unit and integration. Also confirmed locally a raw `@plugin` injects the dev message, and the model knows about its apps, mcps, and skills.	2026-03-06 00:03:39 +00:00
Curtis 'Fjord' Hawthorne	1ed542bf31	Clarify js_repl image emission and encoding guidance (#13639 ) ## Summary This updates the `js_repl` prompt and docs to make the image guidance less confusing. ## What changed - Clarified that `codex.emitImage(...)` adds one image per call and can be called multiple times to emit multiple images. - Reworded the image-encoding guidance to be general `js_repl` advice instead of `ImageDetailOriginal`-specific behavior. - Updated the guidance to recommend JPEG at about quality 85 when lossy compression is acceptable, and PNG when transparency or lossless detail matters. - Mirrored the same wording in the public `js_repl` docs.	2026-03-05 16:02:37 -08:00
viyatb-oai	9203f17b0e	Improve macOS Seatbelt network and unix socket handling (#12702 ) This improves macOS Seatbelt handling for sandboxed tool processes. ## Changes - Allow dual-stack local binding in proxy-managed sessions, while still keeping traffic limited to loopback and configured proxy endpoints. - Replace the old generic unix-socket path rule with explicit AF_UNIX permissions for socket creation, bind, and outbound connect. - Keep explicitly approved wrapper sockets connect-only. Local helper servers are less likely to fail when binding on macOS. Tools using local unix-socket IPC should work more reliably under the sandbox. Full-network sessions, proxy fail-closed behavior, and proxy lifecycle are unchanged.	2026-03-05 15:39:54 -08:00
Owen Lin	aa3fe8abf8	feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602 ) This PR adds a durable trace linkage for each turn by storing the active trace ID on the rollout TurnContext record stored in session rollout files. Before this change, we propagated trace context at runtime but didn’t persist a stable per-turn trace key in rollout history. That made after-the-fact debugging harder (for example, mapping a historical turn to the corresponding trace in datadog). This sets us up for much easier debugging in the future. ### What changed - Added an optional `trace_id` to TurnContextItem (rollout schema). - Added a small OTEL helper to read the current span trace ID. - Captured `trace_id` when creating `TurnContext` and included it in `to_turn_context_item()`. - Updated tests and fixtures that construct TurnContextItem so older/no-trace cases still work. ### Why this approach TurnContext is already the canonical durable per-turn metadata in rollout. This keeps ownership clean: trace linkage lives with other persisted turn metadata.	2026-03-05 13:26:48 -08:00
Curtis 'Fjord' Hawthorne	cfbbbb1dda	Harden js_repl emitImage to accept only data: URLs (#13507 ) ### Motivation - Prevent untrusted js_repl code from supplying arbitrary external URLs that the host would forward into model input and cause external fetches / data exfiltration. This change narrows the emitImage contract to safe, self-contained data URLs. ### Description - Kernel: added `normalizeEmitImageUrl` and enforce that string-valued `codex.emitImage(...)` inputs and `input_image`/content-item paths only accept non-empty `data:` URLs; byte-based paths still produce data URLs as before (`kernel.js`). - Host: added `validate_emitted_image_url` and check `EmitImage` requests before creating `FunctionCallOutputContentItem::InputImage`, returning an error to the kernel if the URL is not a `data:` URL (`mod.rs`). - Tests/docs: added a runtime test `js_repl_emit_image_rejects_non_data_url` to assert rejection of non-data URLs and updated user-facing docs/instruction text to state `data URL` support instead of generic direct image URLs (`mod.rs`, `docs/js_repl.md`, `project_doc.rs`). ### Testing - Ran `just fmt` in `codex-rs`; it completed successfully. - Added a runtime test (`cargo test -p codex-core js_repl_emit_image_rejects_non_data_url`) but executing the test in this environment failed due to a missing system dependency required by `codex-linux-sandbox` (the vendored `bubblewrap` build requires `libcap.pc` via `pkg-config`), so the test could not be run here. - Attempted a focused `cargo test` invocation with and without default features; both compile/test attempts were blocked by the same missing system `libcap` dependency in this environment. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69a7837bce98832d91db92d5f76d6cbe)	2026-03-05 12:12:32 -08:00
Celia Chen	a63624a61a	feat: merge skill permission profiles into the turn sandbox for zsh-fork execs (#13496 ) ## Summary This changes the Unix shell escalation path for skill-matched executables to apply a skill's `PermissionProfile` as additive permissions on top of the existing turn/request sandbox policy. Previously, skill-matched executables compiled the skill permission profile into a standalone sandbox policy and executed against that replacement policy. Now they go through the same `additional_permissions` merge path used elsewhere in shell sandbox preparation. ## What Changed - Changed `skill_escalation_execution()` to return `EscalationPermissions::PermissionProfile(...)` for non-empty skill permission profiles. - Kept empty or missing skill permission profiles on the `TurnDefault` path. - Added tests covering the new additive skill-permission behavior. - Added inline comments in `prepare_escalated_exec()` clarifying the difference between additive permission merging and fully specified replacement sandbox policies. - Removed the now-unused skill permission compiler module after switching this path away from standalone compiled skill sandbox policies. ## Testing - Ran `just fmt` in `codex-rs` - Ran `cargo test -p codex-core` `cargo test -p codex-core` still hits an unrelated existing failure: `shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin` ## Follow-up This change intentionally does not merge skill-specific macOS seatbelt profile extensions through the `additional_permissions` path yet. Filesystem and network permissions now follow the additive merge path, but seatbelt extension permissions still need separate handling in a follow-up PR.	2026-03-05 20:05:35 +00:00
Curtis 'Fjord' Hawthorne	657841e7f5	Persist initialized js_repl bindings after failed cells (#13482 ) ## Summary - Change `js_repl` failed-cell persistence so later cells keep prior bindings plus only the current-cell bindings whose initialization definitely completed before the throw. - Preserve initialized lexical bindings across failed cells via module-namespace readability, including top-level destructuring that partially succeeds before a later throw. - Preserve hoisted `var` and `function` bindings only when execution clearly reached their declaration site, and preserve direct top-level pre-declaration `var` writes and updates through explicit write-site markers. - Preserve top-level `for...in` / `for...of` `var` bindings when the loop body executes at least once, using a first-iteration guard to avoid per-iteration bookkeeping overhead. - Keep prior module state intact across link-time failures and evaluation failures before the prelude runs, while still allowing failed cells that already recreated prior bindings to persist updates to those existing bindings. - Hide internal commit hooks from user `js_repl` code after the prelude aliases them, so snippets cannot spoof committed bindings by calling the raw `import.meta` hooks directly. - Add focused regression coverage for the supported failed-cell behaviors and the intentionally unsupported boundaries. - Update `js_repl` docs and generated instructions to describe the new, narrower failed-cell persistence model. ## Motivation We saw `js_repl` drop bindings that had already been initialized successfully when a later statement in the same cell threw, for example: const { context: liveContext, session } = await initializeGoogleSheetsLiveForTab(tab); // later statement throws That was surprising in practice because successful earlier work disappeared from the next cell. This change makes failed-cell persistence more useful without trying to model every possible partially executed JavaScript edge case. The resulting behavior is narrower and easier to reason about: - prior bindings are always preserved - lexical bindings persist when their initialization completed before the throw - hoisted `var` / `function` bindings persist only when execution clearly reached their declaration or a supported top-level `var` write site - failed cells that already recreated prior bindings can persist writes to those existing bindings even if they introduce no new bindings The detailed edge-case matrix stays in `docs/js_repl.md`. The model-facing `project_doc` guidance is intentionally shorter and focused on generation-relevant behavior. ## Supported Failed-Cell Behavior - Prior bindings remain available after a failed cell. - Initialized lexical bindings remain available after a failed cell. - Top-level destructuring like `const { a, b } = ...` preserves names whose initialization completed before a later throw. - Hoisted `function` bindings persist when execution reached the declaration statement before the throw. - Direct top-level pre-declaration `var` writes and updates persist, for example: - `x = 1` - `x += 1` - `x++` - short-circuiting logical assignments only persist when the write branch actually runs - Non-empty top-level `for...in` / `for...of` `var` loops persist their loop bindings. - Failed cells can persist updates to existing carried bindings after the prelude has run, even when the cell commits no new bindings. - Link failures and eval failures before the prelude do not poison `@prev`. ## Intentionally Unsupported Failed-Cell Cases - Hoisted function reads before the declaration, such as `foo(); ...; function foo() {}` - Aliasing or inference-based recovery from reads before declaration - Nested writes inside already-instrumented assignment RHS expressions - Destructuring-assignment recovery for hoisted `var` - Partial `var` destructuring recovery - Pre-declaration `undefined` reads for hoisted `var` - Empty top-level `for...in` / `for...of` loop vars - Nested or scope-sensitive pre-declaration `var` writes outside direct top-level expression statements	2026-03-05 11:01:46 -08:00
Owen Lin	926b2f19e8	feat(app-server): support mcp elicitations in v2 api (#13425 ) This adds a first-class server request for MCP server elicitations: `mcpServer/elicitation/request`. Until now, MCP elicitation requests only showed up as a raw `codex/event/elicitation_request` event from core. That made it hard for v2 clients to handle elicitations using the same request/response flow as other server-driven interactions (like shell and `apply_patch` tools). This also updates the underlying MCP elicitation request handling in core to pass through the full MCP request (including URL and form data) so we can expose it properly in app-server. ### Why not `item/mcpToolCall/elicitationRequest`? This is because MCP elicitations are related to MCP servers first, and only optionally to a specific MCP tool call. In the MCP protocol, elicitation is a server-to-client capability: the server sends `elicitation/create`, and the client replies with an elicitation result. RMCP models it that way as well. In practice an elicitation is often triggered by an MCP tool call, but not always. ### What changed - add `mcpServer/elicitation/request` to the v2 app-server API - translate core `codex/event/elicitation_request` events into the new v2 server request - map client responses back into `Op::ResolveElicitation` so the MCP server can continue - update app-server docs and generated protocol schema - add an end-to-end app-server test that covers the full round trip through a real RMCP elicitation flow - The new test exercises a realistic case where an MCP tool call triggers an elicitation, the app-server emits mcpServer/elicitation/request, the client accepts it, and the tool call resumes and completes successfully. ### app-server API flow - Client starts a thread with `thread/start`. - Client starts a turn with `turn/start`. - App-server sends `item/started` for the `mcpToolCall`. - While that tool call is in progress, app-server sends `mcpServer/elicitation/request`. - Client responds to that request with `{ action: "accept" \| "decline" \| "cancel" }`. - App-server sends `serverRequest/resolved`. - App-server sends `item/completed` for the mcpToolCall. - App-server sends `turn/completed`. - If the turn is interrupted while the elicitation is pending, app-server still sends `serverRequest/resolved` before the turn finishes.	2026-03-05 07:20:20 -08:00
jif-oai	0cc6835416	feat: ultra polish package manager (#13573 ) See the readme	2026-03-05 13:02:30 +00:00
jif-oai	f304b2ef62	feat: bind package manager (#13571 )	2026-03-05 11:57:13 +00:00
Michael Bolin	b4cb989563	refactor: prepare unified exec for zsh-fork backend (#13392 ) ## Why `shell_zsh_fork` already provides stronger guarantees around which executables receive elevated permissions. To reuse that machinery from unified exec without pushing Unix-specific escalation details through generic runtime code, the escalation bootstrap and session lifetime handling need a cleaner boundary. That boundary also needs to be safe for long-lived sessions: when an intercepted shell session is closed or pruned, any in-flight approval workers and any already-approved escalated child they spawned must be torn down with the session, and the inherited escalation socket must not leak into unrelated subprocesses. ## What Changed - Extracted a reusable `EscalationSession` and `EscalateServer::start_session(...)` in `shell-escalation` so callers can get the wrapper/socket env overlay and keep the escalation server alive without immediately running a one-shot command. - Documented that `EscalationSession::env()` and `ShellCommandExecutor::run(...)` exchange only that env overlay, which callers must merge into their own base shell environment. - Clarified the prepared-exec helper boundary in `core` by naming the new helper APIs around `ExecRequest`, while keeping the legacy `execute_env(...)` entrypoints as thin compatibility wrappers for existing callers that still use the older naming. - Added a small post-spawn hook on the prepared execution path so the parent copy of the inheritable escalation socket is closed immediately after both the existing one-shot shell-command spawn and the unified-exec spawn. - Made session teardown explicit with session-scoped cancellation: dropping an `EscalationSession` or canceling its parent request now stops intercept workers, and the server-spawned escalated child uses `kill_on_drop(true)` so teardown cannot orphan an already-approved child. - Added `UnifiedExecBackendConfig` plumbing through `ToolsConfig`, a `shell::zsh_fork_backend` facade, and an opaque unified-exec spawn-lifecycle hook so unified exec can prepare a wrapped `zsh -c/-lc` request without storing `EscalationSession` directly in generic process/runtime code. - Kept the existing `shell_command` zsh-fork behavior intact on top of the new bootstrap path. Tool selection is unchanged in this PR: when `shell_zsh_fork` is enabled, `ShellCommand` still wins over `exec_command`. ## Verification - `cargo test -p codex-shell-escalation` - includes coverage for `start_session_exposes_wrapper_env_overlay` - includes coverage for `exec_closes_parent_socket_after_shell_spawn` - includes coverage for `dropping_session_aborts_intercept_workers_and_kills_spawned_child` - `cargo test -p codex-core shell_zsh_fork_prefers_shell_command_over_unified_exec` - `cargo test -p codex-core --test all shell_zsh_fork_prompts_for_skill_script_execution` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13392). * #13432 * __->__ #13392	2026-03-05 08:55:12 +00:00
sayan-oai	03d55f0e6f	chore: add web_search_tool_type for image support (#13538 ) add `web_search_tool_type` on model_info that can be populated from backend. will be used to filter which models can use `web_search` with images and which cant. added small unit test.	2026-03-05 07:02:27 +00:00
Ahmed Ibrahim	8f828f8a43	Reduce realtime audio submission log noise (#13539 ) - lower `submission_dispatch` span logging to debug for realtime audio submissions only - keep other submission spans at info and add a targeted test for the level selection --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-04 22:44:14 -08:00
aaronl-openai	ff0341dc94	[js_repl] Support local ESM file imports (#13437 ) ## Summary - add `js_repl` support for dynamic imports of relative and absolute local ESM `.js` / `.mjs` files - keep bare package imports on the native Node path and resolved from REPL-global search roots (`CODEX_JS_REPL_NODE_MODULE_DIRS`, then `cwd`), even when they originate from imported local files - restrict static imports inside imported local files to other local relative/absolute `.js` / `.mjs` files, and surface a clear error for unsupported top-level static imports in the REPL cell - run imported local files inside the REPL VM context so they can access `codex.tmpDir`, `codex.tool`, captured `console`, and Node-like `import.meta` helpers - reload local files between execs so later `await import("./file.js")` calls pick up edits and fixed failures, while preserving package/builtin caching and persistent top-level REPL bindings - make `import.meta.resolve()` self-consistent by allowing the returned `file://...` URLs to round-trip through `await import(...)` - update both public and injected `js_repl` docs to clarify the narrowed contract, including global bare-import resolution behavior for local absolute files ## Testing - `cargo test -p codex-core js_repl_` - built codex binary and verified behavior --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-04 22:40:31 -08:00
pash-openai	394e538640	[core] Enable fast mode by default (#13450 ) Co-authored-by: Codex <noreply@openai.com>	2026-03-04 20:06:35 -08:00
sayan-oai	d44398905b	feat: track plugins mcps/apps and add plugin info to user_instructions (#13433 ) ### first half of changes, followed by #13510 Track plugin capabilities as derived summaries on `PluginLoadOutcome` for enabled plugins with at least one skill/app/mcp. Also add `Plugins` section to `user_instructions` injected on session start. These introduce the plugins concept and list enabled plugins, but do NOT currently include paths to enabled plugins or details on what apps/mcps the plugins contain (current plan is to inject this on @-mention). that can be adjusted in a follow up and based on evals. ### tests Added/updated tests, confirmed locally that new `Plugins` section + currently enabled plugins show up in `user_instructions`.	2026-03-04 19:46:13 -08:00
Won Park	229e6d0347	image-gen-event/client_processing (#13512 ) enabling client-side to process with image-generation capabilities (setting app-server)	2026-03-04 16:54:38 -08:00
Ahmed Ibrahim	7b088901c2	Log non-audio realtime events (#13516 ) Improve observability of realtime conversation event handling by logging non-audio events with payload details in the event loop, while skipping audio-out events to reduce noise.	2026-03-04 16:30:18 -08:00
xl-openai	1e877ccdd2	plugin: support local-based marketplace.json + install endpoint. (#13422 ) Support marketplace.json that points to a local file, with ``` "source": { "source": "local", "path": "./plugin-1" }, ``` Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.	2026-03-04 19:08:18 -05:00
Ahmed Ibrahim	294079b0b1	Prefix handoff messages with role (#13505 ) Format handoff context by prefixing each message with its role (for example "user:" and "assistant:") before forwarding to the agent.	2026-03-04 15:37:31 -08:00
alexsong-oai	ce139bb1af	add metrics for external config import (#13501 )	2026-03-04 13:59:50 -08:00
jif-oai	2322e49549	feat: external artifacts builder (#13485 ) This PR reverts the built-in artifact render while a decision is being reached. No impact expected on any features	2026-03-04 20:22:34 +00:00
Owen Lin	27724f6ead	feat(core, tracing): add a span representing a turn (#13424 ) This is PR 3 of the app-server tracing rollout. PRs https://github.com/openai/codex/pull/13285 and https://github.com/openai/codex/pull/13368 gave us inbound request spans in app-server and propagated trace context through Submission. This change finishes the next piece in core: when a request actually starts a turn, we now create a core-owned long-lived span that stays open for the real lifetime of the turn. What changed: - `Session::spawn_task` can now optionally create a long-lived turn span and run the spawned task inside it - `turn/start` uses that path, so normal turn execution stays under a single core-owned span after the async handoff - `review/start` uses the same pattern - added a unit test that verifies the spawned turn task inherits the submission dispatch trace ancestry Why The app-server request span is intentionally short-lived. Once work crosses into core, we still want one span that covers the actual execution window until completion or interruption. This keeps that ownership where it belongs: in the layer that owns the runtime lifecycle.	2026-03-04 11:09:17 -08:00
Alex Daley	8a59386273	add new scopes to login (#12383 ) Validated login + refresh flows. Removing scopes from the refresh request until we have upgrade flow in place. Confirmed that tokens refresh with existing scopes.	2026-03-04 16:41:54 +00:00
jif-oai	f72ab43fd1	feat: memories in workspace write (#13467 )	2026-03-04 13:00:26 +00:00
jif-oai	e07eaff0d3	feat: add metric for per-turn tool count and add tmp_mem flag (#13456 )	2026-03-04 11:25:58 +00:00
jif-oai	bda3c49dc4	feat: disable request input on sub agent (#13460 ) https://github.com/openai/codex/issues/13289	2026-03-04 11:25:49 +00:00
jif-oai	49634b7f9c	add metric for per-turn token usage (#13454 )	2026-03-04 10:17:25 +00:00
jif-oai	a4ad101125	feat: ordinal nick name (#13412 )	2026-03-04 09:41:29 +00:00
jif-oai	932ff28183	feat: better multi-agent prompt (#13404 )	2026-03-04 09:41:20 +00:00
Won Park	fa2306b303	image-gen-core (#13290 ) Core tool-calling for image-gen, handles requesting and receiving logic for images using response API	2026-03-03 23:11:28 -08:00
Val Kharitonov	4f6c4bb143	support 'flex' tier in app-server in addition to 'fast' (#13391 )	2026-03-03 22:46:05 -08:00
Michael Bolin	7134220f3c	core: box wrapper futures to reduce stack pressure (#13429 ) Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This uses the same general fix pattern as [#12421](https://github.com/openai/codex/pull/12421), but in the `codex-core` compact/resume/fork path. ## Why `compact_resume_after_second_compaction_preserves_history` started overflowing the stack on Windows CI after `#13388`. The important part is that this was not a compaction-recursion bug. The test exercises a path with several thin `async fn` wrappers around much larger thread-spawn, resume, and fork futures. When one `async fn` awaits another inline, the outer future stores the callee future as part of its own state machine. In a long wrapper chain, that means a caller can accidentally inline a lot more state than the source code suggests. That is exactly what was happening here: - `ThreadManager` convenience methods such as `start_thread`, `resume_thread_from_rollout`, and `fork_thread` were inlining the larger spawn/resume futures beneath them. - `core_test_support::test_codex` added another wrapper layer on top of those same paths. - `compact_resume_fork` adds a few more helpers, and this particular test drives the resume/fork path multiple times. On Windows, that was enough to push both the libtest thread and Tokio worker threads over the edge. The previous 8 MiB test-thread workaround proved the failure was stack-related, but it did not address the underlying future size. ## How This Was Debugged The useful debugging pattern here was to turn the CI-only failure into a local low-stack repro. 1. First, remove the explicit large-stack harness so the test runs on the normal `#[tokio::test]` path. 2. Build the test binary normally. 3. Re-run the already-built `tests/all` binary directly with progressively smaller `RUST_MIN_STACK` values. Running the built binary directly matters: it keeps the reduced stack size focused on the test process instead of also applying it to `cargo` and `rustc`. That made it possible to answer two questions quickly: - Does the failure still reproduce without the workaround? Yes. - Does boxing the wrapper futures actually buy back stack headroom? Also yes. After this change, the built test binary passes with `RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough evidence to justify removing the explicit 8 MiB override while keeping a deterministic low-stack repro for future debugging. If we hit a similar issue again, the first places to inspect are thin `async fn` wrappers that mostly forward into a much larger async implementation. ## `Box::pin()` Primer `async fn` compiles into a state machine. If a wrapper does this: ```rust async fn wrapper() { inner().await; } ``` then `wrapper()` stores the full `inner()` future inline as part of its own state. If the wrapper instead does this: ```rust async fn wrapper() { Box::pin(inner()).await; } ``` then the child future lives on the heap, and the outer future only stores a pinned pointer to it. That usually trades one allocation for a substantially smaller outer future, which is exactly the tradeoff we want when the problem is stack pressure rather than raw CPU time. Useful references: - [`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin) - [Async book: Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html) ## What Changed - Boxed the wrapper futures in `core/src/thread_manager.rs` around `start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the corresponding `ThreadManagerState` spawn helpers so callers no longer inline the full spawn/resume state machine through multiple layers. - Boxed the matching test-only wrapper futures in `core/tests/common/test_codex.rs` and `core/tests/suite/compact_resume_fork.rs`, which sit directly on top of the same path. - Restored `compact_resume_after_second_compaction_preserves_history` in `core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]` and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing. - Simplified a tiny helper in `compact_resume_fork` by making `fetch_conversation_path()` synchronous, which removes one more unnecessary future layer from the test path. ## Verification - `cargo test -p codex-core --test all suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history -- --exact --nocapture` - `cargo test -p codex-core --test all suite::compact_resume_fork -- --nocapture` - Re-ran the built `codex-core` `tests/all` binary directly with reduced stack sizes: - `RUST_MIN_STACK=917504` passes - `RUST_MIN_STACK=786432` still overflows - `cargo test -p codex-core` - Still fails locally in unrelated existing integration areas that expect the `codex` / `test_stdio_server` binaries or hit the existing `search_tool` wiremock mismatches.	2026-03-04 05:44:52 +00:00
Celia Chen	d622bff384	chore: Nest skill and protocol network permissions under `network.enabled` (#13427 ) ## Summary Changes the permission profile shape from a bare network boolean to a nested object. Before: ```yaml permissions: network: true ``` After: ```yaml permissions: network: enabled: true ``` This also updates the shared Rust and app-server protocol types so `PermissionProfile.network` is no longer `Option<bool>`, but `Option<NetworkPermissions>` with `enabled: Option<bool>`. ## What Changed - Updated `PermissionProfile` in `codex-rs/protocol/src/models.rs`: - `pub network: Option<bool>` -> `pub network: Option<NetworkPermissions>` - Added `NetworkPermissions` with: - `pub enabled: Option<bool>` - Changed emptiness semantics so `network` is only considered empty when `enabled` is `None` - Updated skill metadata parsing to accept `permissions.network.enabled` - Updated core permission consumers to read `network.enabled.unwrap_or(false)` where a concrete boolean is needed - Updated app-server v2 protocol types and regenerated schema/TypeScript outputs - Updated docs to mention `additionalPermissions.network.enabled`	2026-03-03 20:57:29 -08:00
gabec-openai	2e154a35bc	Add role-specific subagent nickname overrides (#13218 ) ## Summary - add `nickname_candidates` to agent role config - use role-specific nickname pools for spawned and resumed subagents - validate and schema-generate the new config surface ## Testing - `just fmt` - `just write-config-schema` - `just fix -p codex-core` - `cargo test -p codex-core` - `cargo test` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-04 04:43:52 +00:00
Michael Bolin	bfff0c729f	config: enforce enterprise feature requirements (#13388 ) ## Why Enterprises can already constrain approvals, sandboxing, and web search through `requirements.toml` and MDM, but feature flags were still only configurable as managed defaults. That meant an enterprise could suggest feature values, but it could not actually pin them. This change closes that gap and makes enterprise feature requirements behave like the other constrained settings. The effective feature set now stays consistent with enterprise requirements during config load, when config writes are validated, and when runtime code mutates feature flags later in the session. It also tightens the runtime API for managed features. `ManagedFeatures` now follows the same constraint-oriented shape as `Constrained<T>` instead of exposing panic-prone mutation helpers, and production code can no longer construct it through an unconstrained `From<Features>` path. The PR also hardens the `compact_resume_fork` integration coverage on Windows. After the feature-management changes, `compact_resume_after_second_compaction_preserves_history` was overflowing the libtest/Tokio thread stacks on Windows, so the test now uses an explicit larger-stack harness as a pragmatic mitigation. That may not be the ideal root-cause fix, and it merits a parallel investigation into whether part of the async future chain should be boxed to reduce stack pressure instead. ## What Changed Enterprises can now pin feature values in `requirements.toml` with the requirements-side `features` table: ```toml [features] personality = true unified_exec = false ``` Only canonical feature keys are allowed in the requirements `features` table; omitted keys remain unconstrained. - Added a requirements-side pinned feature map to `ConfigRequirementsToml`, threaded it through source-preserving requirements merge and normalization in `codex-config`, and made the TOML surface use `[features]` (while still accepting legacy `[feature_requirements]` for compatibility). - Exposed `featureRequirements` from `configRequirements/read`, regenerated the JSON/TypeScript schema artifacts, and updated the app-server README. - Wrapped the effective feature set in `ManagedFeatures`, backed by `ConstrainedWithSource<Features>`, and changed its API to mirror `Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`, and result-returning `enable` / `disable` / `set_enabled` helpers. - Removed the legacy-usage and bulk-map passthroughs from `ManagedFeatures`; callers that need those behaviors now mutate a plain `Features` value and reapply it through `set(...)`, so the constrained wrapper remains the enforcement boundary. - Removed the production loophole for constructing unconstrained `ManagedFeatures`. Non-test code now creates it through the configured feature-loading path, and `impl From<Features> for ManagedFeatures` is restricted to `#[cfg(test)]`. - Rejected legacy feature aliases in enterprise feature requirements, and return a load error when a pinned combination cannot survive dependency normalization. - Validated config writes against enterprise feature requirements before persisting changes, including explicit conflicting writes and profile-specific feature states that normalize into invalid combinations. - Updated runtime and TUI feature-toggle paths to use the constrained setter API and to persist or apply the effective post-constraint value rather than the requested value. - Updated the `core_test_support` Bazel target to include the bundled core model-catalog fixtures in its runtime data, so helper code that resolves `core/models.json` through runfiles works in remote Bazel test environments. - Renamed the core config test coverage to emphasize that effective feature values are normalized at runtime, while conflicting persisted config writes are rejected. - Ran `compact_resume_after_second_compaction_preserves_history` inside an explicit 8 MiB test thread and Tokio runtime worker stack, following the existing larger-stack integration-test pattern, to keep the Windows `compact_resume_fork` test slice from aborting while a parallel investigation continues into whether some of the underlying async futures should be boxed. ## Verification - `cargo test -p codex-config` - `cargo test -p codex-core feature_requirements_ -- --nocapture` - `cargo test -p codex-core load_requirements_toml_produces_expected_constraints -- --nocapture` - `cargo test -p codex-core compact_resume_after_second_compaction_preserves_history -- --nocapture` - `cargo test -p codex-core compact_resume_fork -- --nocapture` - Re-ran the built `codex-core` `tests/all` binary with `RUST_MIN_STACK=262144` for `compact_resume_after_second_compaction_preserves_history` to confirm the explicit-stack harness fixes the deterministic low-stack repro. - `cargo test -p codex-core` - This still fails locally in unrelated integration areas that expect the `codex` / `test_stdio_server` binaries or hit existing `search_tool` wiremock mismatches. ## Docs `developers.openai.com/codex` should document the requirements-side `[features]` table for enterprise and MDM-managed configuration, including that it only accepts canonical feature keys and that conflicting config writes are rejected.	2026-03-04 04:40:22 +00:00
Celia Chen	e6773f856c	Feat: Preserve network access on read-only sandbox policies (#13409 ) ## Summary `PermissionProfile.network` could not be preserved when additional or compiled permissions resolved to `SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access field. This change makes read-only + network enabled representable directly and threads that through the protocol, app-server v2 mirror, and permission- merging logic. ## What changed - Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core protocol and app-server v2 protocol. - Kept backward compatibility by defaulting the new field to false, so legacy read-only payloads still deserialize unchanged. - Updated `has_full_network_access()` and sandbox summaries to respect read-only network access. - Preserved PermissionProfile.network when: - compiling skill permission profiles into sandbox policies - normalizing additional permissions - merging additional permissions into existing sandbox policies - Updated the approval overlay to show network in the rendered permission rule when requested. - Regenerated app-server schema fixtures for the new v2 wire shape.	2026-03-04 02:41:57 +00:00
Owen Lin	52521a5e40	feat(app-server): propagate app-server trace context into core (#13368 ) ### Summary Propagate trace context originating at app-server RPC method handlers -> codex core submission loop (so this includes spans such as `run_turn`!). This implements PR 2 of the app-server tracing rollout. This also removes the old lower-level env-based reparenting in core so explicit request/submission ancestry wins instead of being overridden by ambient `TRACEPARENT` state. ### What changed - Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission - Taught `Codex::submit()` / `submit_with_id()` to automatically capture the current span context when constructing or forwarding a submission - Wrapped the core submission loop in a submission_dispatch span parented from Submission.trace - Warn on invalid submission trace carriers and ignore them cleanly - Removed the old env-based downstream reparenting path in core task execution - Stopped OTEL provider init from implicitly attaching env trace context process-wide - Updated mcp-server Submission call sites for the new field Added focused unit tests for: - capturing trace context into Submission - preferring `Submission.trace` when building the core dispatch span ### Why PR 1 gave us consistent inbound request spans in app-server, but that only covered the transport boundary. For long-running work like turns and reviews, the important missing piece was preserving ancestry after the request handler returns and core continues work on a different async path. This change makes that handoff explicit and keeps the parentage rules simple: - app-server request span sets the current context - `Submission.trace` snapshots that context - core restores it once, at the submission boundary - deeper core spans inherit naturally That also lets us stop relying on env-based reparenting for this path, which was too ambient and could override explicit ancestry.	2026-03-04 01:03:45 +00:00
sayan-oai	082682a628	feat: load plugin apps (#13401 ) load plugin-apps from `.app.json`. make apps runtime-mentionable iff `codex_apps` MCP actually exposes tools for that `connector_id`. if the app isn't available, it's filtered out of runtime connector set, so no tools are added and no app-mentions resolve. right now we don't have a clean cli-side error for an app not being installed. can look at this after. ### Tests Added tests, tested locally that using a plugin that bundles an app picks up the app.	2026-03-03 16:29:15 -08:00
Curtis 'Fjord' Hawthorne	c4cb594e73	Make js_repl image output controllable (#13331 ) ## Summary Instead of always adding inner function call outputs to the model context, let js code decide which ones to return. - Stop auto-hoisting nested tool outputs from `codex.tool(...)` into the outer `js_repl` function output. - Keep `codex.tool(...)` return values unchanged as structured JS objects. - Add `codex.emitImage(...)` as the explicit path for attaching an image to the outer `js_repl` function output. - Support emitting from a direct image URL, a single `input_image` item, an explicit `{ bytes, mimeType }` object, or a raw tool response object containing exactly one image. - Preserve existing `view_image` original-resolution behavior when JS emits the raw `view_image` tool result. - Suppress the special `ViewImageToolCall` event for `js_repl`-sourced `view_image` calls so nested inspection stays side-effect free until JS explicitly emits. - Update the `js_repl` docs and generated project instructions with both recommended patterns: - `await codex.emitImage(codex.tool("view_image", { path }))` - `await codex.emitImage({ bytes: await page.screenshot({ type: "jpeg", quality: 85 }), mimeType: "image/jpeg" })` #### [git stack](https://github.com/magus/git-stack-cli) - ✅ `1` https://github.com/openai/codex/pull/13050 - 👉 `2` https://github.com/openai/codex/pull/13331 - ⏳ `3` https://github.com/openai/codex/pull/13049	2026-03-03 16:25:59 -08:00
alexsong-oai	1afbbc11c3	Ensure the env values of imported shell_environment_policy.set is string (#13402 )	2026-03-03 16:12:23 -08:00

1 2 3 4 5 ...

1824 Commits