Commit Graph

1824 Commits

Author SHA1 Message Date
sayan-oai
014a59fb0b check app auth in plugin/install (#13685)
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.

checks are best effort based on `codex_apps` loading, much like
`app/list`.

#### Tests
Added integration tests, tested locally.
2026-03-06 06:45:00 +00:00
Dylan Hurd
4c9b1c38f6 fix(tui) remove config check for trusted setting (#11874)
## Summary
Simplify the trusted directory flow. This logic was originally designed
several months ago, to determine if codex should start in read-only or
workspace-write mode. However, that's no longer the purpose of directory
trust - and therefore we should get rid of this logic.

## Testing
- [x] Unit tests pass
2026-03-05 22:29:34 -08:00
iceweasel-oai
14de492985 copy current exe to CODEX_HOME/.sandbox-bin for apply_patch (#13669)
We do this for codex-command-runner.exe as well for the same reason.
Windows sandbox users cannot execute binaries in the WindowsApp/
installed directory for the Codex App. This causes apply-patch to fail
because it tries to execute codex.exe as the sandbox user.
2026-03-05 22:15:10 -08:00
viyatb-oai
6a79ed5920 refactor: remove proxy admin endpoint (#13687)
## Summary
- delete the network proxy admin server and its runtime listener/task
plumbing
- remove the admin endpoint config, runtime, requirement, protocol,
schema, and debug-surface fields
- update proxy docs to reflect the remaining HTTP and SOCKS listeners
only
2026-03-05 22:03:16 -08:00
xl-openai
520ed724d2 support plugin/list. (#13540)
Introduce a plugin/list which reads from local marketplace.json.
Also update the signature for plugin/install.
2026-03-05 21:58:50 -05:00
Ahmed Ibrahim
629cb15bc6 Replay thread rollback from rollout history (#13615)
- Replay thread rollback from the persisted rollout history instead of
truncating in-memory state.\n- Add rollback coverage, including
rollback-behind-compaction snapshot coverage.
2026-03-05 16:40:09 -08:00
Ahmed Ibrahim
6cf0ed4e79 Refine realtime startup context formatting (#13560)
## Summary
- group recent work by git repo when available, otherwise by directory
- render recent work as bounded user asks with per-thread cwd context
- exclude hidden files and directories from workspace trees
2026-03-05 16:31:20 -08:00
Owen Lin
c3736cff0a feat(otel): safe tracing (#13626)
### Motivation
Today config.toml has three different OTEL knobs under `[otel]`:
- `exporter` controls where OTEL logs go
- `trace_exporter` controls where OTEL traces go
- `metrics_exporter` controls where metrics go

Those often (pretty much always?) serve different purposes.

For example, for OpenAI internal usage, the **log exporter** is already
being used for IT/security telemetry, and that use case is intentionally
content-rich: tool calls, arguments, outputs, MCP payloads, and in some
cases user content are all useful there. `log_user_prompt` is a good
example of that distinction. When it’s enabled, we include raw prompt
text in OTEL logs, which is acceptable for the security use case.

The **trace exporter** is a different story. The goal there is to give
OpenAI engineers visibility into latency and request behavior when they
run Codex locally, without sending sensitive prompt or tool data as
trace event data. In other words, traces should help answer “what was
slow?” or “where did time go?”, not “what did the user say?” or “what
did the tool return?”

The complication is that Rust’s `tracing` crate does not make a hard
distinction between “logs” and “trace events.” It gives us one
instrumentation API for logs and trace events (via `tracing::event!`),
and subscribers decide what gets treated as logs, trace events, or both.

Before this change, our OTEL trace layer was effectively attached to the
general tracing stream, which meant turning on `trace_exporter` could
pick up content-rich events that were originally written with logging
(and the `log_exporter`) in mind. That made it too easy for sensitive
data to end up in exported traces by accident.

### Concrete example
In `otel_manager.rs`, this `tracing::event!` call would be exported in
both logs AND traces (as a trace event).
```
    pub fn user_prompt(&self, items: &[UserInput]) {
        let prompt = items
            .iter()
            .flat_map(|item| match item {
                UserInput::Text { text, .. } => Some(text.as_str()),
                _ => None,
            })
            .collect::<String>();

        let prompt_to_log = if self.metadata.log_user_prompts {
            prompt.as_str()
        } else {
            "[REDACTED]"
        };

        tracing::event!(
            tracing::Level::INFO,
            event.name = "codex.user_prompt",
            event.timestamp = %timestamp(),
            // ...
            prompt = %prompt_to_log,
        );
    }
```

Instead of `tracing::event!`, we should now be using `log_event!` and
`trace_event!` instead to more clearly indicate which sink (logs vs.
traces) that event should be exported to.

### What changed
This PR makes the log and trace export distinct instead of treating them
as two sinks for the same data.

On the provider side, OTEL logs and traces now have separate
routing/filtering policy. The log exporter keeps receiving the existing
`codex_otel` events, while trace export is limited to spans and trace
events.

On the event side, `OtelManager` now emits two flavors of telemetry
where needed:
- a log-only event with the current rich payloads
- a tracing-safe event with summaries only

It also has a convenience `log_and_trace_event!` macro for emitting to
both logs and traces when it's safe to do so, as well as log- and
trace-specific fields.

That means prompts, tool args, tool output, account email, MCP metadata,
and similar content stay in the log lane, while traces get the pieces
that are actually useful for performance work: durations, counts, sizes,
status, token counts, tool origin, and normalized error classes.

This preserves current IT/security logging behavior while making it safe
to turn on trace export for employees.

### Full list of things removed from trace export
- raw user prompt text from `codex.user_prompt`
- raw tool arguments and output from `codex.tool_result`
- MCP server metadata from `codex.tool_result` (mcp_server,
mcp_server_origin)
- account identity fields like `user.email` and `user.account_id` from
trace-safe OTEL events
- `host.name` from trace resources
- generic `codex.tool_decision` events from traces
- generic `codex.sse_event` events from traces
- the full ToolCall debug payload from the `handle_tool_call` span

What traces now keep instead is mostly:
- spans
- trace-safe OTEL events
- counts, lengths, durations, status, token counts, and tool origin
summaries
2026-03-05 16:30:53 -08:00
Celia Chen
aaefee04cd core/protocol: add structured macOS additional permissions and merge them into sandbox execution (#13499)
## Summary
- Introduce strongly-typed macOS additional permissions across
protocol/core/app-server boundaries.
- Merge additional permissions into effective sandbox execution,
including macOS seatbelt profile extensions.
- Expand docs, schema/tool definitions, UI rendering, and tests for
`network`, `file_system`, and `macos` additional permissions.
2026-03-05 16:21:45 -08:00
sayan-oai
4e77ea0ec7 add @plugin mentions (#13510)
## Note-- added plugin mentions via @, but that conflicts with file
mentions

depends and builds upon #13433.

- introduces explicit `@plugin` mentions. this injects the plugin's mcp
servers, app names, and skill name format into turn context as a dev
message.
- we do not yet have UI for these mentions, so we currently parse raw
text (as opposed to skills and apps which have UI chips, autocomplete,
etc.) this depends on a `plugins/list` app-server endpoint we can feed
the UI with, which is upcoming
- also annotate mcp and app tool descriptions with the plugin(s) they
come from. this gives the model a first class way of understanding what
tools come from which plugins, which will help implicit invocation.

### Tests
Added and updated tests, unit and integration. Also confirmed locally a
raw `@plugin` injects the dev message, and the model knows about its
apps, mcps, and skills.
2026-03-06 00:03:39 +00:00
Curtis 'Fjord' Hawthorne
1ed542bf31 Clarify js_repl image emission and encoding guidance (#13639)
## Summary

This updates the `js_repl` prompt and docs to make the image guidance
less confusing.

## What changed

- Clarified that `codex.emitImage(...)` adds one image per call and can
be called multiple times to emit multiple images.
- Reworded the image-encoding guidance to be general `js_repl` advice
instead of `ImageDetailOriginal`-specific behavior.
- Updated the guidance to recommend JPEG at about quality 85 when lossy
compression is acceptable, and PNG when transparency or lossless detail
matters.
- Mirrored the same wording in the public `js_repl` docs.
2026-03-05 16:02:37 -08:00
viyatb-oai
9203f17b0e Improve macOS Seatbelt network and unix socket handling (#12702)
This improves macOS Seatbelt handling for sandboxed tool processes.

## Changes
- Allow dual-stack local binding in proxy-managed sessions, while still
keeping traffic limited to loopback and configured proxy endpoints.
- Replace the old generic unix-socket path rule with explicit AF_UNIX
permissions for socket creation, bind, and outbound connect.
- Keep explicitly approved wrapper sockets connect-only.

Local helper servers are less likely to fail when binding on macOS.
Tools using local unix-socket IPC should work more reliably under the
sandbox.
Full-network sessions, proxy fail-closed behavior, and proxy lifecycle
are unchanged.
2026-03-05 15:39:54 -08:00
Owen Lin
aa3fe8abf8 feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602)
This PR adds a durable trace linkage for each turn by storing the active
trace ID on the rollout TurnContext record stored in session rollout
files.

Before this change, we propagated trace context at runtime but didn’t
persist a stable per-turn trace key in rollout history. That made
after-the-fact debugging harder (for example, mapping a historical turn
to the corresponding trace in datadog). This sets us up for much easier
debugging in the future.

### What changed
- Added an optional `trace_id` to TurnContextItem (rollout schema).
- Added a small OTEL helper to read the current span trace ID.
- Captured `trace_id` when creating `TurnContext` and included it in
`to_turn_context_item()`.
- Updated tests and fixtures that construct TurnContextItem so
older/no-trace cases still work.

### Why this approach
TurnContext is already the canonical durable per-turn metadata in
rollout. This keeps ownership clean: trace linkage lives with other
persisted turn metadata.
2026-03-05 13:26:48 -08:00
Curtis 'Fjord' Hawthorne
cfbbbb1dda Harden js_repl emitImage to accept only data: URLs (#13507)
### Motivation

- Prevent untrusted js_repl code from supplying arbitrary external URLs
that the host would forward into model input and cause external fetches
/ data exfiltration. This change narrows the emitImage contract to safe,
self-contained data URLs.

### Description

- Kernel: added `normalizeEmitImageUrl` and enforce that string-valued
`codex.emitImage(...)` inputs and `input_image`/content-item paths only
accept non-empty `data:` URLs; byte-based paths still produce data URLs
as before (`kernel.js`).
- Host: added `validate_emitted_image_url` and check `EmitImage`
requests before creating `FunctionCallOutputContentItem::InputImage`,
returning an error to the kernel if the URL is not a `data:` URL
(`mod.rs`).
- Tests/docs: added a runtime test
`js_repl_emit_image_rejects_non_data_url` to assert rejection of
non-data URLs and updated user-facing docs/instruction text to state
`data URL` support instead of generic direct image URLs (`mod.rs`,
`docs/js_repl.md`, `project_doc.rs`).

### Testing

- Ran `just fmt` in `codex-rs`; it completed successfully.
- Added a runtime test (`cargo test -p codex-core
js_repl_emit_image_rejects_non_data_url`) but executing the test in this
environment failed due to a missing system dependency required by
`codex-linux-sandbox` (the vendored `bubblewrap` build requires
`libcap.pc` via `pkg-config`), so the test could not be run here.
- Attempted a focused `cargo test` invocation with and without default
features; both compile/test attempts were blocked by the same missing
system `libcap` dependency in this environment.

------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69a7837bce98832d91db92d5f76d6cbe)
2026-03-05 12:12:32 -08:00
Celia Chen
a63624a61a feat: merge skill permission profiles into the turn sandbox for zsh-fork execs (#13496)
## Summary

This changes the Unix shell escalation path for skill-matched
executables to apply a skill's `PermissionProfile` as additive
permissions on top of the existing turn/request sandbox policy.

Previously, skill-matched executables compiled the skill permission
profile into a standalone sandbox policy and executed against that
replacement policy. Now they go through the same
`additional_permissions` merge path used elsewhere in shell sandbox
preparation.

## What Changed

- Changed `skill_escalation_execution()` to return
`EscalationPermissions::PermissionProfile(...)` for non-empty skill
permission profiles.
- Kept empty or missing skill permission profiles on the `TurnDefault`
path.
- Added tests covering the new additive skill-permission behavior.
- Added inline comments in `prepare_escalated_exec()` clarifying the
difference between additive permission merging and fully specified
replacement sandbox policies.
- Removed the now-unused skill permission compiler module after
switching this path away from standalone compiled skill sandbox
policies.

## Testing

- Ran `just fmt` in `codex-rs`
- Ran `cargo test -p codex-core`

`cargo test -p codex-core` still hits an unrelated existing failure:
`shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin`

## Follow-up

This change intentionally does not merge skill-specific macOS seatbelt
profile extensions through the `additional_permissions` path yet.
Filesystem and network permissions now follow the additive merge path,
but seatbelt extension permissions still need separate handling in a
follow-up PR.
2026-03-05 20:05:35 +00:00
Curtis 'Fjord' Hawthorne
657841e7f5 Persist initialized js_repl bindings after failed cells (#13482)
## Summary

- Change `js_repl` failed-cell persistence so later cells keep prior
bindings plus only the current-cell bindings whose initialization
definitely completed before the throw.
- Preserve initialized lexical bindings across failed cells via
module-namespace readability, including top-level destructuring that
partially succeeds before a later throw.
- Preserve hoisted `var` and `function` bindings only when execution
clearly reached their declaration site, and preserve direct top-level
pre-declaration `var` writes and updates through explicit write-site
markers.
- Preserve top-level `for...in` / `for...of` `var` bindings when the
loop body executes at least once, using a first-iteration guard to avoid
per-iteration bookkeeping overhead.
- Keep prior module state intact across link-time failures and
evaluation failures before the prelude runs, while still allowing failed
cells that already recreated prior bindings to persist updates to those
existing bindings.
- Hide internal commit hooks from user `js_repl` code after the prelude
aliases them, so snippets cannot spoof committed bindings by calling the
raw `import.meta` hooks directly.
- Add focused regression coverage for the supported failed-cell
behaviors and the intentionally unsupported boundaries.
- Update `js_repl` docs and generated instructions to describe the new,
narrower failed-cell persistence model.

## Motivation

We saw `js_repl` drop bindings that had already been initialized
successfully when a later statement in the same cell threw, for example:

    const { context: liveContext, session } =
      await initializeGoogleSheetsLiveForTab(tab);
    // later statement throws

That was surprising in practice because successful earlier work
disappeared from the next cell.

This change makes failed-cell persistence more useful without trying to
model every possible partially executed JavaScript edge case. The
resulting behavior is narrower and easier to reason about:

- prior bindings are always preserved
- lexical bindings persist when their initialization completed before
the throw
- hoisted `var` / `function` bindings persist only when execution
clearly reached their declaration or a supported top-level `var` write
site
- failed cells that already recreated prior bindings can persist writes
to those existing bindings even if they introduce no new bindings

The detailed edge-case matrix stays in `docs/js_repl.md`. The
model-facing `project_doc` guidance is intentionally shorter and focused
on generation-relevant behavior.

## Supported Failed-Cell Behavior

- Prior bindings remain available after a failed cell.
- Initialized lexical bindings remain available after a failed cell.
- Top-level destructuring like `const { a, b } = ...` preserves names
whose initialization completed before a later throw.
- Hoisted `function` bindings persist when execution reached the
declaration statement before the throw.
- Direct top-level pre-declaration `var` writes and updates persist, for
example:
  - `x = 1`
  - `x += 1`
  - `x++`
- short-circuiting logical assignments only persist when the write
branch actually runs
- Non-empty top-level `for...in` / `for...of` `var` loops persist their
loop bindings.
- Failed cells can persist updates to existing carried bindings after
the prelude has run, even when the cell commits no new bindings.
- Link failures and eval failures before the prelude do not poison
`@prev`.

## Intentionally Unsupported Failed-Cell Cases

- Hoisted function reads before the declaration, such as `foo(); ...;
function foo() {}`
- Aliasing or inference-based recovery from reads before declaration
- Nested writes inside already-instrumented assignment RHS expressions
- Destructuring-assignment recovery for hoisted `var`
- Partial `var` destructuring recovery
- Pre-declaration `undefined` reads for hoisted `var`
- Empty top-level `for...in` / `for...of` loop vars
- Nested or scope-sensitive pre-declaration `var` writes outside direct
top-level expression statements
2026-03-05 11:01:46 -08:00
Owen Lin
926b2f19e8 feat(app-server): support mcp elicitations in v2 api (#13425)
This adds a first-class server request for MCP server elicitations:
`mcpServer/elicitation/request`.

Until now, MCP elicitation requests only showed up as a raw
`codex/event/elicitation_request` event from core. That made it hard for
v2 clients to handle elicitations using the same request/response flow
as other server-driven interactions (like shell and `apply_patch`
tools).

This also updates the underlying MCP elicitation request handling in
core to pass through the full MCP request (including URL and form data)
so we can expose it properly in app-server.

### Why not `item/mcpToolCall/elicitationRequest`?
This is because MCP elicitations are related to MCP servers first, and
only optionally to a specific MCP tool call.

In the MCP protocol, elicitation is a server-to-client capability: the
server sends `elicitation/create`, and the client replies with an
elicitation result. RMCP models it that way as well.

In practice an elicitation is often triggered by an MCP tool call, but
not always.

### What changed
- add `mcpServer/elicitation/request` to the v2 app-server API
- translate core `codex/event/elicitation_request` events into the new
v2 server request
- map client responses back into `Op::ResolveElicitation` so the MCP
server can continue
- update app-server docs and generated protocol schema
- add an end-to-end app-server test that covers the full round trip
through a real RMCP elicitation flow
- The new test exercises a realistic case where an MCP tool call
triggers an elicitation, the app-server emits
mcpServer/elicitation/request, the client accepts it, and the tool call
resumes and completes successfully.

### app-server API flow
- Client starts a thread with `thread/start`.
- Client starts a turn with `turn/start`.
- App-server sends `item/started` for the `mcpToolCall`.
- While that tool call is in progress, app-server sends
`mcpServer/elicitation/request`.
- Client responds to that request with `{ action: "accept" | "decline" |
"cancel" }`.
- App-server sends `serverRequest/resolved`.
- App-server sends `item/completed` for the mcpToolCall.
- App-server sends `turn/completed`.
- If the turn is interrupted while the elicitation is pending,
app-server still sends `serverRequest/resolved` before the turn
finishes.
2026-03-05 07:20:20 -08:00
jif-oai
0cc6835416 feat: ultra polish package manager (#13573)
See the readme
2026-03-05 13:02:30 +00:00
jif-oai
f304b2ef62 feat: bind package manager (#13571) 2026-03-05 11:57:13 +00:00
Michael Bolin
b4cb989563 refactor: prepare unified exec for zsh-fork backend (#13392)
## Why

`shell_zsh_fork` already provides stronger guarantees around which
executables receive elevated permissions. To reuse that machinery from
unified exec without pushing Unix-specific escalation details through
generic runtime code, the escalation bootstrap and session lifetime
handling need a cleaner boundary.

That boundary also needs to be safe for long-lived sessions: when an
intercepted shell session is closed or pruned, any in-flight approval
workers and any already-approved escalated child they spawned must be
torn down with the session, and the inherited escalation socket must not
leak into unrelated subprocesses.

## What Changed

- Extracted a reusable `EscalationSession` and
`EscalateServer::start_session(...)` in `shell-escalation` so callers
can get the wrapper/socket env overlay and keep the escalation server
alive without immediately running a one-shot command.
- Documented that `EscalationSession::env()` and
`ShellCommandExecutor::run(...)` exchange only that env overlay, which
callers must merge into their own base shell environment.
- Clarified the prepared-exec helper boundary in `core` by naming the
new helper APIs around `ExecRequest`, while keeping the legacy
`execute_env(...)` entrypoints as thin compatibility wrappers for
existing callers that still use the older naming.
- Added a small post-spawn hook on the prepared execution path so the
parent copy of the inheritable escalation socket is closed immediately
after both the existing one-shot shell-command spawn and the
unified-exec spawn.
- Made session teardown explicit with session-scoped cancellation:
dropping an `EscalationSession` or canceling its parent request now
stops intercept workers, and the server-spawned escalated child uses
`kill_on_drop(true)` so teardown cannot orphan an already-approved
child.
- Added `UnifiedExecBackendConfig` plumbing through `ToolsConfig`, a
`shell::zsh_fork_backend` facade, and an opaque unified-exec
spawn-lifecycle hook so unified exec can prepare a wrapped `zsh -c/-lc`
request without storing `EscalationSession` directly in generic
process/runtime code.
- Kept the existing `shell_command` zsh-fork behavior intact on top of
the new bootstrap path. Tool selection is unchanged in this PR: when
`shell_zsh_fork` is enabled, `ShellCommand` still wins over
`exec_command`.

## Verification

- `cargo test -p codex-shell-escalation`
  - includes coverage for `start_session_exposes_wrapper_env_overlay`
  - includes coverage for `exec_closes_parent_socket_after_shell_spawn`
- includes coverage for
`dropping_session_aborts_intercept_workers_and_kills_spawned_child`
- `cargo test -p codex-core
shell_zsh_fork_prefers_shell_command_over_unified_exec`
- `cargo test -p codex-core --test all
shell_zsh_fork_prompts_for_skill_script_execution`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13392).
* #13432
* __->__ #13392
2026-03-05 08:55:12 +00:00
sayan-oai
03d55f0e6f chore: add web_search_tool_type for image support (#13538)
add `web_search_tool_type` on model_info that can be populated from
backend. will be used to filter which models can use `web_search` with
images and which cant.

added small unit test.
2026-03-05 07:02:27 +00:00
Ahmed Ibrahim
8f828f8a43 Reduce realtime audio submission log noise (#13539)
- lower `submission_dispatch` span logging to debug for realtime audio
submissions only
- keep other submission spans at info and add a targeted test for the
level selection

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-04 22:44:14 -08:00
aaronl-openai
ff0341dc94 [js_repl] Support local ESM file imports (#13437)
## Summary
- add `js_repl` support for dynamic imports of relative and absolute
local ESM `.js` / `.mjs` files
- keep bare package imports on the native Node path and resolved from
REPL-global search roots (`CODEX_JS_REPL_NODE_MODULE_DIRS`, then `cwd`),
even when they originate from imported local files
- restrict static imports inside imported local files to other local
relative/absolute `.js` / `.mjs` files, and surface a clear error for
unsupported top-level static imports in the REPL cell
- run imported local files inside the REPL VM context so they can access
`codex.tmpDir`, `codex.tool`, captured `console`, and Node-like
`import.meta` helpers
- reload local files between execs so later `await import("./file.js")`
calls pick up edits and fixed failures, while preserving package/builtin
caching and persistent top-level REPL bindings
- make `import.meta.resolve()` self-consistent by allowing the returned
`file://...` URLs to round-trip through `await import(...)`
- update both public and injected `js_repl` docs to clarify the narrowed
contract, including global bare-import resolution behavior for local
absolute files

## Testing
- `cargo test -p codex-core js_repl_`
- built codex binary and verified behavior

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-04 22:40:31 -08:00
pash-openai
394e538640 [core] Enable fast mode by default (#13450)
Co-authored-by: Codex <noreply@openai.com>
2026-03-04 20:06:35 -08:00
sayan-oai
d44398905b feat: track plugins mcps/apps and add plugin info to user_instructions (#13433)
### first half of changes, followed by #13510

Track plugin capabilities as derived summaries on `PluginLoadOutcome`
for enabled plugins with at least one skill/app/mcp.

Also add `Plugins` section to `user_instructions` injected on session
start. These introduce the plugins concept and list enabled plugins, but
do NOT currently include paths to enabled plugins or details on what
apps/mcps the plugins contain (current plan is to inject this on
@-mention). that can be adjusted in a follow up and based on evals.

### tests
Added/updated tests, confirmed locally that new `Plugins` section +
currently enabled plugins show up in `user_instructions`.
2026-03-04 19:46:13 -08:00
Won Park
229e6d0347 image-gen-event/client_processing (#13512)
enabling client-side to process with image-generation capabilities
(setting app-server)
2026-03-04 16:54:38 -08:00
Ahmed Ibrahim
7b088901c2 Log non-audio realtime events (#13516)
Improve observability of realtime conversation event handling by logging
non-audio events with payload details in the event loop, while skipping
audio-out events to reduce noise.
2026-03-04 16:30:18 -08:00
xl-openai
1e877ccdd2 plugin: support local-based marketplace.json + install endpoint. (#13422)
Support marketplace.json that points to a local file, with
```
    "source":
    {
        "source": "local",
        "path": "./plugin-1"
    },
 ```
 
 Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
2026-03-04 19:08:18 -05:00
Ahmed Ibrahim
294079b0b1 Prefix handoff messages with role (#13505)
Format handoff context by prefixing each message with its role (for
example "user:" and "assistant:") before forwarding to the agent.
2026-03-04 15:37:31 -08:00
alexsong-oai
ce139bb1af add metrics for external config import (#13501) 2026-03-04 13:59:50 -08:00
jif-oai
2322e49549 feat: external artifacts builder (#13485)
This PR reverts the built-in artifact render while a decision is being
reached. No impact expected on any features
2026-03-04 20:22:34 +00:00
Owen Lin
27724f6ead feat(core, tracing): add a span representing a turn (#13424)
This is PR 3 of the app-server tracing rollout.

PRs https://github.com/openai/codex/pull/13285 and
https://github.com/openai/codex/pull/13368 gave us inbound request spans
in app-server and propagated trace context through Submission. This
change finishes the next piece in core: when a request actually starts a
turn, we now create a core-owned long-lived span that stays open for the
real lifetime of the turn.

What changed:
- `Session::spawn_task` can now optionally create a long-lived turn span
and run the spawned task inside it
- `turn/start` uses that path, so normal turn execution stays under a
single core-owned span after the async handoff
- `review/start` uses the same pattern
- added a unit test that verifies the spawned turn task inherits the
submission dispatch trace ancestry

**Why**
The app-server request span is intentionally short-lived. Once work
crosses into core, we still want one span that covers the actual
execution window until completion or interruption. This keeps that
ownership where it belongs: in the layer that owns the runtime
lifecycle.
2026-03-04 11:09:17 -08:00
Alex Daley
8a59386273 add new scopes to login (#12383)
Validated login + refresh flows. Removing scopes from the refresh
request until we have upgrade flow in place. Confirmed that tokens
refresh with existing scopes.
2026-03-04 16:41:54 +00:00
jif-oai
f72ab43fd1 feat: memories in workspace write (#13467) 2026-03-04 13:00:26 +00:00
jif-oai
e07eaff0d3 feat: add metric for per-turn tool count and add tmp_mem flag (#13456) 2026-03-04 11:25:58 +00:00
jif-oai
bda3c49dc4 feat: disable request input on sub agent (#13460)
https://github.com/openai/codex/issues/13289
2026-03-04 11:25:49 +00:00
jif-oai
49634b7f9c add metric for per-turn token usage (#13454) 2026-03-04 10:17:25 +00:00
jif-oai
a4ad101125 feat: ordinal nick name (#13412) 2026-03-04 09:41:29 +00:00
jif-oai
932ff28183 feat: better multi-agent prompt (#13404) 2026-03-04 09:41:20 +00:00
Won Park
fa2306b303 image-gen-core (#13290)
Core tool-calling for image-gen, handles requesting and receiving logic
for images using response API
2026-03-03 23:11:28 -08:00
Val Kharitonov
4f6c4bb143 support 'flex' tier in app-server in addition to 'fast' (#13391) 2026-03-03 22:46:05 -08:00
Michael Bolin
7134220f3c core: box wrapper futures to reduce stack pressure (#13429)
Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This
uses the same general fix pattern as
[#12421](https://github.com/openai/codex/pull/12421), but in the
`codex-core` compact/resume/fork path.

## Why

`compact_resume_after_second_compaction_preserves_history` started
overflowing the stack on Windows CI after `#13388`.

The important part is that this was not a compaction-recursion bug. The
test exercises a path with several thin `async fn` wrappers around much
larger thread-spawn, resume, and fork futures. When one `async fn`
awaits another inline, the outer future stores the callee future as part
of its own state machine. In a long wrapper chain, that means a caller
can accidentally inline a lot more state than the source code suggests.

That is exactly what was happening here:

- `ThreadManager` convenience methods such as `start_thread`,
`resume_thread_from_rollout`, and `fork_thread` were inlining the larger
spawn/resume futures beneath them.
- `core_test_support::test_codex` added another wrapper layer on top of
those same paths.
- `compact_resume_fork` adds a few more helpers, and this particular
test drives the resume/fork path multiple times.

On Windows, that was enough to push both the libtest thread and Tokio
worker threads over the edge. The previous 8 MiB test-thread workaround
proved the failure was stack-related, but it did not address the
underlying future size.

## How This Was Debugged

The useful debugging pattern here was to turn the CI-only failure into a
local low-stack repro.

1. First, remove the explicit large-stack harness so the test runs on
the normal `#[tokio::test]` path.
2. Build the test binary normally.
3. Re-run the already-built `tests/all` binary directly with
progressively smaller `RUST_MIN_STACK` values.

Running the built binary directly matters: it keeps the reduced stack
size focused on the test process instead of also applying it to `cargo`
and `rustc`.

That made it possible to answer two questions quickly:

- Does the failure still reproduce without the workaround? Yes.
- Does boxing the wrapper futures actually buy back stack headroom? Also
yes.

After this change, the built test binary passes with
`RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough
evidence to justify removing the explicit 8 MiB override while keeping a
deterministic low-stack repro for future debugging.

If we hit a similar issue again, the first places to inspect are thin
`async fn` wrappers that mostly forward into a much larger async
implementation.

## `Box::pin()` Primer

`async fn` compiles into a state machine. If a wrapper does this:

```rust
async fn wrapper() {
    inner().await;
}
```

then `wrapper()` stores the full `inner()` future inline as part of its
own state.

If the wrapper instead does this:

```rust
async fn wrapper() {
    Box::pin(inner()).await;
}
```

then the child future lives on the heap, and the outer future only
stores a pinned pointer to it. That usually trades one allocation for a
substantially smaller outer future, which is exactly the tradeoff we
want when the problem is stack pressure rather than raw CPU time.

Useful references:

-
[`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin)
- [Async book:
Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html)

## What Changed

- Boxed the wrapper futures in `core/src/thread_manager.rs` around
`start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the
corresponding `ThreadManagerState` spawn helpers so callers no longer
inline the full spawn/resume state machine through multiple layers.
- Boxed the matching test-only wrapper futures in
`core/tests/common/test_codex.rs` and
`core/tests/suite/compact_resume_fork.rs`, which sit directly on top of
the same path.
- Restored `compact_resume_after_second_compaction_preserves_history` in
`core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]`
and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing.
- Simplified a tiny helper in `compact_resume_fork` by making
`fetch_conversation_path()` synchronous, which removes one more
unnecessary future layer from the test path.

## Verification

- `cargo test -p codex-core --test all
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
-- --exact --nocapture`
- `cargo test -p codex-core --test all suite::compact_resume_fork --
--nocapture`
- Re-ran the built `codex-core` `tests/all` binary directly with reduced
stack sizes:
  - `RUST_MIN_STACK=917504` passes
  - `RUST_MIN_STACK=786432` still overflows
- `cargo test -p codex-core`
- Still fails locally in unrelated existing integration areas that
expect the `codex` / `test_stdio_server` binaries or hit the existing
`search_tool` wiremock mismatches.
2026-03-04 05:44:52 +00:00
Celia Chen
d622bff384 chore: Nest skill and protocol network permissions under network.enabled (#13427)
## Summary

Changes the permission profile shape from a bare network boolean to a
nested object.

Before:

```yaml
permissions:
  network: true
```

After:

```yaml
permissions:
  network:
    enabled: true
```

This also updates the shared Rust and app-server protocol types so
`PermissionProfile.network` is no longer `Option<bool>`, but
`Option<NetworkPermissions>` with `enabled: Option<bool>`.

## What Changed

- Updated `PermissionProfile` in `codex-rs/protocol/src/models.rs`:
- `pub network: Option<bool>` -> `pub network:
Option<NetworkPermissions>`
- Added `NetworkPermissions` with:
  - `pub enabled: Option<bool>`
- Changed emptiness semantics so `network` is only considered empty when
`enabled` is `None`
- Updated skill metadata parsing to accept `permissions.network.enabled`
- Updated core permission consumers to read
`network.enabled.unwrap_or(false)` where a concrete boolean is needed
- Updated app-server v2 protocol types and regenerated schema/TypeScript
outputs
- Updated docs to mention `additionalPermissions.network.enabled`
2026-03-03 20:57:29 -08:00
gabec-openai
2e154a35bc Add role-specific subagent nickname overrides (#13218)
## Summary
- add `nickname_candidates` to agent role config
- use role-specific nickname pools for spawned and resumed subagents
- validate and schema-generate the new config surface

## Testing
- `just fmt`
- `just write-config-schema`
- `just fix -p codex-core`
- `cargo test -p codex-core`
- `cargo test`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-04 04:43:52 +00:00
Michael Bolin
bfff0c729f config: enforce enterprise feature requirements (#13388)
## Why

Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.

This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.

It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.

The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.

## What Changed

Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:

```toml
[features]
personality = true
unified_exec = false
```

Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.

- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.

## Docs

`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
2026-03-04 04:40:22 +00:00
Celia Chen
e6773f856c Feat: Preserve network access on read-only sandbox policies (#13409)
## Summary

`PermissionProfile.network` could not be preserved when additional or
compiled permissions resolved to
`SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
field. This change makes read-only + network
enabled representable directly and threads that through the protocol,
app-server v2 mirror, and permission-
  merging logic.

## What changed

- Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
protocol and app-server v2 protocol.
- Kept backward compatibility by defaulting the new field to false, so
legacy read-only payloads still
    deserialize unchanged.
- Updated `has_full_network_access()` and sandbox summaries to respect
read-only network access.
  - Preserved PermissionProfile.network when:
      - compiling skill permission profiles into sandbox policies
      - normalizing additional permissions
      - merging additional permissions into existing sandbox policies
- Updated the approval overlay to show network in the rendered
permission rule when requested.
  - Regenerated app-server schema fixtures for the new v2 wire shape.
2026-03-04 02:41:57 +00:00
Owen Lin
52521a5e40 feat(app-server): propagate app-server trace context into core (#13368)
### Summary
Propagate trace context originating at app-server RPC method handlers ->
codex core submission loop (so this includes spans such as `run_turn`!).
This implements PR 2 of the app-server tracing rollout.

This also removes the old lower-level env-based reparenting in core so
explicit request/submission ancestry wins instead of being overridden by
ambient `TRACEPARENT` state.

### What changed
- Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
- Taught `Codex::submit()` / `submit_with_id()` to automatically capture
the current span context when constructing or forwarding a submission
- Wrapped the core submission loop in a submission_dispatch span
parented from Submission.trace
- Warn on invalid submission trace carriers and ignore them cleanly
- Removed the old env-based downstream reparenting path in core task
execution
- Stopped OTEL provider init from implicitly attaching env trace context
process-wide
- Updated mcp-server Submission call sites for the new field

Added focused unit tests for:
- capturing trace context into Submission
- preferring `Submission.trace` when building the core dispatch span

### Why
PR 1 gave us consistent inbound request spans in app-server, but that
only covered the transport boundary. For long-running work like turns
and reviews, the important missing piece was preserving ancestry after
the request handler returns and core continues work on a different async
path.

This change makes that handoff explicit and keeps the parentage rules
simple:
- app-server request span sets the current context
- `Submission.trace` snapshots that context
- core restores it once, at the submission boundary
- deeper core spans inherit naturally

That also lets us stop relying on env-based reparenting for this path,
which was too ambient and could override explicit ancestry.
2026-03-04 01:03:45 +00:00
sayan-oai
082682a628 feat: load plugin apps (#13401)
load plugin-apps from `.app.json`.

make apps runtime-mentionable iff `codex_apps` MCP actually exposes
tools for that `connector_id`.

if the app isn't available, it's filtered out of runtime connector set,
so no tools are added and no app-mentions resolve.

right now we don't have a clean cli-side error for an app not being
installed. can look at this after.

### Tests
Added tests, tested locally that using a plugin that bundles an app
picks up the app.
2026-03-03 16:29:15 -08:00
Curtis 'Fjord' Hawthorne
c4cb594e73 Make js_repl image output controllable (#13331)
## Summary

Instead of always adding inner function call outputs to the model
context, let js code decide which ones to return.

- Stop auto-hoisting nested tool outputs from `codex.tool(...)` into the
outer `js_repl` function output.
- Keep `codex.tool(...)` return values unchanged as structured JS
objects.
- Add `codex.emitImage(...)` as the explicit path for attaching an image
to the outer `js_repl` function output.
- Support emitting from a direct image URL, a single `input_image` item,
an explicit `{ bytes, mimeType }` object, or a raw tool response object
containing exactly one image.
- Preserve existing `view_image` original-resolution behavior when JS
emits the raw `view_image` tool result.
- Suppress the special `ViewImageToolCall` event for `js_repl`-sourced
`view_image` calls so nested inspection stays side-effect free until JS
explicitly emits.
- Update the `js_repl` docs and generated project instructions with both
recommended patterns:
  - `await codex.emitImage(codex.tool("view_image", { path }))`
- `await codex.emitImage({ bytes: await page.screenshot({ type: "jpeg",
quality: 85 }), mimeType: "image/jpeg" })`

#### [git stack](https://github.com/magus/git-stack-cli)
-  `1` https://github.com/openai/codex/pull/13050
- 👉 `2` https://github.com/openai/codex/pull/13331
-  `3` https://github.com/openai/codex/pull/13049
2026-03-03 16:25:59 -08:00
alexsong-oai
1afbbc11c3 Ensure the env values of imported shell_environment_policy.set is string (#13402) 2026-03-03 16:12:23 -08:00