Commit Graph

4233 Commits

Author SHA1 Message Date
Xin Lin
8ea5996f33 support plugin/list. 2026-03-04 23:56:02 -05:00
xl-openai
1e877ccdd2 plugin: support local-based marketplace.json + install endpoint. (#13422)
Support marketplace.json that points to a local file, with
```
    "source":
    {
        "source": "local",
        "path": "./plugin-1"
    },
 ```
 
 Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
2026-03-04 19:08:18 -05:00
Ahmed Ibrahim
294079b0b1 Prefix handoff messages with role (#13505)
Format handoff context by prefixing each message with its role (for
example "user:" and "assistant:") before forwarding to the agent.
2026-03-04 15:37:31 -08:00
Michael Bolin
4907096d13 [release] temporarily use thin LTO for releases (#13506) 2026-03-04 14:10:54 -08:00
Eric Traut
f80e5d979d Notify TUI about plan mode prompts and user input requests (#13495)
Addresses #13478

Summary
- Add two new scopes for `tui.notifications` config: `plan-mode-prompt`
and `user-input-requested`.
- Add Plan Mode prompt and user-input-requested notifications to the TUI
so these events surface consistently outside of plan mode
- Add helpers and tests to ensure the new notification types publish the
right titles, summaries, and type tags for filtering
- Add prioritization mechanism to fix an existing bug where one
notification event could arbitrarily overwrite others

Testing
- Manually tested plan mode to ensure that notification appeared
2026-03-04 15:08:57 -07:00
alexsong-oai
ce139bb1af add metrics for external config import (#13501) 2026-03-04 13:59:50 -08:00
Owen Lin
8dfd654196 feat(app-server-test-client): OTEL setup for tracing (#13493)
### Overview
This PR:
- Updates `app-server-test-client` to load OTEL settings from
`$CODEX_HOME/config.toml` and initializes its own OTEL provider.
- Add real client root spans to app-server test client traces.

This updates `codex-app-server-test-client` so its Datadog traces
reflect the full client-driven flow instead of a set of server spans
stitched together under a synthetic parent.

Before this change, the test client generated a fake `traceparent` once
and reused it for every JSON-RPC request. That kept the requests in one
trace, but there was no real client span at the top, so Datadog ended up
showing the sequence in a slightly misleading way, where all RPCs were
anchored under `initialize`.

Now the test client:
- loads OTEL settings from the normal Codex config path, including
`$CODEX_HOME/config.toml` and existing --config overrides
- initializes tracing the same way other Codex binaries do when trace
export is enabled
- creates a real client root span for each scripted command
- creates per-request client spans for JSON-RPC methods like
`initialize`, `thread/start`, and `turn/start`
- injects W3C trace context from the current client span into
request.trace instead of reusing a fabricated carrier

This gives us a cleaner trace shape in Datadog:
- one trace URL for the whole scripted flow
- a visible client root span
- proper client/server parent-child relationships for each app-server
request
2026-03-04 13:30:09 -08:00
jif-oai
2322e49549 feat: external artifacts builder (#13485)
This PR reverts the built-in artifact render while a decision is being
reached. No impact expected on any features
2026-03-04 20:22:34 +00:00
Felipe Coury
98923e53cc fix(tui): decode ANSI alpha-channel encoding in syntax themes (#13382)
## Problem

The `ansi`, `base16`, and `base16-256` syntax themes are designed to
emit ANSI palette colors so that highlighted code respects the user's
terminal color scheme. Syntect encodes this intent in the alpha channel
of its `Color` struct — a convention shared with `bat` — but
`convert_style` was ignoring it entirely, treating every foreground
color as raw RGB. This caused ANSI-family themes to produce hard-coded
RGB values (e.g. `Rgb(0x02, 0, 0)` instead of `Green`), defeating their
purpose and rendering them as near-invisible dark colors on most
terminals.

Reported in #12890.

## Mental model

Syntect themes use a compact encoding in their `Color` struct:

| `alpha` | Meaning of `r` | Mapped to |
|---------|----------------|-----------|
| `0x00` | ANSI palette index (0–255) | `RtColor::Black`…`Gray` for 0–7,
`Indexed(n)` for 8–255 |
| `0x01` | Unused (sentinel) | `None` — inherit terminal default fg/bg |
| `0xFF` | True RGB red channel | `RtColor::Rgb(r, g, b)` |
| other | Unexpected | `RtColor::Rgb(r, g, b)` (silent fallback) |

This encoding is a bat convention that three bundled themes rely on. The
new `convert_syntect_color` function decodes it; `ansi_palette_color`
maps indices 0–7 to ratatui's named ANSI variants.

| macOS - Dark | macOS - Light | Windows - ansi | Windows - base16 |
|---|---|---|---|
| <img width="1064" height="1205" alt="macos-dark"
src="https://github.com/user-attachments/assets/f03d92fb-b44b-4939-b2b9-503fde133811"
/> | <img width="1073" height="1227" alt="macos-light"
src="https://github.com/user-attachments/assets/2ecb2089-73b5-4676-bed8-e4e6794250b4"
/> |
![windows-ansi](https://github.com/user-attachments/assets/d41029e6-ffd3-454e-ab72-6751607e5d5c)
|
![windows-base16](https://github.com/user-attachments/assets/b48aafcc-0196-4977-8ee1-8f8eaddd1698)
|

## Non-goals

- Background color decoding — we intentionally skip backgrounds to
preserve the terminal's own background. The decoder supports it, but
`convert_style` does not apply it.
- Italic/underline changes — those remain suppressed as before.
- Custom `.tmTheme` support for ANSI encoding — only the bundled themes
use this convention.

## Tradeoffs

- The alpha-channel encoding is an undocumented bat/syntect convention,
not a formal spec. We match bat's behavior exactly, trading formality
for ecosystem compatibility.
- Indices 0–7 are mapped to ratatui's named variants (`Black`, `Red`, …,
`Gray`) rather than `Indexed(0)`…`Indexed(7)`. This lets terminals apply
bold/bright semantics to named colors, which is the expected behavior
for ANSI themes, but means the two representations are not perfectly
round-trippable.

## Architecture

All changes are in `codex-rs/tui/src/render/highlight.rs`, within the
style-conversion layer between syntect and ratatui:

```
syntect::highlighting::Color
  └─ convert_syntect_color(color)  [NEW — alpha-dispatch]
       ├─ a=0x00 → ansi_palette_color()  [NEW — index→named/indexed]
       ├─ a=0x01 → None (terminal default)
       ├─ a=0xFF → Rgb(r,g,b) (standard opaque path)
       └─ other  → Rgb(r,g,b) (silent fallback)
```

`convert_style` delegates foreground mapping to `convert_syntect_color`
instead of inlining the `Rgb(r,g,b)` conversion. The core highlighter is
refactored into `highlight_to_line_spans_with_theme` (accepts an
explicit theme reference) so tests can highlight against specific themes
without mutating process-global state.

### ANSI-family theme contract

The ANSI-family themes (`ansi`, `base16`, `base16-256`) rely on upstream
alpha-channel encoding from two_face/syntect. We intentionally do
**not** validate this contract at runtime — if the upstream format
changes, the `ansi_themes_use_only_ansi_palette_colors` test catches it
at build time, long before it reaches users. A runtime warning would be
unactionable noise.

### Warning copy cleanup

User-facing warning messages were rewritten for clarity:
- Removed internal jargon ("alpha-encoded ANSI color markers", "RGB
fallback semantics", "persisted override config")
- Dropped "syntax" prefix from "syntax theme" — users just think "theme"
- Downgraded developer-only diagnostics (duplicate override, resolve
fallback) from `warn` to `debug`

## Observability

- The `ansi_themes_use_only_ansi_palette_colors` test enforces the
ANSI-family contract at build time.
- The snapshot test provides a regression tripwire for palette color
output.
- User-facing warnings are limited to actionable issues: unknown theme
names and invalid custom `.tmTheme` files.

## Tests

- **Unit tests for each alpha branch:** `alpha=0x00` with low index
(named color), `alpha=0x00` with high index (`Indexed`), `alpha=0x01`
(terminal default), unexpected alpha (falls back to RGB), ANSI white →
Gray mapping.
- **Integration test:**
`ansi_family_themes_use_terminal_palette_colors_not_rgb` — highlights a
Rust snippet with each ANSI-family theme and asserts zero `Rgb`
foreground colors appear.
- **Snapshot test:** `ansi_family_foreground_palette_snapshot` — records
the exact set of unique foreground colors each ANSI-family theme
produces, guarding against regressions.
- **Warning validation tests:** verify user-facing warnings for missing
custom themes, invalid `.tmTheme` files, and bundled theme resolution.

## Test plan

- [ ] `cargo test -p codex-tui` passes all new and existing tests
- [ ] Select `ansi`, `base16`, or `base16-256` theme and verify code
blocks render with terminal palette colors (not near-black RGB)
- [ ] Select a standard RGB theme (e.g. `dracula`) and verify no
regression in color output
2026-03-04 12:03:34 -08:00
pash-openai
b200a5f45b [tui] Update Fast slash command description (#13458)
## Summary
- update the /fast slash command description to mention fastest
inference
- mention the 3X plan usage tradeoff in the help copy

## Testing
- cargo test -p codex-tui slash_command (currently blocked by an
unrelated latest-main codex-tui compile error in chatwidget.rs:
refresh_queued_user_messages missing)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-04 19:30:51 +00:00
Val Kharitonov
26f4b8e2f1 remove serviceTier from app-server examples (#13489)
Documentation-only
2026-03-04 19:12:40 +00:00
Owen Lin
27724f6ead feat(core, tracing): add a span representing a turn (#13424)
This is PR 3 of the app-server tracing rollout.

PRs https://github.com/openai/codex/pull/13285 and
https://github.com/openai/codex/pull/13368 gave us inbound request spans
in app-server and propagated trace context through Submission. This
change finishes the next piece in core: when a request actually starts a
turn, we now create a core-owned long-lived span that stays open for the
real lifetime of the turn.

What changed:
- `Session::spawn_task` can now optionally create a long-lived turn span
and run the spawned task inside it
- `turn/start` uses that path, so normal turn execution stays under a
single core-owned span after the async handoff
- `review/start` uses the same pattern
- added a unit test that verifies the spawned turn task inherits the
submission dispatch trace ancestry

**Why**
The app-server request span is intentionally short-lived. Once work
crosses into core, we still want one span that covers the actual
execution window until completion or interruption. This keeps that
ownership where it belongs: in the layer that owns the runtime
lifecycle.
2026-03-04 11:09:17 -08:00
iceweasel-oai
54a1c81d73 allow apps to specify cwd for sandbox setup. (#13484)
The electron app doesn't start up the app-server in a particular
workspace directory.
So sandbox setup happens in the app-installed directory instead of the
project workspace.

This allows the app do specify the workspace cwd so that the sandbox
setup actually sets up the ACLs instead of exiting fast and then having
the first shell command be slow.
2026-03-04 10:54:30 -08:00
Alex Daley
8a59386273 add new scopes to login (#12383)
Validated login + refresh flows. Removing scopes from the refresh
request until we have upgrade flow in place. Confirmed that tokens
refresh with existing scopes.
2026-03-04 16:41:54 +00:00
jif-oai
f72ab43fd1 feat: memories in workspace write (#13467) artifact-runtime-v2.3.4 2026-03-04 13:00:26 +00:00
jif-oai
df619474f5 nit: citation prompt (#13468) 2026-03-04 13:00:11 +00:00
jif-oai
e07eaff0d3 feat: add metric for per-turn tool count and add tmp_mem flag (#13456) 2026-03-04 11:25:58 +00:00
jif-oai
bda3c49dc4 feat: disable request input on sub agent (#13460)
https://github.com/openai/codex/issues/13289
2026-03-04 11:25:49 +00:00
jif-oai
e6b2e3a9f7 fix: bad merge (#13461) 2026-03-04 11:00:48 +00:00
jif-oai
e4a202ea52 fix: pending messages in /agent (#13240) 2026-03-04 10:17:29 +00:00
jif-oai
49634b7f9c add metric for per-turn token usage (#13454) 2026-03-04 10:17:25 +00:00
jif-oai
a4ad101125 feat: ordinal nick name (#13412) 2026-03-04 09:41:29 +00:00
jif-oai
932ff28183 feat: better multi-agent prompt (#13404) 2026-03-04 09:41:20 +00:00
Won Park
fa2306b303 image-gen-core (#13290)
Core tool-calling for image-gen, handles requesting and receiving logic
for images using response API
2026-03-03 23:11:28 -08:00
Val Kharitonov
4f6c4bb143 support 'flex' tier in app-server in addition to 'fast' (#13391) 2026-03-03 22:46:05 -08:00
Michael Bolin
7134220f3c core: box wrapper futures to reduce stack pressure (#13429)
Follow-up to [#13388](https://github.com/openai/codex/pull/13388). This
uses the same general fix pattern as
[#12421](https://github.com/openai/codex/pull/12421), but in the
`codex-core` compact/resume/fork path.

## Why

`compact_resume_after_second_compaction_preserves_history` started
overflowing the stack on Windows CI after `#13388`.

The important part is that this was not a compaction-recursion bug. The
test exercises a path with several thin `async fn` wrappers around much
larger thread-spawn, resume, and fork futures. When one `async fn`
awaits another inline, the outer future stores the callee future as part
of its own state machine. In a long wrapper chain, that means a caller
can accidentally inline a lot more state than the source code suggests.

That is exactly what was happening here:

- `ThreadManager` convenience methods such as `start_thread`,
`resume_thread_from_rollout`, and `fork_thread` were inlining the larger
spawn/resume futures beneath them.
- `core_test_support::test_codex` added another wrapper layer on top of
those same paths.
- `compact_resume_fork` adds a few more helpers, and this particular
test drives the resume/fork path multiple times.

On Windows, that was enough to push both the libtest thread and Tokio
worker threads over the edge. The previous 8 MiB test-thread workaround
proved the failure was stack-related, but it did not address the
underlying future size.

## How This Was Debugged

The useful debugging pattern here was to turn the CI-only failure into a
local low-stack repro.

1. First, remove the explicit large-stack harness so the test runs on
the normal `#[tokio::test]` path.
2. Build the test binary normally.
3. Re-run the already-built `tests/all` binary directly with
progressively smaller `RUST_MIN_STACK` values.

Running the built binary directly matters: it keeps the reduced stack
size focused on the test process instead of also applying it to `cargo`
and `rustc`.

That made it possible to answer two questions quickly:

- Does the failure still reproduce without the workaround? Yes.
- Does boxing the wrapper futures actually buy back stack headroom? Also
yes.

After this change, the built test binary passes with
`RUST_MIN_STACK=917504` and still overflows at `786432`, which is enough
evidence to justify removing the explicit 8 MiB override while keeping a
deterministic low-stack repro for future debugging.

If we hit a similar issue again, the first places to inspect are thin
`async fn` wrappers that mostly forward into a much larger async
implementation.

## `Box::pin()` Primer

`async fn` compiles into a state machine. If a wrapper does this:

```rust
async fn wrapper() {
    inner().await;
}
```

then `wrapper()` stores the full `inner()` future inline as part of its
own state.

If the wrapper instead does this:

```rust
async fn wrapper() {
    Box::pin(inner()).await;
}
```

then the child future lives on the heap, and the outer future only
stores a pinned pointer to it. That usually trades one allocation for a
substantially smaller outer future, which is exactly the tradeoff we
want when the problem is stack pressure rather than raw CPU time.

Useful references:

-
[`Box::pin`](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.pin)
- [Async book:
Pinning](https://rust-lang.github.io/async-book/04_pinning/01_chapter.html)

## What Changed

- Boxed the wrapper futures in `core/src/thread_manager.rs` around
`start_thread`, `resume_thread_from_rollout`, `fork_thread`, and the
corresponding `ThreadManagerState` spawn helpers so callers no longer
inline the full spawn/resume state machine through multiple layers.
- Boxed the matching test-only wrapper futures in
`core/tests/common/test_codex.rs` and
`core/tests/suite/compact_resume_fork.rs`, which sit directly on top of
the same path.
- Restored `compact_resume_after_second_compaction_preserves_history` in
`core/tests/suite/compact_resume_fork.rs` to a normal `#[tokio::test]`
and removed the explicit `TEST_STACK_SIZE_BYTES` thread/runtime sizing.
- Simplified a tiny helper in `compact_resume_fork` by making
`fetch_conversation_path()` synchronous, which removes one more
unnecessary future layer from the test path.

## Verification

- `cargo test -p codex-core --test all
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
-- --exact --nocapture`
- `cargo test -p codex-core --test all suite::compact_resume_fork --
--nocapture`
- Re-ran the built `codex-core` `tests/all` binary directly with reduced
stack sizes:
  - `RUST_MIN_STACK=917504` passes
  - `RUST_MIN_STACK=786432` still overflows
- `cargo test -p codex-core`
- Still fails locally in unrelated existing integration areas that
expect the `codex` / `test_stdio_server` binaries or hit the existing
`search_tool` wiremock mismatches.
2026-03-04 05:44:52 +00:00
Celia Chen
d622bff384 chore: Nest skill and protocol network permissions under network.enabled (#13427)
## Summary

Changes the permission profile shape from a bare network boolean to a
nested object.

Before:

```yaml
permissions:
  network: true
```

After:

```yaml
permissions:
  network:
    enabled: true
```

This also updates the shared Rust and app-server protocol types so
`PermissionProfile.network` is no longer `Option<bool>`, but
`Option<NetworkPermissions>` with `enabled: Option<bool>`.

## What Changed

- Updated `PermissionProfile` in `codex-rs/protocol/src/models.rs`:
- `pub network: Option<bool>` -> `pub network:
Option<NetworkPermissions>`
- Added `NetworkPermissions` with:
  - `pub enabled: Option<bool>`
- Changed emptiness semantics so `network` is only considered empty when
`enabled` is `None`
- Updated skill metadata parsing to accept `permissions.network.enabled`
- Updated core permission consumers to read
`network.enabled.unwrap_or(false)` where a concrete boolean is needed
- Updated app-server v2 protocol types and regenerated schema/TypeScript
outputs
- Updated docs to mention `additionalPermissions.network.enabled`
2026-03-03 20:57:29 -08:00
gabec-openai
2e154a35bc Add role-specific subagent nickname overrides (#13218)
## Summary
- add `nickname_candidates` to agent role config
- use role-specific nickname pools for spawned and resumed subagents
- validate and schema-generate the new config surface

## Testing
- `just fmt`
- `just write-config-schema`
- `just fix -p codex-core`
- `cargo test -p codex-core`
- `cargo test`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-04 04:43:52 +00:00
Michael Bolin
bfff0c729f config: enforce enterprise feature requirements (#13388)
## Why

Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.

This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.

It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.

The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.

## What Changed

Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:

```toml
[features]
personality = true
unified_exec = false
```

Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.

- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.

## Docs

`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
2026-03-04 04:40:22 +00:00
Celia Chen
e6773f856c Feat: Preserve network access on read-only sandbox policies (#13409)
## Summary

`PermissionProfile.network` could not be preserved when additional or
compiled permissions resolved to
`SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
field. This change makes read-only + network
enabled representable directly and threads that through the protocol,
app-server v2 mirror, and permission-
  merging logic.

## What changed

- Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
protocol and app-server v2 protocol.
- Kept backward compatibility by defaulting the new field to false, so
legacy read-only payloads still
    deserialize unchanged.
- Updated `has_full_network_access()` and sandbox summaries to respect
read-only network access.
  - Preserved PermissionProfile.network when:
      - compiling skill permission profiles into sandbox policies
      - normalizing additional permissions
      - merging additional permissions into existing sandbox policies
- Updated the approval overlay to show network in the rendered
permission rule when requested.
  - Regenerated app-server schema fixtures for the new v2 wire shape.
2026-03-04 02:41:57 +00:00
zbarsky-openai
2d8c1575b8 [bazel] Bump rules_rs and llvm (#13366)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-03-04 01:59:32 +00:00
iceweasel-oai
639a5f6c48 copy command-runner to CODEX_HOME so sandbox users can always execute it (#13413)
• Keep Windows sandbox runner launches working from packaged installs by
running the helper from a user-owned runtime location.

On some Windows installs, the packaged helper location is difficult to
use reliably for sandboxed runner launches even though the binaries are
present. This change works around that by copying codex-
command-runner.exe into CODEX_HOME/.sandbox-bin/, reusing that copy
across launches, and falling back to the existing packaged-path lookup
if anything goes wrong.

The runtime copy lives in a dedicated directory with tighter ACLs than
.sandbox: sandbox users can read and execute the runner there, but they
cannot modify it. This keeps the workaround focused on the
command runner, leaves the setup helper on its trusted packaged path,
and adds logging so it is clear which runner path was selected at
launch.
2026-03-04 01:31:37 +00:00
Owen Lin
52521a5e40 feat(app-server): propagate app-server trace context into core (#13368)
### Summary
Propagate trace context originating at app-server RPC method handlers ->
codex core submission loop (so this includes spans such as `run_turn`!).
This implements PR 2 of the app-server tracing rollout.

This also removes the old lower-level env-based reparenting in core so
explicit request/submission ancestry wins instead of being overridden by
ambient `TRACEPARENT` state.

### What changed
- Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
- Taught `Codex::submit()` / `submit_with_id()` to automatically capture
the current span context when constructing or forwarding a submission
- Wrapped the core submission loop in a submission_dispatch span
parented from Submission.trace
- Warn on invalid submission trace carriers and ignore them cleanly
- Removed the old env-based downstream reparenting path in core task
execution
- Stopped OTEL provider init from implicitly attaching env trace context
process-wide
- Updated mcp-server Submission call sites for the new field

Added focused unit tests for:
- capturing trace context into Submission
- preferring `Submission.trace` when building the core dispatch span

### Why
PR 1 gave us consistent inbound request spans in app-server, but that
only covered the transport boundary. For long-running work like turns
and reviews, the important missing piece was preserving ancestry after
the request handler returns and core continues work on a different async
path.

This change makes that handoff explicit and keeps the parentage rules
simple:
- app-server request span sets the current context
- `Submission.trace` snapshots that context
- core restores it once, at the submission boundary
- deeper core spans inherit naturally

That also lets us stop relying on env-based reparenting for this path,
which was too ambient and could override explicit ancestry.
2026-03-04 01:03:45 +00:00
Owen Lin
0fbd84081b feat(app-server): add a skills/changed v2 notification (#13414)
This adds a first-class app-server v2 `skills/changed` notification for
the existing skills live-reload signal.

Before this change, clients only had the legacy raw
`codex/event/skills_update_available` event. With this PR, v2 clients
can listen for a typed JSON-RPC notification instead of depending on the
legacy `codex/event/*` stream, which we want to remove soon.
2026-03-03 17:01:00 -08:00
rhan-oai
e951ef4374 [feedback] diagnostics (#13292)
- added header logic to display diagnostics on cli
- added logic for collecting env vars

<img width="606" height="327" alt="Screenshot 2026-03-03 at 3 49 31 PM"
src="https://github.com/user-attachments/assets/05e78c56-8cb3-47fa-abaf-3e57f1fdd8e2"
/>

<img width="690" height="353" alt="Screenshot 2026-03-02 at 6 47 54 PM"
src="https://github.com/user-attachments/assets/e470b559-13f4-44d9-897f-bc398943c6d1"
/>
2026-03-03 16:34:11 -08:00
sayan-oai
082682a628 feat: load plugin apps (#13401)
load plugin-apps from `.app.json`.

make apps runtime-mentionable iff `codex_apps` MCP actually exposes
tools for that `connector_id`.

if the app isn't available, it's filtered out of runtime connector set,
so no tools are added and no app-mentions resolve.

right now we don't have a clean cli-side error for an app not being
installed. can look at this after.

### Tests
Added tests, tested locally that using a plugin that bundles an app
picks up the app.
2026-03-03 16:29:15 -08:00
Curtis 'Fjord' Hawthorne
c4cb594e73 Make js_repl image output controllable (#13331)
## Summary

Instead of always adding inner function call outputs to the model
context, let js code decide which ones to return.

- Stop auto-hoisting nested tool outputs from `codex.tool(...)` into the
outer `js_repl` function output.
- Keep `codex.tool(...)` return values unchanged as structured JS
objects.
- Add `codex.emitImage(...)` as the explicit path for attaching an image
to the outer `js_repl` function output.
- Support emitting from a direct image URL, a single `input_image` item,
an explicit `{ bytes, mimeType }` object, or a raw tool response object
containing exactly one image.
- Preserve existing `view_image` original-resolution behavior when JS
emits the raw `view_image` tool result.
- Suppress the special `ViewImageToolCall` event for `js_repl`-sourced
`view_image` calls so nested inspection stays side-effect free until JS
explicitly emits.
- Update the `js_repl` docs and generated project instructions with both
recommended patterns:
  - `await codex.emitImage(codex.tool("view_image", { path }))`
- `await codex.emitImage({ bytes: await page.screenshot({ type: "jpeg",
quality: 85 }), mimeType: "image/jpeg" })`

#### [git stack](https://github.com/magus/git-stack-cli)
-  `1` https://github.com/openai/codex/pull/13050
- 👉 `2` https://github.com/openai/codex/pull/13331
-  `3` https://github.com/openai/codex/pull/13049
2026-03-03 16:25:59 -08:00
alexsong-oai
1afbbc11c3 Ensure the env values of imported shell_environment_policy.set is string (#13402) 2026-03-03 16:12:23 -08:00
Curtis 'Fjord' Hawthorne
b92146d48b Add under-development original-resolution view_image support (#13050)
## Summary

Add original-resolution support for `view_image` behind the
under-development `view_image_original_resolution` feature flag.

When the flag is enabled and the target model is `gpt-5.3-codex` or
newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
`detail: "original"` to the Responses API instead of using the legacy
resize/compress path.

## What changed

- Added `view_image_original_resolution` as an under-development feature
flag.
- Added `ImageDetail` to the protocol models and support for serializing
`detail: "original"` on tool-returned images.
- Added `PromptImageMode::Original` to `codex-utils-image`.
  - Preserves original PNG/JPEG/WebP bytes.
  - Keeps legacy behavior for the resize path.
- Updated `view_image` to:
- use the shared `local_image_content_items_with_label_number(...)`
helper in both code paths
  - select original-resolution mode only when:
    - the feature flag is enabled, and
    - the model slug parses as `gpt-5.3-codex` or newer
- Kept local user image attachments on the existing resize path; this
change is specific to `view_image`.
- Updated history/image accounting so only `detail: "original"` images
use the docs-based GPT-5 image cost calculation; legacy images still use
the old fixed estimate.
- Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
at 85% quality unless lossless is required, while still allowing other
formats when explicitly requested.
- Updated tests and helper code that construct
`FunctionCallOutputContentItem::InputImage` to carry the new `detail`
field.

## Behavior

### Feature off
- `view_image` keeps the existing resize/re-encode behavior.
- History estimation keeps the existing fixed-cost heuristic.

### Feature on + `gpt-5.3-codex+`
- `view_image` sends original-resolution images with `detail:
"original"`.
- PNG/JPEG/WebP source bytes are preserved when possible.
- History estimation uses the GPT-5 docs-based image-cost calculation
for those `detail: "original"` images.


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/13050
-  `2` https://github.com/openai/codex/pull/13331
-  `3` https://github.com/openai/codex/pull/13049
2026-03-03 15:56:54 -08:00
joeytrasatti-openai
935754baa3 Add thread metadata update endpoint to app server (#13280)
## Summary
- add the v2 `thread/metadata/update` API, including
protocol/schema/TypeScript exports and app-server docs
- patch stored thread `gitInfo` in sqlite without resuming the thread,
with validation plus support for explicit `null` clears
- repair missing sqlite thread rows from rollout data before patching,
and make those repairs safe by inserting only when absent and updating
only git columns so newer metadata is not clobbered
- keep sqlite authoritative for mutable thread git metadata by
preserving existing sqlite git fields during reconcile/backfill and only
using rollout `SessionMeta` git fields to fill gaps
- add regression coverage for the endpoint, repair paths, concurrent
sqlite writes, clearing git fields, and rollout/backfill reconciliation
- fix the login server shutdown race so cancelling before the waiter
starts still terminates `block_until_done()` correctly

## Testing
- `cargo test -p codex-state
apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-state
update_thread_git_info_preserves_newer_non_git_metadata`
- `cargo test -p codex-core
backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-app-server thread_metadata_update`
- `cargo test`
- currently fails in existing `codex-core` grep-files tests with
`unsupported call: grep_files`:
    - `suite::grep_files::grep_files_tool_collects_matches`
    - `suite::grep_files::grep_files_tool_reports_empty_results`
2026-03-03 15:56:11 -08:00
Charley Cunningham
299b8ac445 tui: align pending steers with core acceptance (#12868)
## Summary
- submit `Enter` steers immediately while a turn is already running
instead of routing them through `queued_user_messages`
- keep those submitted steers visible in the footer as `pending_steers`
until core records them as a user message or aborts the turn
- reconcile pending steers on `ItemCompleted(UserMessage)`, not
`RawResponseItem`
- emit user-message item lifecycle for leftover pending input at task
finish, then remove the TUI `TurnComplete` fallback
- keep `queued_user_messages` for actual queued drafts, rendered below
pending steers

## Problem
While the assistant was generating, pressing `Enter` could send the
input into `queued_user_messages`. That queue only drains after the turn
ends, so ordinary steers behaved like queued drafts instead of landing
at the next core sampling boundary.

The first version of this fix also used `RawResponseItem` to decide when
a steer had landed. Review feedback was that this is the wrong
abstraction for client behavior.

There was also a late edge case in core: if pending steer input was
accepted after the final sampling decision but before `TurnComplete`,
core would record that user message into history at task finish without
emitting `ItemStarted(UserMessage)` / `ItemCompleted(UserMessage)`. TUI
had a fallback to paper over that gap locally.

## Approach
- `Enter` during an active turn now submits a normal `Op::UserTurn`
immediately
- TUI keeps a local pending-steer preview instead of rendering that user
message into history immediately
- when core records the steer as `ItemCompleted(UserMessage)`, TUI
matches and removes the corresponding pending preview, then renders the
committed user message
- core now emits the same user-message lifecycle when
`on_task_finished(...)` drains leftover pending user input, before
`TurnComplete`
- with that lifecycle gap closed in core, TUI no longer needs to flush
pending steers into history on `TurnComplete`
- if the turn is interrupted, pending steers and queued drafts are both
restored into the composer, with pending steers first

## Notes
- `Tab` still uses the real queued-message path
- `queued_user_messages` and `pending_steers` are separate state with
separate semantics
- the pending-steer matching key is built directly from `UserInput`
- this removes the new TUI dependency on `RawResponseItem`

## Validation
- `just fmt`
- `cargo test -p codex-core
task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input --
--nocapture`
- `cargo test -p codex-tui`
2026-03-03 15:31:52 -08:00
viyatb-oai
24a2d0c696 fix(network-proxy): reject mismatched host headers (#13275)
## Summary
- reject plain HTTP absolute-form requests whose Host header does not
match the request target authority
- add host/port-aware Host header validation for non-default ports
- add regression coverage for mismatched Host forwarding and validator
edge cases
2026-03-03 15:12:06 -08:00
xl-openai
9b004e2db1 Refactor plugin config and cache path (#13333)
Update config.toml plugin entries to use
<plugin_name>@<marketplace_name> as the key.
Plugin now stays in
[plugins/cache/marketplace-name/plugin-name/$version/]
Clean up the plugin code structure.
Add plugin install functionality (not used yet).
2026-03-03 15:00:18 -08:00
Ahmed Ibrahim
041c896509 Revert "Revert "realtime prompt changes"" (#13398)
Reverts openai/codex#13385
2026-03-03 14:41:26 -08:00
Eric Traut
bab32afa93 Require deduplicator success before commenting (#13399)
Fixed recent regression in issue dedup action
2026-03-03 15:32:47 -07:00
Ahmed Ibrahim
6bee02a346 Build delegated realtime handoff text from all messages (#13395)
## Summary
- Route delegated realtime handoff turns from all handoff message texts,
preserving order
- Fallback to input_transcript only when no messages are present
- Add regression coverage for multi-message handoff requests
2026-03-03 14:07:51 -08:00
Owen Lin
d7eb195b62 chore(app-server): restore EventMsg TS types (#13397)
Realized EventMsg generated types were unintentionally removed as part
of this PR: https://github.com/openai/codex/pull/13375

Turns out our TypeScript export pipeline relied on transitively reaching
`EventMsg`. We should still export `EventMsg` explicitly since we're
still emitting `codex/event/*` events (for now, but getting dropped soon
as well).
2026-03-03 13:37:40 -08:00
Owen Lin
167158f93c chore(app-server): delete v1 RPC methods and notifications (#13375)
## Summary
This removes the old app-server v1 methods and notifications we no
longer need, while keeping the small set the main codex app client still
depends on for now.

The remaining legacy surface is:
- `initialize`
- `getConversationSummary`
- `getAuthStatus`
- `gitDiffToRemote`
- `fuzzyFileSearch`
- `fuzzyFileSearch/sessionStart`
- `fuzzyFileSearch/sessionUpdate`
- `fuzzyFileSearch/sessionStop`

And the raw `codex/event/*` notifications emitted from core. These
notifications will be removed in a followup PR.

## What changed
- removed deprecated v1 request variants from the protocol and
app-server dispatcher
- removed deprecated typed notifications: `authStatusChange`,
`loginChatGptComplete`, and `sessionConfigured`
- updated the app-server test client to use v2 flows instead of deleted
v1 flows
- deleted legacy-only app-server test suites and added focused coverage
for `getConversationSummary`
- regenerated app-server schema fixtures and updated the MCP interface
docs to match the remaining compatibility surface

## Testing
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server`
2026-03-03 13:18:25 -08:00
Ahmed Ibrahim
72d368e03a fix (#13389)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-03-03 12:48:16 -08:00
Ahmed Ibrahim
8afe2127dc Revert "realtime prompt changes" (#13385)
Reverts openai/codex#13376
2026-03-03 12:30:37 -08:00