**PR Summary**
This PR adds embedded-only OTEL policy audit logging for
`codex-network-proxy` and threads audit metadata from `codex-core` into
managed proxy startup.
### What changed
- Added structured audit event emission in `network_policy.rs` with
target `codex_otel.network_proxy`.
- Emitted:
- `codex.network_proxy.domain_policy_decision` once per domain-policy
evaluation.
- `codex.network_proxy.block_decision` for non-domain denies.
- Added required policy/network fields, RFC3339 UTC millisecond
`event.timestamp`, and fallback defaults (`http.request.method="none"`,
`client.address="unknown"`).
- Added non-domain deny audit emission in HTTP/SOCKS handlers for
mode-guard and proxy-state denies, including unix-socket deny paths.
- Added `REASON_UNIX_SOCKET_UNSUPPORTED` and used it for unsupported
unix-socket auditing.
- Added `NetworkProxyAuditMetadata` to runtime/state, re-exported from
`lib.rs` and `state.rs`.
- Added `start_proxy_with_audit_metadata(...)` in core config, with
`start_proxy()` delegating to default metadata.
- Wired metadata construction in `codex.rs` from session/auth context,
including originator sanitization for OTEL-safe tagging.
- Updated `network-proxy/README.md` with embedded-mode audit schema and
behavior notes.
- Refactored HTTP block-audit emission to a small local helper to reduce
duplication.
- Preserved existing unix-socket proxy-disabled host/path behavior for
responses and blocked history while using an audit-only endpoint
override (`server.address="unix-socket"`, `server.port=0`).
### Explicit exclusions
- No standalone proxy OTEL startup work.
- No `main.rs` binary wiring.
- No `standalone_otel.rs`.
- No standalone docs/tests.
### Tests
- Extended `network_policy.rs` tests for event mapping, metadata
propagation, fallbacks, timestamp format, and target prefix.
- Extended HTTP tests to assert unix-socket deny block audit events.
- Extended SOCKS tests to cover deny emission from handler deny
branches.
- Added/updated core tests to verify audit metadata threading into
managed proxy state.
### Validation run
- `just fmt`
- `cargo test -p codex-network-proxy` ✅
- `cargo test -p codex-core` ran with one unrelated flaky timeout
(`shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin`), and
the test passed when rerun directly ✅
---------
Co-authored-by: viyatb-oai <viyatb@openai.com>
**PR Summary**
This PR adds the OpenTelemetry `host.name` resource attribute to Codex
OTEL exports so every OTEL log (and trace, via the shared resource)
carries the machine hostname.
**What changed**
- Added `host.name` to the shared OTEL `Resource` in
`/Users/michael.mcgrew/code/codex/codex-rs/otel/src/otel_provider.rs`
- This applies to both:
- OTEL logs (`SdkLoggerProvider`)
- OTEL traces (`SdkTracerProvider`)
- Hostname is now resolved via `gethostname::gethostname()`
(best-effort)
- Value is trimmed
- Empty values are omitted (non-fatal)
- Added focused unit tests for:
- including `host.name` when present
- omitting `host.name` when missing/empty
**Why**
- `host.name` is host/process metadata and belongs on the OTEL
`resource`, not per-event attributes.
- Attaching it in the shared resource is the smallest change that
guarantees coverage across all exported OTEL logs/traces.
**Scope / Non-goals**
- No public API changes
- No changes to metrics behavior (this PR only updates log/trace
resource metadata)
**Dependency updates**
- Added `gethostname` as a workspace dependency and `codex-otel`
dependency
- `Cargo.lock` updated accordingly
- `MODULE.bazel.lock` unchanged after refresh/check
**Validation**
- `just fmt`
- `cargo test -p codex-otel`
- `just bazel-lock-update`
- `just bazel-lock-check`
Add a stream parser to extract citations (and others) from a stream.
This support cases where markers are split in differen tokens.
Codex never manage to make this code work so everything was done
manually. Please review correctly and do not touch this part of the code
without a very clear understanding of it
This PR adds the macro `#[large_stack_test]`
This spawns the tests in a dedicated tokio runtime with a larger stack.
It is useful for tests that needs the full recursion on the harness
(which is now too deep for windows for example)
Summary
- propagate approval policy from parent to spawned agents and drop the
Never override so sub-agents respect the caller’s request
- refresh the pending-approval list whenever events arrive or the active
thread changes and surface the list above the composer for inactive
threads
- add widgets, helpers, and tests covering the new pending-thread
approval UI state
![Uploading Screenshot 2026-02-25 at 11.02.18.png…]()
## Why
`unix_escalation.rs` checks a session-scoped approval cache before
prompting again for an execve-intercepted skill script. Without also
recording `ReviewDecision::ApprovedForSession`, that cache never gets
populated, so the same skill script can still trigger repeated approval
prompts within one session.
## What Changed
- Add `execve_session_approvals` to `SessionServices` so the session can
track approved skill script paths.
- Record the script path when a skill-script prompt returns
`ReviewDecision::ApprovedForSession`, but only for the skill-script path
rather than broader prefix-rule approvals.
- Reuse the cached approval on later execve callbacks by treating an
already-approved skill script as `Decision::Allow`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12756).
* #12758
* __->__ #12756
Migration Behavior
* Config
* Migrates settings.json into config.toml
* Only adds fields when config.toml is missing, or when those fields are
missing from the existing file
* Supported mappings:
env -> shell_environment_policy
sandbox.enabled = true -> sandbox_mode = "workspace-write"
* Skills
* Copies home and repo .claude/skills into .agents/skills
* Existing skill directories are not overwritten
* SKILL.md content is rewritten from Claude-related terms to Codex
* AgentsMd
* Repo only
* Migrates CLAUDE.md into AGENTS.md
* Detect/import only proceed when AGENTS.md is missing or present but
empty
* Content is rewritten from Claude-related terms to Codex
Add service name to the app-server so that the app can use it's own
service name
This is on thread level because later we might plan the app-server to
become a singleton on the computer
## Summary
- Preserve each skill’s raw permissions block as a permission_profile on
SkillMetadata during skill loading.
- Keep compiling that same metadata into the existing runtime
Permissions object, so current enforcement
behavior stays intact.
- When zsh-fork intercepts execution of a script that belongs to a
skill, include the skill’s
permission_profile in the exec approval request.
- This lets approval UIs show the extra filesystem access the skill
declared when prompting for approval.
## Why
In the `shell_zsh_fork` flow, `codex-shell-escalation` receives the
executable path exactly as the shell passed it to `execve()`. That path
is not guaranteed to be absolute.
For commands such as `./scripts/hello-mbolin.sh`, if the shell was
launched with a different `workdir`, resolving the intercepted `file`
against the server process working directory makes policy checks and
skill matching inspect the wrong executable. This change pushes that fix
a step further by keeping the normalized path typed as `AbsolutePathBuf`
throughout the rest of the escalation pipeline.
That makes the absolute-path invariant explicit, so later code cannot
accidentally treat the resolved executable path as an arbitrary
`PathBuf`.
## What Changed
- record the wrapper process working directory as an `AbsolutePathBuf`
- update the escalation protocol so `workdir` is explicitly absolute
while `file` remains the raw intercepted exec path
- resolve a relative intercepted `file` against the request `workdir` as
soon as the server receives the request
- thread `AbsolutePathBuf` through `EscalationPolicy`,
`CoreShellActionProvider`, and command normalization helpers so the
resolved executable path stays type-checked as absolute
- replace the `path-absolutize` dependency in `codex-shell-escalation`
with `codex-utils-absolute-path`
- add a regression test that covers a relative `file` with a distinct
`workdir`
## Verification
- `cargo test -p codex-shell-escalation`
Direct skill-script matches force `Decision::Prompt`, so skill-backed
scripts require explicit approval before they run. (Note "allow for
session" is not supported in this PR, but will be done in a follow-up.)
In the process of implementing this, I fixed an important bug:
`ShellZshFork` is supposed to keep ordinary allowed execs on the
client-side `Run` path so later `execve()` calls are still intercepted
and reviewed. After the shell-escalation port, `Decision::Allow` still
mapped to `Escalate`, which moved `zsh` to server-side execution too
early. That broke the intended flow for skill-backed scripts and made
the approval prompt depend on the wrong execution path.
## What changed
- In `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`,
`Decision::Allow` now returns `Run` unless escalation is actually
required.
- Removed the zsh-specific `argv[0]` fallback. With the `Allow -> Run`
fix in place, zsh's later `execve()` of the script is intercepted
normally, so the skill match happens on the script path itself.
- Kept the skill-path handling in `determine_action()` focused on the
direct `program` match path.
## Verification
- Updated `shell_zsh_fork_prompts_for_skill_script_execution` in
`codex-rs/core/tests/suite/skill_approval.rs` (gated behind `cfg(unix)`)
to:
- run under `SandboxPolicy::new_workspace_write_policy()` instead of
`DangerFullAccess`
- assert the approval command contains only the script path
- assert the approved run returns both stdout and stderr markers in the
shell output
- Ran `cargo test -p codex-core
shell_zsh_fork_prompts_for_skill_script_execution -- --nocapture`
## Manual Testing
Run the dev build:
```
just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork
```
I have created `/Users/mbolin/.agents/skills/mbolin-test-skill` with:
```
├── scripts
│ └── hello-mbolin.sh
└── SKILL.md
```
The skill:
```
---
name: mbolin-test-skill
description: Used to exercise various features of skills.
---
When this skill is invoked, run the `hello-mbolin.sh` script and report the output.
```
The script:
```
set -e
# Note this script will fail if run with network disabled.
curl --location openai.com
```
Use `$mbolin-test-skill` to invoke the skill manually and verify that I
get prompted to run `hello-mbolin.sh`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12730).
* #12750
* __->__ #12730
## Summary
Remove js_repl/node test-skip paths and make Node setup explicit in CI
so js_repl tests always run instead of silently skipping.
## Why
We had multiple “expediency” skip paths that let js_repl tests pass
without actually exercising Node-backed behavior. This reduced CI signal
and hid runtime/environment regressions.
## What changed
### CI
- Added Node setup using `codex-rs/node-version.txt` in:
- `.github/workflows/rust-ci.yml`
- `.github/workflows/bazel.yml`
- Added a Unix PATH copy step in Bazel workflow to expose the setup-node
binary in common paths.
### js_repl test harness
- Added explicit js_repl sandbox test configuration helpers in:
- `codex-rs/core/src/tools/js_repl/mod.rs`
- `codex-rs/core/src/tools/handlers/js_repl.rs`
- Added Linux arg0 dispatch glue for js_repl tests so sandbox subprocess
entrypoint behavior is correct under Linux test execution.
### Removed skip behavior
- Deleted runtime guard function and early-return skips in js_repl tests
(`can_run_js_repl_runtime_tests` and related per-test short-circuits).
- Removed view_image integration test skip behavior:
- dropped `skip_if_no_network!(Ok(()))`
- removed “skip on Node missing/too old” branch after js_repl output
inspection.
## Impact
- js_repl/node tests now consistently execute and fail loudly when the
environment is not correctly provisioned.
- CI has stronger signal for js_repl regressions instead of false green
from conditional skips.
## Testing
- `cargo test -p codex-core` (locally) to validate js_repl
unit/integration behavior with skips removed.
- CI expected to surface any remaining environment/runtime gaps directly
(rather than masking them).
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/12300
- ✅ `2` https://github.com/openai/codex/pull/12275
- ✅ `3` https://github.com/openai/codex/pull/12205
- ✅ `4` https://github.com/openai/codex/pull/12407
- ✅ `5` https://github.com/openai/codex/pull/12372
- 👉 `6` https://github.com/openai/codex/pull/12185
- ⏳ `7` https://github.com/openai/codex/pull/10673
## Why
`ExecApprovalRequestEvent` can carry a distinct `approval_id` for
subcommand approvals, including the `execve`-intercepted zsh-fork path.
The session registers the pending approval callback under `approval_id`
when one is present, but `ChatWidget` was stashing `call_id` in the
approval modal state. When the user approved the command in the TUI, the
response was sent back with the wrong identifier, so the pending
approval could not be matched and the approval callback would not
resolve.
Note `approval_id` was introduced in
https://github.com/openai/codex/pull/12051.
## What changed
- In `tui/src/chatwidget.rs`, `ChatWidget` now uses
`ExecApprovalRequestEvent::effective_approval_id()` when constructing
`ApprovalRequest::Exec`.
- That preserves the existing behavior for normal shell and
`unified_exec` approvals, where `approval_id` is absent and the
effective id still falls back to `call_id`.
- For subcommand approvals that provide a distinct `approval_id`, the
TUI now sends back the same key that
`Session::request_command_approval()` registered.
## Verification
- Traced the approval flow end to end to confirm the same effective
approval id is now used on both sides of the round trip:
- `Session::request_command_approval()` registers the pending callback
under `approval_id.unwrap_or(call_id)`.
- `ChatWidget` now emits `Op::ExecApproval` with that same effective id.
## Summary
Stabilize `js_repl` runtime test setup in CI and move tool-facing
`js_repl` behavior coverage into integration tests.
This is a test/CI change only. No production `js_repl` behavior change
is intended.
## Why
- Bazel test sandboxes (especially on macOS) could resolve a different
`node` than the one installed by `actions/setup-node`, which caused
`js_repl` runtime/version failures.
- `js_repl` runtime tests depend on platform-specific
sandbox/test-harness behavior, so they need explicit gating in a
base-stability commit.
- Several tests in the `js_repl` unit test module were actually
black-box/tool-level behavior tests and fit better in the integration
suite.
## Changes
- Add `actions/setup-node` to the Bazel and Rust `Tests` workflows,
using the exact version pinned in the repo’s Node version file.
- In Bazel (non-Windows), pass `CODEX_JS_REPL_NODE_PATH=$(which node)`
into test env so `js_repl` uses the `actions/setup-node` runtime inside
Bazel tests.
- Add a new integration test suite for `js_repl` tool behavior and
register it in the core integration test suite module.
- Move black-box `js_repl` behavior tests into the integration suite
(persistence/TLA, builtin tool invocation, recursive self-call
rejection, `process` isolation, blocked builtin imports).
- Keep white-box manager/kernel tests in the `js_repl` unit test module.
- Gate `js_repl` runtime tests to run only on macOS and only when a
usable Node runtime is available (skip on other platforms / missing Node
in this commit).
## Impact
- Reduces `js_repl` CI failures caused by Node resolution drift in
Bazel.
- Improves test organization by separating tool-facing behavior tests
from white-box manager/kernel tests.
- Keeps the base commit stable while expanding `js_repl` runtime
coverage.
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/12372
- 👉 `2` https://github.com/openai/codex/pull/12407
- ⏳ `3` https://github.com/openai/codex/pull/12185
- ⏳ `4` https://github.com/openai/codex/pull/10673
This PR replaces the old `additional_permissions.fs_read/fs_write` shape
with a shared `PermissionProfile`
model and wires it through the command approval, sandboxing, protocol,
and TUI layers. The schema is adopted from the
`SkillManifestPermissions`, which is also refactored to use this unified
struct. This helps us easily expose permission profiles in app
server/core as a follow-up.
## Summary
- Fix `js_repl` so `await codex.tool("view_image", { path })` actually
attaches the image to the active turn when called from inside the JS
REPL.
- Restore the behavior expected by the existing `js_repl`
image-attachment test.
- This is a follow-up to
[#12553](https://github.com/openai/codex/pull/12553), which changed
`view_image` to return structured image content.
## Root Cause
- [#12553](https://github.com/openai/codex/pull/12553) changed
`view_image` from directly injecting a pending user image message to
returning structured `function_call_output` content items.
- The nested tool-call bridge inside `js_repl` serialized that tool
response back to the JS runtime, but it did not mirror returned image
content into the active turn.
- As a result, `view_image` appeared to succeed inside `js_repl`, but no
`input_image` was actually attached for the outer turn.
## What Changed
- Updated the nested tool-call path in `js_repl` to inspect function
tool responses for structured content items.
- When a nested tool response includes `input_image` content, `js_repl`
now injects a corresponding user `Message` into the active turn before
returning the raw tool result back to the JS runtime.
- Kept the normal JSON result flow intact, so `codex.tool(...)` still
returns the original tool output object to JavaScript.
## Why
- `js_repl` documentation and tests already assume that `view_image` can
be used from inside the REPL to attach generated images to the model.
- Without this fix, the nested call path silently dropped that
attachment behavior.
linux musl build steps in `rust-release.yml` are [currently
broken](https://github.com/openai/codex/actions/runs/22367312571)
because of linking issues due to ubsan-calling types (`jitterentropy`)
leaking into the build.
add `AWS_LC_SYS_NO_JITTER_ENTROPY=1` to the musl build step to avoid
linking those ubsan-calling types. this is a more temporary fix, we need
to clean up ubsan usage upstream so they dont leak into release-build
steps anyways.
codex's more thorough explanation below:
[pr 9859](https://github.com/openai/codex/pull/9859) added [MITM
init](https://github.com/openai/codex/pull/9859/changes#diff-db782967007060c5520651633e1ea21681d64be21f2b791d3d84519860245b97R62-R68)
in network-proxy, which wires in cert generation code (rcgen/rustls).
this didnt bump/change dep versions, but it changed symbol reachability
at link time.
for musl builds, that made aws-lc-sys’s jitterentropy objects get pulled
into the final link. those objects contain UBSan calls
(__ubsan_handle_*). musl release linking is static (*-linux-musl-gcc,
-nodefaultlibs) and does not link a musl UBSan runtime, so link fails
with undefined __ubsan_*.
before, our custom musl CI UBSan steps (install libubsan1, RUSTC_WRAPPER
+ LD_PRELOAD, partial flag scrubbing) masked some sanitizer issues.
after this pr, more aws-lc code became link-reachable, and that band-aid
wasn't enough.
## Why
`codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` previously
located `codex-execve-wrapper` by scanning `PATH` and sibling
directories. That lookup is brittle and can select the wrong binary when
the runtime environment differs from startup assumptions.
We already pass `codex-linux-sandbox` from `codex-arg0`;
`codex-execve-wrapper` should use the same startup-driven path plumbing.
## What changed
- Introduced `Arg0DispatchPaths` in `codex-arg0` to carry both helper
executable paths:
- `codex_linux_sandbox_exe`
- `main_execve_wrapper_exe`
- Updated `arg0_dispatch_or_else()` to pass `Arg0DispatchPaths` to
top-level binaries and preserve helper paths created in
`prepend_path_entry_for_codex_aliases()`.
- Threaded `Arg0DispatchPaths` through entrypoints in `cli`, `exec`,
`tui`, `app-server`, and `mcp-server`.
- Added `main_execve_wrapper_exe` to core configuration plumbing
(`Config`, `ConfigOverrides`, and `SessionServices`).
- Updated zsh-fork shell escalation to consume the configured
`main_execve_wrapper_exe` and removed path-sniffing fallback logic.
- Updated app-server config reload paths so reloaded configs keep the
same startup-provided helper executable paths.
## References
- [`Arg0DispatchPaths`
definition](e355b43d5c/codex-rs/arg0/src/lib.rs (L20-L24))
- [`arg0_dispatch_or_else()` forwarding both
paths](e355b43d5c/codex-rs/arg0/src/lib.rs (L145-L176))
- [zsh-fork escalation using configured wrapper
path](e355b43d5c/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L109-L150))
## Testing
- `cargo check -p codex-arg0 -p codex-core -p codex-exec -p codex-tui -p
codex-mcp-server -p codex-app-server`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core tools::runtimes:🐚:unix_escalation:: --
--nocapture`
Rename `SkillMetadata.path` to `SkillMetadata.path_to_skills_md` for
clarity.
Would ideally change the type to `AbsolutePathBuf`, but that can be done
later.
## Summary
Improve `js_repl` behavior when the Node kernel hits a process-level
failure (for example, an uncaught exception or unhandled Promise
rejection).
Instead of only surfacing a generic `js_repl kernel exited unexpectedly`
after stdout EOF, `js_repl` now returns a clearer exec error for the
active request, then resets the kernel cleanly.
## Why
Some sandbox-denied operations can trigger Node errors that become
process-level failures (for example, an unhandled EventEmitter `'error'`
event). In that case:
- the kernel process exits,
- the host sees stdout EOF,
- the user gets a generic kernel-exit error,
- and the next request can briefly race with stale kernel state.
This change improves that failure mode without monkeypatching Node APIs.
## Changes
### Kernel-side (`js_repl` Node process)
- Add process-level handlers for:
- `uncaughtException`
- `unhandledRejection`
- When one of these fires:
- best-effort emit a normal `exec_result` error for the active exec
- include actionable guidance to catch/handle async errors (including
Promise rejections and EventEmitter `'error'` events)
- exit intentionally so the host can reset/restart the kernel
### Host-side (`JsReplManager`)
- Clear dead kernel state as soon as the stdout reader observes
unexpected kernel exit/EOF.
- This lets the next `js_repl` exec start a fresh kernel instead of
hitting a stale broken-pipe path.
### Tests
- Add regression coverage for:
- uncaught async exception -> exec error + kernel recovery on next exec
- Update forced-kernel-exit test to validate recovery behavior (next
exec restarts cleanly)
## Impact
- Better user-facing error for kernel crashes caused by
uncaught/unhandled async failures.
- Cleaner recovery behavior after kernel exit.
## Validation
- `cargo test -p codex-core --lib
tools::js_repl::tests::js_repl_uncaught_exception_returns_exec_error_and_recovers
-- --exact`
- `cargo test -p codex-core --lib
tools::js_repl::tests::js_repl_forced_kernel_exit_recovers_on_next_exec
-- --exact`
- `just fmt`
## Summary
- add graceful websocket app-server restart on Ctrl-C by draining until
no assistant turns are running
- stop the websocket acceptor and disconnect existing connections once
the drain condition is met
- add a websocket integration test that verifies Ctrl-C waits for an
in-flight turn before exit
## Verification
- `cargo check -p codex-app-server --quiet`
- `cargo test -p codex-app-server --test all
suite::v2::connection_handling_websocket`
- I (maxj) tested remote and local Codex.app
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`codex-shell-escalation` exposed a `codex-core`-specific adapter layer
(`ShellActionProvider`, `ShellPolicyFactory`, and `run_escalate_server`)
that existed only to bridge `codex-core` to `EscalateServer`. That
indirection increased API surface and obscured crate ownership without
adding behavior.
This change moves orchestration into `codex-core` so boundaries are
clearer: `codex-shell-escalation` provides reusable escalation
primitives, and `codex-core` provides shell-tool policy decisions.
Admittedly, @pakrym rightfully requested this sort of cleanup as part of
https://github.com/openai/codex/pull/12649, though this avoids moving
all of `codex-shell-escalation` into `codex-core`.
## What changed
- Made `EscalateServer` public and exported it from `shell-escalation`.
- Removed the adapter layer from `shell-escalation`:
- deleted `shell-escalation/src/unix/core_shell_escalation.rs`
- removed exports for `ShellActionProvider`, `ShellPolicyFactory`,
`EscalationPolicyFactory`, and `run_escalate_server`
- Updated `core/src/tools/runtimes/shell/unix_escalation.rs` to:
- create `Stopwatch`/cancellation in `codex-core`
- instantiate `EscalateServer` directly
- implement `EscalationPolicy` directly on `CoreShellActionProvider`
Net effect: same escalation flow with fewer wrappers and a smaller
public API.
## Verification
- Manually reviewed the old vs. new escalation call flow to confirm
timeout/cancellation behavior and approval policy decisions are
preserved while removing wrapper types.
Increase `IMAGE_BYTES_ESTIMATE` from 340 bytes to 7,373 bytes so the
existing 4-bytes/token heuristic yields an image estimate of ~1,844
tokens instead of ~85. This makes auto-compaction more conservative for
image-heavy transcripts and avoids underestimating context usage, which
can otherwise cause compaction to fail when there is not enough free
context remaining. The new value was chosen because that's the image
resolution cap used for our latest models.
Follow-up to [#12419](https://github.com/openai/codex/pull/12419).
Refs [#11845](https://github.com/openai/codex/issues/11845).
Fixes#12128
The docs indicates that `project_root_markers` are used to discover the
project root for local config as well as `AGENTS.md`. It looks like it
was never wired up to support the latter.
Summary
- resolve project docs by walking to the configured
`project_root_markers` (or defaults) instead of assuming the Git root,
while honoring CLI overrides and handling malformed configs
- fall back to the project’s canonical path chain and add a test that
makes sure custom markers upstream of `.git` are respected
- Add a hidden `realtime_conversation` feature flag and `/realtime`
slash command for start/stop live voice sessions.
- Reuse transcription composer/footer UI for live metering, stream mic
audio, play assistant audio, render realtime user text events, and
force-close on feature disable.
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- detect skill-invoking shell commands based on the original command
string, request approvals when needed, and cache positive decisions per
session
- keep implicit skill invocation emitted after approval and keep skill
approval decline messaging centralized to the shell handler
- expand and adjust skill approval tests to cover shell-based skill
scripts while matching the new detection expectations
Testing
- Not run (not requested)
## Problem
Diff lines used only foreground colors (green/red) with no background
tinting, making them hard to scan. The gutter (line numbers) also had no
theme awareness — dimmed text was fine on dark terminals but unreadable
on light ones.
## Mental model
Each diff line now has four styled layers: **gutter** (line number),
**sign** (`+`/`-`), **content** (text), and **line background** (full
terminal width). A `DiffTheme` enum (`Dark` / `Light`) is selected once
per render by probing the terminal's queried background via
`default_bg()`. A companion `DiffColorLevel` enum (`TrueColor` /
`Ansi256` / `Ansi16`) is derived from `stdout_color_level()` and gates
which palette is used. All style helpers dispatch on `(theme,
DiffLineType, color_level)` to pick the right colors.
| Theme Picker Wide | Theme Picker Narrow |
|---|---|
| <img width="1552" height="1012" alt="image"
src="https://github.com/user-attachments/assets/231b21b7-32d4-4727-80ed-7d01924954be"
/> | <img width="795" height="1012" alt="image"
src="https://github.com/user-attachments/assets/549cacdf-daec-43c9-ad64-2a28d16d140e"
/> |
| Dark BG - 16 colors | Dark BG - 256 colors | Dark BG - True Colors |
|---|---|---|
| <img width="1552" height="1012" alt="dark-16colors"
src="https://github.com/user-attachments/assets/fba36de3-c101-47d4-9e63-88cdd00410d0"
/> | <img width="1552" height="1012" alt="dark-256colors"
src="https://github.com/user-attachments/assets/f39e4307-c6b0-49c4-b4fe-bd26d3d8e41c"
/> | <img width="1552" height="1012" alt="dark-truecolor"
src="https://github.com/user-attachments/assets/1af4ec57-04bf-4dfb-8a44-0ab5e5aaaf18"
/> |
| Light BG - 16 colors | Light BG - 256 colors | Light BG - True Colors
|
|---|---|---|
| <img width="1552" height="1012" alt="light-16colors"
src="https://github.com/user-attachments/assets/2b5423d1-74b4-4b1e-8123-7c2488ff436b"
/> | <img width="1552" height="1012" alt="light-256colors"
src="https://github.com/user-attachments/assets/c94cff9a-8d3e-42c9-bbe7-079da39953a8"
/> | <img width="1552" height="1012" alt="light-truecolor"
src="https://github.com/user-attachments/assets/f73da626-725f-4452-99ee-69ef706df2c6"
/> |
## Non-goals
- No runtime theme switching beyond what `default_bg()` already
provides.
- No change to syntax highlighting theme selection or the highlight
module.
## Tradeoffs
- Three fixed palettes (truecolor RGB, 256-color indexed, 16-color
named) are maintained rather than using `best_color` nearest-match. This
is deliberate: `supports_color::on_cached(Stream::Stdout)` can misreport
capabilities once crossterm enters the alternate screen, so hand-picked
palette entries give better visual results than automatic quantization.
- Delete lines in the syntax-highlighted path get `Modifier::DIM` to
visually recede compared to insert lines. This trades some readability
of deleted code for scan-ability of additions.
- The theme picker's diff preview sets `preserve_side_content_bg: true`
on `ListSelectionView` so diff background tints survive into the side
panel. Other popups keep the default (`false`) to preserve their
reset-background look.
## Architecture
- **Color constants** are module-level `const` items grouped by palette
tier: `DARK_TC_*` / `LIGHT_TC_*` (truecolor RGB tuples), `DARK_256_*` /
`LIGHT_256_*` (xterm indexed), with named `Color` variants used for the
16-color tier.
- **`DiffTheme`** is a private enum; `diff_theme()` probes the terminal
and `diff_theme_for_bg()` is the testable pure-function version.
- **`DiffColorLevel`** is a private enum derived from `StdoutColorLevel`
via `diff_color_level()`.
- **Palette helpers** (`add_line_bg`, `del_line_bg`, `light_gutter_fg`,
`light_add_num_bg`, `light_del_num_bg`) each take `(DiffTheme,
DiffColorLevel)` or just `DiffColorLevel` and return a `Color`.
- **Style helpers** (`style_line_bg_for`, `style_gutter_for`,
`style_sign_add`, `style_sign_del`, `style_add`, `style_del`) each take
`(DiffLineType, DiffTheme, DiffColorLevel)` or `(DiffTheme,
DiffColorLevel)` and return a `Style`.
- **`push_wrapped_diff_line_inner_with_theme_and_color_level`** is the
innermost renderer, accepting both theme and color level so tests can
exercise any combination without depending on the terminal.
- **Line-level background** is applied via
`RtLine::from(...).style(line_bg)` so the tint extends across the full
terminal width, not just the text content.
- **Theme picker integration**: `ListSelectionView` gained a
`preserve_side_content_bg` flag. When `true`, the side panel skips
`force_bg_to_terminal_bg`, letting diff preview backgrounds render
faithfully.
## Observability
No new logging. Theme selection is deterministic from `default_bg()`,
which is already queried and cached at TUI startup.
## Tests
1. **`DiffTheme` is determined per `render_change` call** — if
`default_bg()` changes mid-render (e.g. `requery_default_colors()`
fires), different file chunks could render with different themes. Low
risk in practice since re-query only happens on explicit user action.
2. **16-color tier uses named `Color` variants** (`Color::Green`,
`Color::Red`, etc.) which the terminal maps to its own palette. On
unusual terminal themes these could clash with the background.
Acceptable since 16-color terminals already have unpredictable color
rendering.
3. **Light-theme `style_add` / `style_del` set bg but no fg** — on light
terminals, non-syntax-highlighted content uses the terminal's default
foreground against a pastel background. If the terminal's default fg
happens to be very light, contrast could suffer. This is an edge case
since light-terminal users typically have dark default fg.
4. **`preserve_side_content_bg` is a general-purpose flag but only used
by the theme picker** — if other popups start using side content with
intentional backgrounds they'll need to opt in explicitly. Not a real
risk today, just a note for future callers.
#### What
Try matching `\w+`-namespaced model after `longest prefix` as heuristic
to match `ModelInfo` from list of candidates.
This shouldn't regress existing behavior:
- `gpt-5.2-codex` -> `gpt-5.2` if `gpt-5.2-codex` not present
- `gpt-5.3` -> `gpt-5` if `gpt-5.3` not present
- `gpt-9` still doesn't match anything
while being more forgiving for custom prefixes:
- `oai/gpt-5.3-codex` -> `gpt-5.3-codex`
#### Tests
Added unit test.
Fixes#12175
If a user types an npm package name with multiple `@` symbols like `npx
-y @foo/bar@latest`, the TUI currently treats this as though it's
attempting to invoke the file picker.
### What changed
- **Generalized `@` token parsing**
- `current_prefixed_token(...)` now treats `@` as a token start **only
at a whitespace boundary** (or start-of-line).
- If the cursor is on a nested `@` inside an existing
whitespace-delimited token (for example `@scope/pkg@latest`), it keeps
the surrounding token active instead of starting a new token at the
second `@`.
- It also avoids misclassifying mid-word usages like `foo@bar` as an `@`
file token.
- **Enter behavior with file popup**
- If the file-search popup is open but has **no selected match**,
pressing `Enter` now closes the popup and falls through to normal submit
behavior.
- This prevents pasted strings containing `@...` from blocking
submission just because file-search was active with no actionable
selection.
### Testing
I manually built and tested the scenarios involved with the bug report
and related use of `@` mentions to verify no regressions
## Why
This PR switches the `shell_command` zsh-fork path over to
`codex-shell-escalation` so the new shell tool can use the shared
exec-wrapper/escalation protocol instead of the `zsh_exec_bridge`
implementation that was introduced in
https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on
UNIX domain sockets, which is not as tamper-proof as the FD-based
approach in `codex-shell-escalation`.
## What Changed
- Added a Unix zsh-fork runtime adapter in `core`
(`core/src/tools/runtimes/shell/unix_escalation.rs`) that:
- runs zsh-fork commands through
`codex_shell_escalation::run_escalate_server`
- bridges exec-policy / approval decisions into `ShellActionProvider`
- executes escalated commands via a `ShellCommandExecutor` that calls
`process_exec_tool_call`
- Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to
select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving
the generic `shell` tool path unchanged.
- Removed the `zsh_exec_bridge`-based session service and deleted
`core/src/zsh_exec_bridge/mod.rs`.
- Moved exec-wrapper entrypoint dispatch to `arg0` by handling the
`codex-execve-wrapper` arg0 alias there, and removed the old
`codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and
`app-server` mains.
- Added the needed `codex-shell-escalation` dependencies for `core` and
`arg0`.
## Tests
- `cargo test -p codex-core
shell_zsh_fork_prefers_shell_command_over_unified_exec`
- `cargo test -p codex-app-server turn_start_shell_zsh_fork --
--nocapture`
- verifies zsh-fork command execution and approval flows through the new
backend
- includes subcommand approve/decline coverage using the shared zsh
DotSlash fixture in `app-server/tests/suite/zsh`
- To test manually, I added the following to `~/.codex/config.toml`:
```toml
zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh"
[features]
shell_zsh_fork = true
```
Then I ran `just c` to run the dev build of Codex with these changes and
sent it the message:
```
run `echo $0`
```
And it replied with:
```
echo $0 printed:
/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh
In this tool context, $0 reflects the script path used to invoke the shell, not just zsh.
```
so the tool appears to be wired up correctly.
## Notes
- The zsh subcommand-decline integration test now uses `rm` under a
`WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is
auto-allowed by the new `shell-escalation` policy path, which no longer
produces subcommand approval prompts.
## Summary
Introduces the initial implementation of Feature::RequestPermissions.
RequestPermissions allows the model to request that a command be run
inside the sandbox, with additional permissions, like writing to a
specific folder. Eventually this will include other rules as well, and
the ability to persist these permissions, but this PR is already quite
large - let's get the core flow working and go from there!
<img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
/>
## Testing
- [x] Added tests
- [x] Tested locally
- [x] Feature
Send a request with `generate: falls` but a full set of tools and
instructions to pre-warm inference.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- tighten the memory-use decision boundary so agents skip memory only
for clearly self-contained asks
- make the quick memory pass more explicit and bounded (including a
lightweight search budget)
- add structured `<memory_citation>` requirements and examples for final
replies
- clarify memory update guidance and end-state wording for memory lookup
## Why
The previous template was directionally correct, but still left room for
inconsistent memory lookup behavior and citation formatting. This change
makes the default behavior, quick-pass scope, and citation output
contract much more explicit.
## Testing
- not run (prompt/template text change only)
Co-authored-by: jif-oai <jif@openai.com>
rm `PRESETS` list harcoded in `model_presets` as we now have bundled
`models.json` with equivalent info.
update logic to rely on bundled models instead, update tests.
Fixes#12216
Fixes a panic in `AbsolutePathBuf::parent()` when the process hits file
descriptor exhaustion (`EMFILE` / "Too many open files").
### Root cause
`AbsolutePathBuf::parent()` was re-validating the parent path via
`from_absolute_path(...).expect(...)`.
`from_absolute_path()` calls `path_absolutize::absolutize()`, which can
depend on `std::env::current_dir()`. Under `EMFILE`, that can fail,
causing `parent()` to panic even though the parent of an absolute path
is already known.
### Change
- Stop re-absolutizing the result of `self.0.parent()`
- Construct `AbsolutePathBuf` directly from the known parent path
- Keep an invariant check with `debug_assert!(p.is_absolute())`
### Why this is safe
`self` is already an `AbsolutePathBuf`, so `self.0` is
absolute/normalized. The parent of an absolute path is expected to be
absolute, so re-running fallible normalization here is unnecessary and
can introduce unrelated panics.
- use `skills_for_cwd` lookup to scope allowed skills and build
invocation context for downstream processing
- add detection in `stream_events_utils` to classify tool calls as
implicit skill invocations per the proposal (script runners, extensions,
`scripts` dirs, and SKILL.md reads)
- deduplicate invocations per turn and emit analytics/OTEL events on the
same background queue as explicit invokes
## Summary
The test exec_resume_last_respects_cwd_filter_and_all_flag makes one
session “newest” by resuming it, but rollout updated_at is stored/sorted
at second precision. On fast CI (especially Windows), the touch could
land in the same second as initial session creation, making ordering
nondeterministic.
This change adds a short sleep before the recency-touch step so the
resumed session is guaranteed to have a later updated_at, preserving the
intended assertion without changing product behavior.
## Summary
Persist network approval allow/deny decisions as `network_rule(...)`
entries in execpolicy (not proxy config)
It adds `network_rule` parsing + append support in `codex-execpolicy`,
including `decision="prompt"` (parse-only; not compiled into proxy
allow/deny lists)
- compile execpolicy network rules into proxy allow/deny lists and
update the live proxy state on approval
- preserve requirements execpolicy `network_rule(...)` entries when
merging with file-based execpolicy
- reject broad wildcard hosts (for example `*`) for persisted
`network_rule(...)`
## Why
After removing `exec-server`, the next step is to wire a new shell tool
to `codex-rs/shell-escalation` directly.
That is blocked while `codex-shell-escalation` depends on `codex-core`,
because the new integration would require `codex-core` to depend on
`codex-shell-escalation` and create a dependency cycle.
This change ports the reusable pieces from the earlier prep work, but
drops the old compatibility shim because `exec-server`/MCP support is
already gone.
## What Changed
### Decouple `shell-escalation` from `codex-core`
- Introduce a crate-local `SandboxState` in `shell-escalation`
- Introduce a `ShellCommandExecutor` trait so callers provide process
execution/sandbox integration
- Update `EscalateServer::exec(...)` and `run_escalate_server(...)` to
use the injected executor
- Remove the direct `codex_core::exec::process_exec_tool_call(...)` call
from `shell-escalation`
- Remove the `codex-core` dependency from `codex-shell-escalation`
### Restore reusable policy adapter exports
- Re-enable `unix::core_shell_escalation`
- Export `ShellActionProvider` and `ShellPolicyFactory` from
`shell-escalation`
- Keep the crate root API simple (no `legacy_api` compatibility layer)
### Port socket fixes from the earlier prep commit
- Use `socket2::Socket::pair_raw(...)` for AF_UNIX socketpairs and
restore `CLOEXEC` explicitly on both endpoints
- Keep `CLOEXEC` cleared only on the single datagram client FD that is
intentionally passed across `exec`
- Clean up `tokio::AsyncFd::try_io(...)` error handling in the socket
helpers
## Verification
- `cargo shear`
- `cargo clippy -p codex-shell-escalation --tests`
- `cargo test -p codex-shell-escalation`
## Why
We already plan to remove the shell-tool MCP path, and doing that
cleanup first makes the follow-on `shell-escalation` work much simpler.
This change removes the last remaining reason to keep
`codex-rs/exec-server` around by moving the `codex-execve-wrapper`
binary and shared shell test fixtures to the crates/tests that now own
that functionality.
## What Changed
### Delete `codex-rs/exec-server`
- Remove the `exec-server` crate, including the MCP server binary,
MCP-specific modules, and its test support/test suite
- Remove `exec-server` from the `codex-rs` workspace and update
`Cargo.lock`
### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
- Move the wrapper implementation into `shell-escalation`
(`src/unix/execve_wrapper.rs`)
- Add the `codex-execve-wrapper` binary entrypoint under
`shell-escalation/src/bin/`
- Update `shell-escalation` exports/module layout so the wrapper
entrypoint is hosted there
- Move the wrapper README content from `exec-server` to
`shell-escalation/README.md`
### Move shared shell test fixtures to `app-server`
- Move the DotSlash `bash`/`zsh` test fixtures from
`exec-server/tests/suite/` to `app-server/tests/suite/`
- Update `app-server` zsh-fork tests to reference the new fixture paths
### Keep `shell-tool-mcp` as a shell-assets package
- Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
artifact contains only patched Bash/Zsh payloads (no Rust binaries)
- Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
and docs to reflect the shell-assets-only package shape
- `shell-tool-mcp-ci.yml` does not need changes because it is already
JS-only
## Verification
- `cargo shear`
- `cargo clippy -p codex-shell-escalation --tests`
- `just clippy`
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
Bumps [owo-colors](https://github.com/owo-colors/owo-colors) from 4.2.3
to 4.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/owo-colors/owo-colors/releases">owo-colors's
releases</a>.</em></p>
<blockquote>
<h2>owo-colors 4.3.0</h2>
<h3>Fixed</h3>
<ul>
<li>Scripts in the <code>scripts/</code> directory are no longer
published in the crate package. Thanks <a
href="https://redirect.github.com/owo-colors/owo-colors/pull/152">weiznich</a>
for your first contribution!</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Mark methods with
<code>#[rust_analyzer::completions(ignore_flyimport)]</code> and the
<code>OwoColorize</code> trait with
<code>#[rust_analyzer::completions(ignore_flyimport_methods)]</code>.
This prevents owo-colors methods from being completed with rust-analyzer
unless the <code>OwoColorize</code> trait is included.</p>
<p>Unfortunately, this also breaks explicit autocomplete commands such
as Ctrl-Space in many editors. (The language server protocol doesn't
appear to have a way to differentiate between implicit and explicit
autocomplete commands.) On balance we believe this is the right
approach, but please do provide feedback on [PR <a
href="https://redirect.github.com/owo-colors/owo-colors/issues/141">#141</a>](<a
href="https://redirect.github.com/owo-colors/owo-colors/pull/141">owo-colors/owo-colors#141</a>)
if it negatively affects you.</p>
</li>
<li>
<p>Updated MSRV to Rust 1.81.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/owo-colors/owo-colors/blob/main/CHANGELOG.md">owo-colors's
changelog</a>.</em></p>
<blockquote>
<h2>[4.3.0] - 2026-02-22</h2>
<h3>Fixed</h3>
<ul>
<li>Scripts in the <code>scripts/</code> directory are no longer
published in the crate package. Thanks <a
href="https://redirect.github.com/owo-colors/owo-colors/pull/152">weiznich</a>
for your first contribution!</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Mark methods with
<code>#[rust_analyzer::completions(ignore_flyimport)]</code> and the
<code>OwoColorize</code> trait with
<code>#[rust_analyzer::completions(ignore_flyimport_methods)]</code>.
This prevents owo-colors methods from being completed with rust-analyzer
unless the <code>OwoColorize</code> trait is included.</p>
<p>Unfortunately, this also breaks explicit autocomplete commands such
as Ctrl-Space in many editors. (The language server protocol doesn't
appear to have a way to differentiate between implicit and explicit
autocomplete commands.) On balance we believe this is the right
approach, but please do provide feedback on [PR <a
href="https://redirect.github.com/owo-colors/owo-colors/issues/141">#141</a>](<a
href="https://redirect.github.com/owo-colors/owo-colors/pull/141">owo-colors/owo-colors#141</a>)
if it negatively affects you.</p>
</li>
<li>
<p>Updated MSRV to Rust 1.81.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="baf10f9a74"><code>baf10f9</code></a>
[owo-colors] version 4.3.0</li>
<li><a
href="6abe2026c5"><code>6abe202</code></a>
[meta] prepare changelog</li>
<li><a
href="ca81447041"><code>ca81447</code></a>
[RFC] add ignore_flyimport and ignore_flyimport_methods (<a
href="https://redirect.github.com/owo-colors/owo-colors/issues/141">#141</a>)</li>
<li><a
href="61de72e7f9"><code>61de72e</code></a>
Exclude development script from published package (<a
href="https://redirect.github.com/owo-colors/owo-colors/issues/152">#152</a>)</li>
<li><a
href="b2ad6bcd41"><code>b2ad6bc</code></a>
update MSRV to Rust 1.81 (<a
href="https://redirect.github.com/owo-colors/owo-colors/issues/156">#156</a>)</li>
<li>See full diff in <a
href="https://github.com/owo-colors/owo-colors/compare/v4.2.3...v4.3.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Summary
This fixes a TUI race (https://github.com/openai/codex/issues/11008)
where pressing Enter with Steer enabled while the assistant is still
streaming the final answer could put Codex into a non-recoverable
“running” state (no further prompts handled until exiting and resuming).
## Root Cause
In steer mode, `InputResult::Submitted` could submit immediately even
while a final-answer stream was active. That immediate submission races
with turn completion and can strand turn state.
## Fix
When handling `InputResult::Submitted`, we now queue instead of
immediate-submit if a final-answer stream is active
(`stream_controller.is_some()`).
This keeps behavior deterministic:
- Prompt is preserved in the queue.
- `on_task_complete()` drains queued input through
`maybe_send_next_queued_input()`.
- Follow-up prompts continue in FIFO order after completion.
## Why this resolves the “dead mode”
The problematic timing window is now converted into queueing, so prompts
entered during final streaming are not lost and are processed after the
current output ends. The model continues handling prompts normally
without requiring `/quit` + `resume`.
## Tests
Added regression coverage in `tui/src/chatwidget/tests.rs`:
- `steer_enter_queues_while_final_answer_stream_is_active`
- `steer_enter_during_final_stream_preserves_follow_up_prompts_in_order`
Both fail on old behavior and pass with this fix.
## Summary
- Replace the `panic!` in `map_owned_wrapped_line_to_range` with a
recoverable flow that skips synthetic leading characters, logs a warning
on mid-line mismatch, and returns the mapped prefix range instead of
crashing
- Fixes a crash when `textwrap` produces owned lines with synthetic
indent prefixes (e.g. non-space indents via
`initial_indent`/`subsequent_indent`)
## Test plan
- [x] Added unit test for direct mismatch recovery
(`map_owned_wrapped_line_to_range_recovers_on_non_prefix_mismatch`)
- [x] Added end-to-end `wrap_ranges` test with non-space indents that
forces owned wrapped lines and validates full source reconstruction
- [x] Verify no regressions in existing `wrapping.rs` tests (`cargo test
-p codex-tui`)
## Why
Tool handlers and runtimes needed to pass the same turn/session context
for shell and non-shell workflows without duplicative ownership churn.
Using shared pointers avoids temporary lifetimes and keeps existing
behavior unchanged while simplifying call sites.
## What changed
- Converted `ToolCtx` to store shared context handles (`Arc`-based),
including updates across shell, apply-patch, and unified-exec paths.
- Updated orchestrator/runtime call sites to consume the shared context
consistently and remove brittle move/borrow patterns.
- Kept behavior unchanged while preparing the type surface for the new
shell escalation integration in the next stack commit.
## Verification
- Validated this commit stack point with `just clippy` and confirmed
workspace compiles cleanly in this stack state.
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12583).
* #12584
* __->__ #12583
* #12556
Bumps [syn](https://github.com/dtolnay/syn) from 2.0.114 to 2.0.117.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/syn/releases">syn's
releases</a>.</em></p>
<blockquote>
<h2>2.0.117</h2>
<ul>
<li>Fix parsing of <code>self::</code> pattern in first function
argument (<a
href="https://redirect.github.com/dtolnay/syn/issues/1970">#1970</a>)</li>
</ul>
<h2>2.0.116</h2>
<ul>
<li>Optimize parse_fn_arg_or_variadic for less lookahead on erroneous
receiver (<a
href="https://redirect.github.com/dtolnay/syn/issues/1968">#1968</a>)</li>
</ul>
<h2>2.0.115</h2>
<ul>
<li>Enable GenericArgument::Constraint parsing in non-full mode (<a
href="https://redirect.github.com/dtolnay/syn/issues/1966">#1966</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7bcb37cdb3"><code>7bcb37c</code></a>
Release 2.0.117</li>
<li><a
href="9c6e7d3b8d"><code>9c6e7d3</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/syn/issues/1970">#1970</a>
from dtolnay/receiver</li>
<li><a
href="019a84847e"><code>019a848</code></a>
Fix self:: pattern in first function argument</li>
<li><a
href="23f54f3cf6"><code>23f54f3</code></a>
Update test suite to nightly-2026-02-18</li>
<li><a
href="b99b9a627c"><code>b99b9a6</code></a>
Unpin CI miri toolchain</li>
<li><a
href="a62e54a48b"><code>a62e54a</code></a>
Release 2.0.116</li>
<li><a
href="5a8ed9f32e"><code>5a8ed9f</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/syn/issues/1968">#1968</a>
from dtolnay/receiver</li>
<li><a
href="813afcc773"><code>813afcc</code></a>
Optimize parse_fn_arg_or_variadic for less lookahead on erroneous
receiver</li>
<li><a
href="c172150113"><code>c172150</code></a>
Add regression test for issue 1718</li>
<li><a
href="0071ab367c"><code>0071ab3</code></a>
Ignore type_complexity clippy lint</li>
<li>Additional commits viewable in <a
href="https://github.com/dtolnay/syn/compare/2.0.114...2.0.117">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Why
Shell execution refactoring in `exec-server` had become split between
duplicated code paths, which blocked a clean introduction of the new
reusable shell escalation flow. This commit creates a dedicated
foundation crate so later shell tooling changes can share one
implementation.
## What changed
- Added the `codex-shell-escalation` crate and moved the core escalation
pieces (`mcp` protocol/socket/session flow, policy glue) that were
previously in `exec-server` into it.
- Normalized `exec-server` Unix structure under a dedicated `unix`
module layout and kept non-Unix builds narrow.
- Wired crate/build metadata so `shell-escalation` is a first-class
workspace dependency for follow-on integration work.
## Verification
- Built and linted the stack at this commit point with `just clippy`.
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12556).
* #12584
* #12583
* __->__ #12556
## Summary
Improve Bazel CI failure diagnostics by printing the tail of each failed
target’s test.log directly in the GitHub Actions output.
Today, when a large Bazel test target fails (for example tests of
`codex-core`), the workflow often only shows a target-level Exit 101
plus a path to Bazel’s test.log. That makes it hard to see the actual
failing Rust test and panic without digging into artifacts or
reproducing locally.
This change makes the workflow automatically surface that information
inline.
## What Changed
In .github/workflows/bazel.yml:
- Capture Bazel console output via tee
- Preserve the Bazel exit code when piping (PIPESTATUS[0])
- On failure:
- Parse failed Bazel test targets from FAIL: //... lines
- Resolve Bazel test log directory via bazel info bazel-testlogs
- Print tail -n 200 for each failed target’s test.log
- Group each target’s output in GitHub Actions logs (::group::)
## Bonus
Disable `experimental_remote_repo_contents_cache` to prevent "Permission
Denied"
Summary
- mark `output-last-message` as a global exec flag so it can follow
subcommands like `resume`
- add regression tests in both `cli` and `exec` crates verifying the
flag order works when invoking `resume`
Fixes#12538
## Why
The current escalate path in `codex-rs/exec-server` still had policy
creation coupled to MCP details, which makes it hard to reuse the shell
execution flow outside the MCP server. This change is part of a broader
goal to split MCP-specific behavior from shared escalation execution so
other handlers (for example a future `ShellCommandHandler`) can reuse it
without depending on MCP request context types.
## What changed
- Added a new `EscalationPolicyFactory` abstraction in `mcp.rs`:
- `crate`-relative path: `codex-rs/exec-server/src/posix/mcp.rs`
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L87-L107
- Made `run_escalate_server` in `mcp.rs` accept a policy factory instead
of constructing `McpEscalationPolicy` directly.
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L178-L201
- Introduced `McpEscalationPolicyFactory` that stores MCP-only state
(`RequestContext`, `preserve_program_paths`) and implements the new
trait.
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L100-L117
- Updated `shell()` to pass a `McpEscalationPolicyFactory` instance into
`run_escalate_server`, so the server remains the MCP-specific wiring
layer.
-
https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L163-L170
## Verification
- Build and test execution was not re-run in this pass; changes are
limited to `mcp.rs` and preserve the existing escalation flow semantics
by only extracting policy construction behind a factory.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12555).
* #12556
* __->__ #12555
## Why
The zsh integration tests were still brittle in two ways:
- they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
they often did not exercise the patched zsh fork that `shell-tool-mcp`
ships
- once the tests consistently used the vendored zsh fork, they exposed
real Linux-specific zsh-fork issues in CI
In particular, the Linux failures were not just test noise:
- the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
`codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
could receive malformed arguments
- the
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
test uses the zsh exec bridge (which talks to the parent over a Unix
socket), but Linux restricted sandbox seccomp denies `connect(2)`,
causing timeouts on `ubuntu-24.04` x86/arm
This PR makes the zsh tests consistently run against the intended
vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
signal is meaningful.
## What Changed
- Added a single shared test-only DotSlash file for the patched zsh fork
at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
`bash` test resource).
- Updated both app-server and exec-server zsh tests to use that shared
DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
dependency).
- Updated the app-server zsh-fork test helper to resolve the shared
DotSlash zsh and avoid silently falling back to host zsh.
- Kept the app-server zsh-fork tests configured via `config.toml`, using
a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
to `-c`) for the subcommand-decline test.
- Hardened the app-server subcommand-decline zsh-fork test for CI
variability:
- tolerate an extra `/responses` POST with a no-op mock response
- tolerate non-target approval ordering while remaining strict on the
two `/usr/bin/true` approvals and decline behavior
- use `DangerFullAccess` on Linux for this one test because it validates
zsh approval flow, not Linux sandbox socket restrictions
- Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
`ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
arg0 dispatch continues to work.
- Moved `maybe_run_zsh_exec_wrapper_mode()` under
`arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
handling coexists correctly with arg0-dispatched helper modes.
- Consolidated duplicated `dotslash -- fetch` resolution logic into
shared test support (`core/tests/common/lib.rs`).
- Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
differences by:
- resolving an absolute `git` path
- running `git init --quiet .`
- asserting success / `.git` creation instead of relying on banner text
## Verification
- `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
- `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
- `bazel test //codex-rs/exec-server:exec-server-all-test
--test_output=streamed --test_arg=--nocapture
--test_arg=accept_elicitation_for_prompt_rule_with_zsh`
- CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
aarch64-unknown-linux-gnu` passed in [run
22291424358](https://github.com/openai/codex/actions/runs/22291424358)
## PR Notes
This PR adds a project-scoped `babysit-pr` skill for ongoing PR
monitoring (CI, reviews, mergeability).
Simply invoke this skill after creating a PR, and codex will do its best
to get it to a mergeable state:
### What the skill does
* Fixes CI failures related to the PR
* Retries CI failures due to flaky tests
* Addresses code review comments if it agrees with them
* Addresses merge conflicts on main branch
### How the skill works
- Polls PR status on a loop (CI checks, workflow runs, review activity,
mergeability, and review decision).
- Detects new review feedback (including inline comments and automated
Codex review comments) and prompts/handles follow-up work.
- Distinguishes pending vs failed vs passed CI and identifies likely
flaky failures.
- Can retry failed checks/workflows when appropriate.
- Prioritizes actionable code review feedback over flaky CI retries (to
avoid rerunning CI on a SHA that is about to be replaced).
- Continues monitoring after fixes are applied and pushed, rather than
stopping after a progress update.
- Uses a slower backoff polling cadence once CI is green, while still
watching for new review feedback or state changes.
- Treats required review/approval as a blocking condition and keeps
watching until the PR is actually merge-ready (or merged/closed, or
human intervention is needed).
### Intended outcome
Keep the PR moving with minimal manual babysitting by continuously
watching for CI failures, reviewer feedback, and merge blockers, and
responding in the right order until the PR is ready to merge.
Summary
- map csharp/c-sharp aliases to the existing C# syntax in the highlight
matcher
- ensure the extension list and tests include .cs and the new aliases so
coverage stays accurate
Testing
<img width="543" height="266" alt="image"
src="https://github.com/user-attachments/assets/e6c8a42f-649c-4c30-b574-421b4287534c"
/>
## Summary
- order bundled and custom themes together by name while keeping entries
stable across platforms
- update the theme fixture names and tests to assert case-insensitive
ordering
Alt-d should delete the next word. It didn’t. Now it does. Added a small
test so it stays that way.
Details:
File updated:
[codex-rs/tui/src/bottom_pane/textarea.rs](./codex-rs/tui/src/bottom_pane/textarea.rs)
Test added: delete_forward_word_alt_d — verifies Alt-d deletes the next
word and keeps the cursor position correct.
Solves Issue #12453
Summary
- distinguish exec end handling targets (active tracking, active orphan
history, new cell) so unified exec responses don’t clobber unrelated
exploring cells
- ensure orphan ends flush existing exploring history when complete,
insert standalone history entries, and keep active cells correct
- add regression tests plus a snapshot covering the new behavior and
expose the ExecCell completion result for verification
Fix for https://github.com/openai/codex/issues/12278
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
- keep the per-thread app-server listener alive when the last client
unsubscribes or disconnects
- preserve listener-side active turn history so running `thread/resume`
can merge an in-progress turn snapshot after reconnect
- add `ThreadStateManager` regressions for disconnect/unsubscribe
retention and explicit thread teardown cleanup
Added unit tests, and I manually tested to confirm the fix
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
Adds syntax highlighting to the TUI for fenced code blocks in markdown
responses and file diffs, plus a `/theme` command with live preview and
persistent theme selection. Uses syntect (~250 grammars, 32 bundled
themes, ~1 MB binary cost) — the same engine behind `bat`, `delta`, and
`xi-editor`. Includes guardrails for large inputs, graceful fallback to
plain text, and SSH-aware clipboard integration for the `/copy` command.
<img width="1554" height="1014" alt="image"
src="https://github.com/user-attachments/assets/38737a79-8717-4715-b857-94cf1ba59b85"
/>
<img width="2354" height="1374" alt="image"
src="https://github.com/user-attachments/assets/25d30a00-c487-4af8-9cb6-63b0695a4be7"
/>
## Problem
Code blocks in the TUI (markdown responses and file diffs) render
without syntax highlighting, making it hard to scan code at a glance.
Users also have no way to pick a color theme that matches their terminal
aesthetic.
## Mental model
The highlighting system has three layers:
1. **Syntax engine** (`render::highlight`) -- a thin wrapper around
syntect + two-face. It owns a process-global `SyntaxSet` (~250 grammars)
and a `RwLock<Theme>` that can be swapped at runtime. All public entry
points accept `(code, lang)` and return ratatui `Span`/`Line` vectors or
`None` when the language is unrecognized or the input exceeds safety
guardrails.
2. **Rendering consumers** -- `markdown_render` feeds fenced code blocks
through the engine; `diff_render` highlights Add/Delete content as a
whole file and Update hunks per-hunk (preserving parser state across
hunk lines). Both callers fall back to plain unstyled text when the
engine returns `None`.
3. **Theme lifecycle** -- at startup the config's `tui.theme` is
resolved to a syntect `Theme` via `set_theme_override`. At runtime the
`/theme` picker calls `set_syntax_theme` to swap themes live; on cancel
it restores the snapshot taken at open. On confirm it persists `[tui]
theme = "..."` to config.toml.
## Non-goals
- Inline diff highlighting (word-level change detection within a line).
- Semantic / LSP-backed highlighting.
- Theme authoring tooling; users supply standard `.tmTheme` files.
## Tradeoffs
| Decision | Upside | Downside |
| ------------------------------------------------ |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-----------------------------------------------------------------------------------------------------------------------
|
| syntect over tree-sitter / arborium | ~1 MB binary increase for ~250
grammars + 32 themes; battle-tested crate powering widely-used tools
(`bat`, `delta`, `xi-editor`). tree-sitter would add ~12 MB for 20-30
languages or ~35 MB for full coverage. | Regex-based; less structurally
accurate than tree-sitter for some languages (e.g. language injections
like JS-in-HTML). |
| Global `RwLock<Theme>` | Enables live `/theme` preview without
threading Theme through every call site | Lock contention risk
(mitigated: reads vastly outnumber writes, single UI thread) |
| Skip background / italic / underline from themes | Terminal BG
preserved, avoids ugly rendering on some themes | Themes that rely on
these properties lose fidelity |
| Guardrails: 512 KB / 10k lines | Prevents pathological stalls on huge
diffs or pastes | Very large files render without color |
## Architecture
```
config.toml ─[tui.theme]─> set_theme_override() ─> THEME (RwLock)
│
┌───────────────────────────────────────────┘
│
markdown_render ─── highlight_code_to_lines(code, lang) ─> Vec<Line>
diff_render ─── highlight_code_to_styled_spans(code, lang) ─> Option<Vec<Vec<Span>>>
│
│ (None ⇒ plain text fallback)
│
/theme picker ─── set_syntax_theme(theme) // live preview swap
─── current_syntax_theme() // snapshot for cancel
─── resolve_theme_by_name(name) // lookup by kebab-case
```
Key files:
- `tui/src/render/highlight.rs` -- engine, theme management, guardrails
- `tui/src/diff_render.rs` -- syntax-aware diff line wrapping
- `tui/src/theme_picker.rs` -- `/theme` command builder
- `tui/src/bottom_pane/list_selection_view.rs` -- side content panel,
callbacks
- `core/src/config/types.rs` -- `Tui::theme` field
- `core/src/config/edit.rs` -- `syntax_theme_edit()` helper
## Observability
- `tracing::warn` when a configured theme name cannot be resolved.
- `Config::startup_warnings` surfaces the same message as a TUI banner.
- `tracing::error` when persisting theme selection fails.
## Tests
- Unit tests in `highlight.rs`: language coverage, fallback behavior,
CRLF stripping, style conversion, guardrail enforcement, theme name
mapping exhaustiveness.
- Unit tests in `diff_render.rs`: snapshot gallery at multiple terminal
sizes (80x24, 94x35, 120x40), syntax-highlighted wrapping, large-diff
guardrail, rename-to-different-extension highlighting, parser state
preservation across hunk lines.
- Unit tests in `theme_picker.rs`: preview rendering (wide + narrow),
dim overlay on deletions, subtitle truncation, cancel-restore, fallback
for unavailable configured theme.
- Unit tests in `list_selection_view.rs`: side layout geometry, stacked
fallback, buffer clearing, cancel/selection-changed callbacks.
- Integration test in `lib.rs`: theme warning uses the final
(post-resume) config.
## Cargo Deny: Unmaintained Dependency Exceptions
This PR adds two `cargo deny` advisory exceptions for transitive
dependencies pulled in by `syntect v5.3.0`:
| Advisory | Crate | Status |
|----------|-------|--------|
| RUSTSEC-2024-0320 | `yaml-rust` | Unmaintained (maintainer
unreachable) |
| RUSTSEC-2025-0141 | `bincode` | Unmaintained (development ceased;
v1.3.3 considered complete) |
**Why this is safe in our usage:**
- Neither advisory describes a known security vulnerability. Both are
"unmaintained" notices only.
- `bincode` is used by syntect to deserialize pre-compiled syntax sets.
Again, these are **static vendored artifacts** baked into the binary at
build time. No user-supplied bincode data is ever deserialized. - Attack
surface is zero for both crates; exploitation would require a
supply-chain compromise of our own build artifacts.
- These exceptions can be removed when syntect migrates to `yaml-rust2`
and drops `bincode`, or when alternative crates are available upstream.
robust to Nix shell paths (#12476)
## Summary
- Updated `codex-rs/core/src/shell.rs` tests for shell detection to stop
asserting hardcoded shell paths.
- `detects_bash` and `detects_sh` now assert executable basenames
(`bash`, `sh`) rather than `/bin/*`/`/usr/bin/*` absolute paths.
- This keeps behavior the same while avoiding failures in Nix
environments where shells are resolved from `/nix/store/.../bin`.
## Testing
- `nix develop .#default --command sh -lc 'export
PKG_CONFIG_PATH=/nix/store/6az1q591wwlgazzskngr6rl7gmhpyvnc-libcap-2.77-dev/lib/pkgconfig:/nix/store/fgm3pz8486ksh3f94629lpb7xjr2wjp7-openssl-3.6.0-dev/lib/pkgconfig:$PKG_CONFIG_PATH;
export PKG_CONFIG_PATH_FOR_TARGET=$PKG_CONFIG_PATH; cd
/home/alex/workspace/openai/codex/codex-rs && cargo test -p codex-core
--lib detects_bash && cargo test -p codex-core --lib detects_sh'`
## Why
The two failing tests previously hardcoded fixed paths and failed under
the Nix shell due to Nix-provided shell binary locations.
## Links
- Bug report / enhancement request: not publicly filed yet; this was
reproduced in the local Nix environment.
## Why
`codex-core::all` has a flaky test,
`suite::realtime_conversation::conversation_start_audio_text_close_round_trip`,
that assumes a fixed ordering between `conversation.item.create` and
`response.input_audio.delta` requests.
That ordering is not guaranteed: realtime text and audio input are
forwarded through separate queues and a background task, so either
request can be observed first while still being correct behavior.
## What Changed
- Updated the assertion in
`codex-rs/core/tests/suite/realtime_conversation.rs` to compare the two
observed request types order-independently.
- Kept the existing checks that `session.create` is sent first and that
exactly two follow-up requests are recorded.
## Verification
- Re-ran `cargo test -p codex-core --test all
conversation_start_audio_text_close_round_trip` 10 times locally.
## Problem
Long URLs containing `/` and `-` characters are split across multiple
terminal lines by `textwrap`'s default hyphenation rules. This breaks
terminal link detection: emulators can no longer identify the URL as
clickable, and copy-paste yields a truncated fragment. The issue affects
every view that renders user or agent text — exec output, history cells,
markdown, the app-link setup screen, and the VT100 scrollback path.
A secondary bug compounds the first: `desired_height()` calculations
count logical lines rather than viewport rows. When a URL overflows its
line and wraps visually, the height budget is too small, causing content
to clip or leave gaps.
Here is how the complete URL is interpreted by the terminal before
(first line only) and after (complete URL):
| Before | After |
|---|---|
| <img width="777" height="1002" alt="Screenshot 2026-02-17 at 7 59 11
PM"
src="https://github.com/user-attachments/assets/193a89a0-7e56-49c5-8b76-53499a76e7e3"
/> | <img width="777" height="1002" alt="Screenshot 2026-02-17 at 7 58
40 PM"
src="https://github.com/user-attachments/assets/0b9b4c14-aafb-439f-9ffe-f6bba556f95e"
/> |
## Mental model
The TUI now treats URL-like tokens as atomic units that must never be
split by the wrapping engine. Every call site that previously used
`word_wrap_*` has been migrated to `adaptive_wrap_*`, which inspects
each line for URL-like tokens and switches wrapping strategy
accordingly:
- **Non-URL lines** follow the existing `textwrap` path unchanged (word
boundaries, optional indentation, hyphenation).
- **URL-only lines** (with at most decorative markers like `│`, `-`,
`1.`) are emitted unwrapped so terminal link detection works; ratatui's
`Wrap { trim: false }` handles the final character wrap at render time.
- **Mixed lines** (URL + substantive non-URL prose) flow through
`adaptive_wrap_line` so prose wraps naturally at word boundaries while
URL tokens remain unsplit.
Height measurement everywhere now delegates to
`Paragraph::line_count(width)`, which accounts for the visual row cost
of overflowed lines. This single source of truth replaces ad-hoc line
counting in individual cells.
For terminal scrollback (the VT100 path that prints history when the TUI
exits), URL-only lines are emitted unwrapped so the terminal's own link
detector can find them. Mixed URL+prose lines use adaptive wrapping so
surrounding text wraps naturally. Continuation rows are pre-cleared to
avoid stale content artifacts.
## Non-goals
- Full RFC 3986 URL parsing. The detector is a conservative heuristic
that covers `scheme://host`, bare domains (`example.com/path`),
`localhost:port`, and IPv4 hosts. IPv6 (`[::1]:8080`) and exotic schemes
are intentionally excluded from v1.
- Changing wrapping behavior for non-URL content.
- Reflowing or reformatting existing terminal scrollback on resize.
## Tradeoffs
| Decision | Upside | Downside |
|----------|--------|----------|
| Heuristic URL detection vs. full parser | Fast, zero-alloc on the hot
path; conservative enough to reject file paths like `src/main.rs` |
False negatives on obscure URL formats (they get split as before) |
| Adaptive (three-path) wrapping | Non-URL lines are untouched — no
behavior change, no perf cost; mixed lines wrap prose naturally while
preserving URLs | Three wrapping strategies to reason about when
debugging layout |
| Row-based truncation with line-unit ellipsis | Accurate viewport
budget; stable "N lines omitted" count across terminal widths |
`truncate_lines_middle` is more complex (must compute per-line row cost)
|
| Unwrapped URL-only lines in scrollback | Terminal emulators detect
clickable links; copy-paste gets the full URL | TUI and scrollback
formatting diverge for URL-only lines |
| Default `desired_height` via `Paragraph::line_count` | DRY — most
cells inherit correct measurement | Cells with custom layout must
remember to override |
## Architecture
```mermaid
flowchart TD
A["adaptive_wrap_*()"] --> B{"line_contains_url_like?"}
B -- No URL tokens --> C["word_wrap_line<br/>(textwrap default)"]
B -- Has URL tokens --> D{"mixed URL + prose?"}
D -- "URL-only<br/>(+ decorative markers)" --> E["emit unwrapped<br/>(terminal char-wraps)"]
D -- "Mixed<br/>(URL + substantive text)" --> F["adaptive_wrap_line<br/>(AsciiSpace + custom WordSplitter)"]
C --> G["Paragraph::line_count(w)<br/>(single height truth)"]
E --> G
F --> G
```
**Changed files:**
| File | Role |
|------|------|
| `wrapping.rs` | URL detection heuristics, mixed-line detection,
`adaptive_wrap_*` functions, custom `WordSplitter` |
| `exec_cell/render.rs` | Row-aware `truncate_lines_middle`, adaptive
wrapping for command/output display |
| `history_cell.rs` | Migrate all cell types to `adaptive_wrap_*`;
default `desired_height` via `Paragraph::line_count` |
| `insert_history.rs` | Three-path scrollback wrapping (unwrapped
URL-only, adaptive mixed, word-wrapped text); continuation row clearing
|
| `app_link_view.rs` | Adaptive wrapping for setup URL; `desired_height`
via `Paragraph::line_count` |
| `markdown_render.rs` | Adaptive wrapping in `finish_paragraph` |
| `model_migration.rs` | Viewport-aware wrapping for narrow-pane
markdown |
| `pager_overlay.rs` | `Wrap { trim: false }` for transcript and
streaming chunks |
| `queued_user_messages.rs` | Migrate to `adaptive_wrap_lines` |
| `status/card.rs` | Migrate to `adaptive_wrap_lines` |
## Observability
- **Ellipsis message** in truncated exec output reports omitted count in
logical lines (stable across resize) rather than viewport rows
(fluctuates).
- URL detection is deterministic and stateless — no hidden caching or
memoization to go stale.
- Height mismatch bugs surface immediately as visual clipping or gaps;
the `Paragraph::line_count` path is the same code ratatui uses at render
time, so measurement and rendering cannot diverge.
## Tests
26 new unit tests across 7 files, covering:
- **URL integrity**: assert a URL-like token appears on exactly one
rendered line (not split across two).
- **Height accuracy**: compare `desired_height()` against
`Paragraph::line_count()` for URL-containing content.
- **Row-aware truncation**: verify ellipsis counts logical lines and
output fits within the row budget.
- **Scrollback rendering**: VT100 backend tests confirm prefix and URL
land on the same row; continuation rows are cleared; mixed URL+prose
lines wrap prose while preserving URL tokens.
- **Mixed URL+prose detection**: `line_has_mixed_url_and_non_url_tokens`
correctly distinguishes lines with substantive non-URL text from lines
with only decorative markers alongside a URL.
- **Heuristic correctness**: positive matches (`https://...`,
`example.com/path`, `localhost:3000/api`, `192.168.1.1:8080/health`) and
negative matches (`src/main.rs`, `foo/bar`, `hello-world`).
## Risks and open items
1. **URL-like tokens in code output** (e.g. `example.com/api` inside a
JSON blob) will trigger URL-preserving wrap on that line. This is
acceptable — the worst case is a slightly wider line, not broken output.
2. **Very long non-URL tokens on a URL line** can only break at
character boundaries (the custom splitter emits all char indices for
non-URL words). On extremely narrow terminals this could overflow, but
narrow terminals already degrade gracefully.
3. **No IPv6 support** — `[::1]:8080/path` will be treated as a non-URL
and may get split. Can be added later without API changes.
Fixes#5457
Fixes#11845.
Adjust context/token estimation for inline image `data:*;base64,...`
URLs so we
do not count the raw base64 payload as model-visible text.
What changed:
- keep the existing JSON-length estimator as the baseline
- detect only inline base64 `data:` image URLs in message and
function-call
output content items
- subtract only the base64 payload bytes (preserving data URL prefix +
JSON
overhead)
- add a fixed per-image estimate of 340 bytes (~85 tokens at the repo’s
4-bytes/token heuristic)
This avoids large overestimates from MCP image tool outputs while
leaving normal
image URLs (`https://`, `file://`, non-base64 `data:` URLs) unchanged.
Tests:
- message image data URL estimate regression
- function-call output image data URL estimate regression
- non-base64 image URLs unchanged
- non-base64 `data:` URLs unchanged
- `data:application/octet-stream;base64,...` adjusted
- multiple inline images apply multiple fixed costs
- text-only items unchanged
Fixes#11852
Resume replay was applying transient runtime events (`TurnStarted`,
`StreamError`) as if they were live, which could leave the TUI stuck in
a stale `Working` / `Reconnecting...` state after resuming an
interrupted reconnect.
This change makes replay transcript-oriented for these events by:
- skipping retry-status restoration for replayed non-stream events
- ignoring replayed `TurnStarted` for task-running state
- ignoring replayed `StreamError` for reconnect/status UI
Also adds TUI regression tests and snapshot coverage for the interrupted
reconnect replay case.
## Why
`codex-core` was carrying the embedded system-skill sample assets (and a
`build.rs` that walks those files to register rerun triggers). Those
assets change infrequently, but any change under `codex-core` still ties
them to `codex-core`'s build/cache lifecycle.
This change moves the embedded system-skills packaging into a dedicated
`codex-skills` crate so it can be cached independently. That reduces
unnecessary invalidation/rebuild pressure on `codex-core` when the
skills bundle is the only thing that changes.
## What Changed
- Added a new `codex-rs/skills` crate (`codex-skills`) with:
- `Cargo.toml`
- `BUILD.bazel`
- `build.rs` to track skill asset file changes for Cargo rebuilds
- `src/lib.rs` containing the embedded system-skills install/cache logic
previously in `codex-core`
- Moved the embedded sample skill assets from
`codex-rs/core/src/skills/assets/samples` to
`codex-rs/skills/src/assets/samples`.
- Updated `codex-rs/core/Cargo.toml` to depend on `codex-skills` and
removed `codex-core`'s direct `include_dir` dependency.
- Removed `codex-core`'s `build.rs`.
- Replaced `codex-rs/core/src/skills/system.rs` implementation with a
thin re-export wrapper to keep existing `codex-core` call sites
unchanged.
- Updated workspace manifests/lockfile (`codex-rs/Cargo.toml`,
`codex-rs/Cargo.lock`) for the new crate.
## Why
`codex-rs/arg0` only needed two things from `codex-core`:
- the `find_codex_home()` wrapper
- the special argv flag used for the internal `apply_patch`
self-invocation path
That made `codex-arg0` depend on `codex-core` for a very small surface
area. This change removes that dependency edge and moves the shared
`apply_patch` invocation flag to a more natural boundary
(`codex-apply-patch`) while keeping the contract explicitly documented.
## What Changed
- Moved the internal `apply_patch` argv[1] flag constant out of
`codex-core` and into `codex-apply-patch`.
- Renamed the constant to `CODEX_CORE_APPLY_PATCH_ARG1` and documented
that it is part of the Codex core process-invocation contract (even
though it now lives in `codex-apply-patch`).
- Updated `arg0`, the core apply-patch runtime, and the `codex-exec`
apply-patch test to import the constant from `codex-apply-patch`.
- Updated `codex-rs/arg0` to call
`codex_utils_home_dir::find_codex_home()` directly instead of
`codex_core::config::find_codex_home()`.
- Removed the `codex-core` dependency from `codex-rs/arg0` and added the
needed direct dependency on `codex-utils-home-dir`.
- Added `codex-apply-patch` as a dev-dependency for `codex-rs/exec`
tests (the apply-patch test now imports the moved constant directly).
## Verification
- `cargo test -p codex-apply-patch`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core --lib apply_patch`
- `cargo test -p codex-exec
test_standalone_exec_cli_can_use_apply_patch`
- `cargo shear`
## Why
`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.
This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.
## What Changed
- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
- `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
- `codex_protocol::protocol`
- `codex_protocol::config_types`
- `codex_protocol::models`
- `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
- `codex-utils-approval-presets`
- `codex-utils-cli`
## Verification
- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
## Why
Compiling `codex-rs/core` is a bottleneck for local iteration, so this
change continues the ongoing extraction of config-related functionality
out of `codex-core` and into `codex-config`.
The goal is not just to move code, but to reduce `codex-core` ownership
and indirection so more code depends on `codex-config` directly.
## What Changed
- Moved config diagnostics logic from
`core/src/config_loader/diagnostics.rs` into
`config/src/diagnostics.rs`.
- Updated `codex-core` to use `codex-config` diagnostics types/functions
directly where possible.
- Removed the `core/src/config_loader/diagnostics.rs` shim module
entirely; the remaining `ConfigToml`-specific calls are in
`core/src/config_loader/mod.rs`.
- Moved `CONFIG_TOML_FILE` into `codex-config` and updated existing
references to use `codex_config::CONFIG_TOML_FILE` directly.
- Added a direct `codex-config` dependency to `codex-cli` for its
`CONFIG_TOML_FILE` use.
## Summary
- move regular-turn context diff/full-context persistence into
`run_turn` so pre-turn compaction runs before incoming context updates
are recorded
- after successful pre-turn compaction, rely on a cleared
`reference_context_item` to trigger full context reinjection on the
follow-up regular turn (manual `/compact` keeps replacement history
summary-only and also clears the baseline)
- preserve `<model_switch>` when full context is reinjected, and inject
it *before* the rest of the full-context items
- scope `reference_context_item` and `previous_model` to regular user
turns only so standalone tasks (`/compact`, shell, review, undo) cannot
suppress future reinjection or `<model_switch>` behavior
- make context-diff persistence + `reference_context_item` updates
explicit in the regular-turn path, with clearer docs/comments around the
invariant
- stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
(only regular turns persist `TurnContextItem` now)
- simplify resume/fork previous-model/reference-baseline hydration by
looking up the last surviving turn context from rollout lifecycle
events, including rollback and compaction-crossing handling
- remove the legacy fallback that guessed from bare `TurnContext`
rollouts without lifecycle events
- update compaction/remote-compaction/model-visible snapshots and
compact test assertions (including remote compaction mock response
shape)
## Why
We were persisting incoming context items before spawning the regular
turn task, which let pre-turn compaction requests accidentally include
incoming context diffs without the new user message. Fixing that exposed
follow-on baseline issues around `/compact`, resume/fork, and standalone
tasks that could cause duplicate context injection or suppress
`<model_switch>` instructions.
This PR re-centers the invariants around regular turns:
- regular turns persist model-visible context diffs/full reinjection and
update the `reference_context_item`
- standalone tasks do not advance those regular-turn baselines
- compaction clears the baseline when replacement history may have
stripped the referenced context diffs
## Follow-ups (TODOs left in code)
- `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
comprehensively
- `TODO(ccunningham)`: include pending incoming context items in
pre-turn compaction threshold estimation
- `TODO(ccunningham)`: inject updated personality spec alongside
`<model_switch>` so some model-switch paths can avoid forced full
reinjection
- `TODO(ccunningham)`: review task turn lifecycle
(`TurnStarted`/`TurnComplete`) behavior and emit task-start context
diffs for task types that should have them (excluding `/compact`)
## Validation
- `just fmt`
- CI should cover the updated compaction/resume/model-visible snapshot
expectations and rollout-hydration behavior
- I did **not** rerun the full local test suite after the latest
resume-lookup / rollout-persistence simplifications
## Why
Developers are frequently running low on disk space, and routine use of
`--all-features` contributes to larger Cargo build caches in `target/`
by compiling additional feature combinations.
This change updates local workflow guidance to avoid `--all-features` by
default and reserve it for cases where full feature coverage is
specifically needed.
## What Changed
- Updated `AGENTS.md` guidance for `codex-rs` to recommend `cargo test`
/ `just test` for full-suite local runs, and to call out the disk-usage
cost of routine `--all-features` usage.
- Updated the root `justfile` so `just fix` and `just clippy` no longer
pass `--all-features` by default.
- Updated `docs/install.md` to explicitly describe `cargo test
--all-features` as an optional heavier-weight run (more build time and
`target/` disk usage).
## Verification
- Confirmed the `justfile` parses and the recipes list successfully with
`just --list`.
## Summary
`gpt-5.3-codex` really likes to write complicated shell scripts, and
suggest a partial prefix_rule that wouldn't actually approve the
command. We should only show the `prefix_rule` suggestion from the model
if it would actually fully approve the command the user is seeing.
This will technically cause more instances of overly-specific
suggestions when we fallback, but I think the UX is clearer,
particularly when the model doesn't necessarily understand the current
limitations of execpolicy parsing.
## Testing
- [x] Add unit tests
- [x] Add integration tests
## Why
The generated unnamespaced JSON envelope schemas (`ClientRequest` and
`ServerNotification`) still contained both v1 and v2 variants, which
pulled legacy v1/core types and v2 types into the same `definitions`
graph. That caused `schemars` to produce numeric suffix names (for
example `AskForApproval2`, `ByteRange2`, `MessagePhase2`).
This PR moves JSON codegen toward v2-only output while preserving the
unnamespaced envelope artifacts, and avoids reintroducing numeric-suffix
tolerance by removing the v1/internal-only variants that caused the
collisions in those envelope schemas.
## What Changed
- In `codex-rs/app-server-protocol/src/export.rs`, JSON generation now
excludes v1 schema artifacts (`v1/*`) while continuing to emit
unnamespaced/root JSON schemas and the JSON bundle.
- Added a narrow JSON v1 allowlist (`JSON_V1_ALLOWLIST`) so
`InitializeParams` and `InitializeResponse` are still emitted.
- Added JSON-only post-processing for the mixed envelope schemas before
collision checks run:
- `ClientRequest`: strips v1 request variants from the generated `oneOf`
using the temporary `V1_CLIENT_REQUEST_METHODS` list
- `ServerNotification`: strips v1 notifications plus the internal-only
`rawResponseItem/completed` notification using the temporary
`EXCLUDED_SERVER_NOTIFICATION_METHODS_FOR_JSON` list
- Added a temporary local-definition pruning pass for those envelope
schemas so now-unreferenced v1/core definitions are removed from
`definitions` after method filtering.
- Updated the variant-title naming heuristic for single-property literal
object variants to use the literal value (when available), avoiding
collisions like multiple `state`-only variants all deriving the same
title.
- Collision handling remains fail-fast (no numeric suffix fallback map
in this PR path).
## Verification
- `just write-app-server-schema`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12408).
* __->__ #12408
* #12406
## Summary
- switch a few app-server `turn_start` tests from
`codex/event/task_complete` waits to `turn/completed` waits
- avoid matching unrelated/background `task_complete` events
- keep this flaky test fix separate from the /title feature PR
## Why
On Windows ARM CI, these tests can return early after observing a
generic `codex/event/task_complete` notification from another task. That
can leave the mock Responses server with fewer calls than expected and
fail the test with a wiremock verification mismatch.
Using `turn/completed` matches the app-server turn lifecycle
notification the tests actually care about.
## Validation
- `cargo test -p codex-app-server
turn_start_updates_sandbox_and_cwd_between_turns_v2 -- --nocapture`
- `cargo test -p codex-app-server turn_start_exec_approval_ --
--nocapture`
- `just fmt`
https://github.com/openai/codex/pull/10455 introduced the `phase` field,
and then https://github.com/openai/codex/pull/12072 introduced a
`MessagePhase` type in `v2.rs` that paralleled the `MessagePhase` type
in `codex-rs/protocol/src/models.rs`.
The app server protocol prefers `camelCase` while the Responses API uses
`snake_case`, so this meant we had two versions of `MessagePhase` with
different serialization rules. When the app server protocol refers to
types from the Responses API, we use the wire format of the the
Responses API even though it is inconsistent with the app server API.
This PR deletes `MessagePhase` from `v2.rs` and consolidates on the
Responses API version to eliminate confusion.
## Why
`thread/resume` responses for already-running threads can be reported as
`Idle` even while a turn is still in progress. This is caused by a
timing window where the runtime watch state has not yet observed the
running-thread transition, so API clients can receive stale status
information at resume time.
Possibly related: https://github.com/openai/codex/pull/11786
## What
- Add a shared status normalization helper, `resolve_thread_status`, in
`codex-rs/app-server/src/thread_status.rs` that resolves
`Idle`/`NotLoaded` to `Active { active_flags: [] }` when an in-progress
turn is known.
- Reuse this helper across thread response paths in
`codex-rs/app-server/src/codex_message_processor.rs` (including
`thread/start`, `thread/unarchive`, `thread/read`, `thread/resume`,
`thread/fork`, and review/thread-started notification responses).
- In `handle_pending_thread_resume_request`, use both the in-memory
`active_turn_snapshot` and the resumed rollout turns to decide whether a
turn is in progress before resolving thread status for the response.
- Extend `thread_status` tests to validate the new status-resolution
behavior directly.
## Verification
- `cargo test -p codex-app-server
suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
- add top-level `experimental_realtime_ws_backend_prompt` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
`backend_prompt`, with config + realtime tests
Addresses https://github.com/openai/codex/issues/11013
## Summary
- add a Plan implementation path in the TUI that lets users choose
reasoning before switching to Default mode and implementing
- add Plan-mode reasoning scope handling (Plan-only override vs
all-modes default), including config/schema/docs plumbing for
`plan_mode_reasoning_effort`
- remove the hardcoded Plan preset medium default and make the reasoning
popup reflect the active Plan override as `(current)`
- split the collaboration-mode switch notification UI hint into #12307
to keep this diff focused
If I have `plan_mode_reasoning_effort = "medium"` set in my
`config.toml`:
<img width="699" height="127" alt="Screenshot 2026-02-20 at 6 59 37 PM"
src="https://github.com/user-attachments/assets/b33abf04-6b7a-49ed-b2e9-d24b99795369"
/>
If I don't have `plan_mode_reasoning_effort` set in my `config.toml`:
<img width="704" height="129" alt="Screenshot 2026-02-20 at 7 01 51 PM"
src="https://github.com/user-attachments/assets/88a086d4-d2f1-49c7-8be4-f6f0c0fa1b8d"
/>
## Codex author
`codex resume 019c78a2-726b-7fe3-adac-3fa4523dcc2a`
- add top-level `experimental_realtime_ws_base_url` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
transport, with config + realtime tests
commit 923f931121 introduced a dependency
on `libcap`. This PR fixes the nix build by including `libcap` in nix's
build inputs
issue number: #12102. @etraut-openai gave me permission to open pr
Testing:
running `nix run .#codex-rs` works on both macos (aarch64) and nixos
(x86-64)
- Introduce `RealtimeConversationManager` for realtime API management
- Add `op::conversation` to start conversation, insert audio, insert
text, and close conversation.
- emit conversation lifecycle and realtime events.
- Move shared realtime payload types into codex-protocol and add core
e2e websocket tests for start/replace/transport-close paths.
Things to consider:
- Should we use the same `op::` and `Events` channel to carry audio? I
think we should try this simple approach and later we can create
separate one if the channels got congested.
- Sending text updates to the client: we can start simple and later
restrict that.
- Provider auth isn't wired for now intentionally
Exposes through the app server updated names set for a thread. This
enables other surfaces to use the core as the source of truth for thread
naming. `threadName` is gathered using the helper functions used to
interact with `session_index.jsonl`, and is hydrated in:
- `thread/list`
- `thread/read`
- `thread/resume`
- `thread/unarchive`
- `thread/rollback`
We don't do this for `thread/start` and `thread/fork`.
## Why
JSON schema codegen was silently resolving naming collisions by
appending numeric suffixes (for example `...2`, `...3`). That makes the
generated schema names unstable: removing an earlier colliding type can
cause a later type to be renumbered, which is a breaking change for
consumers that referenced the old generated name.
This PR makes those collisions explicit and reviewable.
Though note that once we remove `v1` from the codegen, we will no longer
support naming collisions. Or rather, naming collisions will have to be
handled explicitly rather than the numeric suffix approach.
## What Changed
- In `codex-rs/app-server-protocol/src/export.rs`, replaced implicit
numeric suffix collision handling for generated variant titles with
explicit special-case maps.
- Added a panic when a collision occurs without an entry in the map, so
new collisions fail loudly instead of silently renaming generated schema
types.
- Added the currently required special cases so existing generated names
remain stable.
- Extended the same approach to numbered `definitions` / `$defs`
collisions (for example `MessagePhase2`-style names) so those are also
explicitly tracked.
## Verification
- Ran targeted generator-path test:
- `cargo test -p codex-app-server-protocol
generate_json_filters_experimental_fields_and_methods -- --nocapture`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12406).
* #12408
* __->__ #12406
## Problem
The TUI's "edit queued message" shortcut (Alt+Up) is either silently
swallowed or recognized as another key combination by Apple Terminal,
Warp, and VSCode's integrated terminal on macOS. Users in those
environments see the hint but pressing the keys does nothing.
## Mental model
When a model turn is in progress the user can still type follow-up
messages. These are queued and displayed below the composer with a hint
line showing how to pop the most recent one back into the editor. The
hint text and the actual key handler must agree on which shortcut is
used, and that shortcut must actually reach the TUI—i.e. it must not be
intercepted by the host terminal.
Three terminals are known to intercept Alt+Up: Apple Terminal (remaps it
to cursor movement), Warp (consumes it for its own command palette), and
VSCode (maps it to "move line up"). For these we use Shift+Left instead.
<p align="center">
<img width="283" height="182" alt="image"
src="https://github.com/user-attachments/assets/4a9c5d13-6e47-4157-bb41-28b4ce96a914"
/>
</p>
| macOS Native Terminal | Warp | VSCode Terminal |
|---|---|---|
| <img width="1557" height="1010" alt="SCR-20260219-kigi"
src="https://github.com/user-attachments/assets/f4ff52f8-119e-407b-a3f3-52f564c36d70"
/> | <img width="1479" height="1261" alt="SCR-20260219-krrf"
src="https://github.com/user-attachments/assets/5807d7c4-17ae-4a2b-aa27-238fd49d90fd"
/> | <img width="1612" height="1312" alt="SCR-20260219-ksbz"
src="https://github.com/user-attachments/assets/1cedb895-6966-4d63-ac5f-0eea0f7057e8"
/> |
## Non-goals
- Making the binding user-configurable at runtime (deferred to a broader
keybinding-config effort).
- Remapping any other shortcuts that might be terminal-specific.
## Tradeoffs
- **Exhaustive match instead of a wildcard default.** The
`queued_message_edit_binding_for_terminal` function explicitly lists
every `TerminalName` variant. This is intentional: adding a new terminal
to the enum will produce a compile error, forcing the author to decide
which binding that terminal should use.
- **Binding lives on `ChatWidget`, hint lives on `QueuedUserMessages`.**
The key event handler that actually acts on the press is in
`ChatWidget`, but the rendered hint text is inside `QueuedUserMessages`.
These are kept in sync by `ChatWidget` calling
`bottom_pane.set_queued_message_edit_binding(self.queued_message_edit_binding)`
during construction. A mismatch would show the wrong hint but would not
lose data.
## Architecture
```mermaid
graph TD
TI["terminal_info().name"] --> FN["queued_message_edit_binding_for_terminal(name)"]
FN --> KB["KeyBinding"]
KB --> CW["ChatWidget.queued_message_edit_binding<br/><i>key event matching</i>"]
KB --> BP["BottomPane.set_queued_message_edit_binding()"]
BP --> QUM["QueuedUserMessages.edit_binding<br/><i>rendered in hint line</i>"]
subgraph "Special terminals (Shift+Left)"
AT["Apple Terminal"]
WT["Warp"]
VS["VSCode"]
end
subgraph "Default (Alt+Up)"
GH["Ghostty"]
IT["iTerm2"]
OT["Others…"]
end
AT --> FN
WT --> FN
VS --> FN
GH --> FN
IT --> FN
OT --> FN
```
No new crates or public API surface. The only cross-crate dependency
added is `codex_core::terminal::{TerminalName, terminal_info}`, which
already existed for telemetry.
## Observability
No new logging. Terminal detection already emits a `tracing::debug!` log
line at startup with the detected terminal name, which is sufficient to
diagnose binding mismatches.
## Tests
- Existing `alt_up_edits_most_recent_queued_message` test is preserved
and explicitly sets the Alt+Up binding to isolate from the host
terminal.
- New parameterized async tests verify Shift+Left works for Apple
Terminal, Warp, and VSCode.
- A sync unit test asserts the mapping table covers the three special
terminals (Shift+Left) and that iTerm2 still gets Alt+Up.
Fixes#4490
## Summary
- show an info message when switching collaboration modes changes the
effective model or reasoning
- include the target mode in the message (for example `... for Plan
mode.`)
- add TUI tests for model-change and reasoning-only change notifications
on mode switch
<img width="715" height="184" alt="Screenshot 2026-02-20 at 2 01 40 PM"
src="https://github.com/user-attachments/assets/18d1beb3-ab87-4e1c-9ada-a10218520420"
/>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
description: Babysit a GitHub pull request after creation by continuously polling CI checks/workflow runs, new review comments, and mergeability state until the PR is ready to merge (or merged/closed). Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and stop only when user help is required (for example CI infrastructure issues, exhausted flaky retries, or ambiguous/blocking situations). Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.
---
# PR Babysitter
## Objective
Babysit a PR persistently until one of these terminal outcomes occurs:
- The PR is merged or closed.
- CI is successful, there are no unaddressed review comments surfaced by the watcher, required review approval is not blocking merge, and there are no potential merge conflicts (PR is mergeable / not reporting conflict risk).
- A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely).
Do not stop merely because a single snapshot returns `idle` while checks are still pending.
## Inputs
Accept any of the following:
- No PR argument: infer the PR from the current branch (`--pr auto`)
- PR number
- PR URL
## Core Workflow
1. When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (`--watch`) unless you are intentionally doing a one-shot diagnostic snapshot.
2. Run the watcher script to snapshot PR/CI/review state (or consume each streamed snapshot from `--watch`).
3. Inspect the `actions` list in the JSON response.
4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure.
5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
7. If a review item is actionable and correct, patch code locally, commit, and push.
8. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
9. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
10. On every loop, verify mergeability / merge-conflict status (for example via `gh pr view`) in addition to CI and review state.
11. After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state.
12. If you had been using `--watch` before pausing to patch/commit/push, relaunch `--watch` yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).
13. Repeat polling until the PR is green + review-clean + mergeable, `stop_pr_closed` appears, or a user-help-required blocker is reached.
14. Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached `--watch` process running and then end the turn as if monitoring were complete.
## Commands
### One-shot snapshot
```bash
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --once
```
### Continuous watch (JSONL)
```bash
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --watch
```
### Trigger flaky retry cycle (only when watcher indicates)
```bash
python3 .codex/skills/babysit-pr/scripts/gh_pr_watch.py --pr auto --retry-failed-now
It intentionally surfaces Codex reviewer bot feedback (for example comments/reviews from `chatgpt-codex-connector[bot]`) in addition to human reviewer feedback. Most unrelated bot noise should still be ignored.
For safety, the watcher only auto-surfaces trusted human review authors (for example repo OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots such as Codex.
On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
When you agree with a comment and it is actionable:
1. Patch code locally.
2. Commit with `codex: address PR review feedback (#<n>)`.
3. Push to the PR head branch.
4. Resume watching on the new SHA immediately (do not stop after reporting the push).
5. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.
If you disagree or the comment is non-actionable/already addressed, record it as handled by continuing the watcher loop (the script de-duplicates surfaced items via state after surfacing them).
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
## Git Safety Rules
- Work only on the PR head branch.
- Avoid destructive git commands.
- Do not switch branches unless necessary to recover context.
- Before editing, check for unrelated uncommitted changes. If present, stop and ask the user.
- After each successful fix, commit and `git push`, then re-run the watcher.
- If you interrupted a live `--watch` session to make the fix, restart `--watch` immediately after the push in the same turn.
- Do not run multiple concurrent `--watch` processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.
- A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met.
Commit message defaults:
-`codex: fix CI failure on PR #<n>`
-`codex: address PR review feedback (#<n>)`
## Monitoring Loop Pattern
Use this loop in a live Codex session:
1. Run `--once`.
2. Read `actions`.
3. First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
4. Check CI summary, new review items, and mergeability/conflict status.
5. Diagnose CI failures and classify branch-related vs flaky/unrelated.
6. Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
7. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit.
8. If you pushed a commit or triggered a rerun, report the action briefly and continue polling (do not stop).
9. After a review-fix push, proactively restart continuous monitoring (`--watch`) in the same turn unless a strict stop condition has already been reached.
10. If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report success and stop.
11. If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.
12. Otherwise sleep according to the polling cadence below and repeat.
When the user explicitly asks to monitor/watch/babysit a PR, prefer `--watch` so polling continues autonomously in one command. Use repeated `--once` snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
If a `--watch` process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
## Polling Cadence
Use adaptive polling and continue monitoring even after CI turns green:
- While CI is not green (pending/running/queued or failing): poll every 1 minute.
- After CI turns green: start at every 1 minute, then back off exponentially when there is no change (for example 1m, 2m, 4m, 8m, 16m, 32m), capping at every 1 hour.
- Reset the green-state polling interval back to 1 minute whenever anything changes (new commit/SHA, check status changes, new review comments, mergeability changes, review decision changes).
- If CI stops being green again (new commit, rerun, or regression): return to 1-minute polling.
- If any poll shows the PR is merged or otherwise closed: stop polling immediately and report the terminal state.
## Stop Conditions (Strict)
Stop only when one of the following is true:
- PR merged or closed (stop as soon as a poll/snapshot confirms this).
- PR is ready to merge: CI succeeded, no surfaced unaddressed review comments, not blocked on required review approval, and no merge conflict risk.
- User intervention is required and Codex cannot safely proceed alone.
Keep polling when:
-`actions` contains only `idle` but checks are still pending.
- CI is still running/queued.
- Review state is quiet but CI is not terminal.
- CI is green but mergeability is unknown/pending.
- CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes per the green-state cadence.
- The PR is green but blocked on review approval (`REVIEW_REQUIRED` / similar); continue polling on the green-state cadence and surface any new review comments without asking for confirmation to keep watching.
## Output Expectations
Provide concise progress updates while monitoring and a final summary that includes:
- During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
- Treat push confirmations, intermediate CI snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
- A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
- A review-fix commit + push is not a completion event; immediately resume live monitoring (`--watch`) in the same turn and continue reporting progress updates.
- When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: `🚀 CI is all green! 33/33 passed. Still on watch for review approval.`
- Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
- Final PR SHA
- CI status summary
- Mergeability / conflict status
- Fixes pushed
- Flaky retry cycles used
- Remaining unresolved failures or review comments
## References
- Heuristics and decision tree: `.codex/skills/babysit-pr/references/heuristics.md`
- GitHub CLI/API details used by the watcher: `.codex/skills/babysit-pr/references/github-api-notes.md`
short_description:"Watch PR CI, reviews, and merge conflicts"
default_prompt:"Babysit the current PR: monitor CI, reviewer comments, and merge-conflict status (prefer the watcher’s --watch mode for live monitoring); fix valid issues, push updates, and rerun flaky failures up to 3 times. Keep exactly one watcher session active for the PR (do not leave duplicate --watch terminals running). If you pause monitoring to patch review/CI feedback, restart --watch yourself immediately after the push in the same turn. If a watcher is still running and no strict stop condition has been reached, the task is still in progress: keep consuming watcher output and sending progress updates instead of ending the turn. Continue polling autonomously after any push/rerun until a strict terminal stop condition is reached or the user interrupts."
@@ -24,7 +24,7 @@ In the codex-rs folder where the rust code lives:
Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:
1. Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `cargo test -p codex-tui`.
2. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `cargo test --all-features`. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.
2. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `cargo test` (or `just test` if `cargo-nextest` is installed). Avoid `--all-features` for routine local runs because it expands the build matrix and can significantly increase `target/` disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.
Before finalizing a large change to `codex-rs`, run `just fix -p <project>` (in `codex-rs` directory) to fix any linter issues in the code. Prefer scoping with `-p` to avoid slow workspace‑wide Clippy builds; only run `just fix` without `-p` if you changed shared crates. Do not re-run tests after running `fix` or `fmt`.
@@ -4,14 +4,21 @@ We provide Codex CLI as a standalone, native executable to ensure a zero-depende
## Installing Codex
Today, the easiest way to install Codex is via `npm`:
Install the latest Codex release directly:
```shell
npm i -g @openai/codex
# macOS, Linux, or WSL
curl -fsSL https://chatgpt.com/codex/install.sh | sh
codex
```
You can also install via Homebrew (`brew install --cask codex`) or download a platform-specific release directly from our [GitHub Releases](https://github.com/openai/codex/releases).
You can also install via npm (`npm i -g @openai/codex`), Homebrew (`brew install --cask codex`), or download a platform-specific release directly from our [GitHub Releases](https://github.com/openai/codex/releases).
"description":"Use to correlate this with [codex_protocol::protocol::PatchApplyBeginEvent] and [codex_protocol::protocol::PatchApplyEndEvent].",
"type":"string"
},
"conversationId":{
"$ref":"#/definitions/ThreadId"
},
"fileChanges":{
"additionalProperties":{
"$ref":"#/definitions/FileChange"
},
"type":"object"
},
"grantRoot":{
"description":"When set, the agent is asking the user to allow writes under this root for the remainder of the session (unclear if this is honored today).",
"type":[
"string",
"null"
]
},
"reason":{
"description":"Optional explanatory reason (e.g. request for extra write access).",
"description":"User has approved this command and wants to automatically approve any future identical instances (`command` and `cwd` match exactly) for the remainder of the session.",
"enum":[
"approved_for_session"
],
"type":"string"
},
{
"additionalProperties":false,
"description":"User chose to persist a network policy rule (allow/deny) for future requests to the same host.",
"properties":{
"network_policy_amendment":{
"properties":{
"network_policy_amendment":{
"$ref":"#/definitions/NetworkPolicyAmendment"
}
},
"required":[
"network_policy_amendment"
],
"type":"object"
}
},
"required":[
"network_policy_amendment"
],
"title":"NetworkPolicyAmendmentReviewDecision",
"type":"object"
},
{
"description":"User has denied this command and the agent should not execute it, but it should continue the session and try something else.",
"enum":[
"denied"
],
"type":"string"
},
{
"description":"User has denied this command and the agent should not do anything until the user's next command.",
"description":"Codex attempted a backend request and received `401 Unauthorized`.",
"enum":[
"unauthorized"
],
"type":"string"
}
]
}
},
"properties":{
"previousAccountId":{
"description":"Workspace/account identifier that Codex was previously using.\n\nClients that manage multiple accounts/workspaces can use this as a hint to refresh the token for the correct workspace.\n\nThis may be `null` when the prior auth state did not include a workspace identifier (`chatgpt_account_id`).",
"description":"Unique identifier for this specific approval callback.\n\nFor regular shell/unified_exec approvals, this is null.\n\nFor zsh-exec-bridge subcommand approvals, multiple callbacks can belong to one parent `itemId`, so `approvalId` is a distinct opaque callback id (a UUID) used to disambiguate routing.",
"type":[
"string",
"null"
]
},
"command":{
"description":"The command to be executed.",
"type":[
"string",
"null"
]
},
"commandActions":{
"description":"Best-effort parsed command actions for friendly display.",
"items":{
"$ref":"#/definitions/CommandAction"
},
"type":[
"array",
"null"
]
},
"cwd":{
"description":"The command's working directory.",
"type":[
"string",
"null"
]
},
"itemId":{
"type":"string"
},
"networkApprovalContext":{
"anyOf":[
{
"$ref":"#/definitions/NetworkApprovalContext"
},
{
"type":"null"
}
],
"description":"Optional context for a managed-network approval prompt."
},
"proposedExecpolicyAmendment":{
"description":"Optional proposed execpolicy amendment to allow similar commands without prompting.",
"items":{
"type":"string"
},
"type":[
"array",
"null"
]
},
"proposedNetworkPolicyAmendments":{
"description":"Optional proposed network policy amendments (allow/deny host) for future requests.",
"items":{
"$ref":"#/definitions/NetworkPolicyAmendment"
},
"type":[
"array",
"null"
]
},
"reason":{
"description":"Optional explanatory reason (e.g. request for network access).",
"description":"User approved the command and future identical commands should run without prompting.",
"enum":[
"acceptForSession"
],
"type":"string"
},
{
"additionalProperties":false,
"description":"User approved the command, and wants to apply the proposed execpolicy amendment so future matching commands can run without prompting.",
"description":"(Best effort) Path to the file being read by the command. When possible, this is an absolute path, though when relative, it should be resolved against the `cwd`` that will be used to run the command to derive the absolute path.",
"type":"string"
},
"type":{
"enum":[
"read"
],
"title":"ReadParsedCommandType",
"type":"string"
}
},
"required":[
"cmd",
"name",
"path",
"type"
],
"title":"ReadParsedCommand",
"type":"object"
},
{
"properties":{
"cmd":{
"type":"string"
},
"path":{
"type":[
"string",
"null"
]
},
"type":{
"enum":[
"list_files"
],
"title":"ListFilesParsedCommandType",
"type":"string"
}
},
"required":[
"cmd",
"type"
],
"title":"ListFilesParsedCommand",
"type":"object"
},
{
"properties":{
"cmd":{
"type":"string"
},
"path":{
"type":[
"string",
"null"
]
},
"query":{
"type":[
"string",
"null"
]
},
"type":{
"enum":[
"search"
],
"title":"SearchParsedCommandType",
"type":"string"
}
},
"required":[
"cmd",
"type"
],
"title":"SearchParsedCommand",
"type":"object"
},
{
"properties":{
"cmd":{
"type":"string"
},
"type":{
"enum":[
"unknown"
],
"title":"UnknownParsedCommandType",
"type":"string"
}
},
"required":[
"cmd",
"type"
],
"title":"UnknownParsedCommand",
"type":"object"
}
]
},
"ThreadId":{
"type":"string"
}
},
"properties":{
"approvalId":{
"description":"Identifier for this specific approval callback.",
"type":[
"string",
"null"
]
},
"callId":{
"description":"Use to correlate this with [codex_protocol::protocol::ExecCommandBeginEvent] and [codex_protocol::protocol::ExecCommandEndEvent].",
"description":"User has approved this command and wants to automatically approve any future identical instances (`command` and `cwd` match exactly) for the remainder of the session.",
"enum":[
"approved_for_session"
],
"type":"string"
},
{
"additionalProperties":false,
"description":"User chose to persist a network policy rule (allow/deny) for future requests to the same host.",
"properties":{
"network_policy_amendment":{
"properties":{
"network_policy_amendment":{
"$ref":"#/definitions/NetworkPolicyAmendment"
}
},
"required":[
"network_policy_amendment"
],
"type":"object"
}
},
"required":[
"network_policy_amendment"
],
"title":"NetworkPolicyAmendmentReviewDecision",
"type":"object"
},
{
"description":"User has denied this command and the agent should not execute it, but it should continue the session and try something else.",
"enum":[
"denied"
],
"type":"string"
},
{
"description":"User has denied this command and the agent should not do anything until the user's next command.",
"description":"[UNSTABLE] When set, the agent is asking the user to allow writes under this root for the remainder of the session (unclear if this is honored today).",
"type":[
"string",
"null"
]
},
"itemId":{
"type":"string"
},
"reason":{
"description":"Optional explanatory reason (e.g. request for extra write access).",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"NetworkAccess":{
"enum":[
@@ -836,6 +848,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"PatchApplyStatus":{
"enum":[
@@ -609,6 +621,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"PatchApplyStatus":{
"enum":[
@@ -609,6 +621,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"NetworkAccess":{
"enum":[
@@ -836,6 +848,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"PatchApplyStatus":{
"enum":[
@@ -609,6 +621,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"NetworkAccess":{
"enum":[
@@ -836,6 +848,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"PatchApplyStatus":{
"enum":[
@@ -609,6 +621,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"enum":[
"final_answer"
],
"type":"string"
}
]
},
"PatchApplyStatus":{
"enum":[
@@ -609,6 +621,13 @@
"description":"Model provider used for this thread (for example, 'openai').",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
"description":"Classifies an assistant message as interim commentary or final answer text.\n\nProviders do not emit this consistently, so callers must treat `None` as \"phase unknown\" and keep compatibility behavior for legacy models.",
"oneOf":[
{
"description":"Mid-turn assistant text (for example preamble/progress narration).\n\nAdditional tool calls or assistant output may follow before turn completion.",
"enum":[
"commentary"
],
"type":"string"
},
{
"description":"The assistant's terminal answer text for the current turn.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.