Commit Graph

1892 Commits

Author SHA1 Message Date
Michael Bolin
448fb6ac22 fix: clarify the value of SkillMetadata.path (#12729)
Rename `SkillMetadata.path` to `SkillMetadata.path_to_skills_md` for
clarity.

Would ideally change the type to `AbsolutePathBuf`, but that can be done
later.
2026-02-24 17:15:54 -08:00
Curtis 'Fjord' Hawthorne
63c2ac96cd fix(js_repl): surface uncaught kernel errors and reset cleanly (#12636)
## Summary

Improve `js_repl` behavior when the Node kernel hits a process-level
failure (for example, an uncaught exception or unhandled Promise
rejection).

Instead of only surfacing a generic `js_repl kernel exited unexpectedly`
after stdout EOF, `js_repl` now returns a clearer exec error for the
active request, then resets the kernel cleanly.

## Why

Some sandbox-denied operations can trigger Node errors that become
process-level failures (for example, an unhandled EventEmitter `'error'`
event). In that case:

- the kernel process exits,
- the host sees stdout EOF,
- the user gets a generic kernel-exit error,
- and the next request can briefly race with stale kernel state.

This change improves that failure mode without monkeypatching Node APIs.

## Changes

### Kernel-side (`js_repl` Node process)
- Add process-level handlers for:
  - `uncaughtException`
  - `unhandledRejection`
- When one of these fires:
  - best-effort emit a normal `exec_result` error for the active exec
- include actionable guidance to catch/handle async errors (including
Promise rejections and EventEmitter `'error'` events)
  - exit intentionally so the host can reset/restart the kernel

### Host-side (`JsReplManager`)
- Clear dead kernel state as soon as the stdout reader observes
unexpected kernel exit/EOF.
- This lets the next `js_repl` exec start a fresh kernel instead of
hitting a stale broken-pipe path.

### Tests
- Add regression coverage for:
- uncaught async exception -> exec error + kernel recovery on next exec
- Update forced-kernel-exit test to validate recovery behavior (next
exec restarts cleanly)

## Impact

- Better user-facing error for kernel crashes caused by
uncaught/unhandled async failures.
- Cleaner recovery behavior after kernel exit.

## Validation

- `cargo test -p codex-core --lib
tools::js_repl::tests::js_repl_uncaught_exception_returns_exec_error_and_recovers
-- --exact`
- `cargo test -p codex-core --lib
tools::js_repl::tests::js_repl_forced_kernel_exit_recovers_on_next_exec
-- --exact`
- `just fmt`
2026-02-24 17:12:02 -08:00
Michael Bolin
3d356723c4 fix: make EscalateServer public and remove shell escalation wrappers (#12724)
## Why

`codex-shell-escalation` exposed a `codex-core`-specific adapter layer
(`ShellActionProvider`, `ShellPolicyFactory`, and `run_escalate_server`)
that existed only to bridge `codex-core` to `EscalateServer`. That
indirection increased API surface and obscured crate ownership without
adding behavior.

This change moves orchestration into `codex-core` so boundaries are
clearer: `codex-shell-escalation` provides reusable escalation
primitives, and `codex-core` provides shell-tool policy decisions.

Admittedly, @pakrym rightfully requested this sort of cleanup as part of
https://github.com/openai/codex/pull/12649, though this avoids moving
all of `codex-shell-escalation` into `codex-core`.

## What changed

- Made `EscalateServer` public and exported it from `shell-escalation`.
- Removed the adapter layer from `shell-escalation`:
  - deleted `shell-escalation/src/unix/core_shell_escalation.rs`
- removed exports for `ShellActionProvider`, `ShellPolicyFactory`,
`EscalationPolicyFactory`, and `run_escalate_server`
- Updated `core/src/tools/runtimes/shell/unix_escalation.rs` to:
  - create `Stopwatch`/cancellation in `codex-core`
  - instantiate `EscalateServer` directly
  - implement `EscalationPolicy` directly on `CoreShellActionProvider`

Net effect: same escalation flow with fewer wrappers and a smaller
public API.

## Verification

- Manually reviewed the old vs. new escalation call flow to confirm
timeout/cancellation behavior and approval policy decisions are
preserved while removing wrapper types.
2026-02-24 16:20:08 -08:00
Eric Traut
8da40c9251 Raise image byte estimate for compaction token accounting (#12717)
Increase `IMAGE_BYTES_ESTIMATE` from 340 bytes to 7,373 bytes so the
existing 4-bytes/token heuristic yields an image estimate of ~1,844
tokens instead of ~85. This makes auto-compaction more conservative for
image-heavy transcripts and avoids underestimating context usage, which
can otherwise cause compaction to fail when there is not enough free
context remaining. The new value was chosen because that's the image
resolution cap used for our latest models.

Follow-up to [#12419](https://github.com/openai/codex/pull/12419).
Refs [#11845](https://github.com/openai/codex/issues/11845).
2026-02-24 16:11:38 -08:00
zuxin-oai
61cd3a9700 fix: temp remove citation (#12711)
- **temp remove citation**
2026-02-24 22:07:30 +00:00
daveaitel-openai
dcab40123f Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
## Summary
- Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
auto-export, and store results in SQLite.
- Simplify workflow: remove run/resume/get-status/export tools; spawn is
deterministic and completes in one call.
- Improve exec UX: stable, single-line progress bar with ETA; suppress
sub-agent chatter in exec.

## Why
Enables map-reduce style workflows over arbitrarily large repos using
the existing Codex orchestrator. This addresses review feedback about
overly complex job controls and non-deterministic monitoring.

## Demo (progress bar)
```
./codex-rs/target/debug/codex exec \
  --enable collab \
  --enable sqlite \
  --full-auto \
  --progress-cursor \
  -c agents.max_threads=16 \
  -C /Users/daveaitel/code/codex \
  - <<'PROMPT'
Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
path = item-01..item-30, area = test.

Then call spawn_agents_on_csv with:
- csv_path: /tmp/agent_job_progress_demo.csv
- instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
- output_csv_path: /tmp/agent_job_progress_demo_out.csv
PROMPT
```

## Review feedback addressed
- Auto-start jobs on spawn; removed run/resume/status/export tools.
- Auto-export on success.
- More descriptive tool spec + clearer prompts.
- Avoid deadlocks on spawn failure; pending/running handled safely.
- Progress bar no longer scrolls; stable single-line redraw.

## Tests
- `cd codex-rs && cargo test -p codex-exec`
- `cd codex-rs && cargo build -p codex-cli`
2026-02-24 21:00:19 +00:00
Eric Traut
bd192b54cd Honor project_root_markers when discovering AGENTS.md (#12639)
Fixes #12128

The docs indicates that `project_root_markers` are used to discover the
project root for local config as well as `AGENTS.md`. It looks like it
was never wired up to support the latter.

Summary
- resolve project docs by walking to the configured
`project_root_markers` (or defaults) instead of assuming the Git root,
while honoring CLI overrides and handling malformed configs
- fall back to the project’s canonical path chain and add a test that
makes sure custom markers upstream of `.git` are respected
2026-02-24 12:55:48 -08:00
Ahmed Ibrahim
b6ab2214e3 Add TUI realtime conversation mode (#12687)
- Add a hidden `realtime_conversation` feature flag and `/realtime`
slash command for start/stop live voice sessions.
- Reuse transcription composer/footer UI for live metering, stream mic
audio, play assistant audio, render realtime user text events, and
force-close on feature disable.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-24 12:54:30 -08:00
Michael Bolin
3b5fc7547e refactor: remove unused seatbelt unix socket arg (#12707)
https://github.com/openai/codex/pull/12052 introduced an
`allowed_unix_socket_paths` parameter to
`create_seatbelt_command_args()`, but
https://github.com/openai/codex/pull/12649 removed the abstraction that
#12052 introduced, so this parameter is no longer necessary as it is
always an empty slice.
2026-02-24 12:30:26 -08:00
pakrym-oai
daf0f03ac8 Ensure shell command skills trigger approval (#12697)
Summary
- detect skill-invoking shell commands based on the original command
string, request approvals when needed, and cache positive decisions per
session
- keep implicit skill invocation emitted after approval and keep skill
approval decline messaging centralized to the shell handler
- expand and adjust skill approval tests to cover shell-based skill
scripts while matching the new detection expectations

Testing
- Not run (not requested)
2026-02-24 12:13:20 -08:00
Yaroslav Volovich
67d9261e2c feat(sleep-inhibitor): add Linux and Windows idle-sleep prevention (#11766)
## Background
- follow-up to previous macOS-only PR:
https://github.com/openai/codex/pull/11711
- follow-up macOS refactor PR (current structural approach used here):
https://github.com/openai/codex/pull/12340

## Summary
- extend `codex-utils-sleep-inhibitor` with Linux and Windows backends
while preserving existing macOS behavior
- Linux backend:
  - use `systemd-inhibit` (`--what=idle --mode=block`) when available
- fall back to `gnome-session-inhibit` (`--inhibit idle`) when available
  - keep no-op behavior if neither backend exists on host
- Windows backend:
- use Win32 power request handles (`PowerCreateRequest` +
`PowerSetRequest` / `PowerClearRequest`) with
`PowerRequestSystemRequired`
- make `prevent_idle_sleep` Experimental on macOS/Linux/Windows; keep
under development on other targets

## Testing
- `just fmt`
- `cargo test -p codex-utils-sleep-inhibitor`
- `cargo test -p codex-core features::tests::`
- `cargo test -p codex-tui chatwidget::tests::`
- `just fix -p codex-utils-sleep-inhibitor`
- `just fix -p codex-core`

## Semantics and API references
- Goal remains: prevent idle system sleep while a turn is running.
- Linux:
  - `systemd-inhibit` / login1 inhibitor model:
-
https://www.freedesktop.org/software/systemd/man/latest/systemd-inhibit.html
-
https://www.freedesktop.org/software/systemd/man/org.freedesktop.login1.html
    - https://systemd.io/INHIBITOR_LOCKS/
  - xdg-desktop-portal Inhibit (relevant for sandboxed apps):
-
https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.Inhibit.html
- Windows:
  - `PowerCreateRequest`:
-
https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-powercreaterequest
  - `PowerSetRequest`:
-
https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-powersetrequest
  - `PowerClearRequest`:
-
https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-powerclearrequest
  - `SetThreadExecutionState` (alternative baseline API):
-
https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setthreadexecutionstate

## Chromium vs this PR
- Chromium Linux backend:
-
https://github.com/chromium/chromium/blob/main/services/device/wake_lock/power_save_blocker/power_save_blocker_linux.cc
- Chromium Windows backend:
-
https://github.com/chromium/chromium/blob/main/services/device/wake_lock/power_save_blocker/power_save_blocker_win.cc
- Electron powerSaveBlocker entry point:
-
https://github.com/electron/electron/blob/main/shell/browser/api/electron_api_power_save_blocker.cc

## Why we differ from Chromium
- Linux implementation mechanism:
- Chromium uses in-process D-Bus APIs plus UI-integrated screen-saver
suspension.
- This PR uses command-based inhibitor backends (`systemd-inhibit`,
`gnome-session-inhibit`) instead of linking a Linux D-Bus client in this
crate.
- Reason: keep `codex-utils-sleep-inhibitor` dependency-light and avoid
Linux CI/toolchain fragility from new native D-Bus linkage, while
preserving the same runtime intent (hold an inhibitor while a turn
runs).
- Linux UI integration scope:
- Chromium also uses `display::Screen::SuspendScreenSaver()` in its UI
stack.
- Codex `codex-rs` does not have that display abstraction in this crate,
so this PR scopes Linux behavior to process-level sleep inhibition only.
- Windows wake-lock type breadth:
- Chromium supports both display/system wake-lock types and extra
display-specific handling for some pre-Win11 scenarios.
- Codex’s feature is scoped to turn execution continuity (not forcing
display on), so this PR uses `PowerRequestSystemRequired` only.
2026-02-24 11:51:44 -08:00
sayan-oai
0b6c2e5652 fix: also try matching namespaced prefix for modelinfo candidate (#12658)
#### What
Try matching `\w+`-namespaced model after `longest prefix` as heuristic
to match `ModelInfo` from list of candidates.

This shouldn't regress existing behavior:
- `gpt-5.2-codex` -> `gpt-5.2` if `gpt-5.2-codex` not present
- `gpt-5.3` -> `gpt-5` if `gpt-5.3` not present
- `gpt-9` still doesn't match anything

while being more forgiving for custom prefixes:
- `oai/gpt-5.3-codex` -> `gpt-5.3-codex`

#### Tests
Added unit test.
2026-02-24 10:57:26 -08:00
Michael Bolin
3ca0e7673b feat: run zsh fork shell tool via shell-escalation (#12649)
## Why

This PR switches the `shell_command` zsh-fork path over to
`codex-shell-escalation` so the new shell tool can use the shared
exec-wrapper/escalation protocol instead of the `zsh_exec_bridge`
implementation that was introduced in
https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on
UNIX domain sockets, which is not as tamper-proof as the FD-based
approach in `codex-shell-escalation`.

## What Changed

- Added a Unix zsh-fork runtime adapter in `core`
(`core/src/tools/runtimes/shell/unix_escalation.rs`) that:
- runs zsh-fork commands through
`codex_shell_escalation::run_escalate_server`
  - bridges exec-policy / approval decisions into `ShellActionProvider`
- executes escalated commands via a `ShellCommandExecutor` that calls
`process_exec_tool_call`
- Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to
select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving
the generic `shell` tool path unchanged.
- Removed the `zsh_exec_bridge`-based session service and deleted
`core/src/zsh_exec_bridge/mod.rs`.
- Moved exec-wrapper entrypoint dispatch to `arg0` by handling the
`codex-execve-wrapper` arg0 alias there, and removed the old
`codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and
`app-server` mains.
- Added the needed `codex-shell-escalation` dependencies for `core` and
`arg0`.

## Tests

- `cargo test -p codex-core
shell_zsh_fork_prefers_shell_command_over_unified_exec`
- `cargo test -p codex-app-server turn_start_shell_zsh_fork --
--nocapture`
- verifies zsh-fork command execution and approval flows through the new
backend
- includes subcommand approve/decline coverage using the shared zsh
DotSlash fixture in `app-server/tests/suite/zsh`
- To test manually, I added the following to `~/.codex/config.toml`:

```toml
zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh"

[features]
shell_zsh_fork = true
```

Then I ran `just c` to run the dev build of Codex with these changes and
sent it the message:

```
run `echo $0`
```

And it replied with:

```
  echo $0 printed:

  /Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh

  In this tool context, $0 reflects the script path used to invoke the shell, not just zsh.
```

so the tool appears to be wired up correctly.

## Notes

- The zsh subcommand-decline integration test now uses `rm` under a
`WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is
auto-allowed by the new `shell-escalation` policy path, which no longer
produces subcommand approval prompts.
2026-02-24 10:31:08 -08:00
Dylan Hurd
f6053fdfb3 feat(core) Introduce Feature::RequestPermissions (#11871)
## Summary
Introduces the initial implementation of Feature::RequestPermissions.
RequestPermissions allows the model to request that a command be run
inside the sandbox, with additional permissions, like writing to a
specific folder. Eventually this will include other rules as well, and
the ability to persist these permissions, but this PR is already quite
large - let's get the core flow working and go from there!

<img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
/>

## Testing
- [x] Added tests
- [x] Tested locally
- [x] Feature
2026-02-24 09:48:57 -08:00
pakrym-oai
97d0068658 Send warmup request (#11258)
Send a request with `generate: falls` but a full set of tools and
instructions to pre-warm inference.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-24 08:15:47 -08:00
zuxin-oai
3fe365ad8a memories: tighten memory lookup guidance and citation requirements (#12635)
## Summary
- tighten the memory-use decision boundary so agents skip memory only
for clearly self-contained asks
- make the quick memory pass more explicit and bounded (including a
lightweight search budget)
- add structured `<memory_citation>` requirements and examples for final
replies
- clarify memory update guidance and end-state wording for memory lookup

## Why
The previous template was directionally correct, but still left room for
inconsistent memory lookup behavior and citation formatting. This change
makes the default behavior, quick-pass scope, and citation output
contract much more explicit.

## Testing
- not run (prompt/template text change only)

Co-authored-by: jif-oai <jif@openai.com>
2026-02-24 11:46:28 +00:00
jif-oai
8758db5d5b feat: mutli agents persist config overrides (#12667)
Fix propagation of runtime config changes and `--yolo`
2026-02-24 11:33:00 +00:00
zuxin-oai
15f6cfb047 memories: tighten consolidation prompt schema and indexing guidance (#12653)
## Summary
- tighten the Phase 2 consolidation prompt for task-oriented `MEMORY.md`
generation
- address Phase 2 under-coverage / "laziness" with stronger workflow +
final-pass checks
- improve recency/ordering behavior for `MEMORY.md` and
`memory_summary.md`
- rewrite `## What's in Memory` as a clearer routing index with explicit
recent-3-day structure

## Key Changes
- `MEMORY.md` schema cleanup:
- align on `## Task <n>` task sections (remove stale `task:`
rule/example references)
  - include `thread_id` in rollout provenance examples
  - compact comma-separated `### keywords` format
- Phase 2 completeness guardrails:
  - chunked INIT coverage pass over `raw_memories.md`
  - incremental net-new indexing / routing steps
- stronger final checks (day ordering, topic coverage, keyword
searchability, accidental duplication)
- Recency / ordering rules:
- clearer scan-order guidance for raw memories (newest-first bias in
incremental mode)
- utility+recency ordering guidance for `MEMORY.md` task groups and
summary topics
  - rebuild recent active window from current `updated_at` coverage
- `## What's in Memory` rewrite:
  - index/routing-layer framing (not a mini-handbook)
  - explicit recent 3 distinct memory-day layout
  - richer recent-topic entries + compact lower-priority routing entries
- clearer `desc` / `learnings` expectations and separation from `##
General Tips`
- Explicitly allow rollout-summary reuse across multiple tasks/blocks
when it supports distinct task angles (with distinct task-local value)

## Notes
- Prompt-template only:
`codex-rs/core/templates/memories/consolidation.md`
- No runtime/code changes

## Validation
- Manual diff review only
2026-02-24 09:41:20 +00:00
pakrym-oai
68a7d98363 Simplify skill tracking (#12652)
Remove a few layers of structs and store SkillMetadata.

---------

Co-authored-by: alexsong-oai <alexsong@openai.com>
2026-02-23 22:47:39 -08:00
sayan-oai
7e46e5b9c2 chore: rm hardcoded PRESETS list (#12650)
rm `PRESETS` list harcoded in `model_presets` as we now have bundled
`models.json` with equivalent info.

update logic to rely on bundled models instead, update tests.
2026-02-23 22:35:51 -08:00
pakrym-oai
58763afa0f Add skill approval event/response (#12633)
Set the stage for skill-level permission approval in addition to
command-level.

Behind a feature flag.
2026-02-23 22:28:58 -08:00
alexsong-oai
09a82f364f Support implicit skill invocation analytics events (#12049)
- use `skills_for_cwd` lookup to scope allowed skills and build
invocation context for downstream processing
- add detection in `stream_events_utils` to classify tool calls as
implicit skill invocations per the proposal (script runners, extensions,
`scripts` dirs, and SKILL.md reads)
- deduplicate invocations per turn and emit analytics/OTEL events on the
same background queue as explicit invokes
2026-02-23 21:55:49 -08:00
viyatb-oai
c3048ff90a feat(core): persist network approvals in execpolicy (#12357)
## Summary
Persist network approval allow/deny decisions as `network_rule(...)`
entries in execpolicy (not proxy config)

It adds `network_rule` parsing + append support in `codex-execpolicy`,
including `decision="prompt"` (parse-only; not compiled into proxy
allow/deny lists)
- compile execpolicy network rules into proxy allow/deny lists and
update the live proxy state on approval
- preserve requirements execpolicy `network_rule(...)` entries when
merging with file-based execpolicy
- reject broad wildcard hosts (for example `*`) for persisted
`network_rule(...)`
2026-02-23 21:37:46 -08:00
github-actions[bot]
d580995957 Update models.json (#11408)
Automated update of models.json.

---------

Co-authored-by: sayan-oai <244841968+sayan-oai@users.noreply.github.com>
Co-authored-by: sayan-oai <sayan@openai.com>
2026-02-23 18:37:31 -08:00
Ahmed Ibrahim
10a3adad8e Handle realtime spawn_transcript delegation (#12619) 2026-02-23 14:39:07 -08:00
Jeremy Rose
855e275591 voice transcription (#3381)
Adds voice transcription on press-and-hold of spacebar.


https://github.com/user-attachments/assets/85039314-26f3-46d1-a83b-8c4a4a1ecc21

---------

Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
Co-authored-by: David Zbarsky <zbarsky@openai.com>
2026-02-23 22:15:18 +00:00
Michael Bolin
7f75e74201 Use Arc-based ToolCtx in tool runtimes (#12583)
## Why
Tool handlers and runtimes needed to pass the same turn/session context
for shell and non-shell workflows without duplicative ownership churn.
Using shared pointers avoids temporary lifetimes and keeps existing
behavior unchanged while simplifying call sites.

## What changed
- Converted `ToolCtx` to store shared context handles (`Arc`-based),
including updates across shell, apply-patch, and unified-exec paths.
- Updated orchestrator/runtime call sites to consume the shared context
consistently and remove brittle move/borrow patterns.
- Kept behavior unchanged while preparing the type surface for the new
shell escalation integration in the next stack commit.

## Verification
- Validated this commit stack point with `just clippy` and confirmed
workspace compiles cleanly in this stack state.

[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12583).
* #12584
* __->__ #12583
* #12556
2026-02-23 18:29:26 +00:00
Ahmed Ibrahim
6e60f724bc remove feature flag collaboration modes (#12028)
All code should go in the direction that steer is enabled

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-23 09:06:08 -08:00
jif-oai
eace7c6610 feat: land sqlite (#12141) 2026-02-23 16:12:23 +00:00
jif-oai
2119532a81 feat: role metrics multi-agent (#12579)
add metrics for agent role
2026-02-23 15:55:48 +00:00
jif-oai
e8709bc11a chore: rename memory feature flag (#12580)
`memory_tool` -> `memories`
2026-02-23 15:37:12 +00:00
jif-oai
cf0210bf22 feat: agent nick names to model (#12575) 2026-02-23 13:44:37 +00:00
jif-oai
2b9d0c385f chore: add doc to memories (#12565)
]
2026-02-23 10:52:58 +00:00
jif-oai
cfcbff4c48 chore: awaiter (#12562) 2026-02-23 10:28:24 +00:00
jif-oai
8e9312958d chore: nit name (#12559) 2026-02-23 08:49:41 +00:00
pakrym-oai
335a4e1cbc Return image content from view_image (#12553)
Responses API supports image content
2026-02-22 23:00:08 -08:00
Michael Bolin
e8949f4507 test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518)
## Why

The zsh integration tests were still brittle in two ways:

- they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
they often did not exercise the patched zsh fork that `shell-tool-mcp`
ships
- once the tests consistently used the vendored zsh fork, they exposed
real Linux-specific zsh-fork issues in CI

In particular, the Linux failures were not just test noise:

- the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
`codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
could receive malformed arguments
- the
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
test uses the zsh exec bridge (which talks to the parent over a Unix
socket), but Linux restricted sandbox seccomp denies `connect(2)`,
causing timeouts on `ubuntu-24.04` x86/arm

This PR makes the zsh tests consistently run against the intended
vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
signal is meaningful.

## What Changed

- Added a single shared test-only DotSlash file for the patched zsh fork
at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
`bash` test resource).
- Updated both app-server and exec-server zsh tests to use that shared
DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
dependency).
- Updated the app-server zsh-fork test helper to resolve the shared
DotSlash zsh and avoid silently falling back to host zsh.
- Kept the app-server zsh-fork tests configured via `config.toml`, using
a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
to `-c`) for the subcommand-decline test.
- Hardened the app-server subcommand-decline zsh-fork test for CI
variability:
  - tolerate an extra `/responses` POST with a no-op mock response
- tolerate non-target approval ordering while remaining strict on the
two `/usr/bin/true` approvals and decline behavior
- use `DangerFullAccess` on Linux for this one test because it validates
zsh approval flow, not Linux sandbox socket restrictions
- Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
`ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
arg0 dispatch continues to work.
- Moved `maybe_run_zsh_exec_wrapper_mode()` under
`arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
handling coexists correctly with arg0-dispatched helper modes.
- Consolidated duplicated `dotslash -- fetch` resolution logic into
shared test support (`core/tests/common/lib.rs`).
- Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
differences by:
  - resolving an absolute `git` path
  - running `git init --quiet .`
- asserting success / `.git` creation instead of relying on banner text

## Verification

- `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
- `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
- `bazel test //codex-rs/exec-server:exec-server-all-test
--test_output=streamed --test_arg=--nocapture
--test_arg=accept_elicitation_for_prompt_rule_with_zsh`
- CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
aarch64-unknown-linux-gnu` passed in [run
22291424358](https://github.com/openai/codex/actions/runs/22291424358)
2026-02-22 19:39:56 -08:00
Ahmed Ibrahim
e00fa19328 Revert "Revert "Route inbound realtime text into turn start or steer"" (#12480)
With working tests this time

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-22 11:54:16 -08:00
jif-oai
4666a6e631 feat: monitor role (#12364) 2026-02-22 14:13:56 +00:00
Ahmed Ibrahim
55fc075723 Send events to realtime api (#12423)
- Send assistant messages, ExecCommandBegin, and
PatchApplyBegin/PatchApplyEnd
2026-02-21 23:24:51 -08:00
Felipe Coury
c4f1af7a86 feat(tui): syntax highlighting via syntect with theme picker (#11447)
## Summary

Adds syntax highlighting to the TUI for fenced code blocks in markdown
responses and file diffs, plus a `/theme` command with live preview and
persistent theme selection. Uses syntect (~250 grammars, 32 bundled
themes, ~1 MB binary cost) — the same engine behind `bat`, `delta`, and
`xi-editor`. Includes guardrails for large inputs, graceful fallback to
plain text, and SSH-aware clipboard integration for the `/copy` command.

<img width="1554" height="1014" alt="image"
src="https://github.com/user-attachments/assets/38737a79-8717-4715-b857-94cf1ba59b85"
/>

<img width="2354" height="1374" alt="image"
src="https://github.com/user-attachments/assets/25d30a00-c487-4af8-9cb6-63b0695a4be7"
/>

## Problem

Code blocks in the TUI (markdown responses and file diffs) render
without syntax highlighting, making it hard to scan code at a glance.
Users also have no way to pick a color theme that matches their terminal
aesthetic.

## Mental model

The highlighting system has three layers:

1. **Syntax engine** (`render::highlight`) -- a thin wrapper around
syntect + two-face. It owns a process-global `SyntaxSet` (~250 grammars)
and a `RwLock<Theme>` that can be swapped at runtime. All public entry
points accept `(code, lang)` and return ratatui `Span`/`Line` vectors or
`None` when the language is unrecognized or the input exceeds safety
guardrails.

2. **Rendering consumers** -- `markdown_render` feeds fenced code blocks
through the engine; `diff_render` highlights Add/Delete content as a
whole file and Update hunks per-hunk (preserving parser state across
hunk lines). Both callers fall back to plain unstyled text when the
engine returns `None`.

3. **Theme lifecycle** -- at startup the config's `tui.theme` is
resolved to a syntect `Theme` via `set_theme_override`. At runtime the
`/theme` picker calls `set_syntax_theme` to swap themes live; on cancel
it restores the snapshot taken at open. On confirm it persists `[tui]
theme = "..."` to config.toml.

## Non-goals

- Inline diff highlighting (word-level change detection within a line).
- Semantic / LSP-backed highlighting.
- Theme authoring tooling; users supply standard `.tmTheme` files.

## Tradeoffs

| Decision | Upside | Downside |
| ------------------------------------------------ |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-----------------------------------------------------------------------------------------------------------------------
|
| syntect over tree-sitter / arborium | ~1 MB binary increase for ~250
grammars + 32 themes; battle-tested crate powering widely-used tools
(`bat`, `delta`, `xi-editor`). tree-sitter would add ~12 MB for 20-30
languages or ~35 MB for full coverage. | Regex-based; less structurally
accurate than tree-sitter for some languages (e.g. language injections
like JS-in-HTML). |
| Global `RwLock<Theme>` | Enables live `/theme` preview without
threading Theme through every call site | Lock contention risk
(mitigated: reads vastly outnumber writes, single UI thread) |
| Skip background / italic / underline from themes | Terminal BG
preserved, avoids ugly rendering on some themes | Themes that rely on
these properties lose fidelity |
| Guardrails: 512 KB / 10k lines | Prevents pathological stalls on huge
diffs or pastes | Very large files render without color |

## Architecture

```
config.toml  ─[tui.theme]─>  set_theme_override()  ─>  THEME (RwLock)
                                                              │
                  ┌───────────────────────────────────────────┘
                  │
  markdown_render ─── highlight_code_to_lines(code, lang) ─> Vec<Line>
  diff_render     ─── highlight_code_to_styled_spans(code, lang) ─> Option<Vec<Vec<Span>>>
                  │
                  │   (None ⇒ plain text fallback)
                  │
  /theme picker   ─── set_syntax_theme(theme)    // live preview swap
                  ─── current_syntax_theme()      // snapshot for cancel
                  ─── resolve_theme_by_name(name) // lookup by kebab-case
```

Key files:

- `tui/src/render/highlight.rs` -- engine, theme management, guardrails
- `tui/src/diff_render.rs` -- syntax-aware diff line wrapping
- `tui/src/theme_picker.rs` -- `/theme` command builder
- `tui/src/bottom_pane/list_selection_view.rs` -- side content panel,
callbacks
- `core/src/config/types.rs` -- `Tui::theme` field
- `core/src/config/edit.rs` -- `syntax_theme_edit()` helper

## Observability

- `tracing::warn` when a configured theme name cannot be resolved.
- `Config::startup_warnings` surfaces the same message as a TUI banner.
- `tracing::error` when persisting theme selection fails.

## Tests

- Unit tests in `highlight.rs`: language coverage, fallback behavior,
CRLF stripping, style conversion, guardrail enforcement, theme name
mapping exhaustiveness.
- Unit tests in `diff_render.rs`: snapshot gallery at multiple terminal
sizes (80x24, 94x35, 120x40), syntax-highlighted wrapping, large-diff
guardrail, rename-to-different-extension highlighting, parser state
preservation across hunk lines.
- Unit tests in `theme_picker.rs`: preview rendering (wide + narrow),
dim overlay on deletions, subtitle truncation, cancel-restore, fallback
for unavailable configured theme.
- Unit tests in `list_selection_view.rs`: side layout geometry, stacked
fallback, buffer clearing, cancel/selection-changed callbacks.
- Integration test in `lib.rs`: theme warning uses the final
(post-resume) config.

## Cargo Deny: Unmaintained Dependency Exceptions

This PR adds two `cargo deny` advisory exceptions for transitive
dependencies pulled in by `syntect v5.3.0`:

| Advisory | Crate | Status |
|----------|-------|--------|
| RUSTSEC-2024-0320 | `yaml-rust` | Unmaintained (maintainer
unreachable) |
| RUSTSEC-2025-0141 | `bincode` | Unmaintained (development ceased;
v1.3.3 considered complete) |

**Why this is safe in our usage:**

- Neither advisory describes a known security vulnerability. Both are
"unmaintained" notices only.
- `bincode` is used by syntect to deserialize pre-compiled syntax sets.
Again, these are **static vendored artifacts** baked into the binary at
build time. No user-supplied bincode data is ever deserialized. - Attack
surface is zero for both crates; exploitation would require a
supply-chain compromise of our own build artifacts.
- These exceptions can be removed when syntect migrates to `yaml-rust2`
and drops `bincode`, or when alternative crates are available upstream.
2026-02-21 20:26:58 -08:00
Alex Kwiatkowski
1dad0a7f4a Make shell detection tests
robust to Nix shell paths (#12476)

## Summary
- Updated `codex-rs/core/src/shell.rs` tests for shell detection to stop
asserting hardcoded shell paths.
- `detects_bash` and `detects_sh` now assert executable basenames
(`bash`, `sh`) rather than `/bin/*`/`/usr/bin/*` absolute paths.
- This keeps behavior the same while avoiding failures in Nix
environments where shells are resolved from `/nix/store/.../bin`.

## Testing
- `nix develop .#default --command sh -lc 'export
PKG_CONFIG_PATH=/nix/store/6az1q591wwlgazzskngr6rl7gmhpyvnc-libcap-2.77-dev/lib/pkgconfig:/nix/store/fgm3pz8486ksh3f94629lpb7xjr2wjp7-openssl-3.6.0-dev/lib/pkgconfig:$PKG_CONFIG_PATH;
export PKG_CONFIG_PATH_FOR_TARGET=$PKG_CONFIG_PATH; cd
/home/alex/workspace/openai/codex/codex-rs && cargo test -p codex-core
--lib detects_bash && cargo test -p codex-core --lib detects_sh'`

## Why
The two failing tests previously hardcoded fixed paths and failed under
the Nix shell due to Nix-provided shell binary locations.

## Links
- Bug report / enhancement request: not publicly filed yet; this was
reproduced in the local Nix environment.
2026-02-21 20:08:02 -08:00
Michael Bolin
b73c4b50a2 fix: make realtime conversation flake test order-insensitive (#12475)
## Why

`codex-core::all` has a flaky test,
`suite::realtime_conversation::conversation_start_audio_text_close_round_trip`,
that assumes a fixed ordering between `conversation.item.create` and
`response.input_audio.delta` requests.

That ordering is not guaranteed: realtime text and audio input are
forwarded through separate queues and a background task, so either
request can be observed first while still being correct behavior.

## What Changed

- Updated the assertion in
`codex-rs/core/tests/suite/realtime_conversation.rs` to compare the two
observed request types order-independently.
- Kept the existing checks that `session.create` is sent first and that
exactly two follow-up requests are recorded.

## Verification

- Re-ran `cargo test -p codex-core --test all
conversation_start_audio_text_close_round_trip` 10 times locally.
2026-02-21 17:06:35 -08:00
Ahmed Ibrahim
5e505ff877 Revert "Route inbound realtime text into turn start or steer" (#12479)
Reverts openai/codex#12469
2026-02-21 15:46:03 -08:00
Ahmed Ibrahim
031d701705 Route inbound realtime text into turn start or steer (#12469)
- Route inbound realtime websocket text into normal user input handling
so it steers an active turn or starts a new one
2026-02-21 15:45:27 -08:00
Michael Bolin
66d5d34e6e core: preserve constrained approval/sandbox policies in TurnContext (#12473) 2026-02-21 14:40:24 -08:00
Michael Bolin
f33ac830aa fix: make skills loader tests hermetic with ~/.agents skills (#12474) 2026-02-21 14:40:13 -08:00
Eric Traut
3586fcb802 Improve token usage estimate for images (#12419)
Fixes #11845.

Adjust context/token estimation for inline image `data:*;base64,...`
URLs so we
do not count the raw base64 payload as model-visible text.

What changed:
- keep the existing JSON-length estimator as the baseline
- detect only inline base64 `data:` image URLs in message and
function-call
  output content items
- subtract only the base64 payload bytes (preserving data URL prefix +
JSON
  overhead)
- add a fixed per-image estimate of 340 bytes (~85 tokens at the repo’s
  4-bytes/token heuristic)

This avoids large overestimates from MCP image tool outputs while
leaving normal
image URLs (`https://`, `file://`, non-base64 `data:` URLs) unchanged.

Tests:
- message image data URL estimate regression
- function-call output image data URL estimate regression
- non-base64 image URLs unchanged
- non-base64 `data:` URLs unchanged
- `data:application/octet-stream;base64,...` adjusted
- multiple inline images apply multiple fixed costs
- text-only items unchanged
2026-02-21 14:25:36 -08:00
pakrym-oai
b17148f13a Prefer v2 websockets if available (#12428)
And also cleanup settings flow to avoid reading many separate flags.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-21 20:08:04 +00:00
sayan-oai
5a635f3427 profile-level model_catalog_json overrie (#12410)
enable `model-catalog_json` config value on `ConfigProfile` as well
2026-02-21 19:39:02 +00:00