Commit Graph

1866 Commits

Author SHA1 Message Date
Michael Bolin
86bfa68e42 Use Arc-based ToolCtx in tool runtimes 2026-02-23 09:52:41 -08:00
Ahmed Ibrahim
6e60f724bc remove feature flag collaboration modes (#12028)
All code should go in the direction that steer is enabled

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-23 09:06:08 -08:00
jif-oai
eace7c6610 feat: land sqlite (#12141) 2026-02-23 16:12:23 +00:00
jif-oai
2119532a81 feat: role metrics multi-agent (#12579)
add metrics for agent role
2026-02-23 15:55:48 +00:00
jif-oai
e8709bc11a chore: rename memory feature flag (#12580)
`memory_tool` -> `memories`
2026-02-23 15:37:12 +00:00
jif-oai
cf0210bf22 feat: agent nick names to model (#12575) 2026-02-23 13:44:37 +00:00
jif-oai
2b9d0c385f chore: add doc to memories (#12565)
]
2026-02-23 10:52:58 +00:00
jif-oai
cfcbff4c48 chore: awaiter (#12562) 2026-02-23 10:28:24 +00:00
jif-oai
8e9312958d chore: nit name (#12559) 2026-02-23 08:49:41 +00:00
pakrym-oai
335a4e1cbc Return image content from view_image (#12553)
Responses API supports image content
2026-02-22 23:00:08 -08:00
Michael Bolin
e8949f4507 test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518)
## Why

The zsh integration tests were still brittle in two ways:

- they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
they often did not exercise the patched zsh fork that `shell-tool-mcp`
ships
- once the tests consistently used the vendored zsh fork, they exposed
real Linux-specific zsh-fork issues in CI

In particular, the Linux failures were not just test noise:

- the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
`codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
could receive malformed arguments
- the
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
test uses the zsh exec bridge (which talks to the parent over a Unix
socket), but Linux restricted sandbox seccomp denies `connect(2)`,
causing timeouts on `ubuntu-24.04` x86/arm

This PR makes the zsh tests consistently run against the intended
vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
signal is meaningful.

## What Changed

- Added a single shared test-only DotSlash file for the patched zsh fork
at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
`bash` test resource).
- Updated both app-server and exec-server zsh tests to use that shared
DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
dependency).
- Updated the app-server zsh-fork test helper to resolve the shared
DotSlash zsh and avoid silently falling back to host zsh.
- Kept the app-server zsh-fork tests configured via `config.toml`, using
a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
to `-c`) for the subcommand-decline test.
- Hardened the app-server subcommand-decline zsh-fork test for CI
variability:
  - tolerate an extra `/responses` POST with a no-op mock response
- tolerate non-target approval ordering while remaining strict on the
two `/usr/bin/true` approvals and decline behavior
- use `DangerFullAccess` on Linux for this one test because it validates
zsh approval flow, not Linux sandbox socket restrictions
- Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
`ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
arg0 dispatch continues to work.
- Moved `maybe_run_zsh_exec_wrapper_mode()` under
`arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
handling coexists correctly with arg0-dispatched helper modes.
- Consolidated duplicated `dotslash -- fetch` resolution logic into
shared test support (`core/tests/common/lib.rs`).
- Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
differences by:
  - resolving an absolute `git` path
  - running `git init --quiet .`
- asserting success / `.git` creation instead of relying on banner text

## Verification

- `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
- `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
- `bazel test //codex-rs/exec-server:exec-server-all-test
--test_output=streamed --test_arg=--nocapture
--test_arg=accept_elicitation_for_prompt_rule_with_zsh`
- CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
aarch64-unknown-linux-gnu` passed in [run
22291424358](https://github.com/openai/codex/actions/runs/22291424358)
2026-02-22 19:39:56 -08:00
Ahmed Ibrahim
e00fa19328 Revert "Revert "Route inbound realtime text into turn start or steer"" (#12480)
With working tests this time

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-22 11:54:16 -08:00
jif-oai
4666a6e631 feat: monitor role (#12364) 2026-02-22 14:13:56 +00:00
Ahmed Ibrahim
55fc075723 Send events to realtime api (#12423)
- Send assistant messages, ExecCommandBegin, and
PatchApplyBegin/PatchApplyEnd
2026-02-21 23:24:51 -08:00
Felipe Coury
c4f1af7a86 feat(tui): syntax highlighting via syntect with theme picker (#11447)
## Summary

Adds syntax highlighting to the TUI for fenced code blocks in markdown
responses and file diffs, plus a `/theme` command with live preview and
persistent theme selection. Uses syntect (~250 grammars, 32 bundled
themes, ~1 MB binary cost) — the same engine behind `bat`, `delta`, and
`xi-editor`. Includes guardrails for large inputs, graceful fallback to
plain text, and SSH-aware clipboard integration for the `/copy` command.

<img width="1554" height="1014" alt="image"
src="https://github.com/user-attachments/assets/38737a79-8717-4715-b857-94cf1ba59b85"
/>

<img width="2354" height="1374" alt="image"
src="https://github.com/user-attachments/assets/25d30a00-c487-4af8-9cb6-63b0695a4be7"
/>

## Problem

Code blocks in the TUI (markdown responses and file diffs) render
without syntax highlighting, making it hard to scan code at a glance.
Users also have no way to pick a color theme that matches their terminal
aesthetic.

## Mental model

The highlighting system has three layers:

1. **Syntax engine** (`render::highlight`) -- a thin wrapper around
syntect + two-face. It owns a process-global `SyntaxSet` (~250 grammars)
and a `RwLock<Theme>` that can be swapped at runtime. All public entry
points accept `(code, lang)` and return ratatui `Span`/`Line` vectors or
`None` when the language is unrecognized or the input exceeds safety
guardrails.

2. **Rendering consumers** -- `markdown_render` feeds fenced code blocks
through the engine; `diff_render` highlights Add/Delete content as a
whole file and Update hunks per-hunk (preserving parser state across
hunk lines). Both callers fall back to plain unstyled text when the
engine returns `None`.

3. **Theme lifecycle** -- at startup the config's `tui.theme` is
resolved to a syntect `Theme` via `set_theme_override`. At runtime the
`/theme` picker calls `set_syntax_theme` to swap themes live; on cancel
it restores the snapshot taken at open. On confirm it persists `[tui]
theme = "..."` to config.toml.

## Non-goals

- Inline diff highlighting (word-level change detection within a line).
- Semantic / LSP-backed highlighting.
- Theme authoring tooling; users supply standard `.tmTheme` files.

## Tradeoffs

| Decision | Upside | Downside |
| ------------------------------------------------ |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-----------------------------------------------------------------------------------------------------------------------
|
| syntect over tree-sitter / arborium | ~1 MB binary increase for ~250
grammars + 32 themes; battle-tested crate powering widely-used tools
(`bat`, `delta`, `xi-editor`). tree-sitter would add ~12 MB for 20-30
languages or ~35 MB for full coverage. | Regex-based; less structurally
accurate than tree-sitter for some languages (e.g. language injections
like JS-in-HTML). |
| Global `RwLock<Theme>` | Enables live `/theme` preview without
threading Theme through every call site | Lock contention risk
(mitigated: reads vastly outnumber writes, single UI thread) |
| Skip background / italic / underline from themes | Terminal BG
preserved, avoids ugly rendering on some themes | Themes that rely on
these properties lose fidelity |
| Guardrails: 512 KB / 10k lines | Prevents pathological stalls on huge
diffs or pastes | Very large files render without color |

## Architecture

```
config.toml  ─[tui.theme]─>  set_theme_override()  ─>  THEME (RwLock)
                                                              │
                  ┌───────────────────────────────────────────┘
                  │
  markdown_render ─── highlight_code_to_lines(code, lang) ─> Vec<Line>
  diff_render     ─── highlight_code_to_styled_spans(code, lang) ─> Option<Vec<Vec<Span>>>
                  │
                  │   (None ⇒ plain text fallback)
                  │
  /theme picker   ─── set_syntax_theme(theme)    // live preview swap
                  ─── current_syntax_theme()      // snapshot for cancel
                  ─── resolve_theme_by_name(name) // lookup by kebab-case
```

Key files:

- `tui/src/render/highlight.rs` -- engine, theme management, guardrails
- `tui/src/diff_render.rs` -- syntax-aware diff line wrapping
- `tui/src/theme_picker.rs` -- `/theme` command builder
- `tui/src/bottom_pane/list_selection_view.rs` -- side content panel,
callbacks
- `core/src/config/types.rs` -- `Tui::theme` field
- `core/src/config/edit.rs` -- `syntax_theme_edit()` helper

## Observability

- `tracing::warn` when a configured theme name cannot be resolved.
- `Config::startup_warnings` surfaces the same message as a TUI banner.
- `tracing::error` when persisting theme selection fails.

## Tests

- Unit tests in `highlight.rs`: language coverage, fallback behavior,
CRLF stripping, style conversion, guardrail enforcement, theme name
mapping exhaustiveness.
- Unit tests in `diff_render.rs`: snapshot gallery at multiple terminal
sizes (80x24, 94x35, 120x40), syntax-highlighted wrapping, large-diff
guardrail, rename-to-different-extension highlighting, parser state
preservation across hunk lines.
- Unit tests in `theme_picker.rs`: preview rendering (wide + narrow),
dim overlay on deletions, subtitle truncation, cancel-restore, fallback
for unavailable configured theme.
- Unit tests in `list_selection_view.rs`: side layout geometry, stacked
fallback, buffer clearing, cancel/selection-changed callbacks.
- Integration test in `lib.rs`: theme warning uses the final
(post-resume) config.

## Cargo Deny: Unmaintained Dependency Exceptions

This PR adds two `cargo deny` advisory exceptions for transitive
dependencies pulled in by `syntect v5.3.0`:

| Advisory | Crate | Status |
|----------|-------|--------|
| RUSTSEC-2024-0320 | `yaml-rust` | Unmaintained (maintainer
unreachable) |
| RUSTSEC-2025-0141 | `bincode` | Unmaintained (development ceased;
v1.3.3 considered complete) |

**Why this is safe in our usage:**

- Neither advisory describes a known security vulnerability. Both are
"unmaintained" notices only.
- `bincode` is used by syntect to deserialize pre-compiled syntax sets.
Again, these are **static vendored artifacts** baked into the binary at
build time. No user-supplied bincode data is ever deserialized. - Attack
surface is zero for both crates; exploitation would require a
supply-chain compromise of our own build artifacts.
- These exceptions can be removed when syntect migrates to `yaml-rust2`
and drops `bincode`, or when alternative crates are available upstream.
2026-02-21 20:26:58 -08:00
Alex Kwiatkowski
1dad0a7f4a Make shell detection tests
robust to Nix shell paths (#12476)

## Summary
- Updated `codex-rs/core/src/shell.rs` tests for shell detection to stop
asserting hardcoded shell paths.
- `detects_bash` and `detects_sh` now assert executable basenames
(`bash`, `sh`) rather than `/bin/*`/`/usr/bin/*` absolute paths.
- This keeps behavior the same while avoiding failures in Nix
environments where shells are resolved from `/nix/store/.../bin`.

## Testing
- `nix develop .#default --command sh -lc 'export
PKG_CONFIG_PATH=/nix/store/6az1q591wwlgazzskngr6rl7gmhpyvnc-libcap-2.77-dev/lib/pkgconfig:/nix/store/fgm3pz8486ksh3f94629lpb7xjr2wjp7-openssl-3.6.0-dev/lib/pkgconfig:$PKG_CONFIG_PATH;
export PKG_CONFIG_PATH_FOR_TARGET=$PKG_CONFIG_PATH; cd
/home/alex/workspace/openai/codex/codex-rs && cargo test -p codex-core
--lib detects_bash && cargo test -p codex-core --lib detects_sh'`

## Why
The two failing tests previously hardcoded fixed paths and failed under
the Nix shell due to Nix-provided shell binary locations.

## Links
- Bug report / enhancement request: not publicly filed yet; this was
reproduced in the local Nix environment.
2026-02-21 20:08:02 -08:00
Michael Bolin
b73c4b50a2 fix: make realtime conversation flake test order-insensitive (#12475)
## Why

`codex-core::all` has a flaky test,
`suite::realtime_conversation::conversation_start_audio_text_close_round_trip`,
that assumes a fixed ordering between `conversation.item.create` and
`response.input_audio.delta` requests.

That ordering is not guaranteed: realtime text and audio input are
forwarded through separate queues and a background task, so either
request can be observed first while still being correct behavior.

## What Changed

- Updated the assertion in
`codex-rs/core/tests/suite/realtime_conversation.rs` to compare the two
observed request types order-independently.
- Kept the existing checks that `session.create` is sent first and that
exactly two follow-up requests are recorded.

## Verification

- Re-ran `cargo test -p codex-core --test all
conversation_start_audio_text_close_round_trip` 10 times locally.
2026-02-21 17:06:35 -08:00
Ahmed Ibrahim
5e505ff877 Revert "Route inbound realtime text into turn start or steer" (#12479)
Reverts openai/codex#12469
2026-02-21 15:46:03 -08:00
Ahmed Ibrahim
031d701705 Route inbound realtime text into turn start or steer (#12469)
- Route inbound realtime websocket text into normal user input handling
so it steers an active turn or starts a new one
2026-02-21 15:45:27 -08:00
Michael Bolin
66d5d34e6e core: preserve constrained approval/sandbox policies in TurnContext (#12473) 2026-02-21 14:40:24 -08:00
Michael Bolin
f33ac830aa fix: make skills loader tests hermetic with ~/.agents skills (#12474) 2026-02-21 14:40:13 -08:00
Eric Traut
3586fcb802 Improve token usage estimate for images (#12419)
Fixes #11845.

Adjust context/token estimation for inline image `data:*;base64,...`
URLs so we
do not count the raw base64 payload as model-visible text.

What changed:
- keep the existing JSON-length estimator as the baseline
- detect only inline base64 `data:` image URLs in message and
function-call
  output content items
- subtract only the base64 payload bytes (preserving data URL prefix +
JSON
  overhead)
- add a fixed per-image estimate of 340 bytes (~85 tokens at the repo’s
  4-bytes/token heuristic)

This avoids large overestimates from MCP image tool outputs while
leaving normal
image URLs (`https://`, `file://`, non-base64 `data:` URLs) unchanged.

Tests:
- message image data URL estimate regression
- function-call output image data URL estimate regression
- non-base64 image URLs unchanged
- non-base64 `data:` URLs unchanged
- `data:application/octet-stream;base64,...` adjusted
- multiple inline images apply multiple fixed costs
- text-only items unchanged
2026-02-21 14:25:36 -08:00
pakrym-oai
b17148f13a Prefer v2 websockets if available (#12428)
And also cleanup settings flow to avoid reading many separate flags.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-21 20:08:04 +00:00
sayan-oai
5a635f3427 profile-level model_catalog_json overrie (#12410)
enable `model-catalog_json` config value on `ConfigProfile` as well
2026-02-21 19:39:02 +00:00
Michael Bolin
85ce91a5b3 refactor(core): move embedded system skills into codex-skills crate (#12435)
## Why

`codex-core` was carrying the embedded system-skill sample assets (and a
`build.rs` that walks those files to register rerun triggers). Those
assets change infrequently, but any change under `codex-core` still ties
them to `codex-core`'s build/cache lifecycle.

This change moves the embedded system-skills packaging into a dedicated
`codex-skills` crate so it can be cached independently. That reduces
unnecessary invalidation/rebuild pressure on `codex-core` when the
skills bundle is the only thing that changes.

## What Changed

- Added a new `codex-rs/skills` crate (`codex-skills`) with:
  - `Cargo.toml`
  - `BUILD.bazel`
  - `build.rs` to track skill asset file changes for Cargo rebuilds
- `src/lib.rs` containing the embedded system-skills install/cache logic
previously in `codex-core`
- Moved the embedded sample skill assets from
`codex-rs/core/src/skills/assets/samples` to
`codex-rs/skills/src/assets/samples`.
- Updated `codex-rs/core/Cargo.toml` to depend on `codex-skills` and
removed `codex-core`'s direct `include_dir` dependency.
- Removed `codex-core`'s `build.rs`.
- Replaced `codex-rs/core/src/skills/system.rs` implementation with a
thin re-export wrapper to keep existing `codex-core` call sites
unchanged.
- Updated workspace manifests/lockfile (`codex-rs/Cargo.toml`,
`codex-rs/Cargo.lock`) for the new crate.
2026-02-21 08:34:08 +00:00
Michael Bolin
2fe4be1aa9 fix: codex-arg0 no longer depends on codex-core (#12434)
## Why

`codex-rs/arg0` only needed two things from `codex-core`:

- the `find_codex_home()` wrapper
- the special argv flag used for the internal `apply_patch`
self-invocation path

That made `codex-arg0` depend on `codex-core` for a very small surface
area. This change removes that dependency edge and moves the shared
`apply_patch` invocation flag to a more natural boundary
(`codex-apply-patch`) while keeping the contract explicitly documented.

## What Changed

- Moved the internal `apply_patch` argv[1] flag constant out of
`codex-core` and into `codex-apply-patch`.
- Renamed the constant to `CODEX_CORE_APPLY_PATCH_ARG1` and documented
that it is part of the Codex core process-invocation contract (even
though it now lives in `codex-apply-patch`).
- Updated `arg0`, the core apply-patch runtime, and the `codex-exec`
apply-patch test to import the constant from `codex-apply-patch`.
- Updated `codex-rs/arg0` to call
`codex_utils_home_dir::find_codex_home()` directly instead of
`codex_core::config::find_codex_home()`.
- Removed the `codex-core` dependency from `codex-rs/arg0` and added the
needed direct dependency on `codex-utils-home-dir`.
- Added `codex-apply-patch` as a dev-dependency for `codex-rs/exec`
tests (the apply-patch test now imports the moved constant directly).

## Verification

- `cargo test -p codex-apply-patch`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core --lib apply_patch`
- `cargo test -p codex-exec
test_standalone_exec_cli_can_use_apply_patch`
- `cargo shear`
2026-02-21 00:20:42 -08:00
Michael Bolin
1af2a37ada chore: remove codex-core public protocol/shell re-exports (#12432)
## Why

`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.

This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.

## What Changed

- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
  - `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
  - `codex_protocol::protocol`
  - `codex_protocol::config_types`
  - `codex_protocol::models`
  - `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
  - `codex-utils-approval-presets`
  - `codex-utils-cli`

## Verification

- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
2026-02-20 23:45:35 -08:00
Michael Bolin
1a220ad77d chore: move config diagnostics out of codex-core (#12427)
## Why

Compiling `codex-rs/core` is a bottleneck for local iteration, so this
change continues the ongoing extraction of config-related functionality
out of `codex-core` and into `codex-config`.

The goal is not just to move code, but to reduce `codex-core` ownership
and indirection so more code depends on `codex-config` directly.

## What Changed

- Moved config diagnostics logic from
`core/src/config_loader/diagnostics.rs` into
`config/src/diagnostics.rs`.
- Updated `codex-core` to use `codex-config` diagnostics types/functions
directly where possible.
- Removed the `core/src/config_loader/diagnostics.rs` shim module
entirely; the remaining `ConfigToml`-specific calls are in
`core/src/config_loader/mod.rs`.
- Moved `CONFIG_TOML_FILE` into `codex-config` and updated existing
references to use `codex_config::CONFIG_TOML_FILE` directly.
- Added a direct `codex-config` dependency to `codex-cli` for its
`CONFIG_TOML_FILE` use.
2026-02-20 23:19:29 -08:00
Charley Cunningham
bb0ac5be70 Fix compaction context reinjection and model baselines (#12252)
## Summary
- move regular-turn context diff/full-context persistence into
`run_turn` so pre-turn compaction runs before incoming context updates
are recorded
- after successful pre-turn compaction, rely on a cleared
`reference_context_item` to trigger full context reinjection on the
follow-up regular turn (manual `/compact` keeps replacement history
summary-only and also clears the baseline)
- preserve `<model_switch>` when full context is reinjected, and inject
it *before* the rest of the full-context items
- scope `reference_context_item` and `previous_model` to regular user
turns only so standalone tasks (`/compact`, shell, review, undo) cannot
suppress future reinjection or `<model_switch>` behavior
- make context-diff persistence + `reference_context_item` updates
explicit in the regular-turn path, with clearer docs/comments around the
invariant
- stop persisting local `/compact` `RolloutItem::TurnContext` snapshots
(only regular turns persist `TurnContextItem` now)
- simplify resume/fork previous-model/reference-baseline hydration by
looking up the last surviving turn context from rollout lifecycle
events, including rollback and compaction-crossing handling
- remove the legacy fallback that guessed from bare `TurnContext`
rollouts without lifecycle events
- update compaction/remote-compaction/model-visible snapshots and
compact test assertions (including remote compaction mock response
shape)

## Why
We were persisting incoming context items before spawning the regular
turn task, which let pre-turn compaction requests accidentally include
incoming context diffs without the new user message. Fixing that exposed
follow-on baseline issues around `/compact`, resume/fork, and standalone
tasks that could cause duplicate context injection or suppress
`<model_switch>` instructions.

This PR re-centers the invariants around regular turns:
- regular turns persist model-visible context diffs/full reinjection and
update the `reference_context_item`
- standalone tasks do not advance those regular-turn baselines
- compaction clears the baseline when replacement history may have
stripped the referenced context diffs

## Follow-ups (TODOs left in code)
- `TODO(ccunningham)`: fix rollback/backtracking baseline handling more
comprehensively
- `TODO(ccunningham)`: include pending incoming context items in
pre-turn compaction threshold estimation
- `TODO(ccunningham)`: inject updated personality spec alongside
`<model_switch>` so some model-switch paths can avoid forced full
reinjection
- `TODO(ccunningham)`: review task turn lifecycle
(`TurnStarted`/`TurnComplete`) behavior and emit task-start context
diffs for task types that should have them (excluding `/compact`)

## Validation
- `just fmt`
- CI should cover the updated compaction/resume/model-visible snapshot
expectations and rollout-hydration behavior
- I did **not** rerun the full local test suite after the latest
resume-lookup / rollout-persistence simplifications
2026-02-20 23:13:08 -08:00
Dylan Hurd
a8b4b569fb fix(core) Filter non-matching prefix rules (#12314)
## Summary
`gpt-5.3-codex` really likes to write complicated shell scripts, and
suggest a partial prefix_rule that wouldn't actually approve the
command. We should only show the `prefix_rule` suggestion from the model
if it would actually fully approve the command the user is seeing.

This will technically cause more instances of overly-specific
suggestions when we fallback, but I think the UX is clearer,
particularly when the model doesn't necessarily understand the current
limitations of execpolicy parsing.

## Testing
 - [x] Add unit tests
 - [x] Add integration tests
2026-02-20 22:02:35 -08:00
Ahmed Ibrahim
b237f7cbb1 Add experimental realtime websocket backend prompt override (#12418)
- add top-level `experimental_realtime_ws_backend_prompt` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
`backend_prompt`, with config + realtime tests
2026-02-20 20:10:51 -08:00
Charley Cunningham
4c1744afb2 Improve Plan mode reasoning selection flow (#12303)
Addresses https://github.com/openai/codex/issues/11013

## Summary
- add a Plan implementation path in the TUI that lets users choose
reasoning before switching to Default mode and implementing
- add Plan-mode reasoning scope handling (Plan-only override vs
all-modes default), including config/schema/docs plumbing for
`plan_mode_reasoning_effort`
- remove the hardcoded Plan preset medium default and make the reasoning
popup reflect the active Plan override as `(current)`
- split the collaboration-mode switch notification UI hint into #12307
to keep this diff focused

If I have `plan_mode_reasoning_effort = "medium"` set in my
`config.toml`:
<img width="699" height="127" alt="Screenshot 2026-02-20 at 6 59 37 PM"
src="https://github.com/user-attachments/assets/b33abf04-6b7a-49ed-b2e9-d24b99795369"
/>

If I don't have `plan_mode_reasoning_effort` set in my `config.toml`:
<img width="704" height="129" alt="Screenshot 2026-02-20 at 7 01 51 PM"
src="https://github.com/user-attachments/assets/88a086d4-d2f1-49c7-8be4-f6f0c0fa1b8d"
/>

## Codex author
`codex resume 019c78a2-726b-7fe3-adac-3fa4523dcc2a`
2026-02-20 20:08:56 -08:00
Ahmed Ibrahim
7ae5d88016 Add experimental realtime websocket URL override (#12416)
- add top-level `experimental_realtime_ws_base_url` config key
(experimental / do not use) and include it in config schema
- apply the override only to `Op::RealtimeConversation` websocket
transport, with config + realtime tests
2026-02-20 19:51:20 -08:00
Ahmed Ibrahim
6817f0be8a Wire realtime api to core (#12268)
- Introduce `RealtimeConversationManager` for realtime API management 
- Add `op::conversation` to start conversation, insert audio, insert
text, and close conversation.
- emit conversation lifecycle and realtime events.
- Move shared realtime payload types into codex-protocol and add core
e2e websocket tests for start/replace/transport-close paths.

Things to consider:
- Should we use the same `op::` and `Events` channel to carry audio? I
think we should try this simple approach and later we can create
separate one if the channels got congested.
- Sending text updates to the client: we can start simple and later
restrict that.
- Provider auth isn't wired for now intentionally
2026-02-20 19:06:35 -08:00
Matthew Zeng
36a2a9fdbb [apps] Bump MCP tool call timeout. (#12405)
- [x] Bump MCP tool call timeout.
2026-02-20 17:35:07 -08:00
Matthew Zeng
4ebdddaa34 [apps] Fix gateway url. (#12403)
- [x] Fix connectors gateway url.
2026-02-21 00:47:15 +00:00
sayan-oai
65b9fe8f30 clarify model_catalog_json only applied on startup (#12379)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-02-20 15:04:36 -08:00
viyatb-oai
64f3827d10 Move sanitizer into codex-secrets (#12306)
## Summary
- move the sanitizer implementation into `codex-secrets`
(`secrets/src/sanitizer.rs`) and re-export `redact_secrets`
- switch `codex-core` to depend on/import `codex-secrets` for sanitizer
usage
- remove the old `utils/sanitizer` crate wiring and refresh lockfiles

## Testing
- `just fmt`
- `cargo test -p codex-secrets`
- `cargo test -p codex-core --no-run`
- `cargo clippy -p codex-secrets -p codex-core --all-targets
--all-features -- -D warnings`
- `just bazel-lock-update`
- `just bazel-lock-check`

## Notes
- not run: `cargo test --all-features` (full workspace suite)
2026-02-20 22:47:54 +00:00
viyatb-oai
60c2b7beca core tests: use hermetic mock server in review suite (#12291)
## Summary
- switch the review test SSE mock helper to use the shared hermetic mock
server setup
- ensure review tests always have a default `/v1/models` stub during
Codex session bootstrap
- remove the race that caused intermittent `/v1/models` connection
failures and flaky ETag refresh assertions

## Testing
- `just fmt`
- `cargo test -p codex-core --test all
refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch`
- `cargo test -p codex-core --test all
review_uses_custom_review_model_from_config`
- repeated both targeted tests 5x in a loop
- `cargo clippy -p codex-core --tests -- -D warnings`
2026-02-20 12:50:12 -08:00
colby-oai
d3cf8bd0fa fix(core): require approval for destructive MCP tool calls (#12353)
Summary
- ensure destructive tool annotations short-circuit to require approval
- simplify approval logic to only require read/write + open-world when
destructive is false
- update the unit test to cover the new destructive behavior

Testing
- Not run (not requested)
2026-02-20 12:12:16 -08:00
Matthew Zeng
aa121a115e [apps] Implement apps configs. (#12086)
- [x] Implement apps configs.
2026-02-20 12:05:21 -08:00
jif-oai
5034d4bd89 feat: add config allow_login_shell (#12312) 2026-02-20 20:02:24 +00:00
Curtis 'Fjord' Hawthorne
097620218d js_repl: remove codex.state helper references (#12275)
## Summary

This PR removes `codex.state` from the `js_repl` helper surface and
removes all corresponding documentation/instruction references.

## Motivation

Top-level bindings in `js_repl` now persist across cells, so the extra
`codex.state` helper is redundant and adds unnecessary API/docs surface.

## Changes

- Removed the long-lived `state` object from the Node kernel helper
wiring.
- Stopped exposing `codex.state` (and `context.state`) during `js_repl`
execution.
- Updated user-facing `js_repl` docs to remove `codex.state`.
- Updated generated instruction text and related test expectations to
list only:
  - `codex.tmpDir`
  - `codex.tool(name, args?)`


#### [git stack](https://github.com/magus/git-stack-cli)
-  `1` https://github.com/openai/codex/pull/12300
- 👉 `2` https://github.com/openai/codex/pull/12275
-  `3` https://github.com/openai/codex/pull/12205
-  `4` https://github.com/openai/codex/pull/12185
-  `5` https://github.com/openai/codex/pull/10673
2026-02-20 11:20:45 -08:00
viyatb-oai
28c0089060 fix(network-proxy): add unix socket allow-all and update seatbelt rules (#11368)
## Summary
Adds support for a Unix socket escape hatch so we can bypass socket
allowlisting when explicitly enabled.

## Description
* added a new flag, `network.dangerously_allow_all_unix_sockets` as an
explicit escape hatch
* In codex-network-proxy, enabling that flag now allows any absolute
Unix socket path from x-unix-socket instead of requiring each path to be
explicitly allowlisted. Relative paths are still rejected.
* updated the macOS seatbelt path in core so it enforces the same Unix
socket behavior:
  * allowlisted sockets generate explicit network* subpath rules
  * allow-all generates a broad network* (subpath "/") rule

---------

Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
2026-02-20 10:56:57 -08:00
Curtis 'Fjord' Hawthorne
73fd939296 js_repl: block wrapped payload prefixes in grammar (#12300)
## Summary

Tighten the `js_repl` freeform Lark grammar to block the most common
malformed payload wrappers before they reach runtime validation.

## What Changed

- Replaced the overly permissive `js_repl` freeform grammar (`start:
/[\s\S]*/`) with a structured grammar that still supports:
  - plain JS source
  - optional first-line `// codex-js-repl:` pragma followed by JS source
- Added grammar-level filtering for common bad payload shapes by
rejecting inputs whose first significant token starts with:
  - `{` (JSON object wrapper like `{"code":"..."}`)
  - `"` (quoted code string)
  - `` ``` `` (markdown code fences)
- Implemented the grammar without regex lookahead/lookbehind because the
API-side Lark regex engine does not support look-around.
- Added a unit test to validate the grammar shape and guard against
reintroducing unsupported lookaround.

## Why

`js_repl` is a freeform tool, but the model sometimes emits wrapped
payloads (JSON, quoted strings, markdown fences) instead of raw
JavaScript. We already reject those at runtime, but this change moves
the constraint into the tool grammar so the model is less likely to
generate invalid tool-call payloads in the first place.

## Testing

- `cargo test -p codex-core
js_repl_freeform_grammar_blocks_common_non_js_prefixes`
- `cargo test -p codex-core parse_freeform_args_rejects_`

## Notes

- This intentionally over-blocks a few uncommon valid JS starts (for
example top-level `{ ... }` blocks or top-level quoted directives like
`"use strict";`) in exchange for preventing the common wrapped-payload
mistakes.



#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12300
-  `2` https://github.com/openai/codex/pull/12275
-  `3` https://github.com/openai/codex/pull/12205
-  `4` https://github.com/openai/codex/pull/12185
-  `5` https://github.com/openai/codex/pull/10673
2026-02-20 10:47:07 -08:00
viyatb-oai
e8afaed502 Refactor network approvals to host/protocol/port scope (#12140)
## Summary
Simplify network approvals by removing per-attempt proxy correlation and
moving to session-level approval dedupe keyed by (host, protocol, port).
Instead of encoding attempt IDs into proxy credentials/URLs, we now
treat approvals as a destination policy decision.

- Concurrent calls to the same destination share one approval prompt.
- Different destinations (or same host on different ports) get separate
prompts.
- Allow once approves the current queued request group only.
- Allow for session caches that (host, protocol, port) and auto-allows
future matching requests.
- Never policy continues to deny without prompting.

Example:
- 3 calls: 
  - a.com (line 443)
  - b.com (line 443)
  - a.com (line 443)
=> 2 prompts total (a, b), second a waits on the first decision.
- a.com:80 is treated separately from a.com line 443

## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core tools::network_approval::tests`
- `cargo test -p codex-core` (unit tests pass; existing
integration-suite failures remain in this environment)
2026-02-20 10:39:55 -08:00
pakrym-oai
86803ca9bf Reuse connection between turns (#12294)
Add a pool of one to the model client to reuse connections across turns.
2026-02-20 10:09:46 -08:00
jif-oai
4d60c803ba feat: cleaner TUI for sub-agents (#12327)
<img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
/>
2026-02-20 15:26:33 +00:00
colby-oai
2036a5f5e0 Add MCP server context to otel tool_result logs (#12267)
Summary
- capture the origin for each configured MCP server and expose it via
the connection manager
- plumb MCP server name/origin into tool logging and emit
codex.tool_result events with those fields
- add unit coverage for origin parsing and extend OTEL tests to assert
empty MCP fields for non-MCP tools
- currently not logging full urls or url paths to prevent logging
potentially sensitive data

Testing
- Not run (not requested)
2026-02-20 10:26:19 -05:00
jif-oai
ede561b5d1 disable collab for phase 2 (#12326) 2026-02-20 14:51:17 +00:00