Added an agent control plane that lets sessions spawn or message other
conversations via `AgentControl`.
`AgentBus` (core/src/agent/bus.rs) keeps track of the last known status
of a conversation.
ConversationManager now holds shared state behind an Arc so AgentControl
keeps only a weak back-reference, the goal is just to avoid explicit
cycle reference.
Follow-ups:
* Build a small tool in the TUI to be able to see every agent and send
manual message to each of them
* Handle approval requests in this TUI
* Add tools to spawn/communicate between agents (see related design)
* Define agent types
Adds an optional `justification` parameter to the `prefix_rule()`
execpolicy DSL so policy authors can attach human-readable rationale to
a rule. That justification is propagated through parsing/matching and
can be surfaced to the model (or approval UI) when a command is blocked
or requires approval.
When a command is rejected (or gated behind approval) due to policy, a
generic message makes it hard for the model/user to understand what went
wrong and what to do instead. Allowing policy authors to supply a short
justification improves debuggability and helps guide the model toward
compliant alternatives.
Example:
```python
prefix_rule(
pattern = ["git", "push"],
decision = "forbidden",
justification = "pushing is blocked in this repo",
)
```
If Codex tried to run `git push origin main`, now the failure would
include:
```
`git push origin main` rejected: pushing is blocked in this repo
```
whereas previously, all it was told was:
```
execpolicy forbids this command
```
This change improves the skills render section
- Separate the skills list from usage rules with clear subheadings
- Define skill more clearly upfront
- Remove confusing trigger/discovery wording and make reference-following guidance more actionable
What changed
- Added `outputSchema` support to the app-server APIs, mirroring `codex
exec --output-schema` behavior.
- V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
assistant message for that turn.
- V2 `turn/start` now accepts `outputSchema` and constrains the final
assistant message for that turn (explicitly per-turn only).
Core behavior
- `Op::UserTurn` already supported `final_output_json_schema`; now V1
`sendUserTurn` forwards `outputSchema` into that field.
- `Op::UserInput` now carries `final_output_json_schema` for per-turn
settings updates; core maps it into
`SessionSettingsUpdate.final_output_json_schema` so it applies to the
created turn context.
- V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
(it’s applied only for the current turn). Other overrides
(cwd/model/etc) keep their existing persistent behavior.
API / docs
- `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
`outputSchema`).
- `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
- `codex-rs/app-server/README.md`: document `outputSchema` for
`turn/start` and clarify it applies only to the current turn.
- `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
`sendUserTurn` and v2 `turn/start`.
Tests added/updated
- New app-server integration tests asserting `outputSchema` is forwarded
into outbound `/responses` requests as `text.format`:
- `codex-rs/app-server/tests/suite/output_schema.rs`
- `codex-rs/app-server/tests/suite/v2/output_schema.rs`
- Added per-turn semantics tests (schema does not leak to the next
turn):
- `send_user_turn_output_schema_is_per_turn_v1`
- `turn_start_output_schema_is_per_turn_v2`
- Added protocol wire-compat tests for the merged op:
- serialize omits `final_output_json_schema` when `None`
- deserialize works when field is missing
- serialize includes `final_output_json_schema` when `Some(schema)`
Call site updates (high level)
- Updated all `Op::UserInput { .. }` constructions to include
`final_output_json_schema`:
- `codex-rs/app-server/src/codex_message_processor.rs`
- `codex-rs/core/src/codex_delegate.rs`
- `codex-rs/mcp-server/src/codex_tool_runner.rs`
- `codex-rs/tui/src/chatwidget.rs`
- `codex-rs/tui2/src/chatwidget.rs`
- plus impacted core tests.
Validation
- `just fmt`
- `cargo test -p codex-core`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
- `cargo test -p codex-tui`
- `cargo test -p codex-tui2`
- `cargo test -p codex-protocol`
- `cargo clippy --all-features --tests --profile dev --fix -- -D
warnings`
Load managed requirements from MDM key `requirements_toml_base64`.
Tested on my Mac (using `defaults` to set the preference, though this
would be set by MDM in production):
```
➜ codex git:(gt/mdm-requirements) defaults read com.openai.codex requirements_toml_base64 | base64 -d
allowed_approval_policies = ["on-request"]
➜ codex git:(gt/mdm-requirements) just c --yolo
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.26s
Running `target/debug/codex --yolo`
Error loading configuration: value `Never` is not in the allowed set [OnRequest]
error: Recipe `codex` failed on line 11 with exit code 1
➜ codex git:(gt/mdm-requirements) defaults delete com.openai.codex requirements_toml_base64
➜ codex git:(gt/mdm-requirements) just c --yolo
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.24s
Running `target/debug/codex --yolo`
╭──────────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0) │
│ │
│ model: codex-auto-balanced medium /model to change │
│ directory: ~/code/codex/codex-rs │
╰──────────────────────────────────────────────────────────╯
Tip: Start a fresh idea with /new; the previous session stays in history.
```
The Responses API requires that all tool names conform to
'^[a-zA-Z0-9_-]+$'. This PR replaces all non-conforming characters with
`_` to ensure that they can be used.
Fixes#8174
Fixes /review base-branch prompt resolution to use the session/turn cwd
(respecting runtime cwd overrides) so merge-base/diff guidance is
computed from the intended repo; adds a regression test for cwd
overrides; tested with cargo test -p codex-core --test all
review_uses_overridden_cwd_for_base_branch_merge_base.
last token count in context manager is initialized to 0. Gets populated
only on events from server.
This PR populates it on resume so we can decide if we need to compact or
not.
### What
Builds on #8293.
Add `additional_details`, which contains the upstream error message, to
relevant structures used to pass along retryable `StreamError`s.
Uses the new TUI status indicator's `details` field (shows under the
status header) to display the `additional_details` error to the user on
retryable `Reconnecting...` errors. This adds clarity for users for
retryable errors.
Will make corresponding change to VSCode extension to show
`additional_details` as expandable from the `Reconnecting...` cell.
Examples:
<img width="1012" height="326" alt="image"
src="https://github.com/user-attachments/assets/f35e7e6a-8f5e-4a2f-a764-358101776996"
/>
<img width="1526" height="358" alt="image"
src="https://github.com/user-attachments/assets/0029cbc0-f062-4233-8650-cc216c7808f0"
/>
This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.
As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.
See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.
Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.
My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496
This isn't very useful parameter.
logic:
```
if model puts `**` in their reasoning, trim it and visualize the header.
if couldn't trim: don't render
if model doesn't support: don't render
```
We can simplify to:
```
if could trim, visualize header.
if not, don't render
```
### Motivation
- Persist richer per-turn configuration in rollouts so resumed/forked
sessions and tooling can reason about the exact instruction inputs and
output constraints used for a turn.
### Description
- Extend `TurnContextItem` to include optional `base_instructions`,
`user_instructions`, and `developer_instructions`.
- Record the optional `final_output_json_schema` associated with a turn.
- Add an optional `truncation_policy` to `TurnContextItem` and populate
it when writing turn-context rollout items.
- Introduce a protocol-level `TruncationPolicy` representation and
convert from core truncation policy when recording.
### Testing
- `cargo test -p codex-protocol` (pass)
This adds logic to load `/etc/codex/config.toml` and associate it with
`ConfigLayerSource::System` on UNIX. I refactored the code so it shares
logic with the creation of the `ConfigLayerSource::User` layer.
https://github.com/openai/codex/pull/8354 added support for in-repo
`.config/` files, so this PR updates the logic for loading `*.rules`
files to load `*.rules` files from all relevant layers. The main change
to the business logic is `load_exec_policy()` in
`codex-rs/core/src/exec_policy.rs`.
Note this adds a `config_folder()` method to `ConfigLayerSource` that
returns `Option<AbsolutePathBuf>` so that it is straightforward to
iterate over the sources and get the associated config folder, if any.
This is necessary so that `$CODEX_HOME/skills` and `$CODEX_HOME/rules`
still get loaded even if `$CODEX_HOME/config.toml` does not exist. See
#8453.
For now, it is possible to omit this layer when creating a dummy
`ConfigLayerStack` in a test. We can revisit that later, if it turns out
to be the right thing to do.
The bash command parser in exec_policy was failing to parse commands
with concatenated flag-value patterns like `-g"*.py"` (no space between
flag and quoted value). This caused policy rules like
`prefix_rule(pattern=["rg"])` to not match commands such as `rg -n "foo"
-g"*.py"`.
When tree-sitter-bash parses `-g"*.py"`, it creates a "concatenation"
node containing a word (`-g`) and a string (`"*.py"`). The parser
previously rejected any node type not in the ALLOWED_KINDS list, causing
the entire command parsing to fail and fall back to matching against the
wrapped `bash -lc` command instead of the inner command.
This change:
- Adds "concatenation" to ALLOWED_KINDS in
try_parse_word_only_commands_sequence
- Adds handling for concatenation nodes in parse_plain_command_from_node
that recursively extracts and joins word/string/raw_string children
- Adds test cases for concatenated flag patterns with double and single
quotes
Fixes#8394
- allow configuring `project_root_markers` in `config.toml`
(user/system/MDM) to control project discovery beyond `.git`
- honor the markers after merging pre-project layers; default to
`[".git"]` when unset and skip ancestor walk when set to an empty array
- document the option and add coverage for alternate markers in config
loader tests
- We now support `.codex/config.toml` in repo (from `cwd` up to the
first `.git` found, if any) as layers in `ConfigLayerStack`. A new
`ConfigLayerSource::Project` variant was added to support this.
- In doing this work, I realized that we were resolving relative paths
in `config.toml` after merging everything into one `toml::Value`, which
is wrong: paths should be relativized with respect to the folder
containing the `config.toml` that was deserialized. This PR introduces a
deserialize/re-serialize strategy to account for this in
`resolve_config_paths()`. (This is why `Serialize` is added to so many
types as part of this PR.)
- Added tests to verify this new behavior.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8354).
* #8359
* __->__ #8354
## Summary
Adds a FeatureFlag to enforce UTF8 encoding in powershell, particularly
Windows Powershell v5. This should help address issues like #7290.
Notably, this PR does not include the ability to parse `apply_patch`
invocations within UTF8 shell commands (calls to the freeform tool
should not be impacted). I am leaving this out of scope for now. We
should address before this feature becomes Stable, but those cases are
not the default behavior at this time so we're okay for experimentation
phase. We should continue cleaning up the `apply_patch::invocation`
logic and then can handle it more cleanly.
## Testing
- [x] Adds additional testing
### Summary
With codesigning on Mac, Windows and Linux, we should be able to safely
remove `features.rmcp_client` and `use_experimental_use_rmcp_client`
check from the codebase now.
## TUI2: Normalize Mouse Scroll Input Across Terminals (Wheel +
Trackpad)
This changes TUI2 scrolling to a stream-based model that normalizes
terminal scroll event density into consistent wheel behavior (default:
~3 transcript lines per physical wheel notch) while keeping trackpad
input higher fidelity via fractional accumulation.
Primary code: `codex-rs/tui2/src/tui/scrolling/mouse.rs`
Doc of record (model + probe-derived data):
`codex-rs/tui2/docs/scroll_input_model.md`
### Why
Terminals encode both mouse wheels and trackpads as discrete scroll
up/down events with direction but no magnitude, and they vary widely in
how many raw events they emit per physical wheel notch (commonly 1, 3,
or 9+). Timing alone doesn’t reliably distinguish wheel vs trackpad, so
cadence-based heuristics are unstable across terminals/hardware.
This PR treats scroll input as short *streams* separated by silence or
direction flips, normalizes raw event density into tick-equivalents,
coalesces redraws for dense streams, and exposes explicit config
overrides.
### What Changed
#### Scroll Model (TUI2)
- Stream detection
- Start a stream on the first scroll event.
- End a stream on an idle gap (`STREAM_GAP_MS`) or a direction flip.
- Normalization
- Convert raw events into tick-equivalents using per-terminal
`tui.scroll_events_per_tick`.
- Wheel-like vs trackpad-like behavior
- Wheel-like: fixed “classic” lines per wheel notch; flush immediately
for responsiveness.
- Trackpad-like: fractional accumulation + carry across stream
boundaries; coalesce flushes to ~60Hz to avoid floods and reduce “stop
lag / overshoot”.
- Trackpad divisor is intentionally capped: `min(scroll_events_per_tick,
3)` so terminals with dense wheel ticks (e.g. 9 events per notch) don’t
make trackpads feel artificially slow.
- Auto mode (default)
- Start conservatively as trackpad-like (avoid overshoot).
- Promote to wheel-like if the first tick-worth of events arrives
quickly.
- Fallback for 1-event-per-tick terminals (no tick-completion timing
signal).
#### Trackpad Acceleration
Some terminals produce relatively low vertical event density for
trackpad gestures, which makes large/faster swipes feel sluggish even
when small motions feel correct. To address that, trackpad-like streams
apply a bounded multiplier based on event count:
- `multiplier = clamp(1 + abs(events) / scroll_trackpad_accel_events,
1..scroll_trackpad_accel_max)`
The multiplier is applied to the trackpad stream’s computed line delta
(including carried fractional remainder). Defaults are conservative and
bounded.
#### Config Knobs (TUI2)
All keys live under `[tui]`:
- `scroll_wheel_lines`: lines per physical wheel notch (default: 3).
- `scroll_events_per_tick`: raw vertical scroll events per physical
wheel notch (terminal-specific default; fallback: 3).
- Wheel-like per-event contribution: `scroll_wheel_lines /
scroll_events_per_tick`.
- `scroll_trackpad_lines`: baseline trackpad sensitivity (default: 1).
- Trackpad-like per-event contribution: `scroll_trackpad_lines /
min(scroll_events_per_tick, 3)`.
- `scroll_trackpad_accel_events` / `scroll_trackpad_accel_max`: bounded
trackpad acceleration (defaults: 30 / 3).
- `scroll_mode = auto|wheel|trackpad`: force behavior or use the
heuristic (default: `auto`).
- `scroll_wheel_tick_detect_max_ms`: auto-mode promotion threshold (ms).
- `scroll_wheel_like_max_duration_ms`: auto-mode fallback for
1-event-per-tick terminals (ms).
- `scroll_invert`: invert scroll direction (applies to wheel +
trackpad).
Config docs: `docs/config.md` and field docs in
`codex-rs/core/src/config/types.rs`.
#### App Integration
- The app schedules follow-up ticks to close idle streams (via
`ScrollUpdate::next_tick_in` and `schedule_frame_in`) and finalizes
streams on draw ticks.
- `codex-rs/tui2/src/app.rs`
#### Docs
- Single doc of record describing the model + preserved probe
findings/spec:
- `codex-rs/tui2/docs/scroll_input_model.md`
#### Other (jj-only friendliness)
- `codex-rs/tui2/src/diff_render.rs`: prefer stable cwd-relative paths
when the file is under the cwd even if there’s no `.git`.
### Terminal Defaults
Per-terminal defaults are derived from scroll-probe logs (see doc).
Notable:
- Ghostty currently defaults to `scroll_events_per_tick = 3` even though
logs measured ~9 in one setup. This is a deliberate stopgap; if your
Ghostty build emits ~9 events per wheel notch, set:
```toml
[tui]
scroll_events_per_tick = 9
```
### Testing
- `just fmt`
- `just fix -p codex-core --allow-no-vcs`
- `cargo test -p codex-core --lib` (pass)
- `cargo test -p codex-tui2` (scroll tests pass; remaining failures are
known flaky VT100 color tests in `insert_history`)
### Review Focus
- Stream finalization + frame scheduling in `codex-rs/tui2/src/app.rs`.
- Auto-mode promotion thresholds and the 1-event-per-tick fallback
behavior.
- Trackpad divisor cap (`min(events_per_tick, 3)`) and acceleration
defaults.
- Ghostty default tradeoff (3 vs ~9) and whether we should change it.
`load_config_layers_state()` should load config from a
`.codex/config.toml` in any folder between the `cwd` for a thread and
the project root. Though in order to do that,
`load_config_layers_state()` needs to know what the `cwd` is, so this PR
does the work to thread the `cwd` through for existing callsites.
A notable exception is the `/config` endpoint in app server for which a
`cwd` is not guaranteed to be associated with the query, so the `cwd`
param is `Option<AbsolutePathBuf>` to account for this case.
The logic to make use of the `cwd` will be done in a follow-up PR.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This adds support for `allowed_sandbox_modes` in `requirements.toml` and
provides legacy support for constraining sandbox modes in
`managed_config.toml`. This is converted to `Constrained<SandboxPolicy>`
in `ConfigRequirements` and applied to `Config` such that constraints
are enforced throughout the harness.
Note that, because `managed_config.toml` is deprecated, we do not add
support for the new `external-sandbox` variant recently introduced in
https://github.com/openai/codex/pull/8290. As noted, that variant is not
supported in `config.toml` today, but can be configured programmatically
via app server.
This will make it easier to test for expected errors in unit tests since
we can compare based on the field values rather than the message (which
might change over time). See https://github.com/openai/codex/pull/8298
for an example.
It also ensures more consistency in the way a `ConstraintError` is
constructed.
Fixes#8214 by removing the '--staged' flag from the undo git restore
command. This ensures that while the working tree is reverted to the
snapshot state, the user's staged changes (index) are preserved,
preventing data loss. Also adds a regression test.
We were assembling the skill roots in two different places, and the
admin root was missing in one of them. This change centralizes root
selection into a helper so both paths stay in sync.