**Why We Did This**
- The goal is to reduce MCP tool context pollution by not exposing the
full MCP tool list up front
- It forces an explicit discovery step (`search_tool_bm25`) so the model
narrows tool scope before making MCP calls, which helps relevance and
lowers prompt/tool clutter.
**What It Changed**
- Added a new experimental feature flag `search_tool` in
`core/src/features.rs:90` and `core/src/features.rs:430`.
- Added config/schema support for that flag in
`core/config.schema.json:214` and `core/config.schema.json:1235`.
- Added BM25 dependency (`bm25`) in `Cargo.toml:129` and
`core/Cargo.toml:23`.
- Added new tool handler `search_tool_bm25` in
`core/src/tools/handlers/search_tool_bm25.rs:18`.
- Registered the handler and tool spec in
`core/src/tools/handlers/mod.rs:11` and `core/src/tools/spec.rs:780` and
`core/src/tools/spec.rs:1344`.
- Extended `ToolsConfig` to carry `search_tool` enablement in
`core/src/tools/spec.rs:32` and `core/src/tools/spec.rs:56`.
- Injected dedicated developer instructions for tool-discovery workflow
in `core/src/codex.rs:483` and `core/src/codex.rs:1976`, using
`core/templates/search_tool/developer_instructions.md:1`.
- Added session state to store one-shot selected MCP tools in
`core/src/state/session.rs:27` and `core/src/state/session.rs:131`.
- Added filtering so when feature is enabled, only selected MCP tools
are exposed on the next request (then consumed) in
`core/src/codex.rs:3800` and `core/src/codex.rs:3843`.
- Added E2E suite coverage for
enablement/instructions/hide-until-search/one-turn-selection in
`core/tests/suite/search_tool.rs:72`,
`core/tests/suite/search_tool.rs:109`,
`core/tests/suite/search_tool.rs:147`, and
`core/tests/suite/search_tool.rs:218`.
- Refactored test helper utilities to support config-driven tool
collection in `core/tests/suite/tools.rs:281`.
**Net Behavioral Effect**
- With `search_tool` **off**: existing MCP behavior (tools exposed
normally).
- With `search_tool` **on**: MCP tools start hidden, model must call
`search_tool_bm25`, and only returned `selected_tools` are available for
the next model call.
With this PR we do not close the unified exec processes (i.e. background
terminals) at the end of a turn unless:
* The user interrupt the turn
* The user decide to clean the processes through `app-server` or
`/clean`
I made sure that `codex exec` correctly kill all the processes
Summary
- refactor user shell command execution into a shared helper and add
modes for standalone vs active-turn execution
- run user shell commands asynchronously when a turn is already active
so they don’t replace or abort the current turn
- extend the tests to cover the new behavior and add the generated Codex
environment manifest
Testing
- Not run (not requested)
### What
add wiring for `phase` field on `ResponseItem::Message` to lay
groundwork for differentiating model preambles and final messages.
currently optional.
follows pattern in #9698.
updated schemas with `just write-app-server-schema` so we can see type
changes.
### Tests
Updated existing tests for SSE parsing and hydrating from history
### Overview
Currently calling `thread/resume` will always bump the thread's
`updated_at` timestamp. This PR makes it the `updated_at` timestamp
changes only if a turn is triggered.
### Additonal context
What we typically do on resuming a thread is **always** writing “initial
context” to the rollout file immediately. This initial context includes:
- Developer instructions derived from sandbox/approval policy + cwd
- Optional developer instructions (if provided)
- Optional collaboration-mode instructions
- Optional user instructions (if provided)
- Environment context (cwd, shell, etc.)
This PR defers writing the “initial context” to the rollout file until
the first `turn/start`, so we don't inadvertently bump the thread's
`updated_at` timestamp until a turn is actually triggered.
This works even though both `thread/resume` and `turn/start` accept
overrides (such as `model`, `cwd`, etc.) because the initial context is
seeded from the effective `TurnContext` in memory, computed at
`turn/start` time, after both sets of overrides have been applied.
**NOTE**: This is a very short-lived solution until we introduce sqlite.
Then we can remove this.
## What
Record a model-visible `<turn_aborted>` marker in history when a turn is
interrupted, and treat it as a session prefix.
## Why
When a turn is interrupted, Codex emits `TurnAborted` but previously did
not persist anything model-visible in the conversation history. On the
next user turn, the model can’t tell the previous work was aborted and
may resume/repeat earlier actions (including duplicated side effects
like re-opening PRs).
Fixes: https://github.com/openai/codex/issues/9042
## How
On `TurnAbortReason::Interrupted`, append a hidden user message
containing a `<turn_aborted>…</turn_aborted>` marker and flush.
Treat `<turn_aborted>` like `<environment_context>` for session-prefix
filtering.
Add a regression test to ensure follow-up turns don’t repeat side
effects from an aborted turn.
## Testing
`just fmt`
`just fix -p codex-core`
`cargo test -p codex-core -- --test-threads=1`
`cargo test --all-features -- --test-threads=1`
---------
Co-authored-by: Skylar Graika <sgraika127@gmail.com>
Co-authored-by: jif-oai <jif@openai.com>
Co-authored-by: Eric Traut <etraut@openai.com>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This PR moves `ModelsFamily` to `openai_models`. It also propagates
`ModelsManager` to session services and use it to drive model family. We
also make `derive_default_model_family` private because it's a step
towards what we want: one place that gives model configuration.
This is a second step at having one source of truth for models
information and config: `ModelsManager`.
Next steps would be to remove `ModelsFamily` from config. That's massive
because it's being used in 41 occasions mostly pre launching `codex`.
Also, we need to make `find_family_for_model` private. It's also big
because it's being used in 21 occasions ~ all tests.
In this PR, I am exploring migrating task kind to an invocation of
Codex. The main reason would be getting rid off multiple
`ConversationHistory` state and streamlining our context/history
management.
This approach depends on opening a channel between the sub-codex and
codex. This channel is responsible for forwarding `interactive`
(`approvals`) and `non-interactive` events. The `task` is responsible
for handling those events.
This opens the door for implementing `codex as a tool`, replacing
`compact` and `review`, and potentially subagents.
One consideration is this code is very similar to `app-server` specially
in the approval part. If in the future we wanted an interactive
`sub-codex` we should consider using `codex-mcp`
feature: Add "!cmd" user shell execution
This change lets users run local shell commands directly from the TUI by
prefixing their input with ! (e.g. !ls). Output is truncated to keep the
exec cell usable, and Ctrl-C cleanly
interrupts long-running commands (e.g. !sleep 10000).
**Summary of changes**
- Route Op::RunUserShellCommand through a dedicated UserShellCommandTask
(core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs.
- Reuse the existing tool router: the task constructs a ToolCall for the
local_shell tool and relies on ShellHandler, so no manual MCP tool
lookup is required.
- Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the
TUI can show command metadata, live output, and exit status.
**End-to-end flow**
**TUI handling**
1. ChatWidget::submit_user_message (TUI) intercepts messages starting
with !.
2. Non-empty commands dispatch Op::RunUserShellCommand { command };
empty commands surface a help hint.
3. No UserInput items are created, so nothing is enqueued for the model.
**Core submission loop**
4. The submission loop routes the op to handlers::run_user_shell_command
(core/src/codex.rs).
5. A fresh TurnContext is created and Session::spawn_user_shell_command
enqueues UserShellCommandTask.
**Task execution**
6. UserShellCommandTask::run emits TaskStartedEvent, formats the
command, and prepares a ToolCall targeting local_shell.
7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler.
**Shell tool runtime**
8. ShellHandler::run_exec_like launches the process via the unified exec
runtime, honoring sandbox and shell policies, and emits
ExecCommandBegin/End.
9. Stdout/stderr are captured for the UI, but the task does not turn the
resulting ToolOutput into a model response.
**Completion**
10. After ExecCommandEnd, the task finishes without an assistant
message; the session marks it complete and the exec cell displays the
final output.
**Conversation context**
- The command and its output never enter the conversation history or the
model prompt; the flow is local-only.
- Only exec/task events are emitted for UI rendering.
**Demo video**
https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
Today `sub_id` is an ID of a single incoming Codex Op submition. We then
associate all events triggered by this operation using the same
`sub_id`.
At the same time we are also creating a TurnContext per submission and
we'd like to start associating some events (item added/item completed)
with an entire turn instead of just the operation that started it.
Using turn context when sending events give us flexibility to change
notification scheme.
Adds a new ItemStarted event and delivers UserMessage as the first item
type (more to come).
Renames `InputItem` to `UserInput` considering we're using the `Item`
suffix for actual items.
Extracting tasks in a module and start abstraction behind a Trait (more
to come on this but each task will be tackled in a dedicated PR)
The goal was to drop the ActiveTask and to have a (potentially) set of
tasks during each turn