Continuation of breaking up this PR
https://github.com/openai/codex/pull/9116
## Summary
- Thread user text element ranges through TUI/TUI2 input, submission,
queueing, and history so placeholders survive resume/edit flows.
- Preserve local image attachments alongside text elements and rehydrate
placeholders when restoring drafts.
- Keep model-facing content shapes clean by attaching UI metadata only
to user input/events (no API content changes).
## Key Changes
- TUI/TUI2 composer now captures text element ranges, trims them with
text edits, and restores them when submission is suppressed.
- User history cells render styled spans for text elements and keep
local image paths for future rehydration.
- Initial chat widget bootstraps accept empty `initial_text_elements` to
keep initialization uniform.
- Protocol/core helpers updated to tolerate the new InputText field
shape without changing payloads sent to the API.
## Summary
This PR consolidates base_instructions onto SessionMeta /
SessionConfiguration, so we ensure `base_instructions` is set once per
session and should be (mostly) immutable, unless:
- overridden by config on resume / fork
- sub-agent tasks, like review or collab
In a future PR, we should convert all references to `base_instructions`
to consistently used the typed struct, so it's less likely that we put
other strings there. See #9423. However, this PR is already quite
complex, so I'm deferring that to a follow-up.
## Testing
- [x] Added a resume test to assert that instructions are preserved. In
particular, `resume_switches_models_preserves_base_instructions` fails
against main.
Existing test coverage thats assert base instructions are preserved
across multiple requests in a session:
- Manual compact keeps baseline instructions:
core/tests/suite/compact.rs:199
- Auto-compact keeps baseline instructions:
core/tests/suite/compact.rs:1142
- Prompt caching reuses the same instructions across two requests:
core/tests/suite/prompt_caching.rs:150 and
core/tests/suite/prompt_caching.rs:157
- Prompt caching with explicit expected string across two requests:
core/tests/suite/prompt_caching.rs:213 and
core/tests/suite/prompt_caching.rs:222
- Resume with model switch keeps original instructions:
core/tests/suite/resume.rs:136
- Compact/resume/fork uses request 0 instructions for later expected
payloads: core/tests/suite/compact_resume_fork.rs:215
### Description
- Remove the now-unused `instructions` field from the session metadata
to simplify SessionMeta and stop propagating transient instruction text
through the rollout recorder API. This was only saving
user_instructions, and was never being read.
- Stop passing user instructions into the rollout writer at session
creation so the rollout header only contains canonical session metadata.
### Testing
- Ran `just fmt` which completed successfully.
- Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
-p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
codex-tui2` which completed (Clippy fixes applied) as part of
verification.
- Ran `cargo test -p codex-protocol` which passed (28 tests).
- Ran `cargo test -p codex-core` which showed failures in a small set of
tests (not caused by the protocol type change directly):
`default_client::tests::test_create_client_sets_default_headers`,
several `models_manager::manager::tests::refresh_available_models_*`,
and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
tests failed in this CI run).
- Ran `cargo test -p codex-app-server` which reported several failing
integration tests (including
`suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
`suite::output_schema::send_user_turn_*`, and
`suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
- `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
attempted but aborted due to disk space exhaustion (`No space left on
device`).
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
## Summary
We have a variety of things we refer to as instructions in the code
base: our current canonical terms are:
- base instructions (raw string)
- developer instructions (has a type in protocol)
- user instructions
We also have `instructions` floating around in various places. We should
standardize on the above, and start using types to prevent them from
ending up in the wrong place. There will be additional PRs, but I'm
going to keep these small so we can easily follow them!
## Testing
- [x] Tests pass, this is purely a file move
- Merge `model` and `reasoning_effort` under collaboration modes.
- Add additional instructions for custom collaboration mode
- Default to Custom to not change behavior
Summary:
- Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
for forked sessions.
- Surface forked_from in /status for tui + tui2 and add snapshots.
The second part of breaking up PR
https://github.com/openai/codex/pull/9116
Summary:
- Add `TextElement` / `ByteRange` to protocol user inputs and user
message events with defaults.
- Thread `text_elements` through app-server v1/v2 request handling and
history rebuild.
- Preserve UI metadata only in user input/events (not `ContentItem`)
while keeping local image attachments in user events for rehydration.
Details:
- Protocol: `UserInput::Text` carries `text_elements`;
`UserMessageEvent` carries `text_elements` + `local_images`.
Serialization includes empty vectors for backward compatibility.
- app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in
camelCase with conversions; v2 uses its own camelCase wrapper.
- app-server: v1/v2 input mapping includes `text_elements`; thread
history rebuilds include them.
- Core: user event emission preserves UI metadata while model history
stays clean; history replay round-trips the metadata.
## Summary
- When a user accepts an MCP elicitation request, send `content:
Some(json!({}))` instead of `None`
- MCP servers that use elicitation expect content to be present when
action is Accept
- This matches the expected behavior shown in tests at
`exec-server/tests/common/lib.rs:171`
## Root Cause
In `codex-rs/core/src/codex.rs`, the `resolve_elicitation` function
always sent `content: None`:
```rust
let response = ElicitationResponse {
action,
content: None, // Always None, even for Accept
};
```
## Fix
Send an empty object when accepting:
```rust
let content = match action {
ElicitationAction::Accept => Some(serde_json::json!({})),
ElicitationAction::Decline | ElicitationAction::Cancel => None,
};
```
## Test plan
- [x] Code compiles with `cargo check -p codex-core`
- [x] Formatted with `just fmt`
- [ ] Integration test `accept_elicitation_for_prompt_rule` (requires
MCP server binary)
Fixes#9053
moving `web_search` rollout serverside, so need a way to explicitly
disable search + signal eligibility from the client.
- Add `x‑oai‑web‑search‑eligible` header that signifies whether the
request can have web search.
- Only attach the `web_search` tool when the resolved `WebSearchMode` is
`Live` or `Cached`.
We’re introducing a new SKILL.toml to hold skill metadata so Codex can
deliver a richer Skills experience.
Initial focus is the interface block:
```
[interface]
display_name = "Optional user-facing name"
short_description = "Optional user-facing description"
icon_small = "./assets/small-400px.png"
icon_large = "./assets/large-logo.svg"
brand_color = "#3B82F6"
default_prompt = "Optional surrounding prompt to use the skill with"
```
All fields are exposed via the app server API.
display_name and short_description are consumed by the TUI.
### What
Add `WebSearchMode` enum (disabled, cached live, defaults to cached) to
config + V2 protocol. This enum takes precedence over legacy flags:
`web_search_cached`, `web_search_request`, and `tools.web_search`.
Keep `--search` as live.
### Tests
Added tests
Adding a prompt for collab tools. This is only for internal use and the
prompt won't be gated for now as it is not stable yet.
The goal of this PR is to provide the tool required to iterate on the
prompt
Emit the following events around the collab tools. On the `app-server`
this will be under `item/started` and `item/completed`
```
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
/// beginning.
pub prompt: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the newly spawned agent, if it was created.
pub new_thread_id: Option<ThreadId>,
/// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
/// beginning.
pub prompt: String,
/// Last known status of the new agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
/// leaking at the beginning.
pub prompt: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
/// leaking at the beginning.
pub prompt: String,
/// Last known status of the receiver agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingBeginEvent {
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// ID of the waiting call.
pub call_id: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingEndEvent {
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// ID of the waiting call.
pub call_id: String,
/// Last known status of the receiver agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Last known status of the receiver agent reported to the sender agent before
/// the close.
pub status: AgentStatus,
}
```
Instead of having a hard-coded default review model, use the current
model for running `/review` unless one is specified in the config.
Also inherit current reasoning effort
## Before
When we detect an `InvalidImageRequest`, we replace the image by a
placeholder and keep going
## Now
In such `InvalidImageRequest`, we check if the image is due to a user
message or a tool call output. For tool call output we still replace it
with a placeholder to avoid breaking the agentic loop bu tif this is
because of a user message, we send an error to the user
Clean all shell snapshot files corresponding to sessions that have not
been updated in 7 days
Those files should never leak. The only known cases were it can leak are
during non graceful interrupt of the process (`kill -9, `panic`, OS
crash, ...)
When an invalid config.toml key or value is detected, the CLI currently
just quits. This leaves the VSCE in a dead state.
This PR changes the behavior to not quit and bubble up the config error
to users to make it actionable. It also surfaces errors related to
"rules" parsing.
This allows us to surface these errors to users in the VSCE, like this:
<img width="342" height="129" alt="Screenshot 2026-01-13 at 4 29 22 PM"
src="https://github.com/user-attachments/assets/a79ffbe7-7604-400c-a304-c5165b6eebc4"
/>
<img width="346" height="244" alt="Screenshot 2026-01-13 at 4 45 06 PM"
src="https://github.com/user-attachments/assets/de874f7c-16a2-4a95-8c6d-15f10482e67b"
/>
Have only the following Methods:
- `list_models`: getting current available models
- `try_list_models`: sync version no refresh for tui use
- `get_default_model`: get the default model (should be tightened to
core and received on session configuration)
- `get_model_info`: get `ModelInfo` for a specific model (should be
tightened to core but used in tests)
- `refresh_if_new_etag`: trigger refresh on different etags
Also move the cache to its own struct
Enterprises want to restrict the MCP servers their users can use.
Admins can now specify an allowlist of MCPs in `requirements.toml`. The
MCP servers are matched on both Name and Transport (local path or HTTP
URL) -- both must match to allow the MCP server. This prevents
circumventing the allowlist by renaming MCP servers in user config. (It
is still possible to replace the local path e.g. rewrite say
`/usr/local/github-mcp` with a nefarious MCP. We could allow hash
pinning in the future, but that would break updates. I also think this
represents a broader, out-of-scope problem.)
We introduce a new field to Constrained: "normalizer". In general, it is
a fn(T) -> T and applies when `Constrained<T>.set()` is called. In this
particular case, it disables MCP servers which do not match the
allowlist. An alternative solution would remove this and instead throw a
ConstraintError. That would stop Codex launching if any MCP server was
configured which didn't match. I think this is bad.
We currently reuse the enabled flag on MCP servers to disable them, but
don't propagate any information about why they are disabled. I'd like to
add that in a follow up PR, possibly by switching out enabled with an
enum.
In action:
```
# MCP server config has two MCPs. We are going to allowlist one of them.
➜ codex git:(gt/restrict-mcps) ✗ cat ~/.codex/config.toml | grep mcp_servers -A1
[mcp_servers.hello_world]
command = "hello-world-mcp"
--
[mcp_servers.docs]
command = "docs-mcp"
# Restrict the MCPs to the hello_world MCP.
➜ codex git:(gt/restrict-mcps) ✗ defaults read com.openai.codex requirements_toml_base64 | base64 -d
[mcp_server_allowlist.hello_world]
command = "hello-world-mcp"
# List the MCPs, observe hello_world is enabled and docs is disabled.
➜ codex git:(gt/restrict-mcps) ✗ just codex mcp list
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s
Running `target/debug/codex mcp list`
Name Command Args Env Cwd Status Auth
docs docs-mcp - - - disabled Unsupported
hello_world hello-world-mcp - - - enabled Unsupported
# Remove the restrictions.
➜ codex git:(gt/restrict-mcps) ✗ defaults delete com.openai.codex requirements_toml_base64
# Observe both MCPs are enabled.
➜ codex git:(gt/restrict-mcps) ✗ just codex mcp list
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s
Running `target/debug/codex mcp list`
Name Command Args Env Cwd Status Auth
docs docs-mcp - - - enabled Unsupported
hello_world hello-world-mcp - - - enabled Unsupported
# A new requirements that updates the command to one that does not match.
➜ codex git:(gt/restrict-mcps) ✗ cat ~/requirements.toml
[mcp_server_allowlist.hello_world]
command = "hello-world-mcp-v2"
# Use those requirements.
➜ codex git:(gt/restrict-mcps) ✗ defaults write com.openai.codex requirements_toml_base64 "$(base64 -i /Users/gt/requirements.toml)"
# Observe both MCPs are disabled.
➜ codex git:(gt/restrict-mcps) ✗ just codex mcp list
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.75s
Running `target/debug/codex mcp list`
Name Command Args Env Cwd Status Auth
docs docs-mcp - - - disabled Unsupported
hello_world hello-world-mcp - - - disabled Unsupported
```
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.
Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
### Summary
* Added `mcpServer/refresh` command to inform app servers and active
threads to refresh mcpServer on next turn event.
* Added `pending_mcp_server_refresh_config` to codex core so that if the
value is populated, we reinitialize the mcp server manager on the thread
level.
* The config is updated on `mcpServer/refresh` command which we iterate
through threads and provide with the latest config value after last
write.
Add implementation for the `wait` tool.
For this we consider all status different from `PendingInit` and
`Running` as terminal. The `wait` tool call will return either after a
given timeout or when the tool reaches a non-terminal status.
A few points to note:
* The usage of a channel is preferred to prevent some races (just
looping on `get_status()` could "miss" a terminal status)
* The order of operations is very important, we need to first subscribe
and then check the last known status to prevent race conditions
* If the channel gets dropped, we return an error on purpose
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
Historically we started with a CodexAuth that knew how to refresh it's
own tokens and then added AuthManager that did a different kind of
refresh (re-reading from disk).
I don't think it makes sense for both `CodexAuth` and `AuthManager` to
be mutable and contain behaviors.
Move all refresh logic into `AuthManager` and keep `CodexAuth` as a data
object.
Add metrics capabilities to Codex. The `README.md` is up to date.
This will not be merged with the metrics before this PR of course:
https://github.com/openai/codex/pull/8350