rm `remote_models` feature flag.
We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.
Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).
### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.
Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.
### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).
This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)
In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit
For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.
And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.
#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
## Why
`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.
## Net Change
- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.
## Outcome
- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
Summary
- add the new resume_agent collab tool path through core, protocol, and
the app server API, including the resume events
- update the schema/TypeScript definitions plus docs so resume_agent
appears in generated artifacts and README
- note that resumed agents rehydrate rollout history without overwriting
their base instructions
Testing
- Not run (not requested)
Add a centralized FileWatcher in codex-core (using notify) that watches
skill roots from the config layer stack (recursive)
Send `SkillsChanged` events when relevant file system changes are
detected
On `SkillsChanged`:
* Invalidate the skills cache immediately in ThreadManager
* Emit EventMsg::SkillsUpdateAvailable to active sessions
~~* Broadcast a new app-server notification:
SkillsListUpdatedNotification~~
This change does not inject new items into the event stream. That means
the agent will not know about new skills, so it won't be able to
implicitly invoke new skills. It also won't know about changes to
existing skills, so if it has already read the contents of a modified
skill, it will not honor the new behavior.
This change also does not detect modifications to AGENTS.md.
I plan to address these limitations in a follow-on PR modeled after
#9985. Injection of new skills and AGENTS was deemed to risky, hence the
need to split the feature into two stages. The changes in this PR were
designed to easily accommodate the second stage once we have some other
foundational changes in place.
Testing: In addition to automated tests, I did manual testing to confirm
that newly-created skills, deleted skills, and renamed skills are
reflected in the TUI skill picker menu. Also confirmed that
modifications to behaviors for explicitly-invoked skills are honored.
---------
Co-authored-by: Xin Lin <xl@openai.com>
### What
add wiring for `phase` field on `ResponseItem::Message` to lay
groundwork for differentiating model preambles and final messages.
currently optional.
follows pattern in #9698.
updated schemas with `just write-app-server-schema` so we can see type
changes.
### Tests
Updated existing tests for SSE parsing and hydrating from history
## Summary
Add dynamic tool injection to thread startup in API v2, wire dynamic
tool calls through the app server to clients, and plumb responses back
into the model tool pipeline.
### Flow (high level)
- Thread start injects `dynamic_tools` into the model tool list for that
thread (validation is done here).
- When the model emits a tool call for one of those names, core raises a
`DynamicToolCallRequest` event.
- The app server forwards it to the client as `item/tool/call`, waits
for the client’s response, then submits a `DynamicToolResponse` back to
core.
- Core turns that into a `function_call_output` in the next model
request so the model can continue.
### What changed
- Added dynamic tool specs to v2 thread start params and protocol types;
introduced `item/tool/call` (request/response) for dynamic tool
execution.
- Core now registers dynamic tool specs at request time and routes those
calls via a new dynamic tool handler.
- App server validates tool names/schemas, forwards dynamic tool call
requests to clients, and publishes tool outputs back into the session.
- Integration tests
Keep an unmasked base collaboration mode and apply the active mask on
demand. Simplify the TUI mask helpers and update tests/docs to match the
mask contract.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This PR is in the scope of multi-agent work.
An agent (=thread) can now spawn other agents. Those other agents are
not attached to any clients. We need a way to make sure that the clients
are aware of the new threads to look at (for approval for example). This
PR adds a channel to the `ThreadManager` that pushes the ID of those
newly created agents such that the client (here the app-server) can also
subscribe to those ones.
Have only the following Methods:
- `list_models`: getting current available models
- `try_list_models`: sync version no refresh for tui use
- `get_default_model`: get the default model (should be tightened to
core and received on session configuration)
- `get_model_info`: get `ModelInfo` for a specific model (should be
tightened to core but used in tests)
- `refresh_if_new_etag`: trigger refresh on different etags
Also move the cache to its own struct
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.
Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
### Summary
* Added `mcpServer/refresh` command to inform app servers and active
threads to refresh mcpServer on next turn event.
* Added `pending_mcp_server_refresh_config` to codex core so that if the
value is populated, we reinitialize the mcp server manager on the thread
level.
* The config is updated on `mcpServer/refresh` command which we iterate
through threads and provide with the latest config value after last
write.