## Why
Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.
This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.
It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.
The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.
## What Changed
Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:
```toml
[features]
personality = true
unified_exec = false
```
Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.
- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.
## Verification
- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.
## Docs
`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
## Summary
`PermissionProfile.network` could not be preserved when additional or
compiled permissions resolved to
`SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
field. This change makes read-only + network
enabled representable directly and threads that through the protocol,
app-server v2 mirror, and permission-
merging logic.
## What changed
- Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
protocol and app-server v2 protocol.
- Kept backward compatibility by defaulting the new field to false, so
legacy read-only payloads still
deserialize unchanged.
- Updated `has_full_network_access()` and sandbox summaries to respect
read-only network access.
- Preserved PermissionProfile.network when:
- compiling skill permission profiles into sandbox policies
- normalizing additional permissions
- merging additional permissions into existing sandbox policies
- Updated the approval overlay to show network in the rendered
permission rule when requested.
- Regenerated app-server schema fixtures for the new v2 wire shape.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
### Summary
Propagate trace context originating at app-server RPC method handlers ->
codex core submission loop (so this includes spans such as `run_turn`!).
This implements PR 2 of the app-server tracing rollout.
This also removes the old lower-level env-based reparenting in core so
explicit request/submission ancestry wins instead of being overridden by
ambient `TRACEPARENT` state.
### What changed
- Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission
- Taught `Codex::submit()` / `submit_with_id()` to automatically capture
the current span context when constructing or forwarding a submission
- Wrapped the core submission loop in a submission_dispatch span
parented from Submission.trace
- Warn on invalid submission trace carriers and ignore them cleanly
- Removed the old env-based downstream reparenting path in core task
execution
- Stopped OTEL provider init from implicitly attaching env trace context
process-wide
- Updated mcp-server Submission call sites for the new field
Added focused unit tests for:
- capturing trace context into Submission
- preferring `Submission.trace` when building the core dispatch span
### Why
PR 1 gave us consistent inbound request spans in app-server, but that
only covered the transport boundary. For long-running work like turns
and reviews, the important missing piece was preserving ancestry after
the request handler returns and core continues work on a different async
path.
This change makes that handoff explicit and keeps the parentage rules
simple:
- app-server request span sets the current context
- `Submission.trace` snapshots that context
- core restores it once, at the submission boundary
- deeper core spans inherit naturally
That also lets us stop relying on env-based reparenting for this path,
which was too ambient and could override explicit ancestry.
load plugin-apps from `.app.json`.
make apps runtime-mentionable iff `codex_apps` MCP actually exposes
tools for that `connector_id`.
if the app isn't available, it's filtered out of runtime connector set,
so no tools are added and no app-mentions resolve.
right now we don't have a clean cli-side error for an app not being
installed. can look at this after.
### Tests
Added tests, tested locally that using a plugin that bundles an app
picks up the app.
## Summary
Instead of always adding inner function call outputs to the model
context, let js code decide which ones to return.
- Stop auto-hoisting nested tool outputs from `codex.tool(...)` into the
outer `js_repl` function output.
- Keep `codex.tool(...)` return values unchanged as structured JS
objects.
- Add `codex.emitImage(...)` as the explicit path for attaching an image
to the outer `js_repl` function output.
- Support emitting from a direct image URL, a single `input_image` item,
an explicit `{ bytes, mimeType }` object, or a raw tool response object
containing exactly one image.
- Preserve existing `view_image` original-resolution behavior when JS
emits the raw `view_image` tool result.
- Suppress the special `ViewImageToolCall` event for `js_repl`-sourced
`view_image` calls so nested inspection stays side-effect free until JS
explicitly emits.
- Update the `js_repl` docs and generated project instructions with both
recommended patterns:
- `await codex.emitImage(codex.tool("view_image", { path }))`
- `await codex.emitImage({ bytes: await page.screenshot({ type: "jpeg",
quality: 85 }), mimeType: "image/jpeg" })`
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/13050
- 👉 `2` https://github.com/openai/codex/pull/13331
- ⏳ `3` https://github.com/openai/codex/pull/13049
## Summary
Add original-resolution support for `view_image` behind the
under-development `view_image_original_resolution` feature flag.
When the flag is enabled and the target model is `gpt-5.3-codex` or
newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
`detail: "original"` to the Responses API instead of using the legacy
resize/compress path.
## What changed
- Added `view_image_original_resolution` as an under-development feature
flag.
- Added `ImageDetail` to the protocol models and support for serializing
`detail: "original"` on tool-returned images.
- Added `PromptImageMode::Original` to `codex-utils-image`.
- Preserves original PNG/JPEG/WebP bytes.
- Keeps legacy behavior for the resize path.
- Updated `view_image` to:
- use the shared `local_image_content_items_with_label_number(...)`
helper in both code paths
- select original-resolution mode only when:
- the feature flag is enabled, and
- the model slug parses as `gpt-5.3-codex` or newer
- Kept local user image attachments on the existing resize path; this
change is specific to `view_image`.
- Updated history/image accounting so only `detail: "original"` images
use the docs-based GPT-5 image cost calculation; legacy images still use
the old fixed estimate.
- Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
at 85% quality unless lossless is required, while still allowing other
formats when explicitly requested.
- Updated tests and helper code that construct
`FunctionCallOutputContentItem::InputImage` to carry the new `detail`
field.
## Behavior
### Feature off
- `view_image` keeps the existing resize/re-encode behavior.
- History estimation keeps the existing fixed-cost heuristic.
### Feature on + `gpt-5.3-codex+`
- `view_image` sends original-resolution images with `detail:
"original"`.
- PNG/JPEG/WebP source bytes are preserved when possible.
- History estimation uses the GPT-5 docs-based image-cost calculation
for those `detail: "original"` images.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/13050
- ⏳ `2` https://github.com/openai/codex/pull/13331
- ⏳ `3` https://github.com/openai/codex/pull/13049
## Summary
- add the v2 `thread/metadata/update` API, including
protocol/schema/TypeScript exports and app-server docs
- patch stored thread `gitInfo` in sqlite without resuming the thread,
with validation plus support for explicit `null` clears
- repair missing sqlite thread rows from rollout data before patching,
and make those repairs safe by inserting only when absent and updating
only git columns so newer metadata is not clobbered
- keep sqlite authoritative for mutable thread git metadata by
preserving existing sqlite git fields during reconcile/backfill and only
using rollout `SessionMeta` git fields to fill gaps
- add regression coverage for the endpoint, repair paths, concurrent
sqlite writes, clearing git fields, and rollout/backfill reconciliation
- fix the login server shutdown race so cancelling before the waiter
starts still terminates `block_until_done()` correctly
## Testing
- `cargo test -p codex-state
apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-state
update_thread_git_info_preserves_newer_non_git_metadata`
- `cargo test -p codex-core
backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-app-server thread_metadata_update`
- `cargo test`
- currently fails in existing `codex-core` grep-files tests with
`unsupported call: grep_files`:
- `suite::grep_files::grep_files_tool_collects_matches`
- `suite::grep_files::grep_files_tool_reports_empty_results`
## Summary
- submit `Enter` steers immediately while a turn is already running
instead of routing them through `queued_user_messages`
- keep those submitted steers visible in the footer as `pending_steers`
until core records them as a user message or aborts the turn
- reconcile pending steers on `ItemCompleted(UserMessage)`, not
`RawResponseItem`
- emit user-message item lifecycle for leftover pending input at task
finish, then remove the TUI `TurnComplete` fallback
- keep `queued_user_messages` for actual queued drafts, rendered below
pending steers
## Problem
While the assistant was generating, pressing `Enter` could send the
input into `queued_user_messages`. That queue only drains after the turn
ends, so ordinary steers behaved like queued drafts instead of landing
at the next core sampling boundary.
The first version of this fix also used `RawResponseItem` to decide when
a steer had landed. Review feedback was that this is the wrong
abstraction for client behavior.
There was also a late edge case in core: if pending steer input was
accepted after the final sampling decision but before `TurnComplete`,
core would record that user message into history at task finish without
emitting `ItemStarted(UserMessage)` / `ItemCompleted(UserMessage)`. TUI
had a fallback to paper over that gap locally.
## Approach
- `Enter` during an active turn now submits a normal `Op::UserTurn`
immediately
- TUI keeps a local pending-steer preview instead of rendering that user
message into history immediately
- when core records the steer as `ItemCompleted(UserMessage)`, TUI
matches and removes the corresponding pending preview, then renders the
committed user message
- core now emits the same user-message lifecycle when
`on_task_finished(...)` drains leftover pending user input, before
`TurnComplete`
- with that lifecycle gap closed in core, TUI no longer needs to flush
pending steers into history on `TurnComplete`
- if the turn is interrupted, pending steers and queued drafts are both
restored into the composer, with pending steers first
## Notes
- `Tab` still uses the real queued-message path
- `queued_user_messages` and `pending_steers` are separate state with
separate semantics
- the pending-steer matching key is built directly from `UserInput`
- this removes the new TUI dependency on `RawResponseItem`
## Validation
- `just fmt`
- `cargo test -p codex-core
task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input --
--nocapture`
- `cargo test -p codex-tui`
Update config.toml plugin entries to use
<plugin_name>@<marketplace_name> as the key.
Plugin now stays in
[plugins/cache/marketplace-name/plugin-name/$version/]
Clean up the plugin code structure.
Add plugin install functionality (not used yet).
## Summary
- Route delegated realtime handoff turns from all handoff message texts,
preserving order
- Fallback to input_transcript only when no messages are present
- Add regression coverage for multi-message handoff requests
followup to https://github.com/openai/codex/pull/13212 to expose fast
tier controls to app server
(majority of this PR is generated schema jsons - actual code is +69 /
-35 and +24 tests )
- add service tier fields to the app-server protocol surfaces used by
thread lifecycle, turn start, config, and session configured events
- thread service tier through the app-server message processor and core
thread config snapshots
- allow runtime config overrides to carry service tier for app-server
callers
cleanup:
- Removing useless "legacy" code supporting "standard" - we moved to
None | "fast", so "standard" is not needed.
- add a local Fast mode setting in codex-core (similar to how model id
is currently stored on disk locally)
- send `service_tier=priority` on requests when Fast is enabled
- add `/fast` in the TUI and persist it locally
- feature flag
## Summary
This change removes the compiled permissions field from skill metadata
and keeps permission_profile as the single source of truth.
Skill loading no longer compiles skill permissions eagerly. Instead, the
zsh-fork skill escalation path compiles `skill.permission_profile` when
it needs to determine the sandbox to apply for a skill script.
## Behavior change
For skills that declare:
```
permissions: {}
```
we now treat that the same as having no skill permissions override,
instead of creating and using a default readonly sandbox. This change
makes the behavior more intuitive:
- only non-empty skill permission profiles affect sandboxing
- omitting permissions and writing permissions: {} now mean the same
thing
- skill metadata keeps a single permissions representation instead of
storing derived state too
Overall, this makes skill sandbox behavior easier to understand and more
predictable.
- migrate the realtime websocket transport to the new session and
handoff flow
- make the realtime model configurable in config.toml and use API-key
auth for the websocket
---------
Co-authored-by: Codex <noreply@openai.com>
Currently `thread/name/set` does only work for loaded threads.
Expand the scope to also support persisted but not-yet-loaded ones for a
more predictable API surface.
This will make it possible to rename threads discovered via
`thread/list` and similar operations.
## Summary
Codex discovered this one - shell_snapshot tests were breaking on my
machine because I had a multiline env var. We should handle these!
## Testing
- [x] existing tests pass
- [x] Updated unit tests
Fixes#13076
This PR fixes a bug that causes command-line config overrides for MCP
subtables to not be merged correctly.
Summary
- make project trust loading go through the dedicated struct so CLI
overrides can update trusted project-local MCP transports
---------
Co-authored-by: jif-oai <jif@openai.com>
## Summary
- reuse the parent shell snapshot when spawning/forking/resuming
`SessionSource::SubAgent(SubAgentSource::ThreadSpawn { .. })` sessions
- plumb inherited snapshot through `AgentControl -> ThreadManager ->
Codex::spawn -> SessionConfiguration`
- skip shell snapshot refresh on cwd updates for thread-spawn subagents
so inherited snapshots are not replaced
## Why
- avoids per-subagent shell snapshot creation and cleanup work
- keeps thread-spawn subagents on the parent snapshot path, matching the
intended parent/child snapshot model
## Validation
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-core spawn_agent -- --nocapture`
- `cargo test -p codex-core --test all
suite::agent_jobs::spawn_agents_on_csv_runs_and_exports`
## Notes
- full `cargo test -p codex-core --test all` was left running separately
for broader verification
Co-authored-by: Codex <noreply@openai.com>
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
…with launchservicesd
Add mach lookup for `launchservicesd` when extending the sandbox for
`MacOSAutomationPermission::BundleIDs`. This is necessary so that the
target application can be launched for automation.
This omission was due to a spec error in a document, which has been
fixed.
Support loading plugins.
Plugins can now be enabled via [plugins.<name>] in config.toml. They are
loaded as first-class entities through PluginsManager, and their default
skills/ and .mcp.json contributions are integrated into the existing
skills and MCP flows.
## Why
[#12964](https://github.com/openai/codex/pull/12964) added
`host_executable()` support to `codex-execpolicy`, and
[#13046](https://github.com/openai/codex/pull/13046) adopted it in the
zsh-fork interception path.
The remaining gap was the preflight execpolicy check in
`core/src/exec_policy.rs`. That path derives approval requirements
before execution for `shell`, `shell_command`, and `unified_exec`, but
it was still using the default exact-token matcher.
As a result, a command that already included an absolute executable
path, such as `/usr/bin/git status`, could still miss a basename rule
like `prefix_rule(pattern = ["git"], ...)` during preflight even when
the policy also defined a matching `host_executable(name = "git", ...)`
entry.
This PR brings the same opt-in `host_executable()` resolution to the
preflight approval path when an absolute program path is already present
in the parsed command.
## What Changed
- updated
`ExecPolicyManager::create_exec_approval_requirement_for_command()` in
`core/src/exec_policy.rs` to use `check_multiple_with_options(...)` with
`MatchOptions { resolve_host_executables: true }`
- kept the existing shell parsing flow for approval derivation, but now
allow basename rules to match absolute executable paths during preflight
when `host_executable()` permits it
- updated requested-prefix amendment evaluation to use the same
host-executable-aware matching mode, so suggested `prefix_rule()`
amendments are checked consistently for absolute-path commands
- added preflight coverage for:
- absolute-path commands that should match basename rules through
`host_executable()`
- absolute-path commands whose paths are not in the allowed
`host_executable()` mapping
- requested prefix-rule amendments for absolute-path commands
## Verification
- `just fix -p codex-core`
- `cargo test -p codex-core --lib exec_policy::tests::`
## Summary
- skip online model refresh for subagent sessions
- avoid rollout flushes during subagent startup
- keep /models refresh for non-subagent sessions
## Testing
- cargo test -p codex-core --test all
suite::models_etag_responses::refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch
- cargo test -p codex-core --test all
suite::remote_models::remote_models_long_model_slug_is_sent_with_high_reasoning
- cargo test -p codex-core --test all
suite::model_switching::model_switch_to_smaller_model_updates_token_context_window
- cargo test -p codex-core --test all
suite::compact::pre_sampling_compact_runs_on_switch_to_smaller_context_model
- cargo test -p codex-core --test all
suite::compact::pre_sampling_compact_runs_after_resume_and_switch_to_smaller_model
- cargo test -p codex-core --test all
suite::personality::remote_model_friendly_personality_instructions_with_feature
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
- tighten Codex memory-read behavior around stale facts and conflicting
memory
- encode the risk-of-drift vs verification-effort decision rule directly
in the read-path prompt
- make partial stale-detail updates explicit so correcting only the
answer is not treated as sufficient
## What changed
- update `codex-rs/core/templates/memories/read_path.md`
- add guidance for when to verify cheap local facts vs when to answer
from older memory with visible provenance
- strengthen same-turn `MEMORY.md` updates when stored concrete details
are stale
## Notes
- this is based on some staleness eval work
## Why
[#12964](https://github.com/openai/codex/pull/12964) added
`host_executable()` support to `codex-execpolicy`, but the zsh-fork
interception path in `unix_escalation.rs` was still evaluating commands
with the default exact-token matcher.
That meant an intercepted absolute executable such as `/usr/bin/git
status` could still miss basename rules like `prefix_rule(pattern =
["git", "status"])`, even when the policy also defined a matching
`host_executable(name = "git", ...)` entry.
This PR adopts the new matching behavior in the zsh-fork runtime only.
That keeps the rollout intentionally narrow: zsh-fork already requires
explicit user opt-in, so it is a safer first caller to exercise the new
`host_executable()` scheme before expanding it to other execpolicy call
sites.
It also brings zsh-fork back in line with the current `prefix_rule()`
execution model. Until prefix rules can carry their own permission
profiles, a matched `prefix_rule()` is expected to rerun the intercepted
command unsandboxed on `allow`, or after the user accepts `prompt`,
instead of merely continuing inside the inherited shell sandbox.
## What Changed
- added `evaluate_intercepted_exec_policy()` in
`core/src/tools/runtimes/shell/unix_escalation.rs` to centralize
execpolicy evaluation for intercepted commands
- switched intercepted direct execs in the zsh-fork path to
`check_multiple_with_options(...)` with `MatchOptions {
resolve_host_executables: true }`
- added `commands_for_intercepted_exec_policy()` so zsh-fork policy
evaluation works from intercepted `(program, argv)` data instead of
reconstructing a synthetic command before matching
- left shell-wrapper parsing intentionally disabled by default behind
`ENABLE_INTERCEPTED_EXEC_POLICY_SHELL_WRAPPER_PARSING`, so
path-sensitive matching relies on later direct exec interception rather
than shell-script parsing
- made matched `prefix_rule()` decisions rerun intercepted commands with
`EscalationExecution::Unsandboxed`, while unmatched-command fallback
keeps the existing sandbox-preserving behavior
- extracted the zsh-fork test harness into
`core/tests/common/zsh_fork.rs` so both the skill-focused and
approval-focused integration suites can exercise the same runtime setup
- limited this change to the intercepted zsh-fork path rather than
changing every execpolicy caller at once
- added runtime coverage in
`core/src/tools/runtimes/shell/unix_escalation_tests.rs` for allowed and
disallowed `host_executable()` mappings and the wrapper-parsing modes
- added integration coverage in `core/tests/suite/approvals.rs` to
verify a saved `prefix_rule(pattern=["touch"], decision="allow")` reruns
under zsh-fork outside a restrictive `WorkspaceWrite` sandbox
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13046).
* #13065
* __->__ #13046
- override startup tooltips with model availability NUX and persist
per-model show counts in config
- stop showing each model after four exposures and fall back to normal
tooltips