# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
- switch the review test SSE mock helper to use the shared hermetic mock
server setup
- ensure review tests always have a default `/v1/models` stub during
Codex session bootstrap
- remove the race that caused intermittent `/v1/models` connection
failures and flaky ETag refresh assertions
## Testing
- `just fmt`
- `cargo test -p codex-core --test all
refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch`
- `cargo test -p codex-core --test all
review_uses_custom_review_model_from_config`
- repeated both targeted tests 5x in a loop
- `cargo clippy -p codex-core --tests -- -D warnings`
Summary
- ensure destructive tool annotations short-circuit to require approval
- simplify approval logic to only require read/write + open-world when
destructive is false
- update the unit test to cover the new destructive behavior
Testing
- Not run (not requested)
## Summary
Adds support for a Unix socket escape hatch so we can bypass socket
allowlisting when explicitly enabled.
## Description
* added a new flag, `network.dangerously_allow_all_unix_sockets` as an
explicit escape hatch
* In codex-network-proxy, enabling that flag now allows any absolute
Unix socket path from x-unix-socket instead of requiring each path to be
explicitly allowlisted. Relative paths are still rejected.
* updated the macOS seatbelt path in core so it enforces the same Unix
socket behavior:
* allowlisted sockets generate explicit network* subpath rules
* allow-all generates a broad network* (subpath "/") rule
---------
Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
## Summary
Tighten the `js_repl` freeform Lark grammar to block the most common
malformed payload wrappers before they reach runtime validation.
## What Changed
- Replaced the overly permissive `js_repl` freeform grammar (`start:
/[\s\S]*/`) with a structured grammar that still supports:
- plain JS source
- optional first-line `// codex-js-repl:` pragma followed by JS source
- Added grammar-level filtering for common bad payload shapes by
rejecting inputs whose first significant token starts with:
- `{` (JSON object wrapper like `{"code":"..."}`)
- `"` (quoted code string)
- `` ``` `` (markdown code fences)
- Implemented the grammar without regex lookahead/lookbehind because the
API-side Lark regex engine does not support look-around.
- Added a unit test to validate the grammar shape and guard against
reintroducing unsupported lookaround.
## Why
`js_repl` is a freeform tool, but the model sometimes emits wrapped
payloads (JSON, quoted strings, markdown fences) instead of raw
JavaScript. We already reject those at runtime, but this change moves
the constraint into the tool grammar so the model is less likely to
generate invalid tool-call payloads in the first place.
## Testing
- `cargo test -p codex-core
js_repl_freeform_grammar_blocks_common_non_js_prefixes`
- `cargo test -p codex-core parse_freeform_args_rejects_`
## Notes
- This intentionally over-blocks a few uncommon valid JS starts (for
example top-level `{ ... }` blocks or top-level quoted directives like
`"use strict";`) in exchange for preventing the common wrapped-payload
mistakes.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12300
- ⏳ `2` https://github.com/openai/codex/pull/12275
- ⏳ `3` https://github.com/openai/codex/pull/12205
- ⏳ `4` https://github.com/openai/codex/pull/12185
- ⏳ `5` https://github.com/openai/codex/pull/10673
## Summary
Simplify network approvals by removing per-attempt proxy correlation and
moving to session-level approval dedupe keyed by (host, protocol, port).
Instead of encoding attempt IDs into proxy credentials/URLs, we now
treat approvals as a destination policy decision.
- Concurrent calls to the same destination share one approval prompt.
- Different destinations (or same host on different ports) get separate
prompts.
- Allow once approves the current queued request group only.
- Allow for session caches that (host, protocol, port) and auto-allows
future matching requests.
- Never policy continues to deny without prompting.
Example:
- 3 calls:
- a.com (line 443)
- b.com (line 443)
- a.com (line 443)
=> 2 prompts total (a, b), second a waits on the first decision.
- a.com:80 is treated separately from a.com line 443
## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core tools::network_approval::tests`
- `cargo test -p codex-core` (unit tests pass; existing
integration-suite failures remain in this environment)
Summary
- capture the origin for each configured MCP server and expose it via
the connection manager
- plumb MCP server name/origin into tool logging and emit
codex.tool_result events with those fields
- add unit coverage for origin parsing and extend OTEL tests to assert
empty MCP fields for non-MCP tools
- currently not logging full urls or url paths to prevent logging
potentially sensitive data
Testing
- Not run (not requested)
## Summary
- add reasoning effort constants for the memories phase one and phase
two agents
- wire the constants into phase1 request creation and phase2 agent
configuration so the default efforts are always applied
## Testing
- Not run (not requested)
## Summary
- Add `rollout_summary_file: <generated>.md` to each thread header in
`raw_memories.md` so Phase 2 can reliably reference the canonical
rollout summary filename.
- Update the memory prompts/templates (`stage_one_system`,
`consolidation`, `read_path`) for the new task-oriented raw-memory /
MEMORY.md schema and stronger consolidation guidance.
## Details
- `codex-rs/core/src/memories/storage.rs`
- Writes the generated `rollout_summary_file` path into the per-thread
metadata header when rebuilding `raw_memories.md`.
- `codex-rs/core/src/memories/tests.rs`
- Verifies the canonical `rollout_summary_file` header is present and
ordered after `updated_at`/`cwd` in `raw_memories.md`.
- Verifies task-structured raw-memory content is preserved while the
canonical header is added.
- `codex-rs/core/templates/memories/*.md`
- Updates the stage-1 raw-memory format to task-grouped sections
(`task`, `task_group`, `task_outcome`).
- Updates Phase 2 consolidation guidance around recency (`updated_at`),
task-oriented `MEMORY.md` blocks, and richer evidence-backed
consolidation.
- Tweaks the quick memory pass wording to emphasize topics/workflows in
addition to keywords.
## Testing
- `cargo test -p codex-core memories`
We now write MCP tools from installed apps to disk cache so that they
can be picked up instantly at startup. We still do a fresh fetch from
remote MCP server but it's non blocking unless there's a cache miss.
- [x] Store apps tool cache in disk to reduce startup time.
## Summary
Implements the `ConfigToml.permissions.network` and uses it to populate
`NetworkProxyConfig`. We now parse a new nested permissions/network
config shape which is converted into the proxy’s runtime config.
When managed requirements exist, we still apply those constraints on top
of user settings (so managed policy still wins).
* Cleaned up the old constructor path so it now accepts both user config
+ managed constraints directly.
* Updated the reload path so live proxy config reloads respect
[permissions.network] too, while still supporting the existing top-level
[network] format.
### Behavior
- User-defined `[permissions.network]` values are now honored.
- Managed constraints still take effect and are validated against the
resulting policy.
## Summary
Implements a configurable MCP OAuth callback URL override for `codex mcp
login` and app-server OAuth login flows, including support for non-local
callback endpoints (for example, devbox ingress URLs).
## What changed
- Added new config key: `mcp_oauth_callback_url` in
`~/.codex/config.toml`.
- OAuth authorization now uses `mcp_oauth_callback_url` as
`redirect_uri` when set.
- Callback handling validates the callback path against the configured
redirect URI path.
- Listener bind behavior is now host-aware:
- local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to
`127.0.0.1`
- non-local callback URL hosts bind to `0.0.0.0`
- `mcp_oauth_callback_port` remains supported and is used for the
listener port.
- Wired through:
- CLI MCP login flow
- App-server MCP OAuth login flow
- Skill dependency OAuth login flow
- Updated config schema and config tests.
## Why
Some environments need OAuth callbacks to land on a specific reachable
URL (for example ingress in remote devboxes), not loopback. This change
allows that while preserving local defaults for existing users.
## Backward compatibility
- No behavior change when `mcp_oauth_callback_url` is unset.
- Existing `mcp_oauth_callback_port` behavior remains intact.
- Local callback flows continue binding to loopback by default.
## Testing
- `cargo test -p codex-rmcp-client callback -- --nocapture`
- `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture`
- `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client`
## Example config
```toml
mcp_oauth_callback_port = 5555
mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"
## Summary
- The experimental Bazel CI builds fail on all platforms because askama
resolves template paths relative to `CARGO_MANIFEST_DIR`, which points
outside the Bazel sandbox. This produces errors like:
```
error: couldn't read
`codex-rs/core/src/memories/../../../../../../../../../../../work/codex/codex/codex-rs/core/templates/memories/consolidation.md`:
No such file or directory
```
- Replaced `#[derive(Template)]` + `#[template(path = "...")]` with
`include_str!` + `str::replace()` for the three affected templates
(`consolidation.md`, `stage_one_input.md`, `read_path.md`).
`include_str!` resolves paths relative to the source file, which works
correctly in both Cargo and Bazel builds.
- The templates only use simple `{{ variable }}` substitution with no
control flow or filters, so no askama functionality is lost.
- Removes the `askama` dependency from `codex-core` since it was the
only crate using it. The workspace-level dependency definition is left
in place.
- This matches the existing pattern used throughout the codebase — e.g.
`codex-rs/core/src/memories/mod.rs` already uses
`include_str!("../../templates/memories/stage_one_system.md")` for the
fourth template file.
## Test plan
- [ ] Verify Bazel (experimental) CI passes on all platforms
- [ ] Verify rust-ci (Cargo) builds and tests continue to pass
- [ ] Verify `cargo test -p codex-core` passes locally
## Summary
- Require revised `<proposed_plan>` blocks in the same planning session
to be complete replacements, not partial/delta plans.
- Scope that cumulative replacement rule to the current planning session
only.
- Clarify that after leaving Plan mode (for example switching to Default
mode to implement) or when explicitly asked for a new plan, the model
should produce a new self-contained plan without inheriting prior plan
blocks unless requested.
## Testing
- Not run (prompt/template text-only change).
Summary
- avoid emitting metrics for features marked as `Stage::Removed`
- keep feature metrics aligned with active and planned states only
Testing
- Not run (not requested)
## Why
`McpConnectionManager` used a two-phase setup (`new()` followed by
`initialize()`), which forced call sites to construct placeholder state
and then mutate it asynchronously. That made MCP startup/refresh flows
harder to follow and easier to misuse, especially around cancellation
token ownership.
## What changed
- Replaced the two-phase initialization flow with a single async
constructor: `McpConnectionManager::new(...) -> (Self,
CancellationToken)`.
- Added `McpConnectionManager::new_uninitialized()` for places that need
an empty manager before async startup begins.
- Added `McpConnectionManager::new_mcp_connection_manager_for_tests()`
for test-only construction.
- Updated MCP startup and refresh call sites in
`codex-rs/core/src/codex.rs` to build a fresh manager via `new(...)`,
swap it in, and update the startup cancellation token consistently.
- Updated MCP snapshot/connector call sites in
`codex-rs/core/src/mcp/mod.rs` and `codex-rs/core/src/connectors.rs` to
use the consolidated constructor.
- Removed the now-obsolete `reset_mcp_startup_cancellation_token()`
helper in favor of explicit token replacement at the call sites.
## Testing
- Not run (refactor-only change; no new behavior was intended).
Summary
- expose `agents.max_depth` in config schema and toml parsing, with
defaults and validation
- thread-spawn depth guards and multi-agent handler now respect the
configured limit instead of a hardcoded value
- ensure documentation and helpers account for agent depth limits
TL;DR
Add top-level `model_catalog_json` config support so users can supply a
local model catalog override from a JSON file path (including adding new
models) without backend changes.
### Problem
Codex previously had no clean client-side way to replace/overlay model
catalog data for local testing of model metadata and new model entries.
### Fix
- Add top-level `model_catalog_json` config field (JSON file path).
- Apply catalog entries when resolving `ModelInfo`:
1. Base resolved model metadata (remote/fallback)
2. Catalog overlay from `model_catalog_json`
3. Existing global top-level overrides (`model_context_window`,
`model_supports_reasoning_summaries`, etc.)
### Note
Will revisit per-field overrides in a follow-up
### Tests
Added tests
## Summary
- add `previous_context_item: Option<TurnContextItem>` to
`ContextManager`
- expose session/state accessors for reading and updating the stored
previous context item
- switch settings diffing to use `TurnContextItem` instead of
`TurnContext`
- remove submission-loop local `previous_context` and persist the
previous context item in history
## Testing
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core --test all model_switching::`
- `cargo test -p codex-core --test all collaboration_instructions::`
- `cargo test -p codex-core --test all personality::`
- `cargo test -p codex-core --test all
permissions_messages::permissions_message_not_added_when_no_change`
Summary
This PR expands MCP client-side approval behavior beyond codex_apps and
tightens elicitation capability signaling.
- Removed the codex_apps-only gate in MCP tool approval checks, so
local/custom MCP servers are now eligible for the same client-side
approval prompt flow when tool annotations indicate side effects.
- Updated approval memory keying to support tools without a connector ID
(connector_id: Option<String>), allowing “Approve this Session” to be
remembered even when connector metadata is missing.
- Updated prompt text for non-codex_apps tools to identify origin as The
<server> MCP server instead of This app.
- Added MCP initialization capability policy so only codex_apps
advertises MCP elicitation capability; other servers advertise no
elicitation support.
- Added regression tests for:
server-specific prompt copy behavior
codex-apps-only elicitation capability advertisement
Testing
- Not run (not requested)
- Summary
- raise `DEFAULT_MEMORIES_MAX_ROLLOUTS_PER_STARTUP` to 16 so more
rollouts are allowed per startup
- lower `DEFAULT_MEMORIES_MIN_ROLLOUT_IDLE_HOURS` to 6 to make rollouts
eligible sooner
- Testing
- Not run (not requested)
Summary
- replace the stale `docs/config.md#feature-flags` reference in the
legacy feature notice with the canonical published URL
- align the deprecation notice test to expect the new link
This addresses #12123
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.
The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
## Summary
Fix `js_repl` package-resolution boundary checks for macOS temp
directory path aliasing (`/var` vs `/private/var`).
## Problem
`js_repl` verifies that resolved bare-package imports stay inside a
configured `node_modules` root.
On macOS, temp directories are commonly exposed as `/var/...` but
canonicalize to `/private/var/...`.
Because the boundary check compared raw paths with `path.relative(...)`,
valid resolutions under temp dirs could be misclassified as escaping the
allowed base, causing false `Module not found` errors.
## Changes
- Add `fs` import in the JS kernel.
- Add `canonicalizePath()` using `fs.realpathSync.native(...)` (with
safe fallback).
- Canonicalize both `base` and `resolvedPath` before running the
`node_modules` containment check.
## Impact
- Fixes false-negative boundary checks for valid package resolutions in
macOS temp-dir scenarios.
- Keeps the existing security boundary behavior intact.
- Scope is limited to `js_repl` kernel module path validation logic.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12177
- ⏳ `2` https://github.com/openai/codex/pull/10673
## Summary
Increase the rollout summary filename slug cap from 20 to 60 characters
in memory storage.
## What changed
- Updated `ROLLOUT_SLUG_MAX_LEN` from `20` to `60` in:
- `codex-rs/core/src/memories/storage.rs`
- Updated slug truncation test to verify 60-char behavior.
## Why
This preserves more semantic context in rollout summary filenames while
keeping existing normalization behavior unchanged.
## Testing
- `just fmt`
- `cargo test -p codex-core
memories::storage::tests::rollout_summary_file_stem_sanitizes_and_truncates_slug
-- --exact`
The issue was that the file_watcher never unsubscribe a file watch. All
of them leave in the owning of the ThreadManager. As a result, for each
newly created thread we create a new file watcher but this one never get
deleted even if we close the thread. On Unix system, a file watcher uses
an `inotify` and after some time we end up having consumed all of them.
This PR adds a mechanism to unsubscribe a file watcher when a thread is
dropped