Addresses #15527
Problem: Nested `codex exec` commands could source a shell snapshot that
re-exported the parent `CODEX_THREAD_ID`, so commands inside the nested
session were attributed to the wrong thread.
Solution: Reapply the live command env's `CODEX_THREAD_ID` after
sourcing the snapshot.
Addresses #15532
Problem: Nested read-only `apply_patch` rejections report in-project
files as outside the project.
Solution: Choose the rejection message based on sandbox mode so
read-only sessions report a read-only-specific reason, and add focused
safety coverage.
Problem: The multi-agent followup interrupt test polled history before
interrupt cleanup and mailbox wakeup were guaranteed to settle, which
made it flaky under CI scheduling variance.
Solution: Wait for the child turn's `TurnAborted(Interrupted)` event
before asserting that the redirected assistant envelope is recorded and
no plain user message is left behind.
## Summary
- reduce public module visibility across Rust crates, preferring private
or crate-private modules with explicit crate-root public exports
- update external call sites and tests to use the intended public crate
APIs instead of reaching through module trees
- add the module visibility guideline to AGENTS.md
## Validation
- `cargo check --workspace --all-targets --message-format=short` passed
before the final fix/format pass
- `just fix` completed successfully
- `just fmt` completed successfully
- `git diff --check` passed
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
- make AGENTS.md discovery and loading fully FS-aware and remove the
non-FS discover helper
- migrate remote-aware codex-core tests to use TestEnv workspace setup
instead of syncing a local workspace copy
- add AGENTS.md corner-case coverage, including directory fallbacks and
remote-aware integration coverage
## Testing
- cargo test -p codex-core project_doc -- --nocapture
- cargo test -p codex-core hierarchical_agents -- --nocapture
- cargo test -p codex-core agents_md -- --nocapture
- cargo test -p codex-tui status -- --nocapture
- cargo test -p codex-tui-app-server status -- --nocapture
- just fix
- just fmt
- just bazel-lock-update
- just bazel-lock-check
- just argument-comment-lint
- remote Linux executor tests in progress via scripts/test-remote-env.sh
## Summary
This adds `experimental_network.danger_full_access_denylist_only` for
orgs that want yolo / danger-full-access sessions to keep full network
access while still enforcing centrally managed deny rules.
When the flag is true and the session sandbox is `danger-full-access`,
the network proxy starts with:
- domain allowlist set to `*`
- managed domain `deny` entries enforced
- upstream proxy use allowed
- all Unix sockets allowed
- local/private binding allowed
Caveat: the denylist is best effort only. In yolo / danger-full-access
mode, Codex or the model can use an allowed socket or other
local/private network path to bypass the proxy denylist, so this should
not be treated as a hard security boundary.
The flag is intentionally scoped to `SandboxPolicy::DangerFullAccess`.
Read-only and workspace-write modes keep the existing managed/user
allowlist, denylist, Unix socket, and local-binding behavior. This does
not enable the non-loopback proxy listener setting; that still requires
its own explicit config.
This also threads the new field through config requirements parsing,
app-server protocol/schema output, config API mapping, and the TUI debug
config output.
## How to use
Add the flag under `[experimental_network]` in the network policy config
that is delivered to Codex. The setting is not under `[permissions]`.
```toml
[experimental_network]
enabled = true
danger_full_access_denylist_only = true
[experimental_network.domains]
"blocked.example.com" = "deny"
"*.blocked.example.com" = "deny"
```
With that configuration, yolo / danger-full-access sessions get broad
network access except for the managed denied domains above. The denylist
remains a best-effort proxy policy because the session may still use
allowed sockets to bypass it. Other sandbox modes do not get the
wildcard domain allowlist or the socket/local-binding relaxations from
this flag.
## Verification
- `cargo test -p codex-config network_requirements`
- `cargo test -p codex-core network_proxy_spec`
- `cargo test -p codex-app-server map_requirements_toml_to_api`
- `cargo test -p codex-tui debug_config_output`
- `cargo test -p codex-app-server-protocol`
- `just write-app-server-schema`
- `just fmt`
- `just fix -p codex-config -p codex-core -p codex-app-server-protocol
-p codex-app-server -p codex-tui`
- `just fix -p codex-core -p codex-config`
- `git diff --check`
- `cargo clean`
## Summary
- add a Linux startup warning when system `bwrap` is present but cannot
create user namespaces
- keep the Linux-specific probe, sandbox-policy gate, and stderr
matching in `codex-sandboxing`
- polish the missing-`bwrap` warning to point users at the sandbox
prerequisites and OS package-manager install path
## Details
- probes system `bwrap` with `--unshare-user`, `--unshare-net`, and a
minimal bind before command execution
- detects known bubblewrap setup failures for `RTM_NEWADDR`,
`RTM_NEWLINK`, uid-map permission denial, and `No permissions to create
a new namespace`
- preserves the existing suppression for sandbox-bypassed policies such
as `danger-full-access` and `external-sandbox`
- updates the Linux sandbox docs to call out the user-namespace
requirement
---------
Co-authored-by: Codex <noreply@openai.com>
Addresses #16443
This was a regression introduced when we moved exec on top of the app
server APIs.
Problem: codex exec resolved prompt/stdin and output schema after
starting the in-process app-server, so early `process::exit(1)` paths
could bypass session shutdown.
Solution: Resolve prompt/stdin and output schema before app-server
startup so validation failures happen before any exec session is
created.
## Summary
- make `CODEX_EXEC_SERVER_URL=none` map to an explicit disabled
environment mode instead of inferring from a missing URL
- expose environment capabilities (`exec_enabled`, `filesystem_enabled`)
so tool building can gate behavior explicitly and future
multi-environment work has a clearer seam
- suppress env-backed tools when the relevant capability is unavailable,
including exec tools, `js_repl`, `apply_patch`, `list_dir`, and
`view_image`
- keep handler/runtime backstops so disabled environments still reject
execution if a tool path somehow bypasses registration
## Testing
- `just fmt`
- `cargo test -p codex-exec-server`
- `cargo test -p codex-tools
disabled_environment_omits_environment_backed_tools`
- `cargo test -p codex-tools
environment_capabilities_gate_exec_and_filesystem_tools_independently`
- remote devbox Bazel build via `codex-applied-devbox`:
`//codex-rs/cli:cli`
Addresses #16244
This was a performance regression introduced when we moved the TUI on
top of the app server API.
Problem: `/mcp` rebuilt a full MCP inventory through
`mcpServerStatus/list`, including resources and resource templates that
made the TUI wait on slow inventory probes.
Solution: add a lightweight `detail` mode to `mcpServerStatus/list`,
have `/mcp` request tools-and-auth only, and cover the fast path with
app-server and TUI tests.
Testing: Confirmed slow (multi-second) response prior to change and
immediate response after change.
I considered two options:
1. Change the existing `mcpServerStatus/list` API to accept an optional
"details" parameter so callers can request only a subset of the
information.
2. Add a separate `mcpServer/list` API that returns only the servers,
tools, and auth but omits the resources.
I chose option 1, but option 2 is also a reasonable approach.
Addresses #7646
Also enables device code auth for remote TUI sessions
Problem: TUI onboarding handled device-code login directly rather than
using the recently-added app server support for device auth. Also, auth
screens kept animating while users needed to copy login details.
Solution: Route device-code onboarding through app-server login APIs and
make the auth screens static while those copy-oriented flows are
visible.
Addresses #16951
Problem: codex mcp-server did not apply the configured residency
requirement, so requests from non-US regions could miss the `residency`
header and fail with a 401.
Solution: Set the default client residency requirement after loading
config in the MCP server startup path, matching the existing exec and
TUI behavior.
Extract a shared helper that builds AuthManager from Config and applies
the forced ChatGPT workspace override in one place.
Create the shared AuthManager at MessageProcessor call sites so that
upcoming new transport's initialization can reuse the same handle, and
keep only external auth refresher wiring inside `MessageProcessor`.
Remove the now-unused `AuthManager::shared_with_external_auth` helper.
## Description
This PR makes the SQLite state runtime tolerate databases that have
already been migrated by a newer Codex binary.
Today, if an older CLI sees migration versions in `_sqlx_migrations`
that it doesn't know about, startup fails. This change relaxes that
check for the runtime migrators we use in `codex-state` so older
binaries can keep opening the DB in that case.
## Why
We can end up with mixed-version CLIs running against the same local
state DB. In that setup, treating "the database is ahead of me" as a
hard error is unnecessarily strict and breaks the older client even when
the migration history is otherwise fine.
## Follow-up
We still clean up versioned `state_*.sqlite` and `logs_*.sqlite` files
during init, so older binaries can treat newer DB files as legacy. That
should probably be tightened separately if we want mixed-version local
usage to be fully safe.
Guardian events were emitted a bit out of order for CommandExecution
items. This would make it hard for the frontend to render a guardian
auto-review, which has this payload:
```
pub struct ItemGuardianApprovalReviewStartedNotification {
pub thread_id: String,
pub turn_id: String,
pub target_item_id: String,
pub review: GuardianApprovalReview,
// FYI this is no longer a json blob
pub action: Option<JsonValue>,
}
```
There is a `target_item_id` the auto-approval review is referring to,
but the actual item had not been emitted yet.
Before this PR:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`, and if approved...
- `item/started`
- `item/completed`
After this PR:
- `item/started`
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`
- `item/completed`
This lines up much better with existing patterns (i.e. human review in
`Default mode`, where app-server would send a server request to prompt
for user approval after `item/started`), and makes it easier for clients
to render what guardian is actually reviewing.
We do this following a similar pattern as `FileChange` (aka apply patch)
items, where we create a FileChange item and emit `item/started` if we
see the apply patch approval request, before the actual apply patch call
runs.
## Description
Add requirements.toml support for `allowed_approvals_reviewers =
["user", "guardian_subagent"]`, so admins can now restrict the use of
guardian mode.
Note: If a user sets a reviewer that isn’t allowed by requirements.toml,
config loading falls back to the first allowed reviewer and emits a
startup warning.
The table below describes the possible admin controls.
| Admin intent | `requirements.toml` | User `config.toml` | End result |
|---|---|---|---|
| Leave Guardian optional | omit `allowed_approvals_reviewers` or set
`["user", "guardian_subagent"]` | user chooses `approvals_reviewer =
"user"` or `"guardian_subagent"` | Guardian off for `user`, on for
`guardian_subagent` + `approval_policy = "on-request"` |
| Force Guardian off | `allowed_approvals_reviewers = ["user"]` | any
user value | Effective reviewer is `user`; Guardian off |
| Force Guardian on | `allowed_approvals_reviewers =
["guardian_subagent"]` and usually `allowed_approval_policies =
["on-request"]` | any user reviewer value; user should also have
`approval_policy = "on-request"` unless policy is forced | Effective
reviewer is `guardian_subagent`; Guardian on when effective approval
policy is `on-request` |
| Allow both, but default to manual if user does nothing |
`allowed_approvals_reviewers = ["user", "guardian_subagent"]` | omit
`approvals_reviewer` | Effective reviewer is `user`; Guardian off |
| Allow both, and user explicitly opts into Guardian |
`allowed_approvals_reviewers = ["user", "guardian_subagent"]` |
`approvals_reviewer = "guardian_subagent"` and `approval_policy =
"on-request"` | Guardian on |
| Invalid admin config | `allowed_approvals_reviewers = []` | anything |
Config load error |
### Summary
Fix `thread/metadata/update` so it can still patch stored thread
metadata when the list/backfill-gated `get_state_db(...)` path is
unavailable.
What was happening:
- The app logs showed `thread/metadata/update` failing with `sqlite
state db unavailable for thread ...`.
- This was not isolated to one bad thread. Once the failure started for
a user, branch metadata updates failed 100% of the time for that user.
- Reports were staggered across users, which points at local app-server
/ local SQLite state rather than one global server-side failure.
- Turns could still start immediately after the metadata update failed,
which suggests the thread itself was valid and the failure was in the
metadata endpoint DB-handle path.
The fix:
- Keep using the loaded thread state DB and the normal
`get_state_db(...)` fallback first.
- If that still returns `None`, open `StateRuntime::init(...)` directly
for this targeted metadata update path.
- Log the direct state runtime init error if that final fallback also
fails, so future reports have the real DB-open cause instead of only the
generic unavailable error.
- Add a regression test where the DB exists but backfill is not
complete, and verify `thread/metadata/update` can still repair the
stored rollout thread and patch `gitInfo`.
Relevant context / suspect PRs:
- #16434 changed state DB startup to run auto-vacuum / incremental
vacuum. This is the most suspicious timing match for per-user, staggered
local SQLite availability failures.
- #16433 dropped the old log table from the state DB, also near the
timing window.
- #13280 introduced this endpoint and made it rely on SQLite for git
metadata without resuming the thread.
- #14859 and #14888 added/consumed persisted model + reasoning effort
metadata. I checked these because of the new thread metadata fields, but
this failure happens before the endpoint reaches thread-row update/load
logic, so they seem less likely as the direct cause.
### Testing
- `cargo fmt -- --config imports_granularity=Item` completed; local
stable rustfmt emitted warnings that `imports_granularity` is unstable
- `cargo test -p codex-app-server thread_metadata_update`
- `git diff --check`
Addresses #16622
Problem: bare local file links in TUI markdown render percent-encoded
path bytes literally, unlike file:// links.
Solution: decode bare path targets before local-path expansion and add
regression coverage for spaces and Unicode.
Problem: The resume picker used awkward "Created at" and "Updated at"
headers, and its relative timestamps changed while navigating because
they were recomputed on each redraw.
Solution: Rename the headers to "Created" and "Updated", and anchor
relative timestamp formatting to the picker load time so the displayed
ages stay stable while browsing.
Addresses #16781
Problem: `codex exec --ephemeral` backfilled empty `turn/completed`
items with `thread/read(includeTurns=true)`, which app-server rejects
for ephemeral threads.
This is a regression introduced in the recent conversion of "exec" to
use app server rather than call the core directly.
Solution: Skip turn-item backfill for ephemeral exec threads while
preserving the existing recovery path for non-ephemeral sessions.
Addresses #16832
Problem: After `/fast on`, the TUI omitted an explicit service-tier
clear on later turns, so `/fast off` left app-server sessions stuck on
`priority` until restart.
Solution: Always submit the current service tier with user turns,
including an explicit clear when Fast mode is off, and add a regression
test for the `/fast on` -> `/fast off` flow.
Addresses #16584
Problem: TUI word-wise cursor movement treated entire CJK runs as a
single word, so Option/Alt+Left and Right skipped too far when editing
East Asian text.
Solution: Use Unicode word-boundary segments within each non-whitespace
run so CJK text advances one segment at a time while preserving
separator and delete-word behavior, and add regression coverage for CJK
and mixed-script navigation.
Testing: Manually tested solution by pasting text that includes CJK
characters into the composer and confirmed that keyboard navigation
worked correctly (after confirming it didn't prior to the change).
This adds end-to-end coverage for `responses-api-proxy` request dumps
when Codex spawns a subagent and validates that the `x-codex-window-id`
and `x-openai-subagent` are properly set.
Addresses #13614
Problem: `codex exec --help` implied that `--full-auto` also changed
exec approval mode, even though non-interactive exec stays headless and
does not support interactive approval prompts.
Solution: clarify the `--full-auto` help text so it only describes the
sandbox behavior it actually enables for `codex exec`.
Addresses #15535
Problem: `codex exec --help` advertised a second positional `[COMMAND]`
even though `exec` only accepts a prompt or a subcommand.
Solution: Override the `exec` usage string so the help output shows the
two supported invocation forms instead of the phantom positional.