Commit Graph

4188 Commits

Author SHA1 Message Date
David Wiesen
d79ff53953 fix: avoid remote ports on any-protocol firewall rule 2026-04-14 11:05:25 -07:00
Michael Bolin
15fbf9d4f5 fix: fix Windows CI regression introduced in #15999 (#16027)
#15999 introduced a Windows-only `\r\n` mismatch in review-exit template
handling. This PR normalizes those template newlines and separates that
fix from [#16014](https://github.com/openai/codex/pull/16014) so it can
be reviewed independently.
2026-03-27 12:06:07 -07:00
Michael Bolin
caee620a53 codex-tools: introduce named tool definitions (#15953)
## Why

This continues the `codex-tools` migration by moving one more piece of
generic tool-definition bookkeeping out of `codex-core`.

The earlier extraction steps moved shared schema parsing into
`codex-tools`, but `core/src/tools/spec.rs` still had to supply tool
names separately and perform ad hoc rewrites for deferred MCP aliases.
That meant the crate boundary was still awkward: the parsed shape coming
back from `codex-tools` was missing part of the definition that
`codex-core` ultimately needs to assemble a `ResponsesApiTool`.

This change introduces a named `ToolDefinition` in `codex-tools` so both
MCP tools and dynamic tools cross the crate boundary in the same
reusable model. `codex-core` still owns the final `ResponsesApiTool`
assembly, but less of the generic tool-definition shaping logic stays
behind in `core`.

## What changed

- replaced `ParsedToolDefinition` with a named `ToolDefinition` in
`codex-rs/tools/src/tool_definition.rs`
- added `codex-rs/tools/src/tool_definition_tests.rs` for `renamed()`
and `into_deferred()`
- updated `parse_dynamic_tool()` and `parse_mcp_tool()` to return
`ToolDefinition`
- simplified `codex-rs/core/src/tools/spec.rs` so it adapts
`ToolDefinition` into `ResponsesApiTool` instead of rewriting names and
deferred fields inline
- updated parser tests and `codex-rs/tools/README.md` to reflect the
named tool-definition model

## Test plan

- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
2026-03-27 12:02:55 -07:00
Michael Bolin
2616c7cf12 ci: add Bazel clippy workflow for codex-rs (#15955)
## Why
`bazel.yml` already builds and tests the Bazel graph, but `rust-ci.yml`
still runs `cargo clippy` separately. This PR starts the transition to a
Bazel-backed lint lane for `codex-rs` so we can eventually replace the
duplicate Rust build, test, and lint work with Bazel while explicitly
keeping the V8 Bazel path out of scope for now.

To make that lane practical, the workflow also needs to look like the
Bazel job we already trust. That means sharing the common Bazel setup
and invocation logic instead of hand-copying it, and covering the arm64
macOS path in addition to Linux.

Landing the workflow green also required fixing the first lint findings
that Bazel surfaced and adding the matching local entrypoint.

## What changed
- add a reusable `build:clippy` config to `.bazelrc` and export
`codex-rs/clippy.toml` from `codex-rs/BUILD.bazel` so Bazel can run the
repository's existing Clippy policy
- add `just bazel-clippy` so the local developer entrypoint matches the
new CI lane
- extend `.github/workflows/bazel.yml` with a dedicated Bazel clippy job
for `codex-rs`, scoped to `//codex-rs/... -//codex-rs/v8-poc:all`
- run that clippy job on Linux x64 and arm64 macOS
- factor the shared Bazel workflow setup into
`.github/actions/setup-bazel-ci/action.yml` and the shared Bazel
invocation logic into `.github/scripts/run-bazel-ci.sh` so the clippy
and build/test jobs stay aligned
- fix the first Bazel-clippy findings needed to keep the lane green,
including the cross-target `cmsghdr::cmsg_len` normalization in
`codex-rs/shell-escalation/src/unix/socket.rs` and the no-`voice-input`
dead-code warnings in `codex-rs/tui` and `codex-rs/tui_app_server`

## Verification
- `just bazel-clippy`
- `RUNNER_OS=macOS ./.github/scripts/run-bazel-ci.sh -- build
--config=clippy --build_metadata=COMMIT_SHA=local-check
--build_metadata=TAG_job=clippy -- //codex-rs/...
-//codex-rs/v8-poc:all`
- `bazel build --config=clippy
//codex-rs/shell-escalation:shell-escalation`
- `CARGO_TARGET_DIR=/tmp/codex4-shell-escalation-test cargo test -p
codex-shell-escalation`
- `ruby -e 'require "yaml";
YAML.load_file(".github/workflows/bazel.yml");
YAML.load_file(".github/actions/setup-bazel-ci/action.yml")'`

## Notes
- `CARGO_TARGET_DIR=/tmp/codex4-tui-app-server-test cargo test -p
codex-tui-app-server` still hits existing guardian-approvals test and
snapshot failures unrelated to this PR's Bazel-clippy changes.

Related: #15954
2026-03-27 12:02:41 -07:00
Michael Bolin
617475e54b codex-tools: extract dynamic tool adapters (#15944)
## Why

`codex-tools` already owned the shared JSON schema parser and the MCP
tool schema adapter, but `core/src/tools/spec.rs` still parsed dynamic
tools directly.

That left the tool-schema boundary split in two different ways:

- MCP tools flowed through `codex-tools`, while dynamic tools were still
parsed in `codex-core`
- the extracted dynamic-tool path initially introduced a
dynamic-specific parsed shape even though `codex-tools` already had very
similar MCP adapter output

This change finishes that extraction boundary in one step. `codex-core`
still owns `ResponsesApiTool` assembly, but both MCP tools and dynamic
tools now enter that layer through `codex-tools` using the same parsed
tool-definition shape.

## What changed

- added `tools/src/dynamic_tool.rs` and sibling
`tools/src/dynamic_tool_tests.rs`
- introduced `parse_dynamic_tool()` in `codex-tools` and switched
`core/src/tools/spec.rs` to use it for dynamic tools
- added `tools/src/parsed_tool_definition.rs` so both MCP and dynamic
adapters return the same `ParsedToolDefinition`
- updated `core/src/tools/spec.rs` to build `ResponsesApiTool` through a
shared local adapter helper instead of separate MCP and dynamic assembly
paths
- expanded `core/src/tools/spec_tests.rs` so the dynamic-tool adapter
test asserts the full converted `ResponsesApiTool`, including
`defer_loading`
- updated `codex-rs/tools/README.md` to reflect the shared parsed
tool-definition boundary

## Test plan

- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/15944).
* #15953
* __->__ #15944
2026-03-27 09:12:36 -07:00
viyatb-oai
ec089fd22a fix(sandbox): fix bwrap lookup for multi-entry PATH (#15973)
## Summary
- split the joined `PATH` before running system `bwrap` lookup
- keep the existing workspace-local `bwrap` skip behavior intact
- add regression tests that exercise real multi-entry search paths

## Why
The PATH-based lookup added in #15791 still wrapped the raw `PATH`
environment value as a single `PathBuf` before passing it through
`join_paths()`. On Unix, a normal multi-entry `PATH` contains `:`, so
that wrapper path is invalid as one path element and the lookup returns
`None`.

That made Codex behave as if no system `bwrap` was installed even when
`bwrap` was available on `PATH`, which is what users in #15340 were
still hitting on `0.117.0-alpha.25`.

## Impact
System `bwrap` discovery now works with normal multi-entry `PATH` values
instead of silently falling back to the vendored binary.

Fixes #15340.

## Validation
- `just fmt`
- `cargo test -p codex-sandboxing`
- `cargo test -p codex-linux-sandbox`
- `just fix -p codex-sandboxing`
- `just argument-comment-lint`
2026-03-27 08:41:06 -07:00
jif-oai
426f28ca99 feat: spawn v2 as inter agent communication (#15985)
Co-authored-by: Codex <noreply@openai.com>
2026-03-27 15:45:19 +01:00
jif-oai
2b71717ccf Use codex-utils-template for review exit XML (#15999) 2026-03-27 15:30:28 +01:00
jif-oai
f044ca64df Use codex-utils-template for search tool descriptions (#15996) 2026-03-27 15:08:24 +01:00
jif-oai
37b057f003 Use codex-utils-template for collaboration mode presets (#15995) 2026-03-27 14:51:07 +01:00
jif-oai
2c85ca6842 Use codex-utils-template for sandbox mode prompts (#15998) 2026-03-27 14:50:36 +01:00
jif-oai
7d5d9f041b Use codex-utils-template for review prompts (#16001) 2026-03-27 14:50:01 +01:00
jif-oai
270b7655cd Use codex-utils-template for login error page (#16000) 2026-03-27 14:49:45 +01:00
jif-oai
6a0c4709ca feat: spawn v2 make task name as mandatory (#15986) 2026-03-27 11:30:22 +01:00
Michael Bolin
2ef91b7140 chore: move pty and windows sandbox to Rust 2024 (#15954)
## Why

`codex-utils-pty` and `codex-windows-sandbox` were the remaining crates
in `codex-rs` that still overrode the workspace's Rust 2024 edition.
Moving them forward in a separate PR keeps the baseline edition update
isolated from the follow-on Bazel clippy workflow in #15955, while
making linting and formatting behavior consistent with the rest of the
workspace.

This PR also needs Cargo and Bazel to agree on the edition for
`codex-windows-sandbox`. Without the Bazel-side sync, the experimental
Bazel app-server builds fail once they compile `windows-sandbox-rs`.

## What changed

- switch `codex-rs/utils/pty` and `codex-rs/windows-sandbox-rs` to
`edition = "2024"`
- update `codex-utils-pty` callsites and tests to use the collapsed `if
let` form that Clippy expects under the new edition
- fix the Rust 2024 fallout in `windows-sandbox-rs`, including the
reserved `gen` identifier, `unsafe extern` requirements, and new Clippy
findings that surfaced under the edition bump
- keep the edition bump separate from a larger unsafe cleanup by
temporarily allowing `unsafe_op_in_unsafe_fn` in the Windows entrypoint
modules that now report it under Rust 2024
- update `codex-rs/windows-sandbox-rs/BUILD.bazel` to `crate_edition =
"2024"` so Bazel compiles the crate with the same edition as Cargo





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/15954).
* #15976
* #15955
* __->__ #15954
2026-03-27 02:31:08 -07:00
jif-oai
2e849703cd chore: drop useless stuff (#15876) 2026-03-27 09:41:47 +01:00
daniel-oai
47a9e2e084 Add ChatGPT device-code login to app server (#15525)
## Problem

App-server clients could only initiate ChatGPT login through the browser
callback flow, even though the shared login crate already supports
device-code auth. That left VS Code, Codex App, and other app-server
clients without a first-class way to use the existing device-code
backend when browser redirects are brittle or when the client UX wants
to own the login ceremony.

## Mental model

This change adds a second ChatGPT login start path to app-server:
clients can now call `account/login/start` with `type:
"chatgptDeviceCode"`. App-server immediately returns a `loginId` plus
the device-code UX payload (`verificationUrl` and `userCode`), then
completes the login asynchronously in the background using the existing
`codex_login` polling flow. Successful device-code login still resolves
to ordinary `chatgpt` auth, and completion continues to flow through the
existing `account/login/completed` and `account/updated` notifications.

## Non-goals

This does not introduce a new auth mode, a new account shape, or a
device-code eligibility discovery API. It also does not add automatic
fallback to browser login in core; clients remain responsible for
choosing when to request device code and whether to retry with a
different UX if the backend/admin policy rejects it.

## Tradeoffs

We intentionally keep `login_chatgpt_common` as a local validation
helper instead of turning it into a capability probe. Device-code
eligibility is checked by actually calling `request_device_code`, which
means policy-disabled cases surface as an immediate request error rather
than an async completion event. We also keep the active-login state
machine minimal: browser and device-code logins share the same public
cancel contract, but device-code cancellation is implemented with a
local cancel token rather than a larger cross-crate refactor.

## Architecture

The protocol grows a new `chatgptDeviceCode` request/response variant in
app-server v2. On the server side, the new handler reuses the existing
ChatGPT login precondition checks, calls `request_device_code`, returns
the device-code payload, and then spawns a background task that waits on
either cancellation or `complete_device_code_login`. On success, it
reuses the existing auth reload and cloud-requirements refresh path
before emitting `account/login/completed` success and `account/updated`.
On failure or cancellation, it emits only `account/login/completed`
failure. The existing `account/login/cancel { loginId }` contract
remains unchanged and now works for both browser and device-code
attempts.


## Tests

Added protocol serialization coverage for the new request/response
variant, plus app-server tests for device-code success, failure, cancel,
and start-time rejection behavior. Existing browser ChatGPT login
coverage remains in place to show that the callback-based flow is
unchanged.
2026-03-27 00:27:15 -07:00
Celia Chen
dd30c8eedd chore: refactor network permissions to use explicit domain and unix socket rule maps (#15120)
## Summary

This PR replaces the legacy network allow/deny list model with explicit
rule maps for domains and unix sockets across managed requirements,
permissions profiles, the network proxy config, and the app server
protocol.

Concretely, it:

- introduces typed domain (`allow` / `deny`) and unix socket permission
(`allow` / `none`) entries instead of separate `allowed_domains`,
`denied_domains`, and `allow_unix_sockets` lists
- updates config loading, managed requirements merging, and exec-policy
overlays to read and upsert rule entries consistently
- exposes the new shape through protocol/schema outputs, debug surfaces,
and app-server config APIs
- rejects the legacy list-based keys and updates docs/tests to reflect
the new config format

## Why

The previous representation split related network policy across multiple
parallel lists, which made merging and overriding rules harder to reason
about. Moving to explicit keyed permission maps gives us a single source
of truth per host/socket entry, makes allow/deny precedence clearer, and
gives protocol consumers access to the full rule state instead of
derived projections only.

## Backward Compatibility

### Backward compatible

- Managed requirements still accept the legacy
`experimental_network.allowed_domains`,
`experimental_network.denied_domains`, and
`experimental_network.allow_unix_sockets` fields. They are normalized
into the new canonical `domains` and `unix_sockets` maps internally.
- App-server v2 still deserializes legacy `allowedDomains`,
`deniedDomains`, and `allowUnixSockets` payloads, so older clients can
continue reading managed network requirements.
- App-server v2 responses still populate `allowedDomains`,
`deniedDomains`, and `allowUnixSockets` as legacy compatibility views
derived from the canonical maps.
- `managed_allowed_domains_only` keeps the same behavior after
normalization. Legacy managed allowlists still participate in the same
enforcement path as canonical `domains` entries.

### Not backward compatible

- Permissions profiles under `[permissions.<profile>.network]` no longer
accept the legacy list-based keys. Those configs must use the canonical
`[domains]` and `[unix_sockets]` tables instead of `allowed_domains`,
`denied_domains`, or `allow_unix_sockets`.
- Managed `experimental_network` config cannot mix canonical and legacy
forms in the same block. For example, `domains` cannot be combined with
`allowed_domains` or `denied_domains`, and `unix_sockets` cannot be
combined with `allow_unix_sockets`.
- The canonical format can express explicit `"none"` entries for unix
sockets, but those entries do not round-trip through the legacy
compatibility fields because the legacy fields only represent allow/deny
lists.
## Testing
`/target/debug/codex sandbox macos --log-denials /bin/zsh -c 'curl
https://www.example.com' ` gives 200 with config
```
[permissions.workspace.network.domains]
"www.example.com" = "allow"
```
and fails when set to deny: `curl: (56) CONNECT tunnel failed, response
403`.

Also tested backward compatibility path by verifying that adding the
following to `/etc/codex/requirements.toml` works:
```
[experimental_network]
allowed_domains = ["www.example.com"]
```
2026-03-27 06:17:59 +00:00
rhan-oai
21a03f1671 [app-server-protocol] introduce generic ClientResponse for app-server-protocol (#15921)
- introduces `ClientResponse` as the symmetrical typed response union to
`ClientRequest` for app-server-protocol
- enables scalable event stream ingestion for use cases such as
analytics
- no runtime behavior changes, protocol/schema plumbing only
2026-03-26 21:33:25 -07:00
Michael Bolin
be5afc65d3 codex-tools: extract MCP schema adapters (#15928)
## Why

`codex-tools` already owns the shared tool input schema model and parser
from the first extraction step, but `core/src/tools/spec.rs` still owned
the MCP-specific adapter that normalizes `rmcp::model::Tool` schemas and
wraps `structuredContent` into the call result output schema.

Keeping that adapter in `codex-core` means the reusable MCP schema path
is still split across crates, and the unit tests for that logic stay
anchored in `codex-core` even though the runtime orchestration does not
need to move yet.

This change takes the next small step by moving the reusable MCP schema
adapter into `codex-tools` while leaving `ResponsesApiTool` assembly in
`codex-core`.

## What changed

- added `tools/src/mcp_tool.rs` and sibling
`tools/src/mcp_tool_tests.rs`
- introduced `ParsedMcpTool`, `parse_mcp_tool()`, and
`mcp_call_tool_result_output_schema()` in `codex-tools`
- updated `core/src/tools/spec.rs` to consume parsed MCP tool parts from
`codex-tools`
- removed the now-redundant MCP schema unit tests from
`core/src/tools/spec_tests.rs`
- expanded `codex-rs/tools/README.md` to describe this second migration
step

## Test plan

- `cargo test -p codex-tools`
- `cargo test -p codex-core --lib tools::spec::`
2026-03-26 19:57:26 -07:00
Michael Bolin
d76124d656 fix: make MACOS_DEFAULT_PREFERENCES_POLICY part of MACOS_SEATBELT_BASE_POLICY (#15931) 2026-03-26 18:23:14 -07:00
viyatb-oai
81fa04783a feat(windows-sandbox): add network proxy support (#12220)
## Summary

This PR makes Windows sandbox proxying enforceable by routing proxy-only
runs through the existing `offline` sandbox user and reserving direct
network access for the existing `online` sandbox user.

In brief:

- if a Windows sandbox run should be proxy-enforced, we run it as the
`offline` user
- the `offline` user gets firewall rules that block direct outbound
traffic and only permit the configured localhost proxy path
- if a Windows sandbox run should have true direct network access, we
run it as the `online` user
- no new sandbox identity is introduced

This brings Windows in line with the intended model: proxy use is not
just env-based, it is backed by OS-level egress controls. Windows
already has two sandbox identities:

- `offline`: intended to have no direct network egress
- `online`: intended to have full network access

This PR makes proxy-enforced runs use that model directly.

### Proxy-enforced runs

When proxy enforcement is active:

- the run is assigned to the `offline` identity
- setup extracts the loopback proxy ports from the sandbox env
- Windows setup programs firewall rules for the `offline` user that:
  - block all non-loopback outbound traffic
  - block loopback UDP
  - block loopback TCP except for the configured proxy ports
- optionally allow broader localhost access when `allow_local_binding=1`

So the sandboxed process can only talk to the local proxy. It cannot
open direct outbound sockets or do local UDP-based DNS on its own.The
proxy then performs the real outbound network access outside that
restricted sandbox identity.

### Direct-network runs

When proxy enforcement is not active and full network access is allowed:

- the run is assigned to the `online` identity
- no proxy-only firewall restrictions are applied
- the process gets normal direct network access

### Unelevated vs elevated

The restricted-token / unelevated path cannot enforce per-identity
firewall policy by itself.

So for Windows proxy-enforced runs, we transparently use the logon-user
sandbox path under the hood, even if the caller started from the
unelevated mode. That keeps enforcement real instead of best-effort.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-26 17:27:38 -07:00
Michael Bolin
e6e2999209 permissions: remove macOS seatbelt extension profiles (#15918)
## Why

`PermissionProfile` should only describe the per-command permissions we
still want to grant dynamically. Keeping
`MacOsSeatbeltProfileExtensions` in that surface forced extra macOS-only
approval, protocol, schema, and TUI branches for a capability we no
longer want to expose.

## What changed

- Removed the macOS-specific permission-profile types from
`codex-protocol`, the app-server v2 API, and the generated
schema/TypeScript artifacts.
- Deleted the core and sandboxing plumbing that threaded
`MacOsSeatbeltProfileExtensions` through execution requests and seatbelt
construction.
- Simplified macOS seatbelt generation so it always includes the fixed
read-only preferences allowlist instead of carrying a configurable
profile extension.
- Removed the macOS additional-permissions UI/docs/test coverage and
deleted the obsolete macOS permission modules.
- Tightened `request_permissions` intersection handling so explicitly
empty requested read lists are preserved only when that field was
actually granted, avoiding zero-grant responses being stored as active
permissions.
2026-03-26 17:12:45 -07:00
Michael Bolin
44d28f500f codex-tools: extract shared tool schema parsing (#15923)
## Why

`parse_tool_input_schema` and the supporting `JsonSchema` model were
living in `core/src/tools/spec.rs`, but they already serve callers
outside `codex-core`.

Keeping that shared schema parsing logic inside `codex-core` makes the
crate boundary harder to reason about and works against the guidance in
`AGENTS.md` to avoid growing `codex-core` when reusable code can live
elsewhere.

This change takes the first extraction step by moving the schema parsing
primitive into its own crate while keeping the rest of the tool-spec
assembly in `codex-core`.

## What changed

- added a new `codex-tools` crate under `codex-rs/tools`
- moved the shared tool input schema model and sanitizer/parser into
`tools/src/json_schema.rs`
- kept `tools/src/lib.rs` exports-only, with the module-level unit tests
split into `json_schema_tests.rs`
- updated `codex-core` to use `codex-tools::JsonSchema` and re-export
`parse_tool_input_schema`
- updated `codex-app-server` dynamic tool validation to depend on
`codex-tools` directly instead of reaching through `codex-core`
- wired the new crate into the Cargo workspace and Bazel build graph
2026-03-27 00:03:35 +00:00
viyatb-oai
aea82c63ea fix(network-proxy): fail closed on network-proxy DNS lookup errors (#15909)
## Summary

Fail closed when the network proxy's local/private IP pre-check hits a
DNS lookup error or timeout, instead of treating the hostname as public
and allowing the request.

## Root cause

`host_resolves_to_non_public_ip()` returned `false` on resolver failure,
which created a fail-open path in the `allow_local_binding = false`
boundary. The eventual connect path performs its own DNS resolution
later, so a transient pre-check failure is not evidence that the
destination is public.

## Changes

- Treat DNS lookup errors/timeouts as local/private for blocking
purposes
- Add a regression test for an allowlisted hostname that fails DNS
resolution

## Validation

- `cargo test -p codex-network-proxy`
- `cargo clippy -p codex-network-proxy --all-targets -- -D warnings`
- `just fmt`
- `just argument-comment-lint`
2026-03-26 23:18:04 +00:00
Michael Bolin
5906c6a658 chore: remove skill metadata from command approval payloads (#15906)
## Why

This is effectively a follow-up to
[#15812](https://github.com/openai/codex/pull/15812). That change
removed the special skill-script exec path, but `skill_metadata` was
still being threaded through command-approval payloads even though the
approval flow no longer uses it to render prompts or resolve decisions.

Keeping it around added extra protocol, schema, and client surface area
without changing behavior.

Removing it keeps the command-approval contract smaller and avoids
carrying a dead field through app-server, TUI, and MCP boundaries.

## What changed

- removed `ExecApprovalRequestSkillMetadata` and the corresponding
`skillMetadata` field from core approval events and the v2 app-server
protocol
- removed the generated JSON and TypeScript schema output for that field
- updated app-server, MCP server, TUI, and TUI app-server approval
plumbing to stop forwarding the field
- cleaned up tests that previously constructed or asserted
`skillMetadata`

## Testing

- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-protocol`
- `cargo test -p codex-app-server-test-client`
- `cargo test -p codex-mcp-server`
- `just argument-comment-lint`
2026-03-26 15:32:03 -07:00
viyatb-oai
b52abff279 chore: move bwrap config helpers into dedicated module (#15898)
## Summary
- move the bwrap PATH lookup and warning helpers out of config/mod.rs
- move the related tests into a dedicated bwrap_tests.rs file

## Validation
- git diff --check
- skipped heavier local tests per request

Follow-up to #15791.
2026-03-26 15:15:59 -07:00
Michael Bolin
dfb36573cd sandboxing: use OsString for SandboxCommand.program (#15897)
## Why

`SandboxCommand.program` represents an executable path, but keeping it
as `String` forced path-backed callers to run `to_string_lossy()` before
the sandbox layer ever touched the command. That loses fidelity earlier
than necessary and adds avoidable conversions in runtimes that already
have a `PathBuf`.

## What changed

- Changed `SandboxCommand.program` to `OsString`.
- Updated `SandboxManager::transform` to keep the program and argv in
`OsString` form until the `SandboxExecRequest` conversion boundary.
- Switched the path-backed `apply_patch` and `js_repl` runtimes to pass
`into_os_string()` instead of `to_string_lossy()`.
- Updated the remaining string-backed builders and tests to match the
new type while preserving the existing Linux helper `arg0` behavior.

## Verification

- `cargo test -p codex-sandboxing`
- `just argument-comment-lint -p codex-core -p codex-sandboxing`
- `cargo test -p codex-core` currently fails in unrelated existing
config tests: `config::tests::approvals_reviewer_*` and
`config::tests::smart_approvals_alias_*`
2026-03-26 20:38:33 +00:00
Michael Bolin
b23789b770 [codex] import token_data from codex-login directly (#15903)
## Why
`token_data` is owned by `codex-login`, but `codex-core` was still
re-exporting it. That let callers pull auth token types through
`codex-core`, which keeps otherwise unrelated crates coupled to
`codex-core` and makes `codex-core` more of a build-graph bottleneck.

## What changed
- remove the `codex-core` re-export of `codex_login::token_data`
- update the remaining `codex-core` internals that used
`crate::token_data` to import `codex_login::token_data` directly
- update downstream callers in `codex-rs/chatgpt`,
`codex-rs/tui_app_server`, `codex-rs/app-server/tests/common`, and
`codex-rs/core/tests` to import `codex_login::token_data` directly
- add explicit `codex-login` workspace dependencies and refresh lock
metadata for crates that now depend on it directly

## Validation
- `cargo test -p codex-chatgpt --locked`
- `just argument-comment-lint`
- `just bazel-lock-update`
- `just bazel-lock-check`

## Notes
- attempted `cargo test -p codex-core --locked` and `cargo test -p
codex-core auth_refresh --locked`, but both ran out of disk while
linking `codex-core` test binaries in the local environment
2026-03-26 13:34:02 -07:00
rreichel3-oai
86764af684 Protect first-time project .codex creation across Linux and macOS sandboxes (#15067)
## Problem

Codex already treated an existing top-level project `./.codex` directory
as protected, but there was a gap on first creation.

If `./.codex` did not exist yet, a turn could create files under it,
such as `./.codex/config.toml`, without going through the same approval
path as later modifications. That meant the initial write could bypass
the intended protection for project-local Codex state.

## What this changes

This PR closes that first-creation gap in the Unix enforcement layers:

- `codex-protocol`
- treat the top-level project `./.codex` path as a protected carveout
even when it does not exist yet
- avoid injecting the default carveout when the user already has an
explicit rule for that exact path
- macOS Seatbelt
- deny writes to both the exact protected path and anything beneath it,
so creating `./.codex` itself is blocked in addition to writes inside it
- Linux bubblewrap
- preserve the same protected-path behavior for first-time creation
under `./.codex`
- tests
- add protocol regressions for missing `./.codex` and explicit-rule
collisions
- add Unix sandbox coverage for blocking first-time `./.codex` creation
  - tighten Seatbelt policy assertions around excluded subpaths

## Scope

This change is intentionally scoped to protecting the top-level project
`.codex` subtree from agent writes.

It does not make `.codex` unreadable, and it does not change the product
behavior around loading project skills from `.codex` when project config
is untrusted.

## Why this shape

The fix is pointed rather than broad:
- it preserves the current model of “project `.codex` is protected from
writes”
- it closes the security-relevant first-write hole
- it avoids folding a larger permissions-model redesign into this PR

## Validation

- `cargo test -p codex-protocol`
- `cargo test -p codex-sandboxing seatbelt`
- `cargo test -p codex-exec --test all
sandbox_blocks_first_time_dot_codex_creation -- --nocapture`

---------

Co-authored-by: Michael Bolin <mbolin@openai.com>
2026-03-26 16:06:53 -04:00
Ruslan Nigmatullin
9736fa5e3d app-server: Split transport module (#15811)
`transport.rs` is getting pretty big, split individual transport
implementations into separate files.
2026-03-26 13:01:35 -07:00
Michael Bolin
b3e069e8cb skills: remove unused skill permission metadata (#15900)
## Why

Skill metadata accepted a `permissions` block and stored the result on
`SkillMetadata`, but that data was never consumed by runtime behavior.
Leaving the dead parsing path in place makes it look like skills can
widen or otherwise influence execution permissions when, in practice,
declared skill permissions are ignored.

This change removes that misleading surface area so the skill metadata
model matches what the system actually uses.

## What changed

- removed `permission_profile` and `managed_network_override` from
`core-skills::SkillMetadata`
- stopped parsing `permissions` from skill metadata in
`core-skills/src/loader.rs`
- deleted the loader tests that only exercised the removed permissions
parsing path
- cleaned up dependent `SkillMetadata` constructors in tests and TUI
code that were only carrying `None` for those fields

## Testing

- `cargo test -p codex-core-skills`
- `cargo test -p codex-tui
submission_prefers_selected_duplicate_skill_path`
- `just argument-comment-lint`
2026-03-26 19:33:23 +00:00
viyatb-oai
b6050b42ae fix: resolve bwrap from trusted PATH entry (#15791)
## Summary
- resolve system bwrap from PATH instead of hardcoding /usr/bin/bwrap
- skip PATH entries that resolve inside the current workspace before
launching the sandbox helper
- keep the vendored bubblewrap fallback when no trusted system bwrap is
found

## Validation
- cargo test -p codex-core bwrap --lib
- cargo test -p codex-linux-sandbox
- just fix -p codex-core
- just fix -p codex-linux-sandbox
- just fmt
- just argument-comment-lint
- cargo clean
2026-03-26 12:13:51 -07:00
Matthew Zeng
3360f128f4 [plugins] Polish tool suggest prompts. (#15891)
- [x] Polish tool suggest prompts to distinguish between missing
connectors and discoverable plugins, and be very precise about the
triggering conditions.
2026-03-26 18:52:59 +00:00
Matthew Zeng
25134b592c [mcp] Fix legacy_tools (#15885)
- [x] Fix legacy_tools
2026-03-26 11:08:49 -07:00
Felipe Coury
2c54d4b160 feat(tui): add terminal title support to tui app server (#15860)
## TR;DR

Replicates the `/title` command from `tui` to `tui_app_server`.

## Problem

The classic `tui` crate supports customizing the terminal window/tab
title via `/title`, but the `tui_app_server` crate does not. Users on
the app-server path have no way to configure what their terminal title
shows (project name, status, spinner, thread, etc.), making it harder to
identify Codex sessions across tabs or windows.

## Mental model

The terminal title is a *status surface* -- conceptually parallel to the
footer status line. Both surfaces are configurable lists of items, both
share expensive inputs (git branch lookup, project root discovery), and
both must be refreshed at the same lifecycle points. This change ports
the classic `tui`'s design verbatim:

1. **`terminal_title.rs`** owns the low-level OSC write path and input
sanitization. It strips control characters and bidi/invisible codepoints
before placing untrusted text (model output, thread names, project
paths) inside an escape sequence.

2. **`title_setup.rs`** defines `TerminalTitleItem` (the 8 configurable
items) and `TerminalTitleSetupView` (the interactive picker that wraps
`MultiSelectPicker`).

3. **`status_surfaces.rs`** is the shared refresh pipeline. It parses
both surface configs once per refresh, warns about invalid items once
per session, synchronizes the git-branch cache, then renders each
surface from the same `StatusSurfaceSelections` snapshot.

4. **`chatwidget.rs`** sets `TerminalTitleStatusKind` at each state
transition (Working, Thinking, Undoing, WaitingForBackgroundTerminal)
and calls `refresh_terminal_title()` whenever relevant state changes.

5. **`app.rs`** handles the three setup events (confirm/preview/cancel),
persists config via `ConfigEditsBuilder`, and clears the managed title
on `Drop`.

## Non-goals

- **Restoring the previous terminal title on exit.** There is no
portable way to read the terminal's current title, so `Drop` clears the
managed title rather than restoring it.
- **Sharing code between `tui` and `tui_app_server`.** The
implementation is a parallel copy, matching the existing pattern for the
status-line feature. Extracting a shared crate is future work.

## Tradeoffs

- **Duplicate code across crates.** The three core files
(`terminal_title.rs`, `title_setup.rs`, `status_surfaces.rs`) are
byte-for-byte copies from the classic `tui`. This was chosen for
consistency with the existing status-line port and to avoid coupling the
two crates at the dependency level. Future changes must be applied in
both places.

- **`status_surfaces.rs` is large (~660 lines).** It absorbs logic that
previously lived inline in `chatwidget.rs` (status-line refresh, git
branch management, project root discovery) plus all new terminal-title
logic. This consolidation trades file size for a single place where both
surfaces are coordinated.

- **Spinner scheduling on every refresh.** The terminal title spinner
(when active) schedules a frame every 100ms. This is the same pattern
the status-indicator spinner already uses; the overhead is a timer
registration, not a redraw.

## Architecture

```
/title command
  -> SlashCommand::Title
  -> open_terminal_title_setup()
  -> TerminalTitleSetupView (MultiSelectPicker)
  -> on_change:  AppEvent::TerminalTitleSetupPreview  -> preview_terminal_title()
  -> on_confirm: AppEvent::TerminalTitleSetup         -> ConfigEditsBuilder + setup_terminal_title()
  -> on_cancel:  AppEvent::TerminalTitleSetupCancelled -> cancel_terminal_title_setup()

Runtime title refresh:
  state change (turn start, reasoning, undo, plan update, thread rename, ...)
  -> set terminal_title_status_kind
  -> refresh_terminal_title()
  -> status_surface_selections()  (parse configs, collect invalids)
  -> refresh_terminal_title_from_selections()
     -> terminal_title_value_for_item() for each configured item
     -> assemble title string with separators
     -> skip if identical to last_terminal_title (dedup OSC writes)
     -> set_terminal_title() (sanitize + OSC 0 write)
     -> schedule spinner frame if animating

Widget replacement:
  replace_chat_widget_with_app_server_thread()
  -> transfer last_terminal_title from old widget to new
  -> avoids redundant OSC clear+rewrite on session switch
```

## Observability

- Invalid terminal-title item IDs in config emit a one-per-session
warning via `on_warning()` (gated by
`terminal_title_invalid_items_warned` `AtomicBool`).
- OSC write failures are logged at `tracing::debug` level.
- Config persistence failures are logged at `tracing::error` and
surfaced to the user via `add_error_message()`.

## Tests

- `terminal_title.rs`: 4 unit tests covering sanitization (control
chars, bidi codepoints, truncation) and OSC output format.
- `title_setup.rs`: 3 tests covering setup view snapshot rendering,
parse order preservation, and invalid-ID rejection.
- `chatwidget/tests.rs`: Updated test helpers with new fields; existing
tests continue to pass.

---------

Co-authored-by: Eric Traut <etraut@openai.com>
2026-03-26 11:59:12 -06:00
jif-oai
970386e8b2 fix: root as std agent (#15881) 2026-03-26 18:57:34 +01:00
evawong-oai
0bd34c28c7 Add wildcard in the middle test coverage (#15813)
## Summary
Add a focused codex network proxy unit test for the denylist pattern
with wildcard in the middle `region*.some.malicious.tunnel.com`. This
does not change how existing code works, just ensure that behavior stays
the same and we got CI guards to guard existin behavior.

## Why
The managed Codex denylist update relies on this mid label glob form,
and the existing tests only covered exact hosts, `*.` subdomains, and
`**.` apex plus subdomains.

## Validation
`cargo test -p codex-network-proxy
compile_globset_supports_mid_label_wildcards`
`cargo test -p codex-network-proxy`
`./tools/argument-comment-lint/run-prebuilt-linter.sh -p
codex-network-proxy`
2026-03-26 17:53:31 +00:00
Adrian
af04273778 [codex] Block unsafe git global options from safe allowlist (#15796)
## Summary
- block git global options that can redirect config, repository, or
helper lookup from being auto-approved as safe
- share the unsafe global-option predicate across the Unix and Windows
git safety checks
- add regression coverage for inline and split forms, including `bash
-lc` and PowerShell wrappers

## Root cause
The Unix safe-command gate only rejected `-c` and `--config-env`, even
though the shared git parser already knew how to skip additional
pre-subcommand globals such as `--git-dir`, `--work-tree`,
`--exec-path`, `--namespace`, and `--super-prefix`. That let those
arguments slip through safe-command classification on otherwise
read-only git invocations and bypass approval. The Windows-specific
safe-command path had the same trust-boundary gap for git global
options.
2026-03-26 10:46:04 -07:00
Michael Bolin
e36ebaa3da fix: box apply_patch test harness futures (#15835)
## Why

`#[large_stack_test]` made the `apply_patch_cli` tests pass by giving
them more stack, but it did not address why those tests needed the extra
stack in the first place.

The real problem is the async state built by the `apply_patch_cli`
harness path. Those tests await three helper boundaries directly:
harness construction, turn submission, and apply-patch output
collection. If those helpers inline their full child futures, the test
future grows to include the whole harness startup and request/response
path.

This change replaces the workaround from #12768 with the same basic
approach used in #13429, but keeps the fix narrower: only the helper
boundaries awaited directly by `apply_patch_cli` stay boxed.

## What Changed

- removed `#[large_stack_test]` from
`core/tests/suite/apply_patch_cli.rs`
- restored ordinary `#[tokio::test(flavor = "multi_thread",
worker_threads = 2)]` annotations in that suite
- deleted the now-unused `codex-test-macros` crate and removed its
workspace wiring
- boxed only the three helper boundaries that the suite awaits directly:
  - `apply_patch_harness_with(...)`
  - `TestCodexHarness::submit(...)`
  - `TestCodexHarness::apply_patch_output(...)`
- added comments at those boxed boundaries explaining why they remain
boxed

## Testing

- `cargo test -p codex-core --test all suite::apply_patch_cli --
--nocapture`

## References

- #12768
- #13429
2026-03-26 17:32:04 +00:00
Eric Traut
e7139e14a2 Enable tui_app_server feature by default (#15661) 2026-03-26 11:28:25 -06:00
nicholasclark-openai
8d479f741c Add MCP connector metrics (#15805)
## Summary
- enrich `codex.mcp.call` with `tool`, `connector_id`, and sanitized
`connector_name` for actual MCP executions
- record `codex.mcp.call.duration_ms` for actual MCP executions so
connector-level latency is visible in metrics
- keep skipped, blocked, declined, and cancelled paths on the plain
status-only `codex.mcp.call` counter

## Included Changes
- `codex-rs/core/src/mcp_tool_call.rs`: add connector-sliced MCP count
and duration metrics only for executed tool calls, while leaving
non-executed outcomes as status-only counts
- `codex-rs/core/src/mcp_tool_call_tests.rs`: cover metric tag shaping,
connector-name sanitization, and the new duration metric tags

## Testing
- `cargo test -p codex-core`
- `just fix -p codex-core`
- `just fmt`

## Notes
- `cargo test -p codex-core` still hits existing unrelated failures in
approvals-reviewer config tests and the sandboxed JS REPL `mktemp` test
- full workspace `cargo test` was not run

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-26 17:08:02 +00:00
Eric Traut
0d44bd708e Fix duplicate /review messages in app-server TUI (#15839)
## Symptoms
When `/review` ran through `tui_app_server`, the TUI could show
duplicate review content:
- the `>> Code review started: ... <<` banner appeared twice
- the final review body could also appear twice

## Problem
`tui_app_server` was treating review lifecycle items as renderable
content on more than one delivery path.

Specifically:
- `EnteredReviewMode` was rendered both when the item started and again
when it completed
- `ExitedReviewMode` rendered the review text itself, even though the
same review text was also delivered later as the assistant message item

That meant the same logical review event was committed into history
multiple times.

## Solution
Make review lifecycle items control state transitions only once, and
keep the final review body sourced from the assistant message item:
- render the review-start banner from the live `ItemStarted` path, while
still allowing replay to restore it once
- treat `ExitedReviewMode` as a mode-exit/finish-banner event instead of
rendering the review body from it
- preserve the existing assistant-message rendering path as the single
source of final review text
2026-03-26 10:55:18 -06:00
jif-oai
352f37db03 fix: max depth agent still has v2 tools (#15880) 2026-03-26 17:36:12 +01:00
Matthew Zeng
c9214192c5 [plugins] Update the suggestable plugins list. (#15829)
- [x] Update the suggestable plugins list to be featured plugins.
2026-03-26 15:53:22 +00:00
jif-oai
6d2f4aaafc feat: use ProcessId in exec-server (#15866)
Use a full struct for the ProcessId to increase readability and make it
easier in the future to make it evolve if needed
2026-03-26 16:45:36 +01:00
jif-oai
26c66f3ee1 fix: flaky (#15869) 2026-03-26 16:07:32 +01:00
Michael Bolin
01fa4f0212 core: remove special execve handling for skill scripts (#15812) 2026-03-26 07:46:04 -07:00
jif-oai
6dcac41d53 chore: drop artifacts lib (#15864) 2026-03-26 15:28:59 +01:00
jif-oai
7dac332c93 feat: exec-server prep for unified exec (#15691)
This PR partially rebase `unified_exec` on the `exec-server` and adapt
the `exec-server` accordingly.

## What changed in `exec-server`

1. Replaced the old "broadcast-driven; process-global" event model with
process-scoped session events. The goal is to be able to have dedicated
handler for each process.
2. Add to protocol contract to support explicit lifecycle status and
stream ordering:
- `WriteResponse` now returns `WriteStatus` (Accepted, UnknownProcess,
StdinClosed, Starting) instead of a bool.
  - Added seq fields to output/exited notifications.
  - Added terminal process/closed notification.
3. Demultiplexed remote notifications into per-process channels. Same as
for the event sys
4. Local and remote backends now both implement ExecBackend.
5. Local backend wraps internal process ID/operations into per-process
ExecProcess objects.
6. Remote backend registers a session channel before launch and
unregisters on failed launch.

## What changed in `unified_exec`

1. Added unified process-state model and backend-neutral process
wrapper. This will probably disappear in the future, but it makes it
easier to keep the work flowing on both side.
- `UnifiedExecProcess` now handles both local PTY sessions and remote
exec-server processes through a shared `ProcessHandle`.
- Added `ProcessState` to track has_exited, exit_code, and terminal
failure message consistently across backends.
2. Routed write and lifecycle handling through process-level methods.

## Some rationals

1. The change centralizes execution transport in exec-server while
preserving policy and orchestration ownership in core, avoiding
duplicated launch approval logic. This comes from internal discussion.
2. Session-scoped events remove coupling/cross-talk between processes
and make stream ordering and terminal state explicit (seq, closed,
failed).
3. The failure-path surfacing (remote launch failures, write failures,
transport disconnects) makes command tool output and cleanup behavior
deterministic

## Follow-ups:
* Unify the concept of thread ID behind an obfuscated struct
* FD handling
* Full zsh-fork compatibility
* Full network sandboxing compatibility
* Handle ws disconnection
2026-03-26 15:22:34 +01:00