## Summary
- bundle contextual prompt injection into at most one developer message
plus one contextual user message in both:
- per-turn settings updates
- initial context insertion
- preserve `<model_switch>` across compaction by rebuilding it through
canonical initial-context injection, instead of relying on
strip/reattach hacks
- centralize contextual user fragment detection in one shared definition
table and reuse it for parsing/compaction logic
- keep `AGENTS.md` in its natural serialized format:
- `# AGENTS.md instructions for {dirname}`
- `<INSTRUCTIONS>...</INSTRUCTIONS>`
- simplify related tests/helpers and accept the expected snapshot/layout
updates from bundled multi-part messages
## Why
The goal is to converge toward a simpler, more intentional prompt shape
where contextual updates are consistently represented as one developer
envelope plus one contextual user envelope, while keeping parsing and
compaction behavior aligned with that representation.
## Notable details
- the temporary `SettingsUpdateEnvelope` wrapper was removed; these
paths now return `Vec<ResponseItem>` directly
- local/remote compaction no longer rely on model-switch strip/restore
helpers
- contextual user detection is now driven by shared fragment definitions
instead of ad hoc matcher assembly
- AGENTS/user instructions are still the same logical context; only the
synthetic `<user_instructions>` wrapper was replaced by the natural
AGENTS text format
## Testing
- `just fmt`
- `cargo test -p codex-app-server
codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
-- --exact`
- `cargo test -p codex-core
compact::tests::collect_user_messages_filters_session_prefix_entries
--lib -- --exact`
- `cargo test -p codex-core --test all
'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_developer_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::includes_user_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::resume_includes_initial_messages_and_sends_prior_items'
-- --exact`
- `cargo test -p codex-core --test all
'suite::review::review_input_isolated_from_parent_history' -- --exact`
- `cargo test -p codex-exec --test all
'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
--exact`
- `cargo test -p core_test_support
context_snapshot::tests::full_text_mode_preserves_unredacted_text --
--exact`
## Notes
- I also ran several targeted `compact`, `compact_remote`,
`prompt_caching`, `model_visible_layout`, and `event_mapping` tests
while iterating on prompt-shape changes.
- I have not claimed a clean full-workspace `cargo test` from this
environment because local sandbox/resource conditions have previously
produced unrelated failures in large workspace runs.
Currently there is no bound on the length of a user message submitted in
the TUI or through the app server interface. That means users can paste
many megabytes of text, which can lead to bad performance, hangs, and
crashes. In extreme cases, it can lead to a [kernel
panic](https://github.com/openai/codex/issues/12323).
This PR limits the length of a user input to 2**20 (about 1M)
characters. This value was chosen because it fills the entire context
window on the latest models, so accepting longer inputs wouldn't make
sense anyway.
Summary
- add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol
and surface it in TUI and app server code
- block oversized submissions in the TUI submit flow and emit error
history cells when validation fails
- reject heavy app-server requests with JSON-RPC `-32602` and structured
`input_too_large` data, plus document the behavior
Testing
- ran the IDE extension with this change and verified that when I
attempt to paste a user message that's several MB long, it correctly
reports an error instead of crashing or making my computer hot.
## Summary
- hide appended destinations for local path-style markdown links in the
TUI renderer
- keep web links rendering with their visible destination and style link
labels consistently
- add markdown renderer tests and a snapshot for the new file-link
output
## Testing
- just fmt
- cargo test -p codex-tui
<img width="1120" height="968" alt="image"
src="https://github.com/user-attachments/assets/490e8eda-ae47-4231-89fa-b254a1f83eed"
/>
## Summary
Lower the `js_repl` minimum Node version from `24.13.1` to `22.22.0`.
This updates the enforced minimum in `codex-rs/node-version.txt` and the
corresponding user-facing `/experimental` description for the JavaScript
REPL feature.
## Rationale
The previous `24.13.1` floor was stricter than necessary for `js_repl`.
I validated the REPL kernel behavior under Node `22.22.0` still works.
## Why `22.22.0`
`22.22.0` is a current, widely packaged Node 22 release across common
developer environments and distros, including Homebrew `node@22`, Fedora
`nodejs22`, Arch `nodejs-lts-jod`, and Debian testing. That makes it a
better exact floor than guessing at an older `22.x` patch we have not
validated.
`22.x` is also a maintenance branch that will be supported through April
2027, where the previous maintenance branch of `20.x` is only supported
through April of this year.
## Changes
- Update `codex-rs/node-version.txt` from `24.13.1` to `22.22.0`
- Update the `/experimental` JavaScript REPL description to say
`Requires Node >= v22.22.0 installed.`
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
- validate `js_repl` Node compatibility during session startup when the
experiment is enabled
- if Node is missing or too old, disable `js_repl` and
`js_repl_tools_only` for the session before tools and instructions are
built
- surface that startup disablement to users through the existing startup
warning flow instead of only logging it
- reuse the same compatibility check in js_repl kernel startup so
startup gating and runtime behavior stay aligned
- add a regression test that verifies the warning is emitted and that
the first advertised tool list omits `js_repl` and `js_repl_reset` when
Node is incompatible
## Why
Today `js_repl` can be advertised based only on the feature flag, then
fail later when the kernel starts. That makes the available tool list
inaccurate at the start of a conversation, and users do not get a clear
explanation for why the tool is unavailable.
This change makes tool availability reflect real startup checks, keeps
the advertised tool set stable for the lifetime of the session, and
gives users a visible warning when `js_repl` is disabled.
## Testing
- `just fmt`
- `cargo test -p codex-core --test all
js_repl_is_not_advertised_when_startup_node_is_incompatible`
Command-approval clients currently infer which choices to show from
side-channel fields like `networkApprovalContext`,
`proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
the request shape harder to evolve, and it forces each client to
replicate the server's heuristics instead of receiving the exact
decision list for the prompt.
This PR introduces a mapping between `CommandExecutionApprovalDecision`
and `codex_protocol::protocol::ReviewDecision`:
```rust
impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
fn from(value: CoreReviewDecision) -> Self {
match value {
CoreReviewDecision::Approved => Self::Accept,
CoreReviewDecision::ApprovedExecpolicyAmendment {
proposed_execpolicy_amendment,
} => Self::AcceptWithExecpolicyAmendment {
execpolicy_amendment: proposed_execpolicy_amendment.into(),
},
CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
CoreReviewDecision::NetworkPolicyAmendment {
network_policy_amendment,
} => Self::ApplyNetworkPolicyAmendment {
network_policy_amendment: network_policy_amendment.into(),
},
CoreReviewDecision::Abort => Self::Cancel,
CoreReviewDecision::Denied => Self::Decline,
}
}
}
```
And updates `CommandExecutionRequestApprovalParams` to have a new field:
```rust
available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
```
when, if specified, should make it easier for clients to display an
appropriate list of options in the UI.
This makes it possible for `CoreShellActionProvider::prompt()` in
`unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
adding support for `ApprovedForSession` when approving a skill script,
which was previously missing in the TUI.
Note this results in a significant change to `exec_options()` in
`approval_overlay.rs`, as the displayed options are now derived from
`available_decisions: &[ReviewDecision]`.
## What Changed
- Add `available_decisions` to
[`ExecApprovalRequestEvent`](de00e932dd/codex-rs/protocol/src/approvals.rs (L111-L175)),
including helpers to derive the legacy default choices when older
senders omit the field.
- Map `codex_protocol::protocol::ReviewDecision` to app-server
`CommandExecutionApprovalDecision` and expose the ordered list as
experimental `availableDecisions` in
[`CommandExecutionRequestApprovalParams`](de00e932dd/codex-rs/app-server-protocol/src/protocol/v2.rs (L3798-L3807)).
- Thread optional `available_decisions` through the core approval path
so Unix shell escalation can explicitly request `ApprovedForSession` for
session-scoped approvals instead of relying on client heuristics.
[`unix_escalation.rs`](de00e932dd/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L194-L214))
- Update the TUI approval overlay to build its buttons from the ordered
decision list, while preserving the legacy fallback when
`available_decisions` is missing.
- Update the app-server README, test client output, and generated schema
artifacts to document and surface the new field.
## Testing
- Add `approval_overlay.rs` coverage for explicit decision lists,
including the generic `ApprovedForSession` path and network approval
options.
- Update `chatwidget/tests.rs` and app-server protocol tests to populate
the new optional field and keep older event shapes working.
## Developers Docs
- If we document `item/commandExecution/requestApproval` on
[developers.openai.com/codex](https://developers.openai.com/codex), add
experimental `availableDecisions` as the preferred source of approval
choices and note that older servers may omit it.
This reverts commit https://github.com/openai/codex/pull/12633. We no
longer need this PR, because we favor sending normal exec command
approval server request with `additional_permissions` of skill
permissions instead
## Summary
- add a direct install script for macOS and Linux at
`scripts/install/install.sh`
- stage `install.sh` into `dist/` during release so it is published as a
GitHub release asset
- reuse the existing platform npm payload so the installer includes both
`codex` and `rg`
## Testing
- `bash -n scripts/install/install.sh`
- local macOS `curl | sh` smoke test against a locally served copy of
the script
Previous to this change, `determine_action()` would
1. check if `program` is associated with a skill
2. if so, check if `program` is in `execve_session_approvals` to see
whether the user needs to be prompted
This PR flips the order of these checks to try to set us up so that
"session approvals" are always consulted first (which should soon extend
to include session approvals derived from `prefix_rule()`s, as well).
Though to make the new ordering work, we need to record any relevant
metadata to associate with the approval, which in the case of a
skill-based approval is the `SkillMetadata` so that we can derive the
`PermissionProfile` to include with the escalation. (Though as noted by
the `TODO`, this `PermissionProfile` is not honored yet.)
The new `ExecveSessionApproval` struct is used to retain the necessary
metadata.
## What Changed
- Replace the `execve_session_approvals` `HashSet` with a map that
stores an `ExecveSessionApproval` alongside each approved `program`.
- When a user chooses `ApprovedForSession` for a skill script, capture
the matched `SkillMetadata` in the session approval entry.
- Consult that cache before re-running `find_skill()`, and reuse the
originally approved skill metadata and permission profile when allowing
later execve callbacks in the same session.
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags
## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
This reverts commit daf0f03ac8.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
---------
Co-authored-by: Codex <noreply@openai.com>
Adds a new v2 app-server API for a client to be able to unsubscribe to a
thread:
- New RPC method: `thread/unsubscribe`
- New server notification: `thread/closed`
Today clients can start/resume/archive threads, but there wasn’t a way
to explicitly unload a live thread from memory without archiving it.
With `thread/unsubscribe`, a client can indicate it is no longer
actively working with a live Thread. If this is the only client
subscribed to that given thread, the thread will be automatically closed
by app-server, at which point the server will send `thread/closed` and
`thread/status/changed` with `status: notLoaded` notifications.
This gives clients a way to prevent long-running app-server processes
from accumulating too many thread (and related) objects in memory.
Closed threads will also be removed from `thread/loaded/list`.
## Why
The prior
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
assertion was brittle under Bazel: command approval payloads in the test
could include environment-dependent wrapper/command formatting
differences, which makes exact command-string matching flaky even when
behavior is correct.
(This regression was knowingly introduced in
https://github.com/openai/codex/pull/12800, but it was urgent to land
that PR.)
## What changed
- Hardened
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
in
[`turn_start_zsh_fork.rs`](https://github.com/openai/codex/blob/main/codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs):
- Replaced strict `approval_command.starts_with("/bin/rm")` checks with
intent-based subcommand matching.
- Subcommand approvals are now recognized by file-target semantics
(`first.txt` or `second.txt`) plus `rm` intent.
- Parent approval recognition is now more tolerant of command-format
differences while still requiring a definitive parent command context.
- Uses a defensive loop that waits for all target subcommand decisions
and the parent approval request.
- Preserved the existing regression and unit test fixes from earlier
commits in `unix_escalation.rs` and `skill_approval.rs`.
## Verification
- Ran the zsh fork subcommand decline regression under this change:
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
- Confirmed the test is now robust against approval-command-string
variation instead of hardcoding one expected command shape.
Update realtime debug logs to include the actual text payloads in both
input and output paths.
- In `core/src/realtime_conversation.rs`:
- `handle_start`: add extracted assistant text output to the
`[realtime-text]` debug log.
- `handle_text`: add incoming text input (`params.text`) to the
`[realtime-text]` debug log.
No tests were run (per request).
Previously, clients would call `thread/start` with dynamic_tools set,
and when a model invokes a dynamic tool, it would just make the
server->client `item/tool/call` request and wait for the client's
response to complete the tool call. This works, but it doesn't have an
`item/started` or `item/completed` event.
Now we are doing this:
- [new] emit `item/started` with `DynamicToolCall` populated with the
call arguments
- send an `item/tool/call` server request
- [new] once the client responds, emit `item/completed` with
`DynamicToolCall` populated with the response.
Also, with `persistExtendedHistory: true`, dynamic tool calls are now
reconstructable in `thread/read` and `thread/resume` as
`ThreadItem::DynamicToolCall`.
We propagate the session ID when sending requests for inference but we
don't do the same for compaction requests. This makes it hard to link
compaction requests to their session for debugging purposes
## Why
Zsh fork execution was still able to bypass the `WorkspaceWrite` model
in edge cases because the fork path reconstructed command execution
without preserving sandbox wrappers, and command extraction only
accepted shell invocations in a narrow positional shape. This can allow
commands to run with broader filesystem access than expected, which
breaks the sandbox safety model.
## What changed
- Preserved the sandboxed `ExecRequest` produced by
`attempt.env_for(...)` when entering the zsh fork path in
[`unix_escalation.rs`](https://github.com/openai/codex/blob/main/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs).
- Updated `CoreShellCommandExecutor` to execute the sandboxed command
and working directory captured from `attempt.env_for(...)`, instead of
re-running a freshly reconstructed shell command.
- Made zsh-fork script extraction robust to wrapped invocations by
scanning command arguments for `-c`/`-lc` rather than only matching the
first positional form.
- Added unit tests in `unix_escalation.rs` to lock in wrapper-tolerant
parsing behavior and keep unsupported shell forms rejected.
- Tightened the regression in
[`skill_approval.rs`](https://github.com/openai/codex/blob/main/codex-rs/core/tests/suite/skill_approval.rs):
- `shell_zsh_fork_still_enforces_workspace_write_sandbox` now uses an
explicit `WorkspaceWrite` policy with `exclude_tmpdir_env_var: true` and
`exclude_slash_tmp: true`.
- The test attempts to write to `/tmp/...`, which is only reliably
outside writable roots with those explicit exclusions set.
## Verification
- Added and passed the new unit tests around `extract_shell_script`
parsing behavior with wrapped command shapes.
- `extract_shell_script_supports_wrapped_command_prefixes`
- `extract_shell_script_rejects_unsupported_shell_invocation`
- Verified the regression with the focused integration test:
`shell_zsh_fork_still_enforces_workspace_write_sandbox`.
## Manual Testing
Prior to this change, if I ran Codex via:
```
just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork
```
and asked:
```
what is the output of /bin/ps
```
it would run it, even though the default sandbox should prevent the
agent from running `/bin/ps` because it is setuid on MacOS.
But with this change, I now see the expected failure because it is
blocked by the sandbox:
```
/bin/ps exited with status 1 and produced no output in this environment.
```
Add experimental `thread/realtime/*` v2 requests and notifications, then
route app-server realtime events through that thread-scoped surface with
integration coverage.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- Promote `js_repl` to an experimental feature that users can enable
from `/experimental`.
- Add `js_repl` experimental metadata, including the Node prerequisite
and activation guidance.
- Add regression coverage for the feature metadata and the
`/experimental` popup.
## What Changed
- Changed `Feature::JsRepl` from `Stage::UnderDevelopment` to
`Stage::Experimental`.
- Added experimental metadata for `js_repl` in `core/src/features.rs`:
- name: `JavaScript REPL`
- description: calls out interactive website debugging, inline
JavaScript execution, and the required Node version (`>= v24.13.1`)
- announcement: tells users to enable it, then start a new chat or
restart Codex
- Added a core unit test that verifies:
- `js_repl` is experimental
- `js_repl` is disabled by default
- the hardcoded Node version in the description matches
`node-version.txt`
- Added a TUI test that opens the `/experimental` popup and verifies the
rendered `js_repl` entry includes the Node requirement text.
## Testing
- `just fmt`
- `cargo test -p codex-tui`
- `cargo test -p codex-core` (unit-test phase passed; stopped during the
long `tests/all.rs` integration suite)
**PR Summary**
This PR adds embedded-only OTEL policy audit logging for
`codex-network-proxy` and threads audit metadata from `codex-core` into
managed proxy startup.
### What changed
- Added structured audit event emission in `network_policy.rs` with
target `codex_otel.network_proxy`.
- Emitted:
- `codex.network_proxy.domain_policy_decision` once per domain-policy
evaluation.
- `codex.network_proxy.block_decision` for non-domain denies.
- Added required policy/network fields, RFC3339 UTC millisecond
`event.timestamp`, and fallback defaults (`http.request.method="none"`,
`client.address="unknown"`).
- Added non-domain deny audit emission in HTTP/SOCKS handlers for
mode-guard and proxy-state denies, including unix-socket deny paths.
- Added `REASON_UNIX_SOCKET_UNSUPPORTED` and used it for unsupported
unix-socket auditing.
- Added `NetworkProxyAuditMetadata` to runtime/state, re-exported from
`lib.rs` and `state.rs`.
- Added `start_proxy_with_audit_metadata(...)` in core config, with
`start_proxy()` delegating to default metadata.
- Wired metadata construction in `codex.rs` from session/auth context,
including originator sanitization for OTEL-safe tagging.
- Updated `network-proxy/README.md` with embedded-mode audit schema and
behavior notes.
- Refactored HTTP block-audit emission to a small local helper to reduce
duplication.
- Preserved existing unix-socket proxy-disabled host/path behavior for
responses and blocked history while using an audit-only endpoint
override (`server.address="unix-socket"`, `server.port=0`).
### Explicit exclusions
- No standalone proxy OTEL startup work.
- No `main.rs` binary wiring.
- No `standalone_otel.rs`.
- No standalone docs/tests.
### Tests
- Extended `network_policy.rs` tests for event mapping, metadata
propagation, fallbacks, timestamp format, and target prefix.
- Extended HTTP tests to assert unix-socket deny block audit events.
- Extended SOCKS tests to cover deny emission from handler deny
branches.
- Added/updated core tests to verify audit metadata threading into
managed proxy state.
### Validation run
- `just fmt`
- `cargo test -p codex-network-proxy` ✅
- `cargo test -p codex-core` ran with one unrelated flaky timeout
(`shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin`), and
the test passed when rerun directly ✅
---------
Co-authored-by: viyatb-oai <viyatb@openai.com>
**PR Summary**
This PR adds the OpenTelemetry `host.name` resource attribute to Codex
OTEL exports so every OTEL log (and trace, via the shared resource)
carries the machine hostname.
**What changed**
- Added `host.name` to the shared OTEL `Resource` in
`/Users/michael.mcgrew/code/codex/codex-rs/otel/src/otel_provider.rs`
- This applies to both:
- OTEL logs (`SdkLoggerProvider`)
- OTEL traces (`SdkTracerProvider`)
- Hostname is now resolved via `gethostname::gethostname()`
(best-effort)
- Value is trimmed
- Empty values are omitted (non-fatal)
- Added focused unit tests for:
- including `host.name` when present
- omitting `host.name` when missing/empty
**Why**
- `host.name` is host/process metadata and belongs on the OTEL
`resource`, not per-event attributes.
- Attaching it in the shared resource is the smallest change that
guarantees coverage across all exported OTEL logs/traces.
**Scope / Non-goals**
- No public API changes
- No changes to metrics behavior (this PR only updates log/trace
resource metadata)
**Dependency updates**
- Added `gethostname` as a workspace dependency and `codex-otel`
dependency
- `Cargo.lock` updated accordingly
- `MODULE.bazel.lock` unchanged after refresh/check
**Validation**
- `just fmt`
- `cargo test -p codex-otel`
- `just bazel-lock-update`
- `just bazel-lock-check`
Add a stream parser to extract citations (and others) from a stream.
This support cases where markers are split in differen tokens.
Codex never manage to make this code work so everything was done
manually. Please review correctly and do not touch this part of the code
without a very clear understanding of it
This PR adds the macro `#[large_stack_test]`
This spawns the tests in a dedicated tokio runtime with a larger stack.
It is useful for tests that needs the full recursion on the harness
(which is now too deep for windows for example)
Summary
- propagate approval policy from parent to spawned agents and drop the
Never override so sub-agents respect the caller’s request
- refresh the pending-approval list whenever events arrive or the active
thread changes and surface the list above the composer for inactive
threads
- add widgets, helpers, and tests covering the new pending-thread
approval UI state
![Uploading Screenshot 2026-02-25 at 11.02.18.png…]()
## Why
`unix_escalation.rs` checks a session-scoped approval cache before
prompting again for an execve-intercepted skill script. Without also
recording `ReviewDecision::ApprovedForSession`, that cache never gets
populated, so the same skill script can still trigger repeated approval
prompts within one session.
## What Changed
- Add `execve_session_approvals` to `SessionServices` so the session can
track approved skill script paths.
- Record the script path when a skill-script prompt returns
`ReviewDecision::ApprovedForSession`, but only for the skill-script path
rather than broader prefix-rule approvals.
- Reuse the cached approval on later execve callbacks by treating an
already-approved skill script as `Decision::Allow`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12756).
* #12758
* __->__ #12756
Migration Behavior
* Config
* Migrates settings.json into config.toml
* Only adds fields when config.toml is missing, or when those fields are
missing from the existing file
* Supported mappings:
env -> shell_environment_policy
sandbox.enabled = true -> sandbox_mode = "workspace-write"
* Skills
* Copies home and repo .claude/skills into .agents/skills
* Existing skill directories are not overwritten
* SKILL.md content is rewritten from Claude-related terms to Codex
* AgentsMd
* Repo only
* Migrates CLAUDE.md into AGENTS.md
* Detect/import only proceed when AGENTS.md is missing or present but
empty
* Content is rewritten from Claude-related terms to Codex
Add service name to the app-server so that the app can use it's own
service name
This is on thread level because later we might plan the app-server to
become a singleton on the computer
## Summary
- Preserve each skill’s raw permissions block as a permission_profile on
SkillMetadata during skill loading.
- Keep compiling that same metadata into the existing runtime
Permissions object, so current enforcement
behavior stays intact.
- When zsh-fork intercepts execution of a script that belongs to a
skill, include the skill’s
permission_profile in the exec approval request.
- This lets approval UIs show the extra filesystem access the skill
declared when prompting for approval.
## Why
In the `shell_zsh_fork` flow, `codex-shell-escalation` receives the
executable path exactly as the shell passed it to `execve()`. That path
is not guaranteed to be absolute.
For commands such as `./scripts/hello-mbolin.sh`, if the shell was
launched with a different `workdir`, resolving the intercepted `file`
against the server process working directory makes policy checks and
skill matching inspect the wrong executable. This change pushes that fix
a step further by keeping the normalized path typed as `AbsolutePathBuf`
throughout the rest of the escalation pipeline.
That makes the absolute-path invariant explicit, so later code cannot
accidentally treat the resolved executable path as an arbitrary
`PathBuf`.
## What Changed
- record the wrapper process working directory as an `AbsolutePathBuf`
- update the escalation protocol so `workdir` is explicitly absolute
while `file` remains the raw intercepted exec path
- resolve a relative intercepted `file` against the request `workdir` as
soon as the server receives the request
- thread `AbsolutePathBuf` through `EscalationPolicy`,
`CoreShellActionProvider`, and command normalization helpers so the
resolved executable path stays type-checked as absolute
- replace the `path-absolutize` dependency in `codex-shell-escalation`
with `codex-utils-absolute-path`
- add a regression test that covers a relative `file` with a distinct
`workdir`
## Verification
- `cargo test -p codex-shell-escalation`
Direct skill-script matches force `Decision::Prompt`, so skill-backed
scripts require explicit approval before they run. (Note "allow for
session" is not supported in this PR, but will be done in a follow-up.)
In the process of implementing this, I fixed an important bug:
`ShellZshFork` is supposed to keep ordinary allowed execs on the
client-side `Run` path so later `execve()` calls are still intercepted
and reviewed. After the shell-escalation port, `Decision::Allow` still
mapped to `Escalate`, which moved `zsh` to server-side execution too
early. That broke the intended flow for skill-backed scripts and made
the approval prompt depend on the wrong execution path.
## What changed
- In `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`,
`Decision::Allow` now returns `Run` unless escalation is actually
required.
- Removed the zsh-specific `argv[0]` fallback. With the `Allow -> Run`
fix in place, zsh's later `execve()` of the script is intercepted
normally, so the skill match happens on the script path itself.
- Kept the skill-path handling in `determine_action()` focused on the
direct `program` match path.
## Verification
- Updated `shell_zsh_fork_prompts_for_skill_script_execution` in
`codex-rs/core/tests/suite/skill_approval.rs` (gated behind `cfg(unix)`)
to:
- run under `SandboxPolicy::new_workspace_write_policy()` instead of
`DangerFullAccess`
- assert the approval command contains only the script path
- assert the approved run returns both stdout and stderr markers in the
shell output
- Ran `cargo test -p codex-core
shell_zsh_fork_prompts_for_skill_script_execution -- --nocapture`
## Manual Testing
Run the dev build:
```
just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork
```
I have created `/Users/mbolin/.agents/skills/mbolin-test-skill` with:
```
├── scripts
│ └── hello-mbolin.sh
└── SKILL.md
```
The skill:
```
---
name: mbolin-test-skill
description: Used to exercise various features of skills.
---
When this skill is invoked, run the `hello-mbolin.sh` script and report the output.
```
The script:
```
set -e
# Note this script will fail if run with network disabled.
curl --location openai.com
```
Use `$mbolin-test-skill` to invoke the skill manually and verify that I
get prompted to run `hello-mbolin.sh`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12730).
* #12750
* __->__ #12730