Commit Graph

5075 Commits

Author SHA1 Message Date
Michael Bolin
64679ec8a5 permissions: store thread sessions as profiles 2026-04-27 14:39:07 -07:00
Michael Bolin
7aed116fa5 permissions: derive snapshot sandbox projections 2026-04-27 14:39:07 -07:00
Michael Bolin
68fb758131 permissions: make SessionConfigured profile-only 2026-04-27 14:39:07 -07:00
Michael Bolin
df27892772 permissions: require profiles in TUI thread state 2026-04-27 14:39:07 -07:00
Michael Bolin
87b6e80217 permissions: derive config defaults as profiles 2026-04-27 14:39:07 -07:00
Michael Bolin
4b55979755 permissions: remove cwd special path (#19841)
## Why

The experimental `PermissionProfile` API had both `:cwd` and
`:project_roots` special filesystem paths, which made the permission
root ambiguous. This PR removes the unstable `current_working_directory`
special path before the permissions API is stabilized, so callers use
`:project_roots` for symbolic project-root access.

## What changed

- Removes `FileSystemSpecialPath::CurrentWorkingDirectory` from protocol
and app-server protocol models, plus regenerated app-server
JSON/TypeScript schemas.
- Replaces internal `:cwd` permission entries with `:project_roots`
entries.
- Keeps the existing cwd-update behavior for legacy-shaped
workspace-write profiles, while removing the deleted
`CurrentWorkingDirectory` case from that compatibility path.
- Keeps `PermissionProfile::workspace_write()` as the reusable symbolic
workspace-write helper, with docs noting that `:project_roots` entries
resolve at enforcement time.
- Updates app-server docs/examples and approval UI labeling to stop
advertising `:cwd` as a permission token.

## Compatibility

Persisted rollout items may contain the old
`{"kind":"current_working_directory"}` tag from earlier experimental
`permissionProfile` snapshots. This PR keeps that tag as a
deserialize-only alias for `ProjectRoots { subpath: None }`, while
continuing to serialize only the new `project_roots` tag.

## Follow-up

This PR intentionally does not introduce an explicit project-root set on
`SessionConfiguration` or runtime sandbox resolution. Today, the
resolver still uses the active cwd as the single implicit project root.
A follow-up should model project roots separately from tool cwd so
`:project_roots` entries can resolve against the configured project
roots, and resolve to no entries when there are no project roots.

## Verification

- `cargo test -p codex-protocol permissions:: --lib`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-sandboxing -p codex-exec-server --lib`
- `cargo test -p codex-core session_configuration_apply_ --lib`
- `cargo test -p codex-app-server
command_exec_permission_profile_project_roots_use_command_cwd --test
all`
- `cargo test -p codex-tui
thread_read_session_state_does_not_reuse_primary_permission_profile
--lib`
- `cargo test -p codex-tui
preset_matching_accepts_workspace_write_with_extra_roots --lib`
- `cargo test -p codex-config --lib`
2026-04-27 13:41:27 -07:00
Eric Traut
52c06b8759 Preserve TUI markdown list spacing after code blocks (#19706)
## Why

Fixes #19702.

The TUI markdown renderer could visually attach the next list marker to
a fenced code block inside the previous list item, even when the source
markdown included a blank line before the next item. That made
block-heavy loose lists harder to read, while the desired behavior is
still to keep simple lists compact.

## What changed

- Track whether the current rendered list item contains a code block.
- Preserve one blank separator before the following list marker only
when the previous item contained a code block.
- Add regression coverage for both paths: code-block list items keep the
separator, and simple loose list items stay compact.

## Verification

- `cargo test -p codex-tui markdown_render`

I also manually verified that the bug exists before and is fixed after.

## Before
<img width="437" height="240" alt="Screenshot 2026-04-26 at 1 19 01 PM"
src="https://github.com/user-attachments/assets/3bc9d64d-2dba-40d9-9d6b-a1d0b3c0f728"
/>

## After
<img width="410" height="269" alt="Screenshot 2026-04-26 at 1 18 54 PM"
src="https://github.com/user-attachments/assets/19c15bee-da32-455e-a7cb-e05eb85f4ea0"
/>
2026-04-27 13:40:46 -07:00
Eric Traut
0bd25ab374 Delay approval prompts while typing (#19513)
## Why

Fixes #7744. Approval modals can currently appear while the user is
typing ahead in the TUI composer, which lets plain letters like `y` or
`a` get consumed as approval shortcuts instead of staying in the draft
input.

## What changed

- Track recent composer typing activity in `bottom_pane/mod.rs`.
- Delay new approval overlays for 1 second while the composer is active,
keeping delayed requests queued until the user is idle.
- Preserve the existing active-overlay behavior so approvals that arrive
while an approval modal is already open are still queued into that
overlay.
- Prune delayed approvals when app-server resolution says the request
has already been handled.

## Verification

Added unit coverage for immediate approvals, delayed approvals, idle
deadline reset, typed shortcut letters staying in the composer, shortcut
handling after the delay, and resolved delayed-request pruning.

Focused `codex-tui` test groups pass locally. The full `cargo test -p
codex-tui` run currently aborts in
`app::tests::attach_live_thread_for_selection_rejects_unmaterialized_fallback_threads`;
that same test also fails when run alone with the same stack overflow.

Manual reviewer check:

1. Start the TUI from the repo root:

   ```bash
   RUST_LOG=trace just codex \
     -c log_dir=<temp-log-dir> \
     --ask-for-approval untrusted \
     --sandbox workspace-write
   ```

2. Submit this prompt:

   ```text
   create a file text.txt on my desktop
   ```

3. While the agent is preparing the approval request, immediately type
text such as `ya this should stay in the composer`.
4. Confirm the typed-ahead `y`/`a` remains in the composer instead of
approving the request.
5. Stop typing for about 1 second; the approval modal should then
appear.
6. Once the modal is visible, press `y` and confirm the approval
shortcut works normally.
2026-04-27 13:20:55 -07:00
Eric Traut
850f035b8c Fix filtered thread-list resume regression in TUI (#19591)
## Why

`codex resume` regressed after
[#18502](https://github.com/openai/codex/pull/18502) changed the default
`thread/list` scan-and-repair path for metadata-filtered listings. The
TUI resume picker uses `thread/list` with source/provider/cwd filters
and `useStateDbOnly: false`, which is the intended
correctness-preserving mode: it should still consult the filesystem so
healthy, missing, or stale SQLite state can be repaired.

The regression was that #18502 made that filtered, filesystem-backed
path call `reconcile_rollout` for every filesystem hit, and then call it
again for each SQLite hit. When `reconcile_rollout` does not already
have extracted rollout items, it falls back to loading the full JSONL
rollout. That changed the resume picker’s first page from a cheap
rollout-head scan plus SQLite read-repair into full-file reads for large
sessions, so a few long threads could dominate TUI startup/resume
latency.

This change addresses the regression by keeping `useStateDbOnly: false`
on the correctness-preserving path while avoiding unnecessary full JSONL
reads for rows the filesystem scan has already validated.
Source/provider/cwd filters can be decided from rollout-head metadata,
so non-search resume listings only need the lightweight read-repair path
for filesystem hits. Full reconciliation is still used for DB-only
filtered rows because those can be stale false positives, and for search
listings because search can depend on title metadata that may require
scanning the full rollout.

This fixes #19483.

## What changed

- For non-search filtered listings, repair filesystem hits with the
lightweight `read_repair_rollout_path` path instead of full
`reconcile_rollout`.
- Track thread IDs proven by the filesystem scan and only fully
reconcile SQLite-filtered hits that the filesystem scan did not return,
preserving stale-DB false-positive cleanup without full-reading every
healthy rollout.
- Leave search listings on full reconciliation, since search depends on
full title metadata rather than only source/provider/cwd metadata from
the rollout head.

## Verification

- `cargo test -p codex-rollout list_threads`
- `cargo test -p codex-app-server thread_list`
2026-04-27 13:02:39 -07:00
Curtis 'Fjord' Hawthorne
277186ec85 Cap original-detail image token estimates (#19865)
Clamp original-detail image patch estimates to the current 10k patch
budget so large images cannot inflate local context accounting without
bound. Add regression coverage for an over-budget image.

Fixes openai/codex#19806.
2026-04-27 12:39:24 -07:00
rhan-oai
215d5a8f7c [codex-analytics] remove ga flag (#19863) 2026-04-27 19:29:19 +00:00
sayan-oai
85c1500569 fix: filter dynamic deferred tools from model_visible_specs (#19771)
fixes #19486

### Problem
Right now dynamic deferred tools are filtered at normal-turn prompt
building time, rather than upstream while building the `ToolRouter`
itself. This causes issues because dynamic deferred tools are then
wrongly included in the router's `model_visible_specs`, which is what
the compaction request-building flow relies on.

### Fix
Move the dynamic deferred tool filtering to `ToolRouter` creation time
to solve this problem for every request that relies on `ToolRouter` for
`model_visible_specs`, which solves the issue generically.

### Tests
Added unit + integration tests to ensure dynamic deferred tools are
omitted from `model_visible_specs` and compaction request respectively.

Tested against live `/compact` endpoint; raw deferred dynamic tools
without `tool_search` returned `400` (current bug), while the filtered
payload (this fix) returns `200`.
2026-04-27 19:09:02 +00:00
pakrym-oai
e5709db6dc Streamline account and command handlers (#19491)
## Why

Account login/logout and command exec handlers were doing local error
sends in the middle of each handler. That made these request flows
branch heavily even though most of the logic is validate, perform the
operation, and return the response.

## What Changed

- Converted ChatGPT/API-key login, login cancel, logout, rate-limit, and
add-credit handlers in
`codex-rs/app-server/src/codex_message_processor.rs` to compute `Result`
values and send them once at the request boundary.
- Applied the same shape to command exec start/write/resize/terminate
handlers.
- Kept side-effect notifications in the same places after successful
request handling.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::account --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::command_exec --
--test-threads=1`
2026-04-27 12:03:49 -07:00
efrazer-oai
2009f6e894 refactor: make auth loading async (#19762)
## Summary

Auth loading used to expose synchronous construction helpers in several
places even though some auth sources now need async work. This PR makes
the auth-loading surface async and updates the callers to await it.

This is intentionally only plumbing. It does not change how
AgentIdentity tokens are decoded, how task runtime ids are allocated, or
how JWT signatures are verified.

## Stack

1. **This PR:** [refactor: make auth loading
async](https://github.com/openai/codex/pull/19762)
2. [refactor: load AgentIdentity runtime
eagerly](https://github.com/openai/codex/pull/19763)
3. [feat: verify AgentIdentity JWTs with
JWKS](https://github.com/openai/codex/pull/19764)

## Important call sites

| Area | Change |
| --- | --- |
| `codex-login` auth loading | `CodexAuth` and `AuthManager`
construction paths now await auth loading. |
| app-server startup | Auth manager construction is awaited during
initialization. |
| CLI/TUI/exec/MCP/chatgpt callers | Existing auth-loading calls now
await the same behavior. |
| cloud requirements storage loader | The loader becomes async so it can
share the same auth construction path. |
| auth tests | Tests that load auth now run in async contexts. |

## Testing

Tests: targeted Rust auth test compilation, formatter, scoped Clippy
fix, and Bazel lock check.
2026-04-27 11:00:27 -07:00
pakrym-oai
4ed22fc7d2 Streamline plugin, apps, and skills handlers (#19490)
## Why

The plugin, app, and skills handlers had a lot of repeated
`send_error`/`return` branches that made the success path hard to scan.
This slice keeps behavior the same while moving fallible steps into
local response-producing helpers, so the request boundary can send one
result.

## What Changed

- Converted plugin list/install/uninstall handlers in
`codex-rs/app-server/src/codex_message_processor/plugins.rs` to return
`Result<*Response, JSONRPCErrorError>` from helper methods and call
`send_result` once.
- Added local error-mapping helpers for plugin install/uninstall and
marketplace failures.
- Applied the same mechanical shape to app list, skills list/config, and
marketplace add/remove/upgrade handlers in
`codex-rs/app-server/src/codex_message_processor.rs`.

## Verification

- `cargo check -p codex-app-server`
- `cargo test -p codex-app-server --test all v2::app_list --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::plugin_ --
--test-threads=1`
- `cargo test -p codex-app-server --test all v2::skills_list --
--test-threads=1`
2026-04-27 10:18:25 -07:00
Eric Traut
48dd7b58f0 Render delegated patch approval details (#19709)
## Why

Fixes #19632.

When a delegated agent requests approval for an in-progress file change,
the parent TUI handles that request from an inactive thread. The app
server already sent the `FileChange` item with the proposed diff, but
the inactive-thread approval path was not recovering and rendering it
the same way as the active-thread path.

The result was an inconsistent approval prompt: main-thread edits show a
normal patch preview history item before the approval modal, while
delegated edits did not show that preview in the transcript flow.

## What Changed

- Recover buffered or historical `FileChange` item changes when building
inactive-thread file-change approval requests.
- Reuse the app-server file-change conversion helper for both live
transcript rendering and inactive-thread approvals.
- Render recovered delegated patches as a normal patch preview history
cell before the approval modal.
- Keep apply-patch approval modals focused on the decision prompt and
optional metadata; they do not render a synthetic command line or embed
the diff body.

## Manual Repro And Verification

I manually reproduced the issue using a file under `~/Desktop` so the
write would require approval.

Before the fix:

1. Ask the main thread: `Use apply_patch, not shell redirection or
Python, to create ~/Desktop/bug1.txt with three short lines.`
2. Observe the expected TUI shape: the transcript shows a normal patch
preview such as `• Added ~/Desktop/bug1.txt (+N -0)` above the approval
modal, and the modal contains only the approval prompt/options without a
synthetic command line.
3. Ask for the delegated path: `Spawn a worker. Have it use apply_patch,
not shell redirection or Python, to create ~/Desktop/bug1.txt with four
short lines.`
4. Observe the delegated approval is inconsistent: the parent view does
not render the proposed patch as the normal transcript preview before
the modal, so the diff context is missing from the stream or appears
inside the modal instead of in the history flow.

After the fix:

1. Repeat the delegated worker prompt with `apply_patch`.
2. Confirm the parent view renders the same normal patch preview history
cell (`• Added ~/Desktop/bug1.txt (+N -0)` plus the diff) immediately
before the approval modal.
3. Confirm the approval modal remains focused on the decision prompt.
For delegated approvals it may show the worker thread label, but it
should not show a `$ apply_patch` command line or embed the diff body in
the modal.
2026-04-27 10:07:15 -07:00
Eric Traut
0e2300c02c Persist shell mode commands in prompt history (#19618)
## Why

`!` shell commands are currently surfaced as "Bash mode", which is
misleading for users running shells such as PowerShell or zsh. Those
commands also bypass the persistent prompt history path, so they cannot
be recalled after starting a new session.

Fixes #19613.

## What changed

- Rename the TUI footer label and related test wording from "Bash mode"
to "Shell mode".
- Persist accepted `!` shell commands to prompt history with the leading
`!`, so recall restores the composer into shell mode across sessions.
- Add coverage for immediate and queued shell-command submissions
emitting the prompt-history update.

## Verification

- `cargo test -p codex-tui bang_shell`
- `cargo test -p codex-tui shell_command_uses_shell_accent_style`
- `cargo test -p codex-tui footer_mode_snapshots`
- `cargo insta pending-snapshots --manifest-path tui/Cargo.toml`

Manually verified fix after confirming presence of bug prior to fix.
2026-04-27 09:54:25 -07:00
Eric Traut
6c51bf0c7c Hide rewind preview when no user message exists (#19510)
## Why

Fixes #19508.

In a fresh TUI session, pressing `Esc` twice entered the rewind
transcript overlay even though there was no user message to rewind to.
That produced an empty header-only transcript view and exposed a rewind
flow that could not select a valid target.

## What changed

The backtrack flow now checks whether a user-message rewind target
exists before opening the transcript preview. If no target exists, Codex
stays in the main TUI and shows `No previous message to edit.` instead
of opening an empty overlay.

The same guard applies when starting rewind preview from the transcript
overlay, and the first `Esc` no longer advertises the “edit previous
message” hint when there is no previous message available.

Snapshot coverage was added for the unavailable rewind info message,
along with a small target-detection test.
2026-04-27 09:51:12 -07:00
jif-oai
bb83eec825 chore: split memories part 1 (#19818)
Extract memories into 2 different crates
2026-04-27 16:01:05 +02:00
jif-oai
f431ec12c9 nit: one more fix (#19813)
Fix this:
https://github.com/openai/codex/pull/19812#discussion_r3147529230
2026-04-27 15:32:31 +02:00
jif-oai
79b4f691a6 Avoid rewriting Phase 2 selection on clean workspace (#19812)
## Why

Phase 2 can now claim the global consolidation lock on startup even when
the git-backed memory workspace is already clean. The clean-workspace
path still finalized through the normal Phase 2 success path, which
clears and re-marks `selected_for_phase2` rows. That made no-op startups
perform avoidable writes to `stage1_outputs`, creating unnecessary DB
I/O and contention when no memory files changed.

## What Changed

- Added a preserving-selection Phase 2 finalizer in `codex-state` that
only marks the global job row as succeeded.
- Kept the existing `mark_global_phase2_job_succeeded` behavior for real
consolidation runs, where the selected Phase 2 snapshot must be
rewritten.
- Switched the `succeeded_no_workspace_changes` branch in
`core/src/memories/phase2.rs` to use the preserving-selection finalizer.
- Added a regression test that installs a SQLite trigger on
`stage1_outputs` and verifies the clean finalizer performs zero updates
there.

## Testing

- `cargo test -p codex-state`
- `cargo test -p codex-core memories::tests::phase2`
2026-04-27 15:14:16 +02:00
jif-oai
5d314f324c Allow Phase 2 memory claims after retry exhaustion (#19809)
## Why

The Phase 2 memories job row is only the global lock for the git-backed
memory workspace. Manual memory edits do not enqueue new Stage 1 work,
so a Phase 2 row with `retry_remaining = 0` could be skipped before the
worker ever claimed the lock and generated `phase2_workspace_diff.md`.

That left workspace-only changes unconsolidated after repeated failures,
even when retry backoff had elapsed and the filesystem had real diffable
work.

## What Changed

- Allow `try_claim_global_phase2_job` to claim the Phase 2 lock after
the retry budget is exhausted, while still respecting active `retry_at`
backoff and fresh running leases.
- Treat `SkippedRetryUnavailable` for Phase 2 as backoff-only, and
update the outcome docs to match.
- Clamp Phase 2 retry bookkeeping at zero when failed attempts are
recorded.

## Verification

- Added
`phase2_global_lock_can_be_claimed_after_retry_budget_is_exhausted` to
cover the exhausted-budget lock claim path.
- Ran `cargo test -p codex-state`.
2026-04-27 14:58:11 +02:00
jif-oai
01ab25dbb5 feat: use git-backed workspace diffs for memory consolidation (#18982)
## Why

This PR make the `morpheus` agent (memory phase 2) use a git diff to
start it's consolidation. The workflow is the following:
1. The agent acquire a lock
2. If `.codex/memories` does not exist or is not a git root, initialize
everything (and make a first empty commit)
3. Update `raw_memories.md` and `rollout_summaries/` as before.
Basically we select max N phase 1 memories based on a given policy
4. We use git (`gix`) to get a diff between the current state of
`.codex/memories` and the last commit.
5. Dump the diff in `phase2_workspace_diff.md`
6. Spawn `morpheus` and point it to `phase2_workspace_diff.md`
7. Wait for `morpheus` to be done
8. Re-create a new `.git` and make one single commit on it. We do this
because we don't want to preserve history through `.git` and this is
cheap anyway
9. We release the lock
On top of this, we keep the retry policies etc etc

The goals of this new workflow are:
* Better support of any memory extensions such as `chronicle`
* Allow the user to manually edit memories and this will be considered
by the phase 2 agent
 
As a follow-up we will need to add support for user's edition while
`morpheus` is running

## What Changed

- Added memory workspace helpers that prepare the git baseline, compute
the diff, write `phase2_workspace_diff.md`, and reset the baseline after
successful consolidation.
- Updated Phase 2 to sync current inputs into `raw_memories.md` and
`rollout_summaries/`, prune old extension resources, skip clean
workspaces, and run the consolidation subagent only when the workspace
has changes.
- Tightened Phase 2 job ownership around long-running consolidation with
heartbeats and an ownership check before resetting the baseline.
- Simplified the prompt and state APIs so DB watermarks are bookkeeping,
while workspace dirtiness decides whether consolidation work exists.
- Updated the memory pipeline README and tests for workspace diffs,
extension-resource cleanup, pollution-driven forgetting, selection
ranking, and baseline persistence.

## Verification

- Added/updated coverage in `core/src/memories/tests.rs`,
`core/src/memories/workspace_tests.rs`, `state/src/runtime/memories.rs`,
and `core/tests/suite/memories.rs`.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 14:32:44 +02:00
jif-oai
f8c527e529 multi_agent_v2: move thread cap into feature config (#19792)
## Why

`features.multi_agent_v2.max_concurrent_threads_per_session` is meant to
be the MultiAgentV2-specific session thread cap: it counts the root
thread and all open subagent threads. The previous implementation kept
this surface tied to `agents.max_threads`, which made it a global
subagent-only cap and allowed the legacy setting to coexist with
MultiAgentV2.

## What Changed

- Added `max_concurrent_threads_per_session` to
`[features.multi_agent_v2]` with default `4`.
- Removed the `[agents] max_concurrent_threads_per_session` alias to
`agents.max_threads`.
- When MultiAgentV2 is enabled, reject `agents.max_threads` and derive
the existing internal subagent slot limit as
`max_concurrent_threads_per_session - 1`.
- Regenerated `core/config.schema.json` and added coverage for the new
config semantics.

## Result
```
➜  codex git:(jif/clean-multi-agent-v2-config) codex -c features.multi_agent_v2.enabled=true -c features.multi_agent_v2.max_concurrent_threads_per_session=3
╭────────────────────────────────────────────────────╮
│ >_ OpenAI Codex (v0.0.0)                           │
│                                                    │
│ model:     gpt-5.5 xhigh   fast   /model to change │
│ directory: ~/code/codex                            │
╰────────────────────────────────────────────────────╯

  Tip: Update Required - This version will no longer be supported starting May 8th. Please upgrade to the latest version (https://github.com/openai/codex/releases/latest) using your preferred package manager.

› Can you try to spawn 4 agents


• I’ll try to start four lightweight agents at once and report exactly what the runtime accepts.

• Spawned Russell [no-apps] (gpt-5.5 xhigh)
  └ Spawn probe 1: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Spawned Descartes [no-apps] (gpt-5.5 xhigh)
  └ Spawn probe 2: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Agent spawn failed
  └ Spawn probe 3: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• Agent spawn failed
  └ Spawn probe 4: reply briefly that you started, then wait for further instructions. Do not do any repo work.

• The runtime accepted the first two and rejected the next two with agent thread limit reached. I’m checking whether the two accepted probes have returned cleanly, then I’ll close them if needed.
```

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-27 13:31:56 +02:00
Michael Bolin
a6ca39c630 permissions: derive legacy exec policies at boundaries (#19737)
## Why

After config and requirements store canonical profiles, exec requests
should not cache a derived `SandboxPolicy`. The cached legacy value can
drift from the richer profile state, and most execution paths already
have the filesystem and network runtime policies they need.

## What Changed

- Removes `sandbox_policy` from `codex_sandboxing::SandboxExecRequest`
and `codex_core::sandboxing::ExecRequest`.
- Adds an on-demand `ExecRequest::compatibility_sandbox_policy()` helper
for the Windows and legacy call sites that still need a `SandboxPolicy`
projection.
- Updates Windows filesystem override setup and unified exec policy
serialization to derive that compatibility policy at the boundary.
- Updates Unix escalation reruns and direct shell requests to
reconstruct exec requests from `PermissionProfile` plus runtime
filesystem/network policy, without carrying a cached legacy policy.
- Adjusts sandboxing manager tests to assert the effective profile
rather than the removed legacy field.

## Verification

- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-sandboxing manager`
- `cargo test -p codex-core
exec_server_params_use_env_policy_overlay_contract`
- `cargo test -p codex-core unix_escalation`
- `cargo test -p codex-core exec::tests`
- `cargo test -p codex-core sandboxing::tests`
2026-04-26 22:11:49 -07:00
Michael Bolin
523e4aa8e3 permissions: constrain requirements as profiles (#19736) 2026-04-26 21:49:30 -07:00
Michael Bolin
0ccd659b4b permissions: store only constrained permission profiles (#19735) 2026-04-26 20:59:58 -07:00
Won Park
8033b6a449 Add /auto-review-denials retry approval flow (#19058)
## Why

Auto-review can deny an action that the user later decides they want to
retry. Today there is no TUI surface for selecting a recent denial and
sending explicit approval context back into the session, so users have
to restate intent manually and the retry can be reviewed without the
original denied action context.

This adds a narrow TUI-driven path for approving a recent denied action
while still keeping the retry inside the normal auto-review flow.

## What Changed

- Added `/auto-review-denials` to open a picker of recent denied
auto-review actions.
- Added a small in-memory TUI store for the 10 most recent denied
auto-review events.
- Selecting a denial sends the structured denied event back through the
existing core/app-server op path.
- Core now injects a developer message containing the approved action
JSON rather than the full assessment event.
- Auto-review transcript collection now preserves this specific approval
developer message so follow-up review sessions can see the user approval
context.
- Added TUI snapshot/unit coverage for the picker and approval dispatch
path.
- Added core coverage for retaining the approval developer message in
the auto-review transcript.

## Verification

- `cargo test -p codex-core
collect_guardian_transcript_entries_keeps_manual_approval_developer_message`
- `cargo test -p codex-tui auto_review_denials`
- `cargo test -p codex-tui
approving_recent_denial_emits_structured_core_op_once`

## Notes

This intentionally keeps retries going through auto-review. The approval
signal is context for the exact previously denied action, not a blanket
bypass for similar future actions.
2026-04-27 03:43:53 +00:00
Michael Bolin
0d8cdc0510 permissions: centralize legacy sandbox projection (#19734)
## Why

The remaining migration work still needs `SandboxPolicy` at a few
compatibility boundaries, but those projections should come from one
canonical path. Keeping ad hoc legacy projections scattered through
app-server, CLI, and config code makes it easy for behavior to drift as
`PermissionProfile` gains fidelity that the legacy enum cannot
represent.

## What Changed

- Adds `Permissions::legacy_sandbox_policy(cwd)` and
`Config::legacy_sandbox_policy()` as the compatibility projection from
the canonical `PermissionProfile`.
- Adds `Permissions::can_set_legacy_sandbox_policy()` so legacy inputs
are checked after they are converted into profile semantics.
- Updates app-server command handling, Windows sandbox setup, session
configuration, and sandbox summaries to use the centralized projection
helper.
- Leaves `SandboxPolicy` in place only for boundary inputs/outputs that
still speak the legacy abstraction.

## Verification

- `cargo check -p codex-config -p codex-core -p codex-sandboxing -p
codex-app-server -p codex-cli -p codex-tui`
- `cargo test -p codex-tui
permissions_selection_history_snapshot_full_access_to_default --
--nocapture`
- `cargo test -p codex-tui
permissions_selection_sends_approvals_reviewer_in_override_turn_context
-- --nocapture`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_history_snapshot_full_access_to_default
--test_output=errors`
- `bazel test //codex-rs/tui:tui-unit-tests-bin
--test_arg=permissions_selection_sends_approvals_reviewer_in_override_turn_context
--test_output=errors`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19734).
* #19737
* #19736
* #19735
* __->__ #19734
2026-04-26 20:31:23 -07:00
Abhinav
c3e60849e5 inline hostname resolution for remote sandbox config (#19739)
# Why

Requirements support host-specific
`remote_sandbox_config.hostname_patterns`, but config loading previously
resolved and passed the system hostname through every config-loading
path even when no requirements layer used `remote_sandbox_config`. On
machines where hostname lookup is slow, startup and app-server config
reads paid for a feature that was not active.

We only need the hostname when a requirements layer actually declares
`remote_sandbox_config`, so this moves hostname resolution to the single
requirements merge point and keeps all other config callers unaware of
hostname matching.

# What

- Removed the eager `host_name` plumbing from
`load_config_layers_state`, `load_requirements_toml`, `ConfigBuilder`,
app-server `ConfigManager`, network proxy loading, and related call
sites.
- Resolve the hostname inside
`merge_requirements_with_remote_sandbox_config` only when the incoming
requirements contain `remote_sandbox_config`.
2026-04-27 03:18:57 +00:00
Michael Bolin
ad57a3fee2 permissions: finish profile-backed app surfaces (#19395) 2026-04-26 19:42:39 -07:00
Andrey Mishchenko
1f304dd1f2 Allow agents.max_threads to work with multi_agent_v2 (#19733) 2026-04-26 17:56:05 -07:00
Michael Bolin
2cb8746457 permissions: remove core legacy policy round trips (#19394)
## Why

Several execution paths still converted profile-backed permissions into
`SandboxPolicy` and then rebuilt runtime permissions from that legacy
shape. Those round trips are unnecessary after the preceding PRs and can
lose split filesystem semantics. Core approval and escalation should
carry the resolved profile directly.

## What Changed

- Removes `sandbox_policy` from `ResolvedPermissionProfile`; the
resolved permission object now carries the canonical `PermissionProfile`
directly.
- Updates exec-policy fallback, shell/unified-exec interception,
escalation reruns, and related tests to pass profiles instead of legacy
policies.
- Removes legacy additional-permission merge helpers that built an
effective `SandboxPolicy` before rebuilding runtime permissions.
- Keeps legacy projections only at compatibility boundaries that still
require `SandboxPolicy`, not in core permission computation.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`







































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19394).
* #19737
* #19736
* #19735
* #19734
* #19395
* __->__ #19394
2026-04-26 17:43:32 -07:00
Andrey Mishchenko
35bc6e3d01 Delete unused ResponseItem::Message.end_turn (#19605)
This field is unused. Delete it.
2026-04-26 17:18:09 -07:00
Ahmed Ibrahim
0bda8161a2 Split MCP connection modules (#19725)
## Why

The MCP connection manager module had grown to mix orchestration, RMCP
client startup, elicitation handling, Codex Apps cache and naming
behavior, tool qualification and filtering, and runtime data. The
previous stacked PRs split these responsibilities incrementally; this PR
collapses that work into one self-contained refactor on latest main.

## What changed

- Move McpConnectionManager into connection_manager.rs.
- Move RMCP client lifecycle, startup, and uncached tool listing into
rmcp_client.rs.
- Move elicitation request tracking and policy handling into
elicitation.rs.
- Move Codex Apps cache, key, filtering, and naming helpers into
codex_apps.rs.
- Rename the tool-name helper module to tools.rs and move ToolInfo, tool
filtering, schema masking, and qualification there.
- Move runtime and sandbox shared types into runtime.rs.
- Preserve latest main PermissionProfile-based MCP elicitation
auto-approval behavior.

## Verification

- just fmt
- cargo check -p codex-mcp
- cargo check -p codex-mcp --tests
- cargo check -p codex-core

---------

Co-authored-by: Codex <noreply@openai.com>
2026-04-26 23:23:34 +00:00
Michael Bolin
4c58e64f08 test: increase core-all-test shard count to 16 (#19727)
## Summary

Increase `core-all-test`'s Bazel shard count from `8` to `16`.

## Why

[#19609](https://github.com/openai/codex/pull/19609) restored
`bazel.yml` to a 30-minute timeout and increased `app-server-all-test`'s
shard count because the bigger timeout risk was not just a cold Windows
build. The more common problem was a long `rust_test()` shard failing
and getting retried multiple times.

Recent `main` runs show that `//codex-rs/core:core-all-test` still has
the same shape of problem on Windows:

- [Run
24943931330](https://github.com/openai/codex/actions/runs/24943931330)
reported `//codex-rs/core:core-all-test` as flaky after first-attempt
failures in shard `5/8` and shard `8/8`.
- Those retries were driven by
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
and
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`.
- The failed shard attempts in that run took `272.61s` and `259.27s`
before retrying, which is exactly the sort of wall-clock cost that burns
through the 30-minute budget.
- [Run
24966332583](https://github.com/openai/codex/actions/runs/24966332583)
also retried `//codex-rs/tui:tui-unit-tests` after
`app::tests::update_memory_settings_updates_current_thread_memory_mode`
failed once on Windows.
- [Run
24965527138](https://github.com/openai/codex/actions/runs/24965527138)
and its linked [BuildBuddy
invocation](https://app.buildbuddy.io/invocation/ac1a8265-06fa-4da5-9552-4715b7965bce)
show the other half of the problem: when Windows cache reuse is weak,
the `bazel test //...` step can already consume `24m11s` on its own,
leaving very little headroom for flaky retries.

Increasing `core-all-test` to `16` shards does not fix the flaky tests,
but it does reduce the wall-clock cost when a single shard has to be
retried. That matches the mitigation we already applied to
`app-server-all-test` in `#19609`.

## What Changed

- Update `codex-rs/core/BUILD.bazel` so `core-all-test` uses `16` shards
instead of `8`.
- Leave `core-unit-tests` unchanged.

## Follow-up Work

This change is meant to buy back CI headroom while we fix the flaky
tests themselves in subsequent commits. The recent Windows retries that
look worth addressing directly include:

-
`suite::cli_stream::responses_mode_stream_cli_supports_openai_base_url_config_override`
-
`suite::pending_input::steered_user_input_waits_when_tool_output_triggers_compact_before_next_request`
-
`app::tests::update_memory_settings_updates_current_thread_memory_mode`

## Verification

- Compared `core-all-test`'s current sharding against the
`app-server-all-test` precedent in
[#19609](https://github.com/openai/codex/pull/19609).
- Inspected recent `main` Bazel workflow logs and the linked BuildBuddy
invocation to confirm that Windows retries on long shards are still
consuming a meaningful fraction of the 30-minute timeout budget.
- Did not run local tests for this change because it only adjusts Bazel
sharding metadata.
2026-04-26 23:10:26 +00:00
pakrym-oai
ba159cbc79 Fix codex-core config test type paths (#19726)
Summary:
- Update config tests to reference config requirement types from
codex_config after the loader split.

Tests:
- just fmt
- cargo build -p codex-core --tests
- cargo clippy -p codex-core --tests -- -D warnings
2026-04-26 15:58:17 -07:00
Michael Bolin
dda8199b73 permissions: migrate approval and sandbox consumers to profiles (#19393)
## Why

Runtime decisions should not infer permissions from the lossy legacy
sandbox projection once `PermissionProfile` is available. In particular,
`Disabled` and `External` need to remain distinct, and managed profiles
with split filesystem or deny-read rules should not be collapsed before
approval, network, safety, or analytics code makes decisions.

## What Changed

- Changes managed network proxy setup and network approval logic to use
`PermissionProfile` when deciding whether a managed sandbox is active.
- Migrates patch safety, Guardian/user-shell approval paths, Landlock
helper setup, analytics sandbox classification, and selected
turn/session code to profile-backed permissions.
- Validates command-level profile overrides against the constrained
`PermissionProfile` rather than a strict `SandboxPolicy` round trip.
- Preserves configured deny-read restrictions when command profiles are
narrowed.
- Adds coverage for profile-backed trust, network proxy/approval
behavior, patch safety, analytics classification, and command-profile
narrowing.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`




































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19393).
* #19395
* #19394
* __->__ #19393
2026-04-26 15:30:40 -07:00
pakrym-oai
9c3abcd46c [codex] Move config loading into codex-config (#19487)
## Why

Config loading had become split across crates: `codex-config` owned the
config types and merge logic, while `codex-core` still owned the loader
that assembled the layer stack. This change consolidates that
responsibility in `codex-config`, so the crate that defines config
behavior also owns how configs are discovered and loaded.

To make that move possible without reintroducing the old dependency
cycle, the shell-environment policy types and helpers that
`codex-exec-server` needs now live in `codex-protocol` instead of
flowing through `codex-config`.

This also makes the migrated loader tests more deterministic on machines
that already have managed or system Codex config installed by letting
tests override the system config and requirements paths instead of
reading the host's `/etc/codex`.

## What Changed

- moved the config loader implementation from `codex-core` into
`codex-config::loader` and deleted the old `core::config_loader` module
instead of leaving a compatibility shim
- moved shell-environment policy types and helpers into
`codex-protocol`, then updated `codex-exec-server` and other downstream
crates to import them from their new home
- updated downstream callers to use loader/config APIs from
`codex-config`
- added test-only loader overrides for system config and requirements
paths so loader-focused tests do not depend on host-managed config state
- cleaned up now-unused dependency entries and platform-specific cfgs
that were surfaced by post-push CI

## Testing

- `cargo test -p codex-config`
- `cargo test -p codex-core config_loader_tests::`
- `cargo test -p codex-protocol -p codex-exec-server -p
codex-cloud-requirements -p codex-rmcp-client --lib`
- `cargo test --lib -p codex-app-server-client -p codex-exec`
- `cargo test --no-run --lib -p codex-app-server`
- `cargo test -p codex-linux-sandbox --lib`
- `cargo shear`
- `just bazel-lock-check`

## Notes

- I did not chase unrelated full-suite failures outside the migrated
loader surface.
- `cargo test -p codex-core --lib` still hits unrelated proxy-sensitive
failures on this machine, and Windows CI still shows unrelated
long-running/timeouting test noise outside the loader migration itself.
2026-04-26 15:10:53 -07:00
pakrym-oai
2a020f1a0a Lift app-server JSON-RPC error handling to request boundary (#19484)
## Why

App-server request handling had a lot of repeated JSON-RPC error
construction and one-off `send_error`/`return` branches. This made small
handlers noisy and pushed error response details into leaf code that
otherwise only needed to validate input or call the underlying API.

## What Changed

- Added shared JSON-RPC error constructors in
`codex-rs/app-server/src/error_code.rs`.
- Lifted straightforward request result emission into
`codex-rs/app-server/src/message_processor.rs` so response/error
dispatch happens at the request boundary.
- Reused the result helpers across command exec, config, filesystem,
device-key, external-agent config, fs-watch, and outgoing-message paths.
- Removed leaf wrapper handlers where the method body was only
forwarding to a response helper.
- Returned request validation errors upward in the simple cases instead
of sending an error locally and immediately returning.

## Verification

- `cargo test -p codex-app-server --lib command_exec::tests`
- `cargo test -p codex-app-server --lib outgoing_message::tests`
- `cargo test -p codex-app-server --lib in_process::tests`
- `cargo test -p codex-app-server --test all v2::fs`
- `cargo test -p codex-app-server --test all v2::config_rpc`
- `cargo test -p codex-app-server --test all v2::external_agent_config`
- `cargo test -p codex-app-server --test all v2::initialize`
- `just fix -p codex-app-server`
- `git diff --check`

Note: full `cargo test -p codex-app-server` was attempted and stopped in
`message_processor::tracing_tests::turn_start_jsonrpc_span_parents_core_turn_spans`
with a stack overflow after unrelated tests had already passed.
2026-04-26 15:10:35 -07:00
Michael Bolin
deaa307fb2 permissions: derive compatibility policies from profiles (#19392)
## Why

After #19391, `PermissionProfile` and the split filesystem/network
policies could still be stored in parallel. That creates drift risk: a
profile can preserve deny globs, external enforcement, or split
filesystem entries while a cached projection silently loses those
details. This PR makes the profile the runtime source and derives
compatibility views from it.

## What Changed

- Removes stored filesystem/network sandbox projections from
`Permissions` and `SessionConfiguration`; their accessors now derive
from the canonical `PermissionProfile`.
- Derives legacy `SandboxPolicy` snapshots from profiles only where an
older API still needs that field.
- Updates MCP connection and elicitation state to track
`PermissionProfile` instead of `SandboxPolicy` for auto-approval
decisions.
- Adds semantic filesystem-policy comparison so cwd changes can preserve
richer profiles while still recognizing equivalent legacy projections
independent of entry ordering.
- Updates config/session tests to assert profile-derived projections
instead of parallel stored fields.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`



































































---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19392).
* #19395
* #19394
* #19393
* __->__ #19392
2026-04-26 15:06:42 -07:00
Michael Bolin
4d7ce3447d permissions: make runtime config profile-backed (#19606)
## Why

This supersedes #19391. During stack repair, GitHub marked #19391 as
merged into a temporary stack branch rather than into `main`, so the
runtime-config change needed a fresh PR.

`PermissionProfile` is now the canonical permissions shape after #19231
because it can distinguish `Managed`, `Disabled`, and `External`
enforcement while also carrying filesystem rules that legacy
`SandboxPolicy` cannot represent cleanly. Core config and session state
still needed to accept profile-backed permissions without forcing every
profile through the strict legacy bridge, which rejected valid runtime
profiles such as direct write roots.

The unrelated CI/test hardening that previously rode along with this PR
has been split into #19683 so this PR stays focused on the permissions
model migration.

## What Changed

- Adds `Permissions.permission_profile` and
`SessionConfiguration.permission_profile` as constrained runtime state,
while keeping `sandbox_policy` as a legacy compatibility projection.
- Introduces profile setters that keep `PermissionProfile`, split
filesystem/network policies, and legacy `SandboxPolicy` projections
synchronized.
- Uses a compatibility projection for requirement checks and legacy
consumers instead of rejecting profiles that cannot round-trip through
`SandboxPolicy` exactly.
- Updates config loading, config overrides, session updates, turn
context plumbing, prompt permission text, sandbox tags, and exec request
construction to carry profile-backed runtime permissions.
- Preserves configured deny-read entries and `glob_scan_max_depth` when
command/session profiles are narrowed.
- Adds `PermissionProfile::read_only()` and
`PermissionProfile::workspace_write()` presets that match legacy
defaults.

## Verification

- `cargo test -p codex-core direct_write_roots`
- `cargo test -p codex-core runtime_roots_to_legacy_projection`
- `cargo test -p codex-app-server
requested_permissions_trust_project_uses_permission_profile_intent`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19606).
* #19395
* #19394
* #19393
* #19392
* __->__ #19606
2026-04-26 13:29:54 -07:00
efrazer-oai
fed0a8f4fa feat: load AgentIdentity from JWT login/env (#18904)
## Summary

This PR lets programmatic AgentIdentity users provide one token through
either stdin login or environment auth.

`codex login --with-agent-identity` reads an Agent Identity JWT from
stdin, validates that it has the required claims, and stores that token
as the `agent_identity` value in `auth.json`. The file format is
token-only; the decoded account and key fields are runtime state, not
hand-authored auth.json fields.

The Agent Identity JWT claim shape and decoder live in
`codex-agent-identity`; `codex-login` only owns env/storage precedence
and conversion into `CodexAuth::AgentIdentity`.

When env auth is enabled, `CODEX_AGENT_IDENTITY` can provide the same
JWT without writing auth state to disk. `CODEX_API_KEY` still wins if
both env vars are set.

Reference old stack: https://github.com/openai/codex/pull/17387/changes
Reference JWT/env stack: https://github.com/openai/codex/pull/18176

## Stack

1. https://github.com/openai/codex/pull/18757: full revert
2. https://github.com/openai/codex/pull/18871: isolated Agent Identity
crate
3. https://github.com/openai/codex/pull/18785: explicit AgentIdentity
auth mode and startup task allocation
4. https://github.com/openai/codex/pull/18811: migrate Codex backend
auth callsites through AuthProvider
5. This PR: accept AgentIdentity JWTs through login/env

## Testing

Tests: targeted login and Agent Identity crate tests, CLI checks, scoped
formatter/linter cleanup, and CI.

---------

Co-authored-by: Shijie Rao <shijie.rao@openai.com>
2026-04-26 19:49:54 +00:00
Michael Bolin
ac2bffa443 test: harden app-server integration tests (#19683)
## Why

Windows Bazel runs in the permissions stack exposed that app-server
integration tests were launching normal plugin startup warmups in every
subprocess. Those warmups can call
`https://chatgpt.com/backend-api/plugins/featured` when a test is not
specifically exercising plugin startup, which adds slow background work,
noisy stderr, and dependence on external network state. The relevant
startup/featured-plugin behavior was introduced across #15042 and
#15264.

A few app-server tests also had long optional waits or unbounded cleanup
paths, making failures expensive to diagnose and contributing to slow
Windows shards. One external-agent config test from #18246 used a
GitHub-style marketplace source, which was enough to exercise the
pending remote-import path but also meant the background completion task
could attempt a real clone.

## What Changed

- Adds explicit `AppServerRuntimeOptions` / `PluginStartupTasks`
plumbing and a hidden debug-only
`--disable-plugin-startup-tasks-for-tests` app-server flag, so
integration tests can suppress startup plugin warmups without adding a
production env-var gate.
- Has the app-server test harness pass that hidden flag by default,
while opting plugin-startup coverage back in for tests that
intentionally exercise startup sync and featured-plugin warmup behavior.
- Lowers normal app-server subprocess logging from `info`/`debug` to
`warn` to avoid multi-megabyte stderr output in Bazel logs.
- Prevents the external-agent config test from attempting a real
marketplace clone by using an invalid non-local source while still
exercising the pending-import completion path.
- Bounds optional filesystem/realtime waits and fake WebSocket
test-server shutdown so failures produce targeted timeouts instead of
hanging a shard.
- Fixes the Unix script-resolution test in `rmcp-client` to exercise
PATH resolution directly and include the actual spawn error in failures.

## Verification

- `cargo check -p codex-app-server`
- `cargo clippy -p codex-app-server --tests -- -D warnings`
- `cargo test -p codex-rmcp-client
program_resolver::tests::test_unix_executes_script_without_extension`
- `cargo test -p codex-app-server --test all
external_agent_config_import_sends_completion_notification_after_pending_plugins_finish
-- --nocapture`
- `cargo test -p codex-app-server --test all
plugin_list_uses_warmed_featured_plugin_ids_cache_on_first_request --
--nocapture`
- Windows Local Bazel passed with this test-hardening bundle before it
was extracted from #19606.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19683).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19683
2026-04-26 12:43:16 -07:00
Thibault Sottiaux
87bc72408c [codex] remove responses command (#19640)
This removes the hidden `codex responses` CLI subcommand after
confirming no downstream callers rely on it, deleting the raw Responses
passthrough implementation, unregistering the subcommand, and dropping
the now-unused CLI dependencies on `codex-api` and
`codex-model-provider`.
2026-04-25 23:10:38 -07:00
Andrey Mishchenko
355c40ad7e Support end_turn in response.completed (#19610)
Some providers of Responses API forward a model-defined `end_turn`
boolean indicating explicitly the model's indication of whether it would
like to end the turn or to be inferenced again. In this PR, we update
the sampling loop to use this field correctly if it's set. If the field
is not set by the provider, we fall back to the existing sampling logic.
2026-04-25 21:57:42 -07:00
Felipe Coury
5591912f0b fix(tui): reflow scrollback on terminal resize (#18575)
Fixes multiple scrollback and terminal resize issues: #5538, #5576,
#8352, #12223, #16165, and #15380.

## Why

Codex writes finalized transcript output into terminal scrollback after
wrapping it for the current viewport width. A later terminal resize
could leave that scrollback shaped for the old width, so wider windows
kept narrow output and narrower windows could show stale wrapping
artifacts until enough new output replaced the visible area.

This is also the foundation PR for responsive markdown tables. Table
rendering needs finalized transcript content to be width-sensitive after
insertion, not only while content is first streaming. Markdown table
rendering itself stays in #18576.

## Stack

- PR1: resize backlog reflow and interrupt cleanup
- #18576: markdown table support

## What Changed

- Rebuild source-backed transcript history when the terminal width
changes. `terminal_resize_reflow` is introduced through the experimental
feature system, but is enabled by default for this rollout so we can
validate behavior across real terminals.
- Preserve assistant and plan stream source so finalized streaming
output can participate in resize reflow after consolidation.
- Debounce resize work, but force a final source-backed reflow when a
resize happened during active or unconsolidated streaming output.
- Clear stale pending history lines on resize so old-width wrapped
output is not emitted just before rebuilt scrollback.
- Bound replay work with `[tui.terminal_resize_reflow].max_rows`:
omitted uses terminal-specific defaults, `0` keeps all rendered rows,
and a positive value sets an explicit cap. The cap applies both while
initially replaying a resumed transcript into scrollback and when
rebuilding scrollback after terminal resize.
- Consolidate interrupted assistant streams before cleanup, then clear
pending stream output and active-tail state consistently.
- Move resize reflow and thread event buffering helpers out of `app.rs`
into dedicated TUI modules.
- Add focused coverage for resize reflow, feature-gated behavior,
streaming source preservation, interrupted output cleanup,
unicode-neutral text, terminal-specific row caps, and composer/layout
stability.

## Runtime Bounds

Resize reflow keeps only the most recent rendered rows when a row cap is
active. The default is `auto`, which maps to the detected terminal's
default scrollback size where Codex can identify it: VS Code `1000`,
Windows Terminal `9001`, WezTerm `3500`, and Alacritty `10000`.
Terminals without a dedicated mapping use the conservative fallback of
`1000` rows. Users can override this with `[tui.terminal_resize_reflow]
max_rows = N`, or set `max_rows = 0` to disable row limiting.

## Validation

- `just fmt`
- `git diff --check`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui reflow`
- `cargo test --manifest-path codex-rs/Cargo.toml -p codex-tui
transcript_reflow`
- `just fix -p codex-tui`
- PR CI in progress on the squashed branch
2026-04-25 22:00:32 -03:00
Shijie Rao
4e30281a13 Guard npm update readiness (#19389)
## Why
For npm/Bun-managed installs, the update prompt was treating the latest
GitHub release as ready to install. During the `0.124.0` release, GitHub
and npm visibility were not atomic: the root npm wrapper could become
visible before the npm registry marked that version as the package
`latest`. That left a window where users could be prompted to upgrade
before npm was ready for the release.

## What changed
- Keep GitHub Releases as the candidate latest-version source for
npm/Bun installs, but only write the existing `version.json` cache after
npm registry metadata proves that same root version is ready.
- Add `codex-rs/tui/src/npm_registry.rs` to validate npm readiness by
checking `dist-tags.latest` and root package `dist` metadata for the
GitHub candidate version.
- Move version parsing helpers into
`codex-rs/tui/src/update_versions.rs` so that logic can be tested
without compiling the release-only `updates.rs` module under tests.
- Update `.github/workflows/rust-release.yml` so the six known platform
tarballs publish before the root `@openai/codex` wrapper. Other npm
tarballs publish before the root wrapper, and the SDK publishes after
the root package it depends on.
2026-04-25 17:09:29 -07:00
Michael Bolin
9881dc7306 fix: restore 30-minute timeout for Bazel builds (#19609)
I think raising it to 45 minutes in
https://github.com/openai/codex/pull/19578 was a mistake for the reasons
explained in the comments in the code. Instead, we attempt to defend
against timeouts by increasing the number of shards in
`app-server-all-test` so that a "true failure" that gets run 3x should
not take as much wall clock time.
2026-04-25 16:34:06 -07:00
Michael Bolin
d54493ba1c test: stabilize app-server path assertions on Windows (#19604)
## Why

Windows can represent the same canonical local path with either a normal
drive path or a verbatim device path prefix. The failure pattern that
motivated this PR was an assertion diff like `C:\...` versus
`\\?\C:\...`: different spellings, same file.

That became visible while validating the permissions stack above this
PR. The stack increasingly routes paths through `AbsolutePathBuf`, which
normalizes supported Windows device prefixes, while several existing
tests still built expected values directly with
`std::fs::canonicalize()` or compared `AbsolutePathBuf::as_path()` to a
raw `PathBuf`. On Windows, that can make tests fail because the two
sides choose different textual forms for an otherwise equivalent
canonical path.

This PR is intentionally split out as the bottom PR below #19606. The
runtime permissions migration should not carry unrelated Windows test
stabilization, and reviewers should be able to verify this as a
test-only change before looking at the larger permissions changes.

## Failure Modes Covered

- `conversation_summary` expected rollout paths were built from raw
canonicalized `PathBuf`s, while app-server responses could carry
`AbsolutePathBuf`-normalized paths.
- `thread_resume` compared returned thread paths directly to previously
stored or fixture paths, so a verbatim-prefix spelling could fail an
otherwise correct resume.
- `marketplace_add` compared plugin install roots through `as_path()`
against raw canonicalized paths, reproducing the same `C:\...` versus
`\\?\C:\...` mismatch in both app-server and core-plugin coverage.

## What Changed

- In `app-server/tests/suite/conversation_summary.rs`, normalize both
expected rollout paths and received `ConversationSummary.path` values
through `AbsolutePathBuf` before comparing the full summary object.
- In `app-server/tests/suite/v2/thread_resume.rs`, normalize both sides
of thread path comparisons before asserting equality. This keeps the
tests focused on whether resume returned the same existing path, not
whether Windows used the same string spelling.
- In `app-server/tests/suite/v2/marketplace_add.rs` and
`core-plugins/src/marketplace_add.rs`, compare install roots as
`AbsolutePathBuf` values instead of comparing an absolute-path wrapper
to a raw canonicalized `PathBuf`.

## Behavior

This PR does not change production app-server or marketplace behavior.
It only changes tests to assert semantic path identity across Windows
path spelling variants. It also leaves API response values untouched;
the normalization happens inside assertions only.

## Verification

Targeted local checks run while extracting this fix:

- `cargo test -p codex-app-server
get_conversation_summary_by_thread_id_reads_rollout`
- `cargo test -p codex-app-server
get_conversation_summary_by_relative_rollout_path_resolves_from_codex_home`
- `cargo test -p codex-app-server
thread_resume_prefers_path_over_thread_id`

Windows-specific confidence comes from the Bazel Windows CI job for this
PR, since the failure is platform-specific.

## Docs

No docs update is needed because this is test-only infrastructure
stabilization.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/19604).
* #19395
* #19394
* #19393
* #19392
* #19606
* __->__ #19604
2026-04-25 16:25:28 -07:00