- reduce the server-side compaction test matrix to the highest-signal cases
- add comments around the deferred checkpoint rewrite and inline/preflight split
Co-authored-by: Codex <noreply@openai.com>
Handle repeated inline compactions on turns that started from empty history by stripping leading compaction items after prefix calculation, and add regression coverage for the fresh-session case.
Co-authored-by: Codex <noreply@openai.com>
Ignore compact_prompt for OpenAI inline auto-compaction, remove the legacy compat downgrade path, and keep /compact on the point-in-time endpoint. Also skip previous-model preflight remote compaction when inline server-side compaction is available.\n\nCo-authored-by: Codex <noreply@openai.com>
Keep current-turn inputs in local inline compaction checkpoints and remember known backend incompatibilities after a compat downgrade so later turns skip the failed inline request path.
Co-authored-by: Codex <noreply@openai.com>
Keep same-turn ghost snapshots when pre-turn inline compaction downgrades to the legacy client-side path so undo state survives compatibility fallback.
Co-authored-by: Codex <noreply@openai.com>
Move normal auto-compaction onto inline Responses API compaction behind a feature flag, keep the legacy path for manual and compatibility cases, and add observability plus integration coverage.
Co-authored-by: Codex <noreply@openai.com>
## What changed
- Drop failed websocket connections immediately after a terminal stream
error instead of awaiting a graceful close handshake before forwarding
the error to the caller.
- Keep the success path and the closed-connection guard behavior
unchanged.
## Why this fixes the flake
- The failing integration test waits for the second websocket stream to
surface the model error before issuing a follow-up request.
- On slower runners, the old error path awaited
`ws_stream.close().await` before sending the error downstream. If that
close handshake stalled, the test kept waiting for an error that had
already happened server-side and nextest timed it out.
- Dropping the failed websocket immediately makes the terminal error
observable right away and marks the session closed so the next request
reconnects cleanly instead of depending on a best-effort close
handshake.
## Code or test?
- This is a production logic fix in `codex-api`. The existing websocket
integration test already exercises the regression path.
- include the requested sub-agent model and reasoning effort in the
spawn begin event\n- render that metadata next to the spawned agent name
and role in the TUI transcript
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- update the code-mode handler, runner, instructions, and error text to
refer to the `exec` tool name everywhere that used to say `code_mode`
- ensure generated documentation strings and tool specs describe `exec`
and rely on the shared `PUBLIC_TOOL_NAME`
- refresh the suite tests so they invoke `exec` instead of the old name
Testing
- Not run (not requested)
## Summary
- update the guardian prompting
- clarify the guardian rejection message so an action may still proceed
if the user explicitly approves it after being informed of the risk
## Testing
- cargo run on selected examples
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
Pass more params to /compact. This should give us parity with the
/responses endpoint to improve caching.
I'm torn about the MCP await. Blocking will give us parity but it seems
like we explicitly don't block on MCPs. Happy either way
Summary
- document how code-mode can import `output_text`/`output_image` and
ensure `add_content` stays compatible
- add a synthetic `@openai/code_mode` module that appends content items
and validates inputs
- cover the new behavior with integration tests for structured text and
image outputs
Testing
- Not run (not requested)
- clarify the `close_agent` tool description so it nudges models to
close agents they no longer need
- keep the change scoped to the tool spec text only
Co-authored-by: Codex <noreply@openai.com>
Summary
- document that `@openai/code_mode` exposes
`set_max_output_tokens_per_exec_call` and that `code_mode` truncates the
final Rust-side output when the budget is exceeded
- enforce the configured budget in the Rust tool runner, reusing
truncation helpers so text-only outputs follow the unified-exec wrapper
and mixed outputs still fit within the limit
- ensure the new behavior is covered by a code-mode integration test and
string spec update
Testing
- Not run (not requested)
Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes
Testing
- Not run (not requested)
- add `model` and `reasoning_effort` to the `spawn_agent` schema so the
values pass through
- validate requested models against `model.model` and only check that
the selected model supports the requested reasoning effort
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add ARC monitor support for MCP tool calls by serializing MCP approval
requests into the ARC action shape and sending the relevant
conversation/policy context to the `/api/codex/safety/arc` endpoint
- route ARC outcomes back into MCP approval flow so `ask-user` falls
back to a user prompt and `steer-model` blocks the tool call, with
guardian/ARC tests covering the new request shape
- update the TUI approval copy from “Approve Once” to “Allow” / “Allow
for this session” and refresh the related
snapshots
---------
Co-authored-by: Fouad Matin <fouad@openai.com>
Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>
Summary
- document output types for the various tool handlers and registry so
the API exposes richer descriptions
- update unified execution helpers and client tests to align with the
new output metadata
- clean up unused helpers across tool dispatch paths
Testing
- Not run (not requested)
There are some bug investigations that currently require us to ask users
for their user ID even though they've already uploaded logs and session
details via `/feedback`. This frustrates users and increases the time
for diagnosis.
This PR includes the ChatGPT user ID in the metadata uploaded for
`/feedback` (both the TUI and app-server).
Replace the Unix shell lookup path in `codex-rs/core/src/shell.rs` to
use
`libc::getpwuid_r()` instead of `libc::getpwuid()` when resolving the
current
user's shell.
Why:
- `getpwuid()` can return pointers into libc-managed shared storage
- on the musl static Linux build, concurrent callers can race on that
storage
- this matches the crash pattern reported in tmux/Linux sessions with
parallel
shell activity
Refs:
- Fixes#13842
Addresses #13586
This doesn't affect our CI scripts. It was user-reported.
Summary
- add `wiremock::ResponseTemplate` and `body_string_contains` imports
behind `#[cfg(not(debug_assertions))]` in
`codex-rs/core/tests/suite/view_image.rs` so release builds only pull
the helpers they actually use
## Summary
- update the unified exec test to use truncated_output() instead of the
removed output field
- fix the compile failure on latest main after ExecCommandToolOutput
changed shape
## What changed
- The retry test now uses the same streaming SSE test server used by
production-style tests instead of a wiremock sequence.
- The fixture is resolved via `find_resource!`, and the test asserts
that exactly two outbound requests were sent.
## Why this fixes the flake
- The old wiremock sequence approximated early-close behavior, but it
did not reproduce the same streaming semantics the real client sees.
- That meant the retry path depended on mock implementation details
instead of on the actual transport behavior we care about.
- Switching to the streaming SSE helper makes the test exercise the real
early-close/retry contract, and counting requests directly verifies that
we retried exactly once rather than merely hoping the sequence aligned.
## Scope
- Test-only change.
- collect input/output transcript deltas into active handoff transcript
state
- attach and clear that transcript on each handoff, and regenerate
schema/tests
### Purpose
While trying to build out CLI-Tools for the agent to use under skills we
have found that those tools sometimes need to invoke a user elicitation.
These elicitations are handled out of band of the codex app-server but
need to indicate to the exec manager that the command running is not
going to progress on the usual timeout horizon.
### Example
Model calls universal exec:
`$ download-credit-card-history --start-date 2026-01-19 --end-date
2026-02-19 > credit_history.jsonl`
download-cred-card-history might hit a hosted/preauthenticated service
to fetch data. That service might decide that the request requires an
end user approval the access to the personal data. It should be able to
signal to the running thread that the command in question is blocked on
user elicitation. In that case we want the exec to continue, but the
timeout to not expire on the tool call, essentially freezing time until
the user approves or rejects the command at which point the tool would
signal the app-server to decrement the outstanding elicitation count.
Now timeouts would proceed as normal.
### What's Added
- New v2 RPC methods:
- thread/increment_elicitation
- thread/decrement_elicitation
- Protocol updates in:
- codex-rs/app-server-protocol/src/protocol/common.rs
- codex-rs/app-server-protocol/src/protocol/v2.rs
- App-server handlers wired in:
- codex-rs/app-server/src/codex_message_processor.rs
### Behavior
- Counter starts at 0 per thread.
- increment atomically increases the counter.
- decrement atomically decreases the counter; decrement at 0 returns
invalid request.
- Transition rules:
- 0 -> 1: broadcast pause state, pausing all active stopwatches
immediately.
- \>0 -> >0: remain paused.
- 1 -> 0: broadcast unpause state, resuming stopwatches.
- Core thread/session logic:
- codex-rs/core/src/codex_thread.rs
- codex-rs/core/src/codex.rs
- codex-rs/core/src/mcp_connection_manager.rs
### Exec-server stopwatch integration
- Added centralized stopwatch tracking/controller:
- codex-rs/exec-server/src/posix/stopwatch_controller.rs
- Hooked pause/unpause broadcast handling + stopwatch registration:
- codex-rs/exec-server/src/posix/mcp.rs
- codex-rs/exec-server/src/posix/stopwatch.rs
- codex-rs/exec-server/src/posix.rs