Commit Graph

748 Commits

Author SHA1 Message Date
Ahmed Ibrahim
c7e847aaeb Add diagnostics for read_only_unless_trusted timeout flake (#14518)
## Summary
- add targeted diagnostic logging for the
read_only_unless_trusted_requires_approval scenarios in
approval_matrix_covers_all_modes
- add a scoped timeout buffer only for ro_unless_trusted write-file
scenarios: 1000ms -> 2000ms
- keep all other write-file scenarios at 1000ms

## Why
The last two main failures were both in codex-core::all
suite::approvals::approval_matrix_covers_all_modes with exit_code=124 in
the same scenario. This points to execution-time jitter in CI rather
than a semantic approval-policy mismatch.

## Notes
- This does not introduce any >5s timeout and does not
disable/quarantine tests.
- The timeout increase is tightly scoped to the single flaky path and
keeps the matrix deterministic under CI scheduling variance.
2026-03-12 23:51:03 -07:00
Jack Mousseau
7c7e267501 Simplify permissions available in request permissions tool (#14529) 2026-03-12 21:13:17 -07:00
Channing Conger
0daffe667a code_mode: Move exec params from runtime declarations to @pragma (#14511)
This change moves code_mode exec session settings out of the runtime API
and into an optional first-line pragma, so instead of calling runtime
helpers like set_yield_time() or set_max_output_tokens_per_exec_call(),
the model can write // @exec: {"yield_time_ms": ...,
"max_output_tokens": ...} at the top of the freeform exec source. Rust
now parses that pragma before building the source, validates it, and
passes the values directly in the exec start message to the code-mode
broker, which applies them at session start without any worker-runtime
mutation path. The @openai/code_mode module no longer exposes those
setter functions, the docs and grammar were updated to describe the
pragma form, and the existing code_mode tests were converted to use
pragma-based configuration instead.
2026-03-13 03:27:42 +00:00
alexsong-oai
1a363d5fcf Add plugin usage telemetry (#14531)
adding metrics including: 
* plugin used
* plugin installed/uninstalled
* plugin enabled/disabled
2026-03-12 19:22:30 -07:00
Jack Mousseau
b7dba72dbd Rename reject approval policy to granular (#14516) 2026-03-12 16:38:04 -07:00
Eric Traut
d32820ab07 Fix codex exec --profile handling (#14524)
PR #14005 introduced a regression whereby `codex exec --profile`
overrides were dropped when starting or resuming a thread. That causes
the thread to miss profile-scoped settings like
`model_instructions_file`.

This PR preserve the active profile in the thread start/resume config
overrides so the
app-server rebuild sees the same profile that exec resolved. 

Fixes #14515
2026-03-12 17:34:25 -06:00
Rasmus Rygaard
53d5972226 Reapply "Pass more params to compaction" (#14298) (#14521)
This reverts commit 8af97ce4b0.

Confirmed that this runs locally without the previous issues with tool
use
2026-03-12 23:27:21 +00:00
Anton Panasenko
651717323c feat(search_tool): gate search_tool on model supports_search_tool field (#14502) 2026-03-12 16:03:50 -07:00
pakrym-oai
a2546d5dff Expose code-mode tools through globals (#14517)
Summary
- make all code-mode tools accessible as globals so callers only need
`tools.<name>`
- rename text/image helpers and key globals (store, load, ALL_TOOLS,
etc.) to reflect the new shared namespace
- update the JS bridge, runners, descriptions, router, and tests to
follow the new API

Testing
- Not run (not requested)
2026-03-12 15:43:59 -07:00
Jack Mousseau
a314c7d3ae Decouple request permissions feature and tool (#14426) 2026-03-12 14:47:08 -07:00
pakrym-oai
04e14bdf23 Rename exec session IDs to cell IDs (#14510)
- Update the code-mode executor, wait handler, and protocol plumbing to
use cell IDs instead of session IDs for node communication
- Switch tool metadata, wait description, and suite tests to refer to
cell IDs so user-visible messages match the new terminology

**Testing**
- Not run (not requested)
2026-03-12 14:05:30 -07:00
pakrym-oai
dadffd27d4 Fix MCP tool calling (#14491)
Properly escape mcp tool names and make tools only available via
imports.
2026-03-12 13:38:52 -07:00
pakrym-oai
a5a4899d0c Skip nested tool call parallel test on Windows (#14505)
**Summary**
- disable the `code_mode_nested_tool_calls_can_run_in_parallel` test on
Windows where `exec_command` is unavailable

**Testing**
- Not run (not requested)
2026-03-12 13:32:11 -07:00
pakrym-oai
25e301ed98 Add parallel tool call test (#14494)
Summary
- pin tests to `test-gpt-5.1-codex` so code-mode suites exercise that
model explicitly
- add a regression test that ensures nested tool calls can execute in
parallel and assert on timing
- refresh `codex-rs/Cargo.lock` for the updated dependency tree (add
`codex-utils-pty`, drop `codex-otel`)

Testing
- Not run (not requested)
2026-03-12 12:10:14 -07:00
pakrym-oai
d1b03f0d7f Add default code-mode yield timeout (#14484)
Summary
- expose the default yield timeout through code mode runtime so the
handler, wait tool, and protocol share the same 10s value that matches
unified exec
- document the timeout change in the tool descriptions and propagate the
value all the way into the runner metadata
- adjust Cargo.lock to keep the dependency tree in sync with the added
code mode tool dependency

Testing
- Not run (not requested)
2026-03-12 12:06:23 -07:00
pakrym-oai
cfe3f6821a Cleanup code_mode tool descriptions (#14480)
Move to separate files and clarify a bit.
2026-03-12 11:13:35 -07:00
pakrym-oai
2f03b1a322 Dispatch tools when code mode is not awaited directly (#14437)
## Summary
- start a code mode worker once per turn and let it pump nested tool
calls through a dedicated queue
- simplify code mode request/response dispatch around request ids and
generic runner-unavailable errors
- clean up the code mode process API and runner protocol plumbing

## Testing
- not run yet
2026-03-12 09:00:20 -07:00
viyatb-oai
e99e8e4a6b fix: follow up on linux sandbox review nits (#14440)
## Summary
- address the follow-up review nits from #13996 in a separate PR
- make the approvals test command a raw string and keep the
managed-network path using env proxy routing
- inline `--apply-seccomp-then-exec` in the Linux sandbox inner command
builder
- remove the bubblewrap-specific sandbox metric tag path and drop the
`use_legacy_landlock` shim from `sandbox_tag`/`TurnMetadataState::new`
- restore the `Feature` import that `origin/main` currently still needs
in `connectors.rs`

## Testing
- `cargo test -p codex-linux-sandbox`
- focused `codex-core` tests were rerun/started, but the final
verification pass was interrupted when I pushed at request
2026-03-11 23:59:50 -07:00
viyatb-oai
04892b4ceb refactor: make bubblewrap the default Linux sandbox (#13996)
## Summary
- make bubblewrap the default Linux sandbox and keep
`use_legacy_landlock` as the only override
- remove `use_linux_sandbox_bwrap` from feature, config, schema, and
docs surfaces
- update Linux sandbox selection, CLI/config plumbing, and related
tests/docs to match the new default
- fold in the follow-up CI fixes for request-permissions responses and
Linux read-only sandbox error text
2026-03-11 23:31:18 -07:00
pakrym-oai
f6c6128fc7 Support waiting for code_mode sessions (#14295)
## Summary
- persist the code mode runner process in the session-scoped code mode
store
- switch the runner protocol from `init` to `start` with explicit
session ids
- handle runner-side session processing without the init waiter queue

## Validation
- just fmt
- cargo check -p codex-core
- node --check codex-rs/core/src/tools/code_mode_runner.cjs
2026-03-11 23:13:54 -07:00
Ahmed Ibrahim
367a8a2210 Clarify spawn agent authorization (#14432)
- Clarify that spawn_agent requires explicit user permission for
delegation or parallel agent work.
- Add a regression test covering the new description text.
2026-03-11 23:03:07 -07:00
Owen Lin
5bc82c5b93 feat(app-server): propagate traces across tasks and core ops (#14387)
## Summary

This PR keeps app-server RPC request trace context alive for the full
lifetime of the work that request kicks off (e.g. for `thread/start`,
this is `app-server rpc handler -> tokio background task -> core op
submissions`). Previously we lose trace lineage once the request handler
returns or hands work off to background tasks.

This approach is especially relevant for `thread/start` and other RPC
handlers that run in a non-blocking way. In the near future we'll most
likely want to make all app-server handlers run in a non-blocking way by
default, and only queue operations that must operate in order (e.g.
thread RPCs per thread?), so we want to make sure tracing in app-server
just generally works.

Depends on https://github.com/openai/codex/pull/14300

**Before**
<img width="155" height="207" alt="image"
src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
/>


**After**
<img width="299" height="337" alt="image"
src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
/>

## What changed

- Keep request-scoped trace context around until we send the final
response or error, or the connection closes.
- Thread that trace context through detached `thread/start` work so
background startup stays attached to the originating request.
- Pass request trace context through to downstream core operations,
including:
  - thread creation
  - resume/fork flows
  - turn submission
  - review
  - interrupt
  - realtime conversation operations
- Add tracing tests that verify:
  - remote W3C trace context is preserved for `thread/start`
  - remote W3C trace context is preserved for `turn/start`
  - downstream core spans stay under the originating request span
  - request-scoped tracing state is cleaned up correctly
- Clean up shutdown behavior so detached background tasks and spawned
threads are drained before process exit.
2026-03-11 20:18:31 -07:00
Anton Panasenko
77b0c75267 feat: search_tool migrate to bring you own tool of Responses API (#14274)
## Why

to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.

## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
2026-03-11 17:51:51 -07:00
Curtis 'Fjord' Hawthorne
8791f0ab9a Let models opt into original image detail (#14175)
## Summary

This PR narrows original image detail handling to a single opt-in
feature:

- `image_detail_original` lets the model request `detail: "original"` on
supported models
- Omitting `detail` preserves the default resized behavior

The model only sees `detail: "original"` guidance when the active model
supports it:

- JS REPL instructions include the guidance and examples only on
supported models
- `view_image` only exposes a `detail` parameter when the feature and
model can use it

The image detail API is intentionally narrow and consistent across both
paths:

- `view_image.detail` supports only `"original"`; otherwise omit the
field
- `codex.emitImage(..., detail)` supports only `"original"`; otherwise
omit the field
- Unsupported explicit values fail clearly at the API boundary instead
of being silently reinterpreted
- Unsupported explicit `detail: "original"` requests fall back to normal
behavior when the feature is disabled or the model does not support
original detail
2026-03-11 15:25:07 -07:00
Curtis 'Fjord' Hawthorne
5a89660ae4 Add js_repl cwd and homeDir helpers (#14385)
## Summary

This PR adds two read-only path helpers to `js_repl`:

- `codex.cwd`
- `codex.homeDir`

They are exposed alongside the existing `codex.tmpDir` helper so the
REPL can reference basic host path context without reopening direct
`process` access.

## Implementation

- expose `codex.cwd` and `codex.homeDir` from the js_repl kernel
- make `codex.homeDir` come from the kernel process environment
- pass session dependency env through js_repl kernel startup so
`codex.homeDir` matches the env a shell-launched process would see
- keep existing shell `HOME` population behavior unchanged
- update js_repl prompt/docs and add runtime/integration coverage for
the new helpers
2026-03-11 14:44:44 -07:00
Charley Cunningham
f5bb338fdb Defer initial context insertion until the first turn (#14313)
## Summary
- defer fresh-session `build_initial_context()` until the first real
turn instead of seeding model-visible context during startup
- rely on the existing `reference_context_item == None` turn-start path
to inject full initial context on that first real turn (and again after
baseline resets such as compaction)
- add a regression test for `InitialHistory::New` and update affected
deterministic tests / snapshots around developer-message layout,
collaboration instructions, personality updates, and compact request
shapes

## Notes
- this PR does not add any special empty-thread `/compact` behavior
- most of the snapshot churn is the direct result of moving the initial
model-visible context from startup to the first real turn, so first-turn
request layouts no longer contain a pre-user startup copy of permissions
/ environment / other developer-visible context
- remote manual `/compact` with no prior user still skips the remote
compact request; local first-turn `/compact` still issues a compact
request, but that request now reflects the lack of startup-seeded
context

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-11 12:33:10 -07:00
Ahmed Ibrahim
c32c445f1c Clarify locked role settings in spawn prompt (#14283)
- tell agents when a role pins model or reasoning effort so they know
those settings are not changeable
- add prompt-builder coverage for the locked-setting notes
2026-03-11 12:33:10 -07:00
Ahmed Ibrahim
8f8a0f55ce spawn prompt (#14362)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-03-11 12:33:10 -07:00
pakrym-oai
65b325159d Add ALL_TOOLS export to code mode (#14294)
So code mode can search for tools.
2026-03-11 12:33:10 -07:00
Rasmus Rygaard
7f22329389 Revert "Pass more params to compaction" (#14298) 2026-03-11 12:33:10 -07:00
Channing Conger
fd4a673525 Responses: set x-client-request-id as convesration_id when talking to responses (#14312)
Right now we're sending the header session_id to responses which is
ignored/dropped. This sets a useful x-client-request-id to the
conversation_id.
2026-03-11 12:33:10 -07:00
Ahmed Ibrahim
a4d884c767 Split spawn_csv from multi_agent (#14282)
- make `spawn_csv` a standalone feature for CSV agent jobs
- keep `spawn_csv -> multi_agent` one-way and preserve restricted
subagent disable paths
2026-03-11 12:33:09 -07:00
Ahmed Ibrahim
39c1bc1c68 Add realtime start instructions config override (#14270)
- add `realtime_start_instructions` config support
- thread it into realtime context updates, schema, docs, and tests
2026-03-11 12:33:09 -07:00
pakrym-oai
01792a4c61 Prefix code mode output with success or failure message and include error stack (#14272) 2026-03-11 12:33:09 -07:00
Ahmed Ibrahim
c8446d7cf3 Stabilize websocket response.failed error delivery (#14017)
## What changed
- Drop failed websocket connections immediately after a terminal stream
error instead of awaiting a graceful close handshake before forwarding
the error to the caller.
- Keep the success path and the closed-connection guard behavior
unchanged.

## Why this fixes the flake
- The failing integration test waits for the second websocket stream to
surface the model error before issuing a follow-up request.
- On slower runners, the old error path awaited
`ws_stream.close().await` before sending the error downstream. If that
close handshake stalled, the test kept waiting for an error that had
already happened server-side and nextest timed it out.
- Dropping the failed websocket immediately makes the terminal error
observable right away and marks the session closed so the next request
reconnects cleanly instead of depending on a best-effort close
handshake.

## Code or test?
- This is a production logic fix in `codex-api`. The existing websocket
integration test already exercises the regression path.
2026-03-11 12:33:09 -07:00
pakrym-oai
8a099b3dfb Rename code mode tool to exec (#14254)
Summary
- update the code-mode handler, runner, instructions, and error text to
refer to the `exec` tool name everywhere that used to say `code_mode`
- ensure generated documentation strings and tool specs describe `exec`
and rely on the shared `PUBLIC_TOOL_NAME`
- refresh the suite tests so they invoke `exec` instead of the old name

Testing
- Not run (not requested)
2026-03-11 12:33:09 -07:00
Celia Chen
c1a424691f chore: add a separate reject-policy flag for skill approvals (#14271)
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
2026-03-11 12:33:09 -07:00
pakrym-oai
83b22bb612 Add store/load support for code mode (#14259)
adds support for transferring state across code mode invocations.
2026-03-11 12:33:09 -07:00
Rasmus Rygaard
2621ba17e3 Pass more params to compaction (#14247)
Pass more params to /compact. This should give us parity with the
/responses endpoint to improve caching.

I'm torn about the MCP await. Blocking will give us parity but it seems
like we explicitly don't block on MCPs. Happy either way
2026-03-11 12:33:09 -07:00
pakrym-oai
07c22d20f6 Add code_mode output helpers for text and images (#14244)
Summary
- document how code-mode can import `output_text`/`output_image` and
ensure `add_content` stays compatible
- add a synthetic `@openai/code_mode` module that appends content items
and validates inputs
- cover the new behavior with integration tests for structured text and
image outputs

Testing
- Not run (not requested)
2026-03-11 12:33:08 -07:00
pakrym-oai
3d41ff0b77 Add model-controlled truncation for code mode results (#14258)
Summary
- document that `@openai/code_mode` exposes
`set_max_output_tokens_per_exec_call` and that `code_mode` truncates the
final Rust-side output when the budget is exceeded
- enforce the configured budget in the Rust tool runner, reusing
truncation helpers so text-only outputs follow the unified-exec wrapper
and mixed outputs still fit within the limit
- ensure the new behavior is covered by a code-mode integration test and
string spec update

Testing
- Not run (not requested)
2026-03-11 12:33:08 -07:00
pakrym-oai
ee8f84153e Add output schema to MCP tools and expose MCP tool results in code mode (#14236)
Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes

Testing
- Not run (not requested)
2026-03-11 12:33:08 -07:00
Won Park
722e8f08e1 unifying all image saves to /tmp to bug-proof (#14149)
image-gen feature will have the model saving to /tmp by default + at all
times
2026-03-11 12:33:08 -07:00
Ahmed Ibrahim
91ca20c7c3 Add spawn_agent model overrides (#14160)
- add `model` and `reasoning_effort` to the `spawn_agent` schema so the
values pass through
- validate requested models against `model.model` and only check that
the selected model supports the requested reasoning effort

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-11 12:33:08 -07:00
pakrym-oai
00ea8aa7ee Expose strongly-typed result for exec_command (#14183)
Summary
- document output types for the various tool handlers and registry so
the API exposes richer descriptions
- update unified execution helpers and client tests to align with the
new output metadata
- clean up unused helpers across tool dispatch paths

Testing
- Not run (not requested)
2026-03-11 12:33:07 -07:00
Eric Traut
7144f84c69 Fix release-mode integration test compiler failure (#13603)
Addresses #13586

This doesn't affect our CI scripts. It was user-reported.

Summary
- add `wiremock::ResponseTemplate` and `body_string_contains` imports
behind `#[cfg(not(debug_assertions))]` in
`codex-rs/core/tests/suite/view_image.rs` so release builds only pull
the helpers they actually use
2026-03-11 12:33:07 -07:00
Ahmed Ibrahim
aa6a57dfa2 Stabilize incomplete SSE retry test (#13879)
## What changed
- The retry test now uses the same streaming SSE test server used by
production-style tests instead of a wiremock sequence.
- The fixture is resolved via `find_resource!`, and the test asserts
that exactly two outbound requests were sent.

## Why this fixes the flake
- The old wiremock sequence approximated early-close behavior, but it
did not reproduce the same streaming semantics the real client sees.
- That meant the retry path depended on mock implementation details
instead of on the actual transport behavior we care about.
- Switching to the streaming SSE helper makes the test exercise the real
early-close/retry contract, and counting requests directly verifies that
we retried exactly once rather than merely hoping the sequence aligned.

## Scope
- Test-only change.
2026-03-09 22:34:44 -07:00
Ahmed Ibrahim
2e24be2134 Use realtime transcript for handoff context (#14132)
- collect input/output transcript deltas into active handoff transcript
state
- attach and clear that transcript on each handoff, and regenerate
schema/tests
2026-03-09 22:30:03 -07:00
Matthew Zeng
566e4cee4b [apps] Fix apps enablement condition. (#14011)
- [x] Fix apps enablement condition to check both the feature flag and
that the user is not an API key user.
2026-03-09 22:25:43 -07:00
xl-openai
0c33af7746 feat: support disabling bundled system skills (#13792)
Support disable bundled system skills with a config:

[skills.bundled]
enabled = false
2026-03-09 22:02:53 -07:00