Commit Graph

871 Commits

Author SHA1 Message Date
Thibault Sottiaux
a8d5ad37b8 feat: experimental support for skills.md (#7412)
This change prototypes support for Skills with the CLI. This is an
**experimental** feature for internal testing.

---------

Co-authored-by: Gav Verma <gverma@openai.com>
2025-12-01 20:22:35 -08:00
Steve Mostovoy
f443555728 fix(core): enable history lookup on windows (#7457)
- Add portable history log id helper to support inode-like tracking on
Unix and creation time on Windows
- Refactor history metadata and lookup to share code paths and allow
nonzero log ids across platforms
- Add coverage for lookup stability after appends
2025-12-01 16:29:01 -08:00
Dylan Hurd
5b25915d7e fix(apply_patch) tests for shell_command (#7307)
## Summary
Adds test coverage for invocations of apply_patch via shell_command with
heredoc, to validate behavior.

## Testing
- [x] These are tests
2025-12-01 15:09:22 -08:00
Ali Towaiji
0cc3b50228 Fix recent_commits(limit=0) returning 1 commit instead of 0 (#7334)
Fixes #7333

This is a small bug fix.

This PR fixes an inconsistency in `recent_commits` where `limit == 0`
still returns 1 commit due to the use of `limit.max(1)` when
constructing the `git log -n` argument.

Expected behavior: requesting 0 commits should return an empty list.

This PR:
- returns an empty `Vec` when `limit == 0`
- adds a test for `recent_commits(limit == 0)` that fails before the
change and passes afterwards
- maintains existing behavior for `limit > 0`

This aligns behavior with API expectations and avoids downstream
consumers misinterpreting the repository as having commit history when
`limit == 0` is used to explicitly request none.

Happy to adjust if the current behavior is intentional.
2025-12-01 10:14:36 -08:00
jif-oai
a421eba31f fix: disable review rollout filtering (#7371) 2025-12-01 09:04:13 +00:00
jif-oai
457c9fdb87 chore: better session recycling (#7368) 2025-11-30 12:42:26 -08:00
jif-oai
aaec8abf58 feat: detached review (#7292) 2025-11-28 11:34:57 +00:00
Job Chong
cbd7d0d543 chore: improve rollout session init errors (#7336)
Title: Improve rollout session initialization error messages

Issue: https://github.com/openai/codex/issues/7283

What: add targeted mapping for rollout/session initialization errors so
users get actionable messages when Codex cannot access session files.

Why: session creation previously returned a generic internal error,
hiding permissions/FS issues and making support harder.

How:
- Added rollout::error::map_session_init_error to translate the more
common io::Error kinds into user-facing hints (permission, missing dir,
file blocking, corruption). Others are passed through directly with
`CodexErr::Fatal`.
- Reused the mapper in Codex session creation to preserve root causes
instead of returning InternalAgentDied.
2025-11-27 00:20:33 -08:00
Eric Traut
e953092949 Fixed regression in experimental "sandbox command assessment" feature (#7308)
Recent model updates caused the experimental "sandbox tool assessment"
to time out most of the time leaving the user without any risk
assessment or tool summary. This change explicitly sets the reasoning
effort to medium and bumps the timeout.

This change has no effect if the user hasn't enabled the
`experimental_sandbox_command_assessment` feature flag.
2025-11-25 16:15:13 -08:00
jif-oai
28ff364c3a feat: update process ID for event handling (#7261) 2025-11-25 14:21:05 -08:00
jif-oai
4502b1b263 chore: proper client extraction (#6996) 2025-11-25 18:06:12 +00:00
jif-oai
2845e2c006 fix: drop conversation when /new (#7297) 2025-11-25 17:20:25 +00:00
jif-oai
9ba27cfa0a feat: add compaction event (#7289) 2025-11-25 16:12:14 +00:00
jif-oai
37d83e075e feat: add custom env for unified exec process (#7286) 2025-11-25 10:35:35 +00:00
jif-oai
523b40a129 feat[app-serve]: config management (#7241) 2025-11-25 09:29:38 +00:00
Clifford Ressel
3308dc5e48 fix: Correct the stream error message (#7266)
Fixes a copy paste bug with the error handling in  `try_run_turn`

I have read the CLA Document and I hereby sign the CLA
2025-11-24 20:16:29 -08:00
jif-oai
fc2ff624ac fix: don't store early exit sessions (#7263) 2025-11-24 21:14:24 +00:00
Josh McKinney
ec49b56874 chore: add cargo-deny configuration (#7119)
- add GitHub workflow running cargo-deny on push/PR
- document cargo-deny allowlist with workspace-dep notes and advisory
ignores
- align workspace crates to inherit version/edition/license for
consistent checks
2025-11-24 12:22:18 -08:00
Gabriel Peal
3741f387e9 Allow enterprises to skip upgrade checks and messages (#7213)
This is a feature primarily for enterprises who centrally manage Codex
updates.
2025-11-24 15:04:49 -05:00
Dylan Hurd
1e832b1438 fix(windows) support apply_patch parsing in powershell (#7221)
## Summary
Support powershell parsing of apply_patch

## Testing
- [x] Enable apply_patch unit tests

---------

Co-authored-by: jif-oai <jif@openai.com>
2025-11-24 19:32:47 +00:00
Matthew Zeng
c31663d745 [feedback] Add source info into feedback metadata. (#7140)
Verified the source info is correctly attached based on whether it's cli
or vscode.
2025-11-24 19:05:37 +00:00
jif-oai
35d89e820f fix: flaky test (#7257) 2025-11-24 18:45:41 +00:00
jif-oai
b2cddec3d7 feat: unified exec basic pruning strategy (#7239)
LRU + exited sessions first
2025-11-24 17:22:32 +00:00
jif-oai
920239f272 fix: codex delegate cancellation (#7092) 2025-11-24 16:59:09 +00:00
jif-oai
99bcb90353 chore: use proxy for encrypted summary (#7252) 2025-11-24 16:51:47 +00:00
Ahmed Ibrahim
b519267d05 Account for encrypted reasoning for auto compaction (#7113)
- The total token used returned from the api doesn't account for the
reasoning items before the assistant message
- Account for those for auto compaction
- Add the encrypted reasoning effort in the common tests utils
- Add a test to make sure it works as expected
2025-11-22 03:06:45 +00:00
Michael Bolin
c6f68c9df8 feat: declare server capability in shell-tool-mcp (#7112)
This introduces a new feature to Codex when it operates as an MCP
_client_ where if an MCP _server_ replies that it has an entry named
`"codex/sandbox-state"` in its _server capabilities_, then Codex will
send it an MCP notification with the following structure:

```json
{
  "method": "codex/sandbox-state/update",
  "params": {
    "sandboxPolicy": {
      "type": "workspace-write",
      "network-access": false,
      "exclude-tmpdir-env-var": false
      "exclude-slash-tmp": false
    },
    "codexLinuxSandboxExe": null,
    "sandboxCwd": "/Users/mbolin/code/codex2"
  }
}
```

or with whatever values are appropriate for the initial `sandboxPolicy`.

**NOTE:** Codex _should_ continue to send the MCP server notifications
of the same format if these things change over the lifetime of the
thread, but that isn't wired up yet.

The result is that `shell-tool-mcp` can consume these values so that
when it calls `codex_core::exec::process_exec_tool_call()` in
`codex-rs/exec-server/src/posix/escalate_server.rs`, it is now sure to
call it with the correct values (whereas previously we relied on
hardcoded values).

While I would argue this is a supported use case within the MCP
protocol, the `rmcp` crate that we are using today does not support
custom notifications. As such, I had to patch it and I submitted it for
review, so hopefully it will be accepted in some form:

https://github.com/modelcontextprotocol/rust-sdk/pull/556

To test out this change from end-to-end:

- I ran `cargo build` in `~/code/codex2/codex-rs/exec-server`
- I built the fork of Bash in `~/code/bash/bash`
- I added the following to my `~/.codex/config.toml`:

```toml
# Use with `codex --disable shell_tool`.
[mcp_servers.execshell]
args = ["--bash", "/Users/mbolin/code/bash/bash"]
command = "/Users/mbolin/code/codex2/codex-rs/target/debug/codex-exec-mcp-server"
```

- From `~/code/codex2/codex-rs`, I ran `just codex --disable shell_tool`
- When the TUI started up, I verified that the sandbox mode is
`workspace-write`
- I ran `/mcp` to verify that the shell tool from the MCP is there:

<img width="1387" height="1400" alt="image"
src="https://github.com/user-attachments/assets/1a8addcc-5005-4e16-b59f-95cfd06fd4ab"
/>

- Then I asked it:

> what is the output of `gh issue list`

because this should be auto-approved with our existing dummy policy:


af63e6eccc/codex-rs/exec-server/src/posix.rs (L157-L164)

And it worked:

<img width="1387" height="1400" alt="image"
src="https://github.com/user-attachments/assets/7568d2f7-80da-4d68-86d0-c265a6f5e6c1"
/>
2025-11-21 16:11:01 -08:00
zhao-oai
87b211709e bypass sandbox for policy approved commands (#7110)
allowing cmds greenlit by execpolicy to bypass sandbox + minor refactor
for a world where we have execpolicy rules with specific sandbox
requirements
2025-11-21 18:03:23 -05:00
Michael Bolin
67975ed33a refactor: inline sandbox type lookup in process_exec_tool_call (#7122)
`process_exec_tool_call()` was taking `SandboxType` as a param, but in
practice, the only place it was constructed was in
`codex_message_processor.rs` where it was derived from the other
`sandbox_policy` param, so this PR inlines the logic that decides the
`SandboxType` into `process_exec_tool_call()`.



---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122).
* #7112
* __->__ #7122
2025-11-21 22:53:05 +00:00
Jeremy Rose
7561a6aaf0 support MCP elicitations (#6947)
No support for request schema yet, but we'll at least show the message
and allow accept/decline.

<img width="823" height="551" alt="Screenshot 2025-11-21 at 2 44 05 PM"
src="https://github.com/user-attachments/assets/6fbb892d-ca12-4765-921e-9ac4b217534d"
/>
2025-11-21 14:44:53 -08:00
pakrym-oai
e52cc38dfd Use use_model (#7121) 2025-11-21 22:10:52 +00:00
iceweasel-oai
3bdcbc7292 Windows: flag some invocations that launch browsers/URLs as dangerous (#7111)
Prevent certain Powershell/cmd invocations from reaching the sandbox
when they are trying to launch a browser, or run a command with a URL,
etc.
2025-11-21 13:36:17 -08:00
Ahmed Ibrahim
d5f661c91d enable unified exec for experiments (#7118) 2025-11-21 13:10:01 -08:00
Ahmed Ibrahim
8ecaad948b feat: Add exp model to experiment with the tools (#7115) 2025-11-21 12:44:47 -08:00
jif-oai
af65666561 chore: drop model_max_output_tokens (#7100) 2025-11-21 17:42:54 +00:00
jif-oai
bce030ddb5 Revert "fix: read max_output_tokens param from config" (#7088)
Reverts openai/codex#4139
2025-11-21 11:40:02 +01:00
Yorling
c9e149fd5c fix: read max_output_tokens param from config (#4139)
Request param `max_output_tokens` is documented in
`https://github.com/openai/codex/blob/main/docs/config.md`,
but nowhere uses the item in config, this commit read it from config for
GPT responses API.

see https://github.com/openai/codex/issues/4138 for issue report.

Signed-off-by: Yorling <shallowcloud@yeah.net>
2025-11-20 22:46:34 -08:00
Eric Traut
bacdc004be Fixed two tests that can fail in some environments that have global git rewrite rules (#7068)
This fixes https://github.com/openai/codex/issues/7044
2025-11-20 22:45:40 -08:00
pakrym-oai
ab5972d447 Support all types of search actions (#7061)
Fixes the 

```
{
  "error": {
    "message": "Invalid value: 'other'. Supported values are: 'search', 'open_page', and 'find_in_page'.",
    "type": "invalid_request_error",
    "param": "input[150].action.type",
    "code": "invalid_value"
  }
```
error.


The actual-actual fix here is supporting absent `query` parameter.
2025-11-20 20:45:28 -08:00
pakrym-oai
767b66f407 Migrate coverage to shell_command (#7042) 2025-11-21 03:44:00 +00:00
pakrym-oai
830ab4ce20 Support full powershell paths in is_safe_command (#7055)
New shell implementation always uses full paths.
2025-11-20 19:29:15 -08:00
Celia Chen
7e2165f394 [app-server] update doc with codex error info (#6941)
Document new codex error info. Also fixed the name from
`codex_error_code` to `codex_error_info`.
2025-11-21 01:02:37 +00:00
Michael Bolin
8e5f38c0f0 feat: waiting for an elicitation should not count against a shell tool timeout (#6973)
Previously, we were running into an issue where we would run the `shell`
tool call with a timeout of 10s, but it fired an elicitation asking for
user approval, the time the user took to respond to the elicitation was
counted agains the 10s timeout, so the `shell` tool call would fail with
a timeout error unless the user is very fast!

This PR addresses this issue by introducing a "stopwatch" abstraction
that is used to manage the timeout. The idea is:

- `Stopwatch::new()` is called with the _real_ timeout of the `shell`
tool call.
- `process_exec_tool_call()` is called with the `Cancellation` variant
of `ExecExpiration` because it should not manage its own timeout in this
case
- the `Stopwatch` expiration is wired up to the `cancel_rx` passed to
`process_exec_tool_call()`
- when an elicitation for the `shell` tool call is received, the
`Stopwatch` pauses
- because it is possible for multiple elicitations to arrive
concurrently, it keeps track of the number of "active pauses" and does
not resume until that counter goes down to zero

I verified that I can test the MCP server using
`@modelcontextprotocol/inspector` and specify `git status` as the
`command` with a timeout of 500ms and that the elicitation pops up and I
have all the time in the world to respond whereas previous to this PR,
that would not have been possible.

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6973).
* #7005
* __->__ #6973
* #6972
2025-11-20 16:45:38 -08:00
Ahmed Ibrahim
1388e99674 fix flaky tool_call_output_exceeds_limit_truncated_chars_limit (#7043)
I am suspecting this is flaky because of the wall time can become 0,
0.1, or 1.
2025-11-20 16:36:29 -08:00
Michael Bolin
f56d1dc8fc feat: update process_exec_tool_call() to take a cancellation token (#6972)
This updates `ExecParams` so that instead of taking `timeout_ms:
Option<u64>`, it now takes a more general cancellation mechanism,
`ExecExpiration`, which is an enum that includes a
`Cancellation(tokio_util::sync::CancellationToken)` variant.

If the cancellation token is fired, then `process_exec_tool_call()`
returns in the same way as if a timeout was exceeded.

This is necessary so that in #6973, we can manage the timeout logic
external to the `process_exec_tool_call()` because we want to "suspend"
the timeout when an elicitation from a human user is pending.








---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
* #7005
* #6973
* __->__ #6972
2025-11-20 16:29:57 -08:00
Ahmed Ibrahim
9be310041b migrate collect_tool_identifiers_for_model to test_codex (#7041)
Maybe it solved flakiness
2025-11-20 16:02:50 -08:00
Xiao-Yong Jin
0fbcdd77c8 core: make shell behavior portable on FreeBSD (#7039)
- Use /bin/sh instead of /bin/bash on FreeBSD/OpenBSD in the process
group timeout test to avoid command-not-found failures.

- Accept /usr/local/bin/bash as a valid SHELL path to match common
FreeBSD installations.

- Switch the shell serialization duration test to /bin/sh for improved
portability across Unix platforms.

With this change, `cargo test -p codex-core --lib` runs and passes on
FreeBSD.
2025-11-20 16:01:35 -08:00
Celia Chen
9bce050385 [app-server & core] introduce new codex error code and v2 app-server error events (#6938)
This PR does two things:
1. populate a new `codex_error_code` protocol in error events sent from
core to client;
2. old v1 core events `codex/event/stream_error` and `codex/event/error`
will now both become `error`. We also show codex error code for
turncompleted -> error status.

new events in app server test:
```
< {
<   "method": "codex/event/stream_error",
<   "params": {
<     "conversationId": "019aa34c-0c14-70e0-9706-98520a760d67",
<     "id": "0",
<     "msg": {
<       "codex_error_code": {
<         "response_stream_disconnected": {
<           "http_status_code": 401
<         }
<       },
<       "message": "Reconnecting... 2/5",
<       "type": "stream_error"
<     }
<   }
< }

 {
<   "method": "error",
<   "params": {
<     "error": {
<       "codexErrorCode": {
<         "responseStreamDisconnected": {
<           "httpStatusCode": 401
<         }
<       },
<       "message": "Reconnecting... 2/5"
<     }
<   }
< }

< {
<   "method": "turn/completed",
<   "params": {
<     "turn": {
<       "error": {
<         "codexErrorCode": {
<           "responseTooManyFailedAttempts": {
<             "httpStatusCode": 401
<           }
<         },
<         "message": "exceeded retry limit, last status: 401 Unauthorized, request id: 9a1b495a1a97ed3e-SJC"
<       },
<       "id": "0",
<       "items": [],
<       "status": "failed"
<     }
<   }
< }
```
2025-11-20 23:06:55 +00:00
Ahmed Ibrahim
54ee302a06 Attempt to fix unified_exec_formats_large_output_summary flakiness (#7029)
second attempt to fix this test after
https://github.com/openai/codex/pull/6884. I think this flakiness is
happening because yield_time is too small for a 10,000 step loop in
python.
2025-11-20 14:38:04 -08:00
pakrym-oai
856f97f449 Delete shell_command feature (#7024) 2025-11-20 14:14:56 -08:00