codex

mirror of https://github.com/openai/codex.git synced 2026-05-01 20:02:05 +03:00

Author	SHA1	Message	Date
Dylan Hurd	37c36024c7	chore(core): test apply_patch_cli on Windows (#7554 ) ## Summary These tests pass on windows, let's enable them. ## Testing - [x] These are more tests	2025-12-04 10:39:45 -08:00
jif-oai	291b54a762	chore: review in read-only (#7593 )	2025-12-04 10:01:12 -08:00
jif-oai	2b5d0b2935	feat: update sandbox policy to allow TTY (#7580 ) Change: Seatbelt now allows file-ioctl on /dev/ttys[0-9]+ even without the sandbox extension so pre-created PTYs remain interactive (Python REPL, shells). Risk: A seatbelted process that already holds a PTY fd (including one it shouldn’t) could issue tty ioctls like TIOCSTI or termios changes on that fd. This doesn’t allow opening new PTYs or reading/writing them; it only broadens ioctl capability on existing fds. Why acceptable: We already hand the child its PTY for interactive use; restoring ioctls is required for isatty() and prompts to work. The attack requires being given or inheriting a sensitive PTY fd; by design we don’t hand untrusted processes other users’ PTYs (we don't hand them any PTYs actually), so the practical exposure is limited to the PTY intentionally allocated for the session. Validation: Running ``` start a python interpreter and keep it running ``` Followed by: * `calculate 1+1 using it` -> works as expected * `Use this Python session to run the command just fix in /Users/jif/code/codex/codex-rs` -> does not work as expected	2025-12-04 17:58:58 +00:00
zhao-oai	3d35cb4619	Refactor execpolicy fallback evaluation (#7544 ) ## Refactor of the `execpolicy` crate To illustrate why we need this refactor, consider an agent attempting to run `apple \| rm -rf ./`. Suppose `apple` is allowed by `execpolicy`. Before this PR, `execpolicy` would consider `apple` and `pear` and only render one rule match: `Allow`. We would skip any heuristics checks on `rm -rf ./` and immediately approve `apple \| rm -rf ./` to run. To fix this, we now thread a `fallback` evaluation function into `execpolicy` that runs when no `execpolicy` rules match a given command. In our example, we would run `fallback` on `rm -rf ./` and prevent `apple \| rm -rf ./` from being run without approval.	2025-12-03 23:39:48 -08:00
zhao-oai	e925a380dc	whitelist command prefix integration in core and tui (#7033 ) this PR enables TUI to approve commands and add their prefixes to an allowlist: <img width="708" height="605" alt="Screenshot 2025-11-21 at 4 18 07 PM" src="https://github.com/user-attachments/assets/56a19893-4553-4770-a881-becf79eeda32" /> note: we only show the option to whitelist the command when 1) command is not multi-part (e.g `git add -A && git commit -m 'hello world'`) 2) command is not already matched by an existing rule	2025-12-03 23:17:02 -08:00
Ahmed Ibrahim	67e67e054f	Migrate codex max (#7566 ) - make codex max the default - fix: we were doing some async work in sync function which caused tui to panic	2025-12-03 20:54:48 -08:00
Ahmed Ibrahim	cee37a32b2	Migrate model family to models manager (#7565 ) This PR moves `ModelsFamily` to `openai_models`. It also propagates `ModelsManager` to session services and use it to drive model family. We also make `derive_default_model_family` private because it's a step towards what we want: one place that gives model configuration. This is a second step at having one source of truth for models information and config: `ModelsManager`. Next steps would be to remove `ModelsFamily` from config. That's massive because it's being used in 41 occasions mostly pre launching `codex`. Also, we need to make `find_family_for_model` private. It's also big because it's being used in 21 occasions ~ all tests.	2025-12-03 18:49:47 -08:00
Ahmed Ibrahim	00cc00ead8	Introduce `ModelsManager` and migrate `app-server` to use it. (#7552 )	2025-12-03 17:17:56 -08:00
Ahmed Ibrahim	71504325d3	Migrate model preset (#7542 ) - Introduce `openai_models` in `/core` - Move `PRESETS` under it - Move `ModelPreset`, `ModelUpgrade`, `ReasoningEffortPreset`, `ReasoningEffortPreset`, and `ReasoningEffortPreset` to `protocol` - Introduce `Op::ListModels` and `EventMsg::AvailableModels` Next steps: - migrate `app-server` and `tui` to use the introduced Operation	2025-12-03 20:30:43 +00:00
Jeremy Rose	9b3251f28f	seatbelt: allow openpty() (#7507 ) This allows `openpty(3)` to run in the default sandbox. Also permit reading `kern.argmax`, which is the maximum number of arguments to exec().	2025-12-03 09:15:38 -08:00
jif-oai	51307eaf07	feat: retroactive image placeholder to prevent poisoning (#6774 ) If an image can't be read by the API, it will poison the entire history, preventing any new turn on the conversation. This detect such cases and replace the image by a placeholder	2025-12-03 11:35:56 +00:00
Robby He	f3989f6092	fix(unified_exec): use platform default shell when unified_exec shell… (#7486 ) # Unified Exec Shell Selection on Windows ## Problem reference issue #7466 The `unified_exec` handler currently deserializes model-provided tool calls into the `ExecCommandArgs` struct: ```rust #[derive(Debug, Deserialize)] struct ExecCommandArgs { cmd: String, #[serde(default)] workdir: Option<String>, #[serde(default = "default_shell")] shell: String, #[serde(default = "default_login")] login: bool, #[serde(default = "default_exec_yield_time_ms")] yield_time_ms: u64, #[serde(default)] max_output_tokens: Option<usize>, #[serde(default)] with_escalated_permissions: Option<bool>, #[serde(default)] justification: Option<String>, } ``` The `shell` field uses a hard-coded default: ```rust fn default_shell() -> String { "/bin/bash".to_string() } ``` When the model returns a tool call JSON that only contains `cmd` (which is the common case), Serde fills in `shell` with this default value. Later, `get_command` uses that value as if it were a model-provided shell path: ```rust fn get_command(args: &ExecCommandArgs) -> Vec<String> { let shell = get_shell_by_model_provided_path(&PathBuf::from(args.shell.clone())); shell.derive_exec_args(&args.cmd, args.login) } ``` On Unix, this usually resolves to `/bin/bash` and works as expected. However, on Windows this behavior is problematic: - The hard-coded `"/bin/bash"` is not a valid Windows path. - `get_shell_by_model_provided_path` treats this as a model-specified shell, and tries to resolve it (e.g. via `which::which("bash")`), which may or may not exist and may not behave as intended. - In practice, this leads to commands being executed under a non-default or non-existent shell on Windows (for example, WSL bash), instead of the expected Windows PowerShell or `cmd.exe`. The core of the issue is that "model did not specify `shell`" is currently interpreted as "the model explicitly requested `/bin/bash`", which is both Unix-specific and wrong on Windows. ## Proposed Solution Instead of hard-coding `"/bin/bash"` into `ExecCommandArgs`, we should distinguish between: 1. The model explicitly specifying a shell, e.g.: ```json { "cmd": "echo hello", "shell": "pwsh" } ``` In this case, we do want to respect the model’s choice and use `get_shell_by_model_provided_path`. 2. The model omitting the `shell` field entirely, e.g.: ```json { "cmd": "echo hello" } ``` In this case, we should not assume `/bin/bash`. Instead, we should use `default_user_shell()` and let the platform decide. To express this distinction, we can: 1. Change `shell` to be optional in `ExecCommandArgs`: ```rust #[derive(Debug, Deserialize)] struct ExecCommandArgs { cmd: String, #[serde(default)] workdir: Option<String>, #[serde(default)] shell: Option<String>, #[serde(default = "default_login")] login: bool, #[serde(default = "default_exec_yield_time_ms")] yield_time_ms: u64, #[serde(default)] max_output_tokens: Option<usize>, #[serde(default)] with_escalated_permissions: Option<bool>, #[serde(default)] justification: Option<String>, } ``` Here, the absence of `shell` in the JSON is represented as `shell: None`, rather than a hard-coded string value.	2025-12-02 21:49:25 -08:00
jif-oai	72b95db12f	feat: intercept apply_patch for unified_exec (#7446 )	2025-12-02 17:54:02 +00:00
jif-oai	4b78e2ab09	chore: review everywhere (#7444 )	2025-12-02 11:26:27 +00:00
Thibault Sottiaux	a8d5ad37b8	feat: experimental support for skills.md (#7412 ) This change prototypes support for Skills with the CLI. This is an experimental feature for internal testing. --------- Co-authored-by: Gav Verma <gverma@openai.com>	2025-12-01 20:22:35 -08:00
Dylan Hurd	5b25915d7e	fix(apply_patch) tests for shell_command (#7307 ) ## Summary Adds test coverage for invocations of apply_patch via shell_command with heredoc, to validate behavior. ## Testing - [x] These are tests	2025-12-01 15:09:22 -08:00
jif-oai	a421eba31f	fix: disable review rollout filtering (#7371 )	2025-12-01 09:04:13 +00:00
jif-oai	aaec8abf58	feat: detached review (#7292 )	2025-11-28 11:34:57 +00:00
jif-oai	28ff364c3a	feat: update process ID for event handling (#7261 )	2025-11-25 14:21:05 -08:00
jif-oai	9ba27cfa0a	feat: add compaction event (#7289 )	2025-11-25 16:12:14 +00:00
jif-oai	fc2ff624ac	fix: don't store early exit sessions (#7263 )	2025-11-24 21:14:24 +00:00
Josh McKinney	ec49b56874	chore: add cargo-deny configuration (#7119 ) - add GitHub workflow running cargo-deny on push/PR - document cargo-deny allowlist with workspace-dep notes and advisory ignores - align workspace crates to inherit version/edition/license for consistent checks	2025-11-24 12:22:18 -08:00
Dylan Hurd	1e832b1438	fix(windows) support apply_patch parsing in powershell (#7221 ) ## Summary Support powershell parsing of apply_patch ## Testing - [x] Enable apply_patch unit tests --------- Co-authored-by: jif-oai <jif@openai.com>	2025-11-24 19:32:47 +00:00
jif-oai	35d89e820f	fix: flaky test (#7257 )	2025-11-24 18:45:41 +00:00
jif-oai	b2cddec3d7	feat: unified exec basic pruning strategy (#7239 ) LRU + exited sessions first	2025-11-24 17:22:32 +00:00
Ahmed Ibrahim	b519267d05	Account for encrypted reasoning for auto compaction (#7113 ) - The total token used returned from the api doesn't account for the reasoning items before the assistant message - Account for those for auto compaction - Add the encrypted reasoning effort in the common tests utils - Add a test to make sure it works as expected	2025-11-22 03:06:45 +00:00
Michael Bolin	67975ed33a	refactor: inline sandbox type lookup in process_exec_tool_call (#7122 ) `process_exec_tool_call()` was taking `SandboxType` as a param, but in practice, the only place it was constructed was in `codex_message_processor.rs` where it was derived from the other `sandbox_policy` param, so this PR inlines the logic that decides the `SandboxType` into `process_exec_tool_call()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122). * #7112 * __->__ #7122	2025-11-21 22:53:05 +00:00
pakrym-oai	e52cc38dfd	Use use_model (#7121 )	2025-11-21 22:10:52 +00:00
Ahmed Ibrahim	d5f661c91d	enable unified exec for experiments (#7118 )	2025-11-21 13:10:01 -08:00
jif-oai	bce030ddb5	Revert "fix: read `max_output_tokens` param from config" (#7088 ) Reverts openai/codex#4139	2025-11-21 11:40:02 +01:00
Yorling	c9e149fd5c	fix: read `max_output_tokens` param from config (#4139 ) Request param `max_output_tokens` is documented in `https://github.com/openai/codex/blob/main/docs/config.md`, but nowhere uses the item in config, this commit read it from config for GPT responses API. see https://github.com/openai/codex/issues/4138 for issue report. Signed-off-by: Yorling <shallowcloud@yeah.net>	2025-11-20 22:46:34 -08:00
Eric Traut	bacdc004be	Fixed two tests that can fail in some environments that have global git rewrite rules (#7068 ) This fixes https://github.com/openai/codex/issues/7044	2025-11-20 22:45:40 -08:00
pakrym-oai	ab5972d447	Support all types of search actions (#7061 ) Fixes the ``` { "error": { "message": "Invalid value: 'other'. Supported values are: 'search', 'open_page', and 'find_in_page'.", "type": "invalid_request_error", "param": "input[150].action.type", "code": "invalid_value" } ``` error. The actual-actual fix here is supporting absent `query` parameter.	2025-11-20 20:45:28 -08:00
pakrym-oai	767b66f407	Migrate coverage to shell_command (#7042 )	2025-11-21 03:44:00 +00:00
Ahmed Ibrahim	1388e99674	fix flaky `tool_call_output_exceeds_limit_truncated_chars_limit` (#7043 ) I am suspecting this is flaky because of the wall time can become 0, 0.1, or 1.	2025-11-20 16:36:29 -08:00
Michael Bolin	f56d1dc8fc	feat: update process_exec_tool_call() to take a cancellation token (#6972 ) This updates `ExecParams` so that instead of taking `timeout_ms: Option<u64>`, it now takes a more general cancellation mechanism, `ExecExpiration`, which is an enum that includes a `Cancellation(tokio_util::sync::CancellationToken)` variant. If the cancellation token is fired, then `process_exec_tool_call()` returns in the same way as if a timeout was exceeded. This is necessary so that in #6973, we can manage the timeout logic external to the `process_exec_tool_call()` because we want to "suspend" the timeout when an elicitation from a human user is pending. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972). * #7005 * #6973 * __->__ #6972	2025-11-20 16:29:57 -08:00
Ahmed Ibrahim	9be310041b	migrate `collect_tool_identifiers_for_model` to `test_codex` (#7041 ) Maybe it solved flakiness	2025-11-20 16:02:50 -08:00
Xiao-Yong Jin	0fbcdd77c8	core: make shell behavior portable on FreeBSD (#7039 ) - Use /bin/sh instead of /bin/bash on FreeBSD/OpenBSD in the process group timeout test to avoid command-not-found failures. - Accept /usr/local/bin/bash as a valid SHELL path to match common FreeBSD installations. - Switch the shell serialization duration test to /bin/sh for improved portability across Unix platforms. With this change, `cargo test -p codex-core --lib` runs and passes on FreeBSD.	2025-11-20 16:01:35 -08:00
Ahmed Ibrahim	54ee302a06	Attempt to fix `unified_exec_formats_large_output_summary` flakiness (#7029 ) second attempt to fix this test after https://github.com/openai/codex/pull/6884. I think this flakiness is happening because yield_time is too small for a 10,000 step loop in python.	2025-11-20 14:38:04 -08:00
pakrym-oai	856f97f449	Delete shell_command feature (#7024 )	2025-11-20 14:14:56 -08:00
pakrym-oai	52d0ec4cd8	Delete tiktoken-rs (#7018 )	2025-11-20 11:15:04 -08:00
LIHUA	397279d46e	Fix: Improve text encoding for shell output in VSCode preview (#6178 ) (#6182 ) ## 🐛 Problem Users running commands with non-ASCII characters (like Russian text "пример") in Windows/WSL environments experience garbled text in VSCode's shell preview window, with Unicode replacement characters (�) appearing instead of the actual text. Issue: https://github.com/openai/codex/issues/6178 ## 🔧 Root Cause The issue was in `StreamOutput<Vec<u8>>::from_utf8_lossy()` method in `codex-rs/core/src/exec.rs`, which used `String::from_utf8_lossy()` to convert shell output bytes to strings. This function immediately replaces any invalid UTF-8 byte sequences with replacement characters, without attempting to decode using other common encodings. In Windows/WSL environments, shell output often uses encodings like: - Windows-1252 (common Windows encoding) - Latin-1/ISO-8859-1 (extended ASCII) ## 🛠️ Solution Replaced the simple `String::from_utf8_lossy()` call with intelligent encoding detection via a new `bytes_to_string_smart()` function that tries multiple encoding strategies: 1. UTF-8 (fast path for valid UTF-8) 2. Windows-1252 (handles Windows-specific characters in 0x80-0x9F range) 3. Latin-1 (fallback for extended ASCII) 4. Lossy UTF-8 (final fallback, same as before) ## 📁 Changes ### New Files - `codex-rs/core/src/text_encoding.rs` - Smart encoding detection module - `codex-rs/core/tests/suite/text_encoding_fix.rs` - Integration tests ### Modified Files - `codex-rs/core/src/lib.rs` - Added text_encoding module - `codex-rs/core/src/exec.rs` - Updated StreamOutput::from_utf8_lossy() - `codex-rs/core/tests/suite/mod.rs` - Registered new test module ## ✅ Testing - 5 unit tests covering UTF-8, Windows-1252, Latin-1, and fallback scenarios - 2 integration tests simulating the exact Issue #6178 scenario - Demonstrates improvement over the previous `String::from_utf8_lossy()` approach All tests pass: ```bash cargo test -p codex-core text_encoding cargo test -p codex-core test_shell_output_encoding_issue_6178 ``` ## 🎯 Impact - ✅ Eliminates garbled text in VSCode shell preview for non-ASCII content - ✅ Supports Windows/WSL environments with proper encoding detection - ✅ Zero performance impact for UTF-8 text (fast path) - ✅ Backward compatible - UTF-8 content works exactly as before - ✅ Handles edge cases with robust fallback mechanism ## 🧪 Test Scenarios The fix has been tested with: - Russian text ("пример") - Windows-1252 quotation marks (""test") - Latin-1 accented characters ("café") - Mixed encoding content - Invalid byte sequences (graceful fallback) ## 📋 Checklist - [X] Addresses the reported issue - [X] Includes comprehensive tests - [X] Maintains backward compatibility - [X] Follows project coding conventions - [X] No breaking changes --------- Co-authored-by: Josh McKinney <joshka@openai.com>	2025-11-20 11:04:11 -08:00
pakrym-oai	30ca89424c	Always fallback to real shell (#6953 ) Either cmd.exe or `/bin/sh`.	2025-11-20 10:58:46 -08:00
hanson-openai	b5dd189067	Allow unified_exec to early exit (if the process terminates before yield_time_ms) (#6867 ) Thread through an `exit_notify` tokio `Notify` through to the `UnifiedExecSession` so that we can return early if the command terminates before `yield_time_ms`. As Codex review correctly pointed out below 🙌 we also need a `exit_signaled` flag so that commands which finish before we start waiting can also exit early. Since the default `yield_time_ms` is now 10s, this means that we don't have to wait 10s for trivial commands like ls, sed, etc (which are the majority of agent commands 😅) --------- Co-authored-by: jif-oai <jif@openai.com>	2025-11-20 13:34:41 +01:00
zhao-oai	65c13f1ae7	execpolicy2 core integration (#6641 ) This PR threads execpolicy2 into codex-core. activated via feature flag: exec_policy (on by default) reads and parses all .codexpolicy files in `codex_home/codex` refactored tool runtime API to integrate execpolicy logic --------- Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-11-19 16:50:43 -08:00
zhao-oai	72af589398	storing credits (#6858 ) Expand the rate-limit cache/TUI: store credit snapshots alongside primary and secondary windows, render “Credits” when the backend reports they exist (unlimited vs rounded integer balances)	2025-11-19 10:49:35 -08:00
Ahmed Ibrahim	d62cab9a06	fix: don't truncate at new lines (#6907 )	2025-11-19 17:05:48 +00:00
Ahmed Ibrahim	d5dfba2509	feat: arcticfox in the wild (#6906 ) <img width="485" height="600" alt="image" src="https://github.com/user-attachments/assets/4341740d-dd58-4a3e-b69a-33a3be0606c5" /> --------- Co-authored-by: jif-oai <jif@openai.com>	2025-11-19 16:31:06 +00:00
Dylan Hurd	15b5eb30ed	fix(core) Support changing /approvals before conversation (#6836 ) ## Summary Setting `/approvals` before the start of a conversation was not updating the environment_context for a conversation. Not sure exactly when this problem was introduced, but this should reduce model confusion dramatically. ## Testing - [x] Added unit test to reproduce bug, confirmed fix with update - [x] Tested locally	2025-11-19 11:32:48 +00:00
Ahmed Ibrahim	efebc62fb7	Move shell to use `truncate_text` (#6842 ) Move shell to use the configurable `truncate_text` --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-19 01:56:08 -08:00

... 5 6 7 8 9 ...

650 Commits