codex

mirror of https://github.com/openai/codex.git synced 2026-05-02 20:32:04 +03:00

Author	SHA1	Message	Date
iceweasel-oai	5c3ca73914	add a slash command to grant sandbox read access to inaccessible directories (#11512 ) There is an edge case where a directory is not readable by the sandbox. In practice, we've seen very little of it, but it can happen so this slash command unlocks users when it does. Future idea is to make this a tool that the agent knows about so it can be more integrated.	2026-02-12 12:48:36 -08:00
Owen Lin	efc8d45750	feat(app-server): experimental flag to persist extended history (#11227 ) This PR adds an experimental `persist_extended_history` bool flag to app-server thread APIs so rollout logs can retain a richer set of EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e. on `thread/resume`). ### Motivation Today, our rollout recorder only persists a small subset (e.g. user message, reasoning, assistant message) of `EventMsg` types, dropping a good number (like command exec, file change, etc.) that are important for reconstructing full item history for `thread/resume`, `thread/read`, and `thread/fork`. Some clients want to be able to resume a thread without lossiness. This lossiness is primarily a UI thing, since what the model sees are `ResponseItem` and not `EventMsg`. ### Approach This change introduces an opt-in `persist_full_history` flag to preserve those events when you start/resume/fork a thread (defaults to `false`). This is done by adding an `EventPersistenceMode` to the rollout recorder: - `Limited` (existing behavior, default) - `Extended` (new opt-in behavior) In `Extended` mode, persist additional `EventMsg` variants needed for non-lossy app-server `ThreadItem` reconstruction. We now store the following ThreadItems that we didn't before: - web search - command execution - patch/file changes - MCP tool calls - image view calls - collab tool outcomes - context compaction - review mode enter/exit For command executions in particular, we truncate the output using the existing `truncate_text` from core to store an upper bound of 10,000 bytes, which is also the default value for truncating tool outputs shown to the model. This keeps the size of the rollout file and command execution items returned over the wire reasonable. And we also persist `EventMsg::Error` which we can now map back to the Turn's status and populates the Turn's error metadata. #### Updates to EventMsgs To truly make `thread/resume` non-lossy, we also needed to persist the `status` on `EventMsg::CommandExecutionEndEvent` and `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a command failed or was declined (similar for apply_patch). These EventMsgs were never persisted before so I made it a required field.	2026-02-12 19:34:22 +00:00
Michael Bolin	476c1a7160	Remove `test-support` feature from `codex-core` and replace it with explicit test toggles (#11405 ) ## Why `codex-core` was being built in multiple feature-resolved permutations because test-only behavior was modeled as crate features. For a large crate, those permutations increase compile cost and reduce cache reuse. ## Net Change - Removed the `test-support` crate feature and related feature wiring so `codex-core` no longer needs separate feature shapes for test consumers. - Standardized cross-crate test-only access behind `codex_core::test_support`. - External test code now imports helpers from `codex_core::test_support`. - Underlying implementation hooks are kept internal (`pub(crate)`) instead of broadly public. ## Outcome - Fewer `codex-core` build permutations. - Better incremental cache reuse across test targets. - No intended production behavior change.	2026-02-10 22:44:02 -08:00
Michael Bolin	b68a84ee8e	Remove `deterministic_process_ids` feature to avoid duplicate `codex-core` builds (#11393 ) ## Why `codex-core` enabled `deterministic_process_ids` through a self dev-dependency. That forced a second feature-resolved build of the same crate, which increased compile time and test latency. ## What Changed - Removed the `deterministic_process_ids` feature from `codex-rs/core/Cargo.toml`. - Removed the self dev-dependency on `codex-core` that enabled that feature. - Removed the Bazel `deterministic_process_ids` crate feature for `codex-core`. - Added a test-only `AtomicBool` override in unified exec process-id allocation. - Added a test-support setter for that override and re-exported it from `codex-core`. - Enabled deterministic process IDs in integration tests via `core_test_support` ctor. ## Behavior - Production behavior remains random process IDs. - Unit tests remain deterministic via `cfg(test)`. - Integration tests remain deterministic via explicit test-support initialization. ## Validation - `just fmt` - `cargo test -p codex-core unified_exec::` - `cargo test -p codex-core --test all unified_exec -- --test-threads=1` - `cargo tree -p codex-core -e features` (verified the removed feature path)	2026-02-10 19:07:01 -08:00
Michael Bolin	d44f4205fb	chore: rename codex-command to codex-shell-command (#11378 ) This addresses some post-merge feedback on https://github.com/openai/codex/pull/11361: - crate rename - reuse `detect_shell_type()` utility	2026-02-10 17:03:46 -08:00
Michael Bolin	d8f9bb65e2	# Split command parsing/safety out of `codex-core` into new `codex-command` (#11361 ) `codex-core` had accumulated command parsing and command safety logic (`bash`, `powershell`, `parse_command`, and `command_safety`) that is logically cohesive but orthogonal to most core session/runtime logic. Keeping this code in `codex-core` made the crate increasingly monolithic and raised iteration cost for unrelated core changes. This change extracts that surface into a dedicated crate, `codex-command`, while preserving existing `codex_core::...` call sites via re-exports. ## Why this refactor During analysis, command parsing/safety stood out as a good first split because it has: - a clear domain boundary (shell parsing + safety classification) - relatively self-contained dependencies (notably `tree-sitter` / `tree-sitter-bash`) - a meaningful standalone test surface (`134` tests moved with the crate) - many downstream uses that benefit from independent compilation and caching The practical problem was build latency from a large `codex-core` compile/test graph. Clean-build timings before and after this split showed measurable wins: - `cargo check -p codex-core`: `57.08s` -> `53.54s` (~`6.2%` faster) - `cargo test -p codex-core --no-run`: `2m39.9s` -> `2m20s` (~`12.4%` faster) - `codex-core lib` compile unit: `57.18s` -> `49.67s` (~`13.1%` faster) - `codex-core lib(test)` compile unit: `60.87s` -> `53.21s` (~`12.6%` faster) This gives a concrete reduction in core build overhead without changing behavior. ## What changed ### New crate - Added `codex-rs/command` as workspace crate `codex-command`. - Added: - `command/src/lib.rs` - `command/src/bash.rs` - `command/src/powershell.rs` - `command/src/parse_command.rs` - `command/src/command_safety/` - `command/src/shell_detect.rs` - `command/BUILD.bazel` ### Code moved out of `codex-core` - Moved modules from `core/src` into `command/src`: - `bash.rs` - `powershell.rs` - `parse_command.rs` - `command_safety/` ### Dependency graph updates - Added workspace member/dependency entries for `codex-command` in `codex-rs/Cargo.toml`. - Added `codex-command` dependency to `codex-rs/core/Cargo.toml`. - Removed `tree-sitter` and `tree-sitter-bash` from `codex-core` direct deps (now owned by `codex-command`). ### API compatibility for callers To avoid immediate downstream churn, `codex-core` now re-exports the moved modules/functions: - `codex_command::bash` - `codex_command::powershell` - `codex_command::parse_command` - `codex_command::is_safe_command` - `codex_command::is_dangerous_command` This keeps existing `codex_core::...` paths working while enabling gradual migration to direct `codex-command` usage. ### Internal decoupling detail - Added `command::shell_detect` so moved `bash`/`powershell` logic no longer depends on core shell internals. - Adjusted PowerShell helper visibility in `codex-command` for existing core test usage (`UTF8` prefix helper + executable discovery functions). ## Validation - `just fmt` - `just fix -p codex-command -p codex-core` - `cargo test -p codex-command` (`134` passed) - `cargo test -p codex-core --no-run` - `cargo test -p codex-core shell_command_handler` ## Notes / follow-up This commit intentionally prioritizes boundary extraction and compatibility. A follow-up can migrate downstream crates to depend directly on `codex-command` (instead of through `codex-core` re-exports) to realize additional incremental build wins.	2026-02-10 14:43:16 -08:00
iceweasel-oai	82f93a13b2	include sandbox (seatbelt, elevated, etc.) as in turn metadata header (#10946 ) This will help us understand retention/usage for folks who use the Windows (or any other) sandboxes	2026-02-10 19:50:07 +00:00
viyatb-oai	62d0f302fd	fix(core): canonicalize wrapper approvals and support heredoc prefix … (#10941 ) ## Summary - Reduced repeated approvals for equivalent wrapper commands and fixed execpolicy matching for heredoc-style shell invocations, with minimal behavior change and fail-closed defaults. ## Fixes 1. Canonicalized approval matching for wrappers so equivalent commands map to the same approval intent. 2. Added heredoc-aware prefix extraction for execpolicy so commands like `python3 <<'PY' ... PY` match rules such as `prefix_rule(["python3"], ...)`. 3. Kept fallback behavior conservative: if parsing is ambiguous, existing prompt behavior is preserved. ## Edge Cases Covered - Wrapper path/name differences: `/bin/bash` vs `bash`, `/bin/zsh` vs `zsh`. - Shell modes: `-c` and `-lc`. - Heredoc forms: quoted delimiter (`<<'PY'`) and unquoted delimiter (`<< PY`). - Multi-command heredoc scripts are rejected by the fallback - Non-heredoc redirections (`>`, etc.) are not treated as heredoc prefix matches. - Complex scripts still fall back to prior behavior rather than expanding permissions. --------- Co-authored-by: Dylan Hurd <dylan.hurd@openai.com>	2026-02-10 11:46:40 -08:00
jif-oai	d735df1f50	Extract hooks into dedicated crate (#11311 ) Summary - move `core/src/hooks` implementation into a new `codex-hooks` crate with its own manifest - update `codex-rs` workspace and `codex-core` crate to depend on the extracted `hooks` crate and wire up the shared APIs - ensure references, modules, and lockfile reflect the new crate layout Testing - Not run (not requested)	2026-02-10 13:42:17 +00:00
jif-oai	6049ff02a0	memories: add extraction and prompt module foundation (#11200 ) ## Summary - add the new `core/src/memories` module (phase-one parsing, rollout filtering, storage, selection, prompts) - add Askama-backed memory templates for stage-one input/system and consolidation prompts - add module tests for parsing, filtering, path bucketing, and summary maintenance ## Testing - just fmt - cargo test -p codex-core --lib memories::	2026-02-10 10:10:24 +00:00
Matthew Zeng	d90df4761b	[apps] Add gated instructions for Apps. (#10924 ) - [x] Add gated instructions for Apps.	2026-02-09 14:48:09 -08:00
Michael Bolin	ff74aaae21	chore: reverse the codex-network-proxy -> codex-core dependency (#11121 )	2026-02-08 17:03:24 -08:00
Owen Lin	0d8b2b74c4	feat(app-server): turn/steer API (#10821 ) This PR adds a dedicated `turn/steer` API for appending user input to an in-flight turn. ## Motivation Currently, steering in the app is implemented by just calling `turn/start` while a turn is running. This has some really weird quirks: - Client gets back a new `turn.id`, even though streamed events/approvals remained tied to the original active turn ID. - All the various turn-level override params on `turn/start` do not apply to the "steer", and would only apply to the next real turn. - There can also be a race condition where the client thinks the turn is active but the server has already completed it, so there might be bugs if the client has baked in some client-specific behavior thinking it's a steer when in fact the server kicked off a new turn. This is particularly possible when running a client against a remote app-server. Having a dedicated `turn/steer` API eliminates all those quirks. `turn/steer` behavior: - Requires an active turn on threadId. Returns a JSON-RPC error if there is no active turn. - If expectedTurnId is provided, it must match the active turn (more useful when connecting to a remote app-server). - Does not emit `turn/started`. - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or `outputSchema` to accurately reflect that these are not applied when steering.	2026-02-06 00:35:04 +00:00
sayan-oai	5fdf6f5efa	chore: rm web-search-eligible header (#10660 ) default-enablement of web_search is now client-side, no need to send eligibility headers to backend. Tested locally, headers no longer sent. will wait for corresponding backend change to deploy before merging	2026-02-05 11:48:34 -08:00
gt-oai	3b54fd7336	Add hooks implementation and wire up to `notify` (#9691 ) This introduces a `Hooks` service. It registers hooks from config and dispatches hook events at runtime. N.B. The hook config is not wired up to this yet. But for legacy reasons, we wire up `notify` from config and power it using hooks now. Nothing about the `notify` interface has changed. I'd start by reviewing `hooks/types.rs` Some things to note: - hook names subject to change - no hook result yet - stopping semantics yet to be introduced - additional hooks yet to be introduced	2026-02-05 16:49:35 +00:00
pap-openai	b2424cb635	adding fork information (UI) when forking (#10246 ) - shows `/fork` command that ran in prev session - shows `session forked from name (uuid) \|\| uuid (if name is not set)` as an event in new session	2026-02-05 13:24:55 +00:00
pakrym-oai	0e8d359da9	Session-level model client (#10664 ) Make ModelClient a session-scoped object. Move state that is session level onto the client, and make state that is per-turn explicit on corresponding methods. Stop taking a huge Config object, instead only pass in values that are actually needed. --------- Co-authored-by: Josh McKinney <joshka@openai.com>	2026-02-04 16:58:48 -08:00
Eric Traut	7bcc552325	Added support for live updates to skills (#10478 ) Add a centralized FileWatcher in codex-core (using notify) that watches skill roots from the config layer stack (recursive) Send `SkillsChanged` events when relevant file system changes are detected On `SkillsChanged`: * Invalidate the skills cache immediately in ThreadManager * Emit EventMsg::SkillsUpdateAvailable to active sessions ~~* Broadcast a new app-server notification: SkillsListUpdatedNotification~~ This change does not inject new items into the event stream. That means the agent will not know about new skills, so it won't be able to implicitly invoke new skills. It also won't know about changes to existing skills, so if it has already read the contents of a modified skill, it will not honor the new behavior. This change also does not detect modifications to AGENTS.md. I plan to address these limitations in a follow-on PR modeled after #9985. Injection of new skills and AGENTS was deemed to risky, hence the need to split the feature into two stages. The changes in this PR were designed to easily accommodate the second stage once we have some other foundational changes in place. Testing: In addition to automated tests, I did manual testing to confirm that newly-created skills, deleted skills, and renamed skills are reflected in the TUI skill picker menu. Also confirmed that modifications to behaviors for explicitly-invoked skills are honored. --------- Co-authored-by: Xin Lin <xl@openai.com>	2026-02-04 15:25:03 -08:00
jif-oai	e9335374b9	feat: add phase 1 mem client (#10629 ) Adding a client on top of https://github.com/openai/openai/pull/672176	2026-02-04 17:59:36 +00:00
pakrym-oai	56ebfff1a8	Move metadata calculation out of client (#10589 ) Model client shouldn't be responsible for this.	2026-02-03 21:59:13 -08:00
Anton Panasenko	fcaed4cb88	feat: log webscocket timing into runtime metrics (#10577 )	2026-02-03 18:04:07 -08:00
jif-oai	d2394a2494	chore: nuke chat/completions API (#10157 )	2026-02-03 11:31:57 +00:00
pash-openai	019d89ff86	make codex better at git (#10145 ) adds basic git context to the session prefix so the model can anchor git actions and be a bit more version-aware. structured it in a multiroot-friendly shape even though we only have one root today	2026-02-02 16:57:29 -08:00
pap-openai	1644cbfc6d	Session picker shows thread_name if set (#10340 ) - shows names of threads in the ResumePicker used by `/resume` and `codex resume` if set, default to preview (previous behaviour) if none - adds a `find_thread_names_by_ids` that maps names to IDs in `codex-rs/core/src/rollout/session_index.rs`. It reads sequentially in normal (instead of reverse order in `codex resume <name>`) the index mapping file. This function is called from a list of session (default page is 25, pages loaded depends of height of terminal), for which most of them will always have at least one session unnamed and require the whole file to be read therefore. Could be better and sqlite integration will make this better - those reads won't be needed when leveraging sqlite Opened questions: - We could rename the TUI "Conversation" column to "Name" or "Thread" that would feel more accurate. Could be a fast-follow if we implement auto-naming as it'll always be a name instead?	2026-02-02 08:13:17 +00:00
alexsong-oai	b164ac6d1e	feat: fire tracking events for skill invocation (#10120 )	2026-01-31 18:06:26 -08:00
Dylan Hurd	0f9858394b	feat(core,tui,app-server) personality migration (#10307 ) ## Summary Keep existing users on Pragmatic, to preserve behavior while new users default to Friendly ## Testing - [x] Tested locally - [x] add integration tests	2026-01-31 17:25:14 -07:00
Charley Cunningham	ec4a2d07e4	Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786 ) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - Normal text → `AgentMessageContentDelta` - Plan text → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`	2026-01-30 18:59:30 +00:00
pap-openai	1ef5455eb6	Conversation naming (#8991 ) Session renaming: - `/rename my_session` - `/rename` without arg and passing an argument in `customViewPrompt` - AppExitInfo shows resume hint using the session name if set instead of uuid, defaults to uuid if not set - Names are stored in `CODEX_HOME/sessions.jsonl` Session resuming: - codex resume <name> lookup for `CODEX_HOME/sessions.jsonl` first entry matching the name and resumes the session --------- Co-authored-by: jif-oai <jif@openai.com>	2026-01-30 10:40:09 +00:00
pakrym-oai	3b1cddf001	Fall back to http when websockets fail (#10139 ) I expect not all proxies work with websockets, fall back to http if websockets fail.	2026-01-29 10:36:21 -08:00
Matthew Zeng	b9cd089d1f	[connectors] Support connectors part 2 - slash command and tui (#9728 ) - [x] Support `/apps` slash command to browse the apps in tui. - [x] Support inserting apps to prompt using `$`. - [x] Lots of simplification/renaming from connectors to apps.	2026-01-28 19:51:58 -08:00
jif-oai	3878c3dc7c	feat: sqlite 1 (#10004 ) Add a `.sqlite` database to be used to store rollout metatdata (and later logs) This PR is phase 1: * Add the database and the required infrastructure * Add a backfill of the database * Persist the newly created rollout both in files and in the DB * When we need to get metadata or a rollout, consider the `JSONL` as the source of truth but compare the results with the DB and show any errors	2026-01-28 15:29:14 +01:00
iceweasel-oai	c40ad65bd8	remove sandbox globals. (#9797 ) Threads sandbox updates through OverrideTurnContext for active turn Passes computed sandbox type into safety/exec	2026-01-27 11:04:23 -08:00
sayan-oai	86adf53235	fix: handle all web_search actions and in progress invocations (#9960 ) ### Summary - Parse all `web_search` tool actions (`search`, `find_in_page`, `open_page`). - Previously we only parsed + displayed `search`, which made the TUI appear to pause when the other actions were being used. - Show in progress `web_search` calls as `Searching the web` - Previously we only showed completed tool calls <img width="308" height="149" alt="image" src="https://github.com/user-attachments/assets/90a4e8ff-b06a-48ff-a282-b57b31121845" /> ### Tests Added + updated tests, tested locally ### Follow ups Update VSCode extension to display these as well	2026-01-27 03:33:48 +00:00
Charley Cunningham	62266b13f8	Add thread/unarchive to restore archived rollouts (#9843 ) ## Summary - Adds a new `thread/unarchive` RPC to move archived thread rollouts back into the active `sessions/` tree. ## What changed - Protocol - Adds `thread/unarchive` request/response types and wiring. - Server - Implements `thread_unarchive` in the app server. - Validates the archived rollout path and thread ID. - Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout filename timestamp. - Core - Adds `find_archived_thread_path_by_id_str` helper for archived rollouts. - Docs - Documents the new RPC and usage example. - Tests - Adds an end-to-end server test that: 1) starts a thread, 2) archives it, 3) unarchives it, 4) asserts the file is restored to `sessions/`. ## How to use ```json { "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } } ``` ## Author Codex Session `codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this kind of session UUID forkable by anyone with the right `session_object_storage_url` line in their config, but for now just pasting it here for my reference)	2026-01-26 11:24:36 -08:00
jif-oai	d594693d1a	feat: dynamic tools injection (#9539 ) ## Summary Add dynamic tool injection to thread startup in API v2, wire dynamic tool calls through the app server to clients, and plumb responses back into the model tool pipeline. ### Flow (high level) - Thread start injects `dynamic_tools` into the model tool list for that thread (validation is done here). - When the model emits a tool call for one of those names, core raises a `DynamicToolCallRequest` event. - The app server forwards it to the client as `item/tool/call`, waits for the client’s response, then submits a `DynamicToolResponse` back to core. - Core turns that into a `function_call_output` in the next model request so the model can continue. ### What changed - Added dynamic tool specs to v2 thread start params and protocol types; introduced `item/tool/call` (request/response) for dynamic tool execution. - Core now registers dynamic tool specs at request time and routes those calls via a new dynamic tool handler. - App server validates tool names/schemas, forwards dynamic tool call requests to clients, and publishes tool outputs back into the session. - Integration tests	2026-01-26 10:06:44 +00:00
jif-oai	83775f4df1	feat: ephemeral threads (#9765 ) Add ephemeral threads capabilities. Only exposed through the `app-server` v2 The idea is to disable the rollout recorder for those threads.	2026-01-24 14:57:40 +00:00
Matthew Zeng	a2c829a808	[connectors] Support connectors part 1 - App server & MCP (#9667 ) In order to make Codex work with connectors, we add a built-in gateway MCP that acts as a transparent proxy between the client and the connectors. The gateway MCP collects actions that are accessible to the user and sends them down to the user, when a connector action is chosen to be called, the client invokes the action through the gateway MCP as well. - [x] Add the system built-in gateway MCP to list and run connectors. - [x] Add the app server methods and protocol	2026-01-22 16:48:43 -08:00
Skylar Graika	b236f1c95d	fix: prevent repeating interrupted turns (#9043 ) ## What Record a model-visible `<turn_aborted>` marker in history when a turn is interrupted, and treat it as a session prefix. ## Why When a turn is interrupted, Codex emits `TurnAborted` but previously did not persist anything model-visible in the conversation history. On the next user turn, the model can’t tell the previous work was aborted and may resume/repeat earlier actions (including duplicated side effects like re-opening PRs). Fixes: https://github.com/openai/codex/issues/9042 ## How On `TurnAbortReason::Interrupted`, append a hidden user message containing a `<turn_aborted>…</turn_aborted>` marker and flush. Treat `<turn_aborted>` like `<environment_context>` for session-prefix filtering. Add a regression test to ensure follow-up turns don’t repeat side effects from an aborted turn. ## Testing `just fmt` `just fix -p codex-core` `cargo test -p codex-core -- --test-threads=1` `cargo test --all-features -- --test-threads=1` --------- Co-authored-by: Skylar Graika <sgraika127@gmail.com> Co-authored-by: jif-oai <jif@openai.com> Co-authored-by: Eric Traut <etraut@openai.com>	2026-01-20 13:07:28 -08:00
Eric Traut	79c5bf9835	Fixed config merging issue with profiles (#9509 ) This PR fixes a small issue with chained (layered) config.toml file merging. The old logic didn't properly handle profiles. In particular, if a lower-layer config overrides a profile defined in a higher-layer config, the override did not take effect. This prevents users from having project-specific profile overrides and contradicts the (soon-to-be) documented behavior of config merging. The change adds a unit test for this case. It also exposes a function from the config crate that is needed by the app server code paths to implement support for layered configs.	2026-01-20 12:18:00 -08:00
Dylan Hurd	bffe9b33e9	chore(core) Create instructions module (#9422 ) ## Summary We have a variety of things we refer to as instructions in the code base: our current canonical terms are: - base instructions (raw string) - developer instructions (has a type in protocol) - user instructions We also have `instructions` floating around in various places. We should standardize on the above, and start using types to prevent them from ending up in the wrong place. There will be additional PRs, but I'm going to keep these small so we can easily follow them! ## Testing - [x] Tests pass, this is purely a file move	2026-01-17 16:01:26 -08:00
Owen Lin	f1653dd4d3	feat(app-server, core): return threads by created_at or updated_at (#9247 ) Add support for returning threads by either `created_at` OR `updated_at` descending. Previously core always returned threads ordered by `created_at`. This PR: - updates core to be able to list threads by `updated_at` OR `created_at` descending based on what the caller wants - also update `thread/list` in app-server to expose this (default to `created_at` if not specified) All existing codepaths (app-server, TUI) still default to `created_at`, so no behavior change is expected with this PR. Implementation To sort by `updated_at` is a bit nontrivial (whereas `created_at` is easy due to the way we structure the folders and filenames on disk, which are all based on `created_at`). The most naive way to do this without introducing a cache file or sqlite DB (which we have to implement/maintain) is to scan files in reverse `created_at` order on disk, and look at the file's mtime (last modified timestamp according to the filesystem) until we reach `MAX_SCAN_FILES` (currently set to 10,000). Then, we can return the most recent N threads. Based on some quick and dirty benchmarking on my machine with ~1000 rollout files, calling `thread/list` with limit 50, the `updated_at` path is slower as expected due to all the I/O: - updated-at: average 103.10 ms - created-at: average 41.10 ms Those absolute numbers aren't a big deal IMO, but we can certainly optimize this in a followup if needed by introducing more state stored on disk. Caveat There's also a limitation in that any files older than `MAX_SCAN_FILES` will be excluded, which means if a user continues a REALLY old thread, it's possible to not be included. In practice that should not be too big of an issue. If a user makes... - 1000 rollouts/day → threads older than 10 days won't show up - 100 rollouts/day → ~100 days If this becomes a problem for some reason, even more motivation to implement an updated_at cache.	2026-01-16 20:58:55 +00:00
sayan-oai	169201b1b5	[search] allow explicitly disabling web search (#9249 ) moving `web_search` rollout serverside, so need a way to explicitly disable search + signal eligibility from the client. - Add `x‑oai‑web‑search‑eligible` header that signifies whether the request can have web search. - Only attach the `web_search` tool when the resolved `WebSearchMode` is `Live` or `Cached`.	2026-01-15 11:28:57 -08:00
Eric Traut	31d9b6f4d2	Improve handling of config and rules errors for app server clients (#9182 ) When an invalid config.toml key or value is detected, the CLI currently just quits. This leaves the VSCE in a dead state. This PR changes the behavior to not quit and bubble up the config error to users to make it actionable. It also surfaces errors related to "rules" parsing. This allows us to surface these errors to users in the VSCE, like this: <img width="342" height="129" alt="Screenshot 2026-01-13 at 4 29 22 PM" src="https://github.com/user-attachments/assets/a79ffbe7-7604-400c-a304-c5165b6eebc4" /> <img width="346" height="244" alt="Screenshot 2026-01-13 at 4 45 06 PM" src="https://github.com/user-attachments/assets/de874f7c-16a2-4a95-8c6d-15f10482e67b" />	2026-01-13 17:57:09 -08:00
Devon Rifkin	fe03320791	ollama: default to Responses API for built-ins (#8798 ) This is an alternate PR to solving the same problem as <https://github.com/openai/codex/pull/8227>. In this PR, when Ollama is used via `--oss` (or via `model_provider = "ollama"`), we default it to use the Responses format. At runtime, we do an Ollama version check, and if the version is older than when Responses support was added to Ollama, we print out a warning. Because there's no way of configuring the wire api for a built-in provider, we temporarily add a new `oss_provider`/`model_provider` called `"ollama-chat"` that will force the chat format. Once the `"chat"` format is fully removed (see <https://github.com/openai/codex/discussions/7782>), `ollama-chat` can be removed as well --------- Co-authored-by: Eric Traut <etraut@openai.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-01-13 09:51:41 -08:00
pakrym-oai	490c1c1fdd	Add model client sessions (#9102 ) Maintain a long-running session.	2026-01-13 01:15:56 +00:00
iceweasel-oai	6372ba9d5f	Elevated sandbox NUX (#8789 ) Elevated Sandbox NUX: * prompt for elevated sandbox setup when agent mode is selected (via /approvals or at startup) * prompt for degraded sandbox if elevated setup is declined or fails * introduce /elevate-sandbox command to upgrade from degraded experience.	2026-01-08 16:23:06 -08:00
jif-oai	116059c3a0	chore: unify conversation with thread name (#8830 ) Done and verified by Codex + refactor feature of RustRover	2026-01-07 17:04:53 +00:00
jif-oai	1dd1355df3	feat: agent controller (#8783 ) Added an agent control plane that lets sessions spawn or message other conversations via `AgentControl`. `AgentBus` (core/src/agent/bus.rs) keeps track of the last known status of a conversation. ConversationManager now holds shared state behind an Arc so AgentControl keeps only a weak back-reference, the goal is just to avoid explicit cycle reference. Follow-ups: * Build a small tool in the TUI to be able to see every agent and send manual message to each of them * Handle approval requests in this TUI * Add tools to spawn/communicate between agents (see related design) * Define agent types	2026-01-06 19:08:02 +00:00
Ahmed Ibrahim	f0dc6fd3c7	Rename OpenAI models to models manager (#8346 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2025-12-19 16:20:05 -08:00
Michael Bolin	46baedd7cb	fix: change codex/sandbox-state/update from a notification to a request (#8142 ) Historically, `accept_elicitation_for_prompt_rule()` was flaky because we were using a notification to update the sandbox followed by a `shell` tool request that we expected to be subject to the new sandbox config, but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate each incoming message to a new Tokio task, messages are not guaranteed to be processed in order, so sometimes the `shell` tool call would run before the notification was processed. Prior to this PR, we relied on a generous `sleep()` between the notification and the request to reduce the change of the test flaking out. This PR implements a proper fix, which is to use a _request_ instead of a notification for the sandbox update so that we can wait for the response to the sandbox request before sending the request to the `shell` tool call. Previously, `rmcp` did not support custom requests, but I fixed that in https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it into the `0.12.0` release (see #8288). This PR updates `shell-tool-mcp` to expect `"codex/sandbox-state/update"` as a _request_ instead of a notification and sends the appropriate ack. Note this behavior is tied to our custom `codex/sandbox-state` capability, which Codex honors as an MCP client, which is why `core/src/mcp_connection_manager.rs` had to be updated as part of this PR, as well. This PR also updates the docs at `shell-tool-mcp/README.md`.	2025-12-18 15:32:01 -08:00

1 2 3 4

163 Commits