codex

mirror of https://github.com/openai/codex.git synced 2026-05-02 12:21:26 +03:00

Author	SHA1	Message	Date
jif-oai	938c6dd388	fix: db windows path (#13336 )	2026-03-03 09:50:52 +00:00
Ruslan Nigmatullin	14fcb6645c	app-server: Update `thread/name/set` to support not-loaded threads (#13282 ) Currently `thread/name/set` does only work for loaded threads. Expand the scope to also support persisted but not-yet-loaded ones for a more predictable API surface. This will make it possible to rename threads discovered via `thread/list` and similar operations.	2026-03-02 15:13:18 -08:00
jif-oai	b649953845	feat: polluted memories (#13008 ) Add a feature flag to disable memory creation for "polluted"	2026-03-02 11:57:32 +00:00
Ahmed Ibrahim	0aeb55bf08	Record realtime close marker on replacement (#13058 ) ## Summary - record a realtime close developer message when a new realtime session replaces an active one - assert the replacement marker through the mocked responses request path --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charles Cunningham <ccunningham@openai.com>	2026-03-01 13:54:12 -08:00
jif-oai	bbd237348d	feat: gen memories config (#12999 )	2026-02-27 12:38:47 +01:00
Celia Chen	90cc4e79a2	feat: add local date/timezone to turn environment context (#12947 ) ## Summary This PR includes the session's local date and timezone in the model-visible environment context and persists that data in `TurnContextItem`. ## What changed - captures the current local date and IANA timezone when building a turn context, with a UTC fallback if the timezone lookup fails - includes current_date and timezone in the serialized <environment_context> payload - stores those fields on TurnContextItem so they survive rollout/history handling, subagent review threads, and resume flows - treats date/timezone changes as environment updates, so prompt caching and context refresh logic do not silently reuse stale time context - updates tests to validate the new environment fields without depending on a single hardcoded environment-context string ## test built a local build and saw it in the rollout file: ``` {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}} ```	2026-02-26 23:17:35 +00:00
jif-oai	739d4b52de	fix: do not apply turn cwd to metadata (#12887 ) Details here: https://openai.slack.com/archives/C09NZ54M4KY/p1772056758227339	2026-02-26 17:05:58 +00:00
Charley Cunningham	07aefffb1f	core: bundle settings diff updates into one dev/user envelope (#12417 ) ## Summary - bundle contextual prompt injection into at most one developer message plus one contextual user message in both: - per-turn settings updates - initial context insertion - preserve `<model_switch>` across compaction by rebuilding it through canonical initial-context injection, instead of relying on strip/reattach hacks - centralize contextual user fragment detection in one shared definition table and reuse it for parsing/compaction logic - keep `AGENTS.md` in its natural serialized format: - `# AGENTS.md instructions for {dirname}` - `<INSTRUCTIONS>...</INSTRUCTIONS>` - simplify related tests/helpers and accept the expected snapshot/layout updates from bundled multi-part messages ## Why The goal is to converge toward a simpler, more intentional prompt shape where contextual updates are consistently represented as one developer envelope plus one contextual user envelope, while keeping parsing and compaction behavior aligned with that representation. ## Notable details - the temporary `SettingsUpdateEnvelope` wrapper was removed; these paths now return `Vec<ResponseItem>` directly - local/remote compaction no longer rely on model-switch strip/restore helpers - contextual user detection is now driven by shared fragment definitions instead of ad hoc matcher assembly - AGENTS/user instructions are still the same logical context; only the synthetic `<user_instructions>` wrapper was replaced by the natural AGENTS text format ## Testing - `just fmt` - `cargo test -p codex-app-server codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages -- --exact` - `cargo test -p codex-core compact::tests::collect_user_messages_filters_session_prefix_entries --lib -- --exact` - `cargo test -p codex-core --test all 'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_apps_guidance_as_developer_message_when_enabled' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_developer_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_user_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::resume_includes_initial_messages_and_sends_prior_items' -- --exact` - `cargo test -p codex-core --test all 'suite::review::review_input_isolated_from_parent_history' -- --exact` - `cargo test -p codex-exec --test all 'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' -- --exact` - `cargo test -p core_test_support context_snapshot::tests::full_text_mode_preserves_unredacted_text -- --exact` ## Notes - I also ran several targeted `compact`, `compact_remote`, `prompt_caching`, `model_visible_layout`, and `event_mapping` tests while iterating on prompt-shape changes. - I have not claimed a clean full-workspace `cargo test` from this environment because local sandbox/resource conditions have previously produced unrelated failures in large workspace runs.	2026-02-26 00:12:08 -08:00
Celia Chen	4f45668106	Revert "Add skill approval event/response (#12633 )" (#12811 ) This reverts commit https://github.com/openai/codex/pull/12633. We no longer need this PR, because we favor sending normal exec command approval server request with `additional_permissions` of skill permissions instead	2026-02-26 01:02:42 +00:00
Owen Lin	a0fd94bde6	feat(app-server): add ThreadItem::DynamicToolCall (#12732 ) Previously, clients would call `thread/start` with dynamic_tools set, and when a model invokes a dynamic tool, it would just make the server->client `item/tool/call` request and wait for the client's response to complete the tool call. This works, but it doesn't have an `item/started` or `item/completed` event. Now we are doing this: - [new] emit `item/started` with `DynamicToolCall` populated with the call arguments - send an `item/tool/call` server request - [new] once the client responds, emit `item/completed` with `DynamicToolCall` populated with the response. Also, with `persistExtendedHistory: true`, dynamic tool calls are now reconstructable in `thread/read` and `thread/resume` as `ThreadItem::DynamicToolCall`.	2026-02-25 12:00:10 -08:00
jif-oai	f46b767b7e	feat: add search term to thread list (#12578 ) Add `searchTerm` to `thread/list` that will search for a match in the titles (the condition being `searchTerm` $$\in$$ `title`)	2026-02-25 09:59:41 +00:00
pakrym-oai	58763afa0f	Add skill approval event/response (#12633 ) Set the stage for skill-level permission approval in addition to command-level. Behind a feature flag.	2026-02-23 22:28:58 -08:00
Ahmed Ibrahim	6817f0be8a	Wire realtime api to core (#12268 ) - Introduce `RealtimeConversationManager` for realtime API management - Add `op::conversation` to start conversation, insert audio, insert text, and close conversation. - emit conversation lifecycle and realtime events. - Move shared realtime payload types into codex-protocol and add core e2e websocket tests for start/replace/transport-close paths. Things to consider: - Should we use the same `op::` and `Events` channel to carry audio? I think we should try this simple approach and later we can create separate one if the channels got congested. - Sending text updates to the client: we can start simple and later restrict that. - Provider auth isn't wired for now intentionally	2026-02-20 19:06:35 -08:00
jif-oai	0f9eed3a6f	feat: add nick name to sub-agents (#12320 ) Adding random nick name to sub-agents. Used for UX At the same time, also storing and wiring the role of the sub-agent	2026-02-20 14:39:49 +00:00
Jack Mousseau	3a951f8096	Restore phase when loading from history (#12244 )	2026-02-19 09:56:56 -08:00
Shijie Rao	48018e9eac	Feat: add model reroute notification (#12001 ) ### Summary Builiding off `5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0)`, we have created a `model/rerouted` notification that captures the event so that consumers can render as expected. Keep the `EventMsg::Warning` path in core so that this does not affect TUI rendering. `model/rerouted` is meant to be generic to account for future usage including capacity planning etc.	2026-02-17 11:02:23 -08:00
jif-oai	feae389942	Lower missing rollout log level (#11722 ) Fix this: https://github.com/openai/codex/issues/11634	2026-02-13 12:59:17 +00:00
Owen Lin	efc8d45750	feat(app-server): experimental flag to persist extended history (#11227 ) This PR adds an experimental `persist_extended_history` bool flag to app-server thread APIs so rollout logs can retain a richer set of EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e. on `thread/resume`). ### Motivation Today, our rollout recorder only persists a small subset (e.g. user message, reasoning, assistant message) of `EventMsg` types, dropping a good number (like command exec, file change, etc.) that are important for reconstructing full item history for `thread/resume`, `thread/read`, and `thread/fork`. Some clients want to be able to resume a thread without lossiness. This lossiness is primarily a UI thing, since what the model sees are `ResponseItem` and not `EventMsg`. ### Approach This change introduces an opt-in `persist_full_history` flag to preserve those events when you start/resume/fork a thread (defaults to `false`). This is done by adding an `EventPersistenceMode` to the rollout recorder: - `Limited` (existing behavior, default) - `Extended` (new opt-in behavior) In `Extended` mode, persist additional `EventMsg` variants needed for non-lossy app-server `ThreadItem` reconstruction. We now store the following ThreadItems that we didn't before: - web search - command execution - patch/file changes - MCP tool calls - image view calls - collab tool outcomes - context compaction - review mode enter/exit For command executions in particular, we truncate the output using the existing `truncate_text` from core to store an upper bound of 10,000 bytes, which is also the default value for truncating tool outputs shown to the model. This keeps the size of the rollout file and command execution items returned over the wire reasonable. And we also persist `EventMsg::Error` which we can now map back to the Turn's status and populates the Turn's error metadata. #### Updates to EventMsgs To truly make `thread/resume` non-lossy, we also needed to persist the `status` on `EventMsg::CommandExecutionEndEvent` and `EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a command failed or was declined (similar for apply_patch). These EventMsgs were never persisted before so I made it a required field.	2026-02-12 19:34:22 +00:00
Wendy Jiao	82acd815e4	exclude developer messages from phase-1 memory input (#11608 ) Co-authored-by: jif-oai <jif@openai.com>	2026-02-12 17:43:38 +00:00
jif-oai	adad23f743	Ensure list_threads drops stale rollout files (#11572 ) Summary - trim `state_db::list_threads_db` results to entries whose rollout files still exist, logging and recording a discrepancy for dropped rows - delete stale metadata rows from the SQLite store so future calls don’t surface invalid paths - add regression coverage in `recorder.rs` to verify stale DB paths are dropped when the file is missing	2026-02-12 12:49:31 +00:00
jif-oai	befe4fbb02	feat: mem drop cot (#11571 ) Drop CoT and compaction for memory building	2026-02-12 11:41:04 +00:00
Michael Bolin	abbd74e2be	feat: make sandbox read access configurable with `ReadOnlyAccess` (#11387 ) `SandboxPolicy::ReadOnly` previously implied broad read access and could not express a narrower read surface. This change introduces an explicit read-access model so we can support user-configurable read restrictions in follow-up work, while preserving current behavior today. It also ensures unsupported backends fail closed for restricted-read policies instead of silently granting broader access than intended. ## What - Added `ReadOnlyAccess` in protocol with: - `Restricted { include_platform_defaults, readable_roots }` - `FullAccess` - Updated `SandboxPolicy` to carry read-access configuration: - `ReadOnly { access: ReadOnlyAccess }` - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }` - Preserved existing behavior by defaulting current construction paths to `ReadOnlyAccess::FullAccess`. - Threaded the new fields through sandbox policy consumers and call sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and related tests. - Updated Seatbelt policy generation to honor restricted read roots by emitting scoped read rules when full read access is not granted. - Added fail-closed behavior on Linux and Windows backends when restricted read access is requested but not yet implemented there (`UnsupportedOperation`). - Regenerated app-server protocol schema and TypeScript artifacts, including `ReadOnlyAccess`. ## Compatibility / rollout - Runtime behavior remains unchanged by default (`FullAccess`). - API/schema changes are in place so future config wiring can enable restricted read access without another policy-shape migration.	2026-02-11 18:31:14 -08:00
jif-oai	3d0ead8db8	feat: improve thread listing (#11429 ) Improve listing by doing: 1. List using the rollout file system 2. Upsert the result in the DB (if present) 3. Return the result of a DB listing 4. Fallback on the result of 1 + some metrics on top of this	2026-02-11 11:22:05 +00:00
Celia Chen	641d5268fa	chore: persist turn_id in rollout session and make turn_id uuid based (#11246 ) Problem: 1. turn id is constructed in-memory; 2. on resuming threads, turn_id might not be unique; 3. client cannot no the boundary of a turn from rollout files easily. This PR does three things: 1. persist `task_started` and `task_complete` events; 1. persist `turn_id` in rollout turn events; 5. generate turn_id as unique uuids instead of incrementing it in memory. This helps us resolve the issue of clients wanting to have unique turn ids for resuming a thread, and knowing the boundry of each turn in rollout files. example debug logs ``` 2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None } 2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None } 2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None } ``` backward compatibility: if you try to resume an old session without task_started and task_complete event populated, the following happens: - If you resume and do nothing: those reconstructed historical IDs can differ next time you resume. - If you resume and send a new turn: the new turn gets a fresh UUID from live submission flow and is persisted, so that new turn’s ID is stable on later resumes. I think this behavior is fine, because we only care about deterministic turn id once a turn is triggered.	2026-02-11 03:56:01 +00:00
jif-oai	87bbfc50a1	feat: prevent double backfill (#11377 ) ## Summary Add a DB-backed lease to prevent duplicate `.sqlite` backfill workers from running concurrently. ### What changed - Added StateRuntime::try_claim_backfill(lease_seconds) that atomically claims backfill only when: - backfill is not complete, and - no fresh running worker currently owns it. - Updated backfill_sessions to use the claim API and exit early when another worker already holds the lease. - Added runtime tests covering: - singleton claim behavior, - stale lease takeover, - claim blocked after complete. - Set backfill lease to 900s in production and 1s in tests. ### Why This avoids duplicate backfill work and reduces backfill status churn under concurrent startup, while preserving current best-effort fallback behavior.	2026-02-11 00:24:20 +00:00
jif-oai	847a6092e6	fix: reduce usage of `open_if_present` (#11344 )	2026-02-10 19:25:07 +00:00
guinness-oai	099ed802b2	Treat first rollout session_meta as canonical thread identity (#11241 ) During thread/fork, the new rollout includes the fork’s own session_meta plus copied history that can contain older session_meta entries from the source thread. thread/list was overwriting metadata on later session_meta lines, so a fork could be reported with the source thread’s thread_id. This fix only uses the first session_meta, so the fork keeps its own ID.	2026-02-10 10:32:11 -08:00
Eric Traut	b3de6c7f2b	Defer persistence of rollout file (#11028 ) - Defer rollout persistence for fresh threads (`InitialHistory::New`): keep rollout events in memory and only materialize rollout file + state DB row on first `EventMsg::UserMessage`. - Keep precomputed rollout path available before materialization. - Change `thread/start` to build thread response from live config snapshot and optional precomputed path. - Improve pre-materialization behavior in app-server/TUI: clearer invalid-request errors for file-backed ops and a friendlier `/fork` “not ready yet” UX. - Update tests to match deferred semantics across start/read/archive/unarchive/fork/resume/review flows. - Improved resilience of user_shell test, which should be unrelated to this change but must be affected by timing changes For Reviewers: * The primary change is in recorder.rs * Most of the other changes were to fix up broken assumptions in existing tests Testing: * Manually tested CLI * Exercised app server paths by manually running IDE Extension with rebuilt CLI binary * Only user-visible change is that `/fork` in TUI generates visible error if used prior to first turn	2026-02-07 23:05:03 -08:00
jif-oai	62605fa471	Add resume_agent collab tool (#10903 ) Summary - add the new resume_agent collab tool path through core, protocol, and the app server API, including the resume events - update the schema/TypeScript definitions plus docs so resume_agent appears in generated artifacts and README - note that resumed agents rehydrate rollout history without overwriting their base instructions Testing - Not run (not requested)	2026-02-07 17:31:45 +01:00
jif-oai	428a9f6035	feat: wait for backfill to be ready (#10790 )	2026-02-05 20:45:16 +00:00
jif-oai	9ee746afd6	Leverage state DB metadata for thread summaries (#10621 ) Summary: - read conversation summaries and cwd info from the state DB when possible so we no longer rely on rollout files for metadata and avoid extra I/O - persist CLI version in thread metadata, surface it through summary builders, and add the necessary DB migration hooks - simplify thread listing by using enriched state DB data directly rather than reading rollout heads Testing: - Not run (not requested)	2026-02-05 16:39:11 +00:00
jif-oai	901215e310	feat: repair DB in case of missing lines (#10751 )	2026-02-05 16:21:49 +00:00
jif-oai	4033f905c6	feat: resumable backfill (#10745 ) ## Summary This PR makes SQLite rollout backfill resumable and repeatable instead of one-shot-on-db-create. ## What changed - Added a persisted backfill state table: - state/migrations/0008_backfill_state.sql - Tracks status (pending\|running\|complete), last_watermark, and last_success_at. - Added backfill state model/types in codex-state: - BackfillState, BackfillStatus (state/src/model/backfill_state.rs) - Added runtime APIs to manage backfill lifecycle/progress: - get_backfill_state - mark_backfill_running - checkpoint_backfill - mark_backfill_complete - Updated core startup behavior: - Backfill now runs whenever state is not Complete (not only when DB file is newly created). - Reworked backfill execution: - Collect rollout files, derive deterministic watermark per path, sort, resume from last_watermark. - Process in batches (BACKFILL_BATCH_SIZE = 200), checkpoint after each batch. - Mark complete with last_success_at at the end. ## Why Previous behavior could leave users permanently partially backfilled if the process exited during initial async backfill. This change allows safe continuation across restarts and avoids restarting from scratch.	2026-02-05 14:34:34 +00:00
pap-openai	b2424cb635	adding fork information (UI) when forking (#10246 ) - shows `/fork` command that ran in prev session - shows `session forked from name (uuid) \|\| uuid (if name is not set)` as an event in new session	2026-02-05 13:24:55 +00:00
jif-oai	aa46b5cf99	nit: backfill stronger (#10738 )	2026-02-05 12:30:16 +00:00
jif-oai	61aecdde66	fix: make sure file exist in `find_thread_path_by_id_str_in_subdir` (#10618 )	2026-02-04 13:01:17 +00:00
jif-oai	38f6c6b114	chore: simplify user message detection (#10611 ) We don't check anymore the response item with `user` role as they may be instructions etc	2026-02-04 11:14:53 +00:00
jif-oai	100eb6e6f0	Prefer state DB thread listings before filesystem (#10544 ) Summary - add Cursor/ThreadsPage conversions so state DB listings can be mapped back into the rollout list model - make recorder list helpers query the state DB first (archived flag included) and only fall back to file traversal if needed, along with populating head bytes lazily - add extensive tests to ensure the DB path is honored for active and archived threads and that the fallback works Testing - Not run (not requested) <img width="1196" height="693" alt="Screenshot 2026-02-03 at 20 42 33" src="https://github.com/user-attachments/assets/826b3c7a-ef11-4b27-802a-3c343695794a" />	2026-02-04 09:27:24 +00:00
xl-openai	f38d181795	feat: add APIs to list and download public remote skills (#10448 ) Add API to list / download from remote public skills	2026-02-03 14:09:37 -08:00
jif-oai	c38a5958d7	feat: `find_thread_path_by_id_str_in_subdir` from DB (#10532 )	2026-02-03 19:09:04 +00:00
sayan-oai	fc05374344	chore: add phase to message responseitem (#10455 ) ### What add wiring for `phase` field on `ResponseItem::Message` to lay groundwork for differentiating model preambles and final messages. currently optional. follows pattern in #9698. updated schemas with `just write-app-server-schema` so we can see type changes. ### Tests Updated existing tests for SSE parsing and hydrating from history	2026-02-03 02:52:26 +00:00
Celia Chen	fb2df99cf1	[feat] persist thread_dynamic_tools in db (#10252 ) Persist thread_dynamic_tools in sqlite and read first from it. Fall back to rollout files if it's not found. Persist dynamic tools to both sqlite and rollout files. Saw that new sessions get populated to db correctly & old sessions get backfilled correctly at startup: ``` celia@com-92114 codex-rs % sqlite3 ~/.codex/state.sqlite \ "select thread_id, position,name,description,input_schema from thread_dynamic_tools;" 019c0cad-ec0d-74b2-a787-e8b33a349117\|0\|geo_lookup\|lookup a city\|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"} .... 019c10ca-aa4b-7620-ae40-c0919fbd7ea7\|0\|geo_lookup\|lookup a city\|{"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"} ```	2026-02-03 00:06:44 +00:00
jif-oai	0b460eda32	chore: ignore synthetic messages (#10394 ) This will be fixed once this is settled: https://www.notion.so/openai/Artificial-context-management-2fb8e50b62b080db8b8ed93b3b19d1a2#2fb8e50b62b080d2bffce2dd1e60972b	2026-02-02 18:13:48 +00:00
jif-oai	4f1cfaf892	fix: Rfc3339 casting (#10386 )	2026-02-02 13:33:28 +00:00
jif-oai	e9a774e7ae	fix: thread listing (#10383 )	2026-02-02 12:52:49 +00:00
pap-openai	1644cbfc6d	Session picker shows thread_name if set (#10340 ) - shows names of threads in the ResumePicker used by `/resume` and `codex resume` if set, default to preview (previous behaviour) if none - adds a `find_thread_names_by_ids` that maps names to IDs in `codex-rs/core/src/rollout/session_index.rs`. It reads sequentially in normal (instead of reverse order in `codex resume <name>`) the index mapping file. This function is called from a list of session (default page is 25, pages loaded depends of height of terminal), for which most of them will always have at least one session unnamed and require the whole file to be read therefore. Could be better and sqlite integration will make this better - those reads won't be needed when leveraging sqlite Opened questions: - We could rename the TUI "Conversation" column to "Name" or "Thread" that would feel more accurate. Could be a fast-follow if we implement auto-naming as it'll always be a name instead?	2026-02-02 08:13:17 +00:00
Jeremy Rose	d59685f6d4	file-search: multi-root walk (#10240 ) Instead of a separate walker for each root in a multi-root walk, use a single walker.	2026-01-30 22:20:23 +00:00
Charley Cunningham	ec4a2d07e4	Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786 ) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - Normal text → `AgentMessageContentDelta` - Plan text → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`	2026-01-30 18:59:30 +00:00
jif-oai	0212f4010e	nit: fix db with multiple metadata lines (#10237 )	2026-01-30 17:32:10 +00:00
jif-oai	887bec0dee	chore: do not clean the DB anymore (#10232 )	2026-01-30 18:23:00 +01:00

1 2 3

136 Commits