diff --git a/AGENTS.md b/AGENTS.md index db3216b4ec..608b48209c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -36,6 +36,21 @@ In the codex-rs folder where the rust code lives: - When extracting code from a large module, move the related tests and module/type docs toward the new implementation so the invariants stay close to the code that owns them. +### Model-visible context fragments + +- Model-visible prompt context should go through the shared fragment abstractions described in `docs/model-visible-context.md`. +- Every new model-visible fragment should implement `ModelVisibleContextFragment` and set `type Role`. +- Turn-state model-visible context assembly should produce exactly two envelopes (one developer message + one contextual-user message) via the shared envelope builders. +- Contextual-user fragments should use the shared detection path in `model_visible_context`: implement `TaggedContextualUserFragment` when marker-based detection is sufficient, or implement `ContextualUserFragmentDetector` for custom matching (for example AGENTS.md dynamic tags). +- Use the developer envelope for developer guidance. Custom override text (for example config/app-server `developer_instructions`) should use `CustomDeveloperInstructions`; system-generated developer context should use typed fragments plus the neutral `developer_*_text` helpers rather than reusing the custom override type. +- Use the contextual-user envelope for user-role contextual state or runtime markers such as AGENTS instructions, plugin instructions, environment context, skills, and shell-command markers. Contextual-user fragments must provide stable markers so history parsing treats them as contextual state rather than user intent. +- Use `` specifically for environment facts derived from `TurnContext` that may need turn-to-turn diffs (`cwd`, `shell`, optional `current_date`, optional `timezone`, optional network allow/deny domain summaries). Do not put policy text, plugin/skill listings, or other guidance into ``; those should use dedicated fragments. +- Fragments derived from durable/current turn state that should update/reinject via diff across resume/fork/compaction/backtracking should implement `TurnContextDiffFragment` so current-state extraction, diffing, and rendering live together. +- Runtime/session-prefix one-off fragments can implement only `ModelVisibleContextFragment` when they are not turn-state diffs. +- Register new turn-state fragments by type in the single ordered registry used by both initial-context and turn-diff assembly (`REGISTERED_TURN_STATE_FRAGMENT_BUILDERS` in `context_manager/updates.rs`) via `build_registered_turn_state_fragment::`. +- Do not hand-construct model-visible `ResponseItem::Message` payloads in new code; use fragment conversion and shared envelope builders. +- Do not inject raw strings directly into the initial-context or settings-update builders, and do not call fragment wrapping helpers ad hoc from new code. + Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests: 1. Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `cargo test -p codex-tui`. diff --git a/codex-rs/app-server-protocol/schema/json/ClientRequest.json b/codex-rs/app-server-protocol/schema/json/ClientRequest.json index 47db32075f..88d0656fb8 100644 --- a/codex-rs/app-server-protocol/schema/json/ClientRequest.json +++ b/codex-rs/app-server-protocol/schema/json/ClientRequest.json @@ -2370,6 +2370,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" @@ -2618,7 +2619,7 @@ "type": "object" }, "ThreadResumeParams": { - "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.", + "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.\n\nWhen resuming a thread that is already loaded/running, override fields are ignored and reported as mismatch warnings rather than being reapplied mid-session.", "properties": { "approvalPolicy": { "anyOf": [ @@ -2650,6 +2651,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" @@ -2801,6 +2803,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" diff --git a/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.schemas.json b/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.schemas.json index 353507a47e..56fb426ec9 100644 --- a/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.schemas.json +++ b/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.schemas.json @@ -11211,6 +11211,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" @@ -12297,7 +12298,7 @@ }, "ThreadResumeParams": { "$schema": "http://json-schema.org/draft-07/schema#", - "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.", + "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.\n\nWhen resuming a thread that is already loaded/running, override fields are ignored and reported as mismatch warnings rather than being reapplied mid-session.", "properties": { "approvalPolicy": { "anyOf": [ @@ -12329,6 +12330,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" @@ -12562,6 +12564,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" diff --git a/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.v2.schemas.json b/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.v2.schemas.json index 99fe87f0a6..f04916b820 100644 --- a/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.v2.schemas.json +++ b/codex-rs/app-server-protocol/schema/json/codex_app_server_protocol.v2.schemas.json @@ -8938,6 +8938,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" @@ -10024,7 +10025,7 @@ }, "ThreadResumeParams": { "$schema": "http://json-schema.org/draft-07/schema#", - "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.", + "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.\n\nWhen resuming a thread that is already loaded/running, override fields are ignored and reported as mismatch warnings rather than being reapplied mid-session.", "properties": { "approvalPolicy": { "anyOf": [ @@ -10056,6 +10057,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" @@ -10289,6 +10291,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" diff --git a/codex-rs/app-server-protocol/schema/json/v2/ThreadForkParams.json b/codex-rs/app-server-protocol/schema/json/v2/ThreadForkParams.json index 328dca36b9..a28664e041 100644 --- a/codex-rs/app-server-protocol/schema/json/v2/ThreadForkParams.json +++ b/codex-rs/app-server-protocol/schema/json/v2/ThreadForkParams.json @@ -99,6 +99,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" diff --git a/codex-rs/app-server-protocol/schema/json/v2/ThreadResumeParams.json b/codex-rs/app-server-protocol/schema/json/v2/ThreadResumeParams.json index 37f5f1df6b..30c6496e12 100644 --- a/codex-rs/app-server-protocol/schema/json/v2/ThreadResumeParams.json +++ b/codex-rs/app-server-protocol/schema/json/v2/ThreadResumeParams.json @@ -990,7 +990,7 @@ "type": "string" } }, - "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.", + "description": "There are three ways to resume a thread: 1. By thread_id: load the thread from disk by thread_id and resume it. 2. By history: instantiate the thread from memory and resume it. 3. By path: load the thread from disk by path and resume it.\n\nThe precedence is: history > path > thread_id. If using history or path, the thread_id param will be ignored.\n\nPrefer using thread_id whenever possible.\n\nWhen resuming a thread that is already loaded/running, override fields are ignored and reported as mismatch warnings rather than being reapplied mid-session.", "properties": { "approvalPolicy": { "anyOf": [ @@ -1022,6 +1022,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" diff --git a/codex-rs/app-server-protocol/schema/json/v2/ThreadStartParams.json b/codex-rs/app-server-protocol/schema/json/v2/ThreadStartParams.json index e614ccf15c..6777d3797f 100644 --- a/codex-rs/app-server-protocol/schema/json/v2/ThreadStartParams.json +++ b/codex-rs/app-server-protocol/schema/json/v2/ThreadStartParams.json @@ -123,6 +123,7 @@ ] }, "developerInstructions": { + "description": "Custom developer override for this thread session. Takes precedence over `~/.codex/config.toml` `developer_instructions`.", "type": [ "string", "null" diff --git a/codex-rs/app-server-protocol/schema/typescript/v2/ThreadForkParams.ts b/codex-rs/app-server-protocol/schema/typescript/v2/ThreadForkParams.ts index 43b0b36ad8..907263d035 100644 --- a/codex-rs/app-server-protocol/schema/typescript/v2/ThreadForkParams.ts +++ b/codex-rs/app-server-protocol/schema/typescript/v2/ThreadForkParams.ts @@ -22,7 +22,11 @@ export type ThreadForkParams = {threadId: string, /** path?: string | null, /** * Configuration overrides for the forked thread, if any. */ -model?: string | null, modelProvider?: string | null, serviceTier?: ServiceTier | null | null, cwd?: string | null, approvalPolicy?: AskForApproval | null, sandbox?: SandboxMode | null, config?: { [key in string]?: JsonValue } | null, baseInstructions?: string | null, developerInstructions?: string | null, ephemeral?: boolean, /** +model?: string | null, modelProvider?: string | null, serviceTier?: ServiceTier | null | null, cwd?: string | null, approvalPolicy?: AskForApproval | null, sandbox?: SandboxMode | null, config?: { [key in string]?: JsonValue } | null, baseInstructions?: string | null, /** + * Custom developer override for this thread session. + * Takes precedence over `~/.codex/config.toml` `developer_instructions`. + */ +developerInstructions?: string | null, ephemeral?: boolean, /** * If true, persist additional rollout EventMsg variants required to * reconstruct a richer thread history on subsequent resume/fork/read. */ diff --git a/codex-rs/app-server-protocol/schema/typescript/v2/ThreadResumeParams.ts b/codex-rs/app-server-protocol/schema/typescript/v2/ThreadResumeParams.ts index cc12020bd0..4467e5f2b7 100644 --- a/codex-rs/app-server-protocol/schema/typescript/v2/ThreadResumeParams.ts +++ b/codex-rs/app-server-protocol/schema/typescript/v2/ThreadResumeParams.ts @@ -18,6 +18,10 @@ import type { SandboxMode } from "./SandboxMode"; * If using history or path, the thread_id param will be ignored. * * Prefer using thread_id whenever possible. + * + * When resuming a thread that is already loaded/running, override fields are + * ignored and reported as mismatch warnings rather than being reapplied + * mid-session. */ export type ThreadResumeParams = {threadId: string, /** * [UNSTABLE] FOR CODEX CLOUD - DO NOT USE. @@ -31,7 +35,11 @@ history?: Array | null, /** path?: string | null, /** * Configuration overrides for the resumed thread, if any. */ -model?: string | null, modelProvider?: string | null, serviceTier?: ServiceTier | null | null, cwd?: string | null, approvalPolicy?: AskForApproval | null, sandbox?: SandboxMode | null, config?: { [key in string]?: JsonValue } | null, baseInstructions?: string | null, developerInstructions?: string | null, personality?: Personality | null, /** +model?: string | null, modelProvider?: string | null, serviceTier?: ServiceTier | null | null, cwd?: string | null, approvalPolicy?: AskForApproval | null, sandbox?: SandboxMode | null, config?: { [key in string]?: JsonValue } | null, baseInstructions?: string | null, /** + * Custom developer override for this thread session. + * Takes precedence over `~/.codex/config.toml` `developer_instructions`. + */ +developerInstructions?: string | null, personality?: Personality | null, /** * If true, persist additional rollout EventMsg variants required to * reconstruct a richer thread history on subsequent resume/fork/read. */ diff --git a/codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts b/codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts index db73763e40..128b6bd302 100644 --- a/codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts +++ b/codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts @@ -7,7 +7,11 @@ import type { JsonValue } from "../serde_json/JsonValue"; import type { AskForApproval } from "./AskForApproval"; import type { SandboxMode } from "./SandboxMode"; -export type ThreadStartParams = {model?: string | null, modelProvider?: string | null, serviceTier?: ServiceTier | null | null, cwd?: string | null, approvalPolicy?: AskForApproval | null, sandbox?: SandboxMode | null, config?: { [key in string]?: JsonValue } | null, serviceName?: string | null, baseInstructions?: string | null, developerInstructions?: string | null, personality?: Personality | null, ephemeral?: boolean | null, /** +export type ThreadStartParams = {model?: string | null, modelProvider?: string | null, serviceTier?: ServiceTier | null | null, cwd?: string | null, approvalPolicy?: AskForApproval | null, sandbox?: SandboxMode | null, config?: { [key in string]?: JsonValue } | null, serviceName?: string | null, baseInstructions?: string | null, /** + * Custom developer override for this thread session. + * Takes precedence over `~/.codex/config.toml` `developer_instructions`. + */ +developerInstructions?: string | null, personality?: Personality | null, ephemeral?: boolean | null, /** * If true, opt into emitting raw Responses API items on the event stream. * This is for internal use only (e.g. Codex Cloud). */ diff --git a/codex-rs/app-server-protocol/src/protocol/v2.rs b/codex-rs/app-server-protocol/src/protocol/v2.rs index fcbce51979..08c7582b2b 100644 --- a/codex-rs/app-server-protocol/src/protocol/v2.rs +++ b/codex-rs/app-server-protocol/src/protocol/v2.rs @@ -2279,6 +2279,8 @@ pub struct ThreadStartParams { pub service_name: Option, #[ts(optional = nullable)] pub base_instructions: Option, + /// Custom developer override for this thread session. + /// Takes precedence over `~/.codex/config.toml` `developer_instructions`. #[ts(optional = nullable)] pub developer_instructions: Option, #[ts(optional = nullable)] @@ -2351,6 +2353,10 @@ pub struct ThreadStartResponse { /// If using history or path, the thread_id param will be ignored. /// /// Prefer using thread_id whenever possible. +/// +/// When resuming a thread that is already loaded/running, override fields are +/// ignored and reported as mismatch warnings rather than being reapplied +/// mid-session. pub struct ThreadResumeParams { pub thread_id: String, @@ -2391,6 +2397,8 @@ pub struct ThreadResumeParams { pub config: Option>, #[ts(optional = nullable)] pub base_instructions: Option, + /// Custom developer override for this thread session. + /// Takes precedence over `~/.codex/config.toml` `developer_instructions`. #[ts(optional = nullable)] pub developer_instructions: Option, #[ts(optional = nullable)] @@ -2462,6 +2470,8 @@ pub struct ThreadForkParams { pub config: Option>, #[ts(optional = nullable)] pub base_instructions: Option, + /// Custom developer override for this thread session. + /// Takes precedence over `~/.codex/config.toml` `developer_instructions`. #[ts(optional = nullable)] pub developer_instructions: Option, #[serde(default, skip_serializing_if = "std::ops::Not::not")] diff --git a/codex-rs/app-server/README.md b/codex-rs/app-server/README.md index 4a6cd58cce..1681819cab 100644 --- a/codex-rs/app-server/README.md +++ b/codex-rs/app-server/README.md @@ -126,7 +126,7 @@ Example with notification opt-out: ## API Overview - `thread/start` — create a new thread; emits `thread/started` (including the current `thread.status`) and auto-subscribes you to turn/item events for that thread. -- `thread/resume` — reopen an existing thread by id so subsequent `turn/start` calls append to it. +- `thread/resume` — reopen an existing thread by id so subsequent `turn/start` calls append to it. When calling `thread/resume` against a thread that is already loaded/running, override fields are ignored and logged as mismatch warnings rather than being reapplied mid-session. - `thread/fork` — fork an existing thread into a new thread id by copying the stored history; accepts `ephemeral: true` for an in-memory temporary fork, emits `thread/started` (including the current `thread.status`), and auto-subscribes you to turn/item events for the new thread. - `thread/list` — page through stored rollouts; supports cursor-based pagination and optional `modelProviders`, `sourceKinds`, `archived`, `cwd`, and `searchTerm` filters. Each returned `thread` includes `status` (`ThreadStatus`), defaulting to `notLoaded` when the thread is not currently loaded. - `thread/loaded/list` — list the thread ids currently loaded in memory. diff --git a/codex-rs/core/src/agent/control.rs b/codex-rs/core/src/agent/control.rs index c9ac18a026..f1040346a0 100644 --- a/codex-rs/core/src/agent/control.rs +++ b/codex-rs/core/src/agent/control.rs @@ -7,8 +7,8 @@ use crate::error::CodexErr; use crate::error::Result as CodexResult; use crate::find_thread_path_by_id_str; use crate::rollout::RolloutRecorder; +use crate::session_prefix::SubagentNotification; use crate::session_prefix::format_subagent_context_line; -use crate::session_prefix::format_subagent_notification_message; use crate::shell_snapshot::ShellSnapshot; use crate::state_db; use crate::thread_manager::ThreadManagerState; @@ -455,9 +455,10 @@ impl AgentControl { let Ok(parent_thread) = state.get_thread(parent_thread_id).await else { return; }; + let child_thread_id_string = child_thread_id.to_string(); parent_thread - .inject_user_message_without_turn(format_subagent_notification_message( - &child_thread_id.to_string(), + .inject_model_visible_fragment_without_turn(SubagentNotification::new( + &child_thread_id_string, &status, )) .await; @@ -487,5 +488,1098 @@ impl AgentControl { } } #[cfg(test)] -#[path = "control_tests.rs"] -mod tests; +mod tests { + use super::*; + use crate::CodexAuth; + use crate::CodexThread; + use crate::ThreadManager; + use crate::agent::agent_status_from_event; + use crate::config::AgentRoleConfig; + use crate::config::Config; + use crate::config::ConfigBuilder; + use crate::config_loader::LoaderOverrides; + use crate::features::Feature; + use crate::model_visible_context::SUBAGENT_NOTIFICATION_OPEN_TAG; + use assert_matches::assert_matches; + use codex_protocol::config_types::ModeKind; + use codex_protocol::models::ContentItem; + use codex_protocol::models::MessageRole; + use codex_protocol::models::ResponseItem; + use codex_protocol::protocol::ErrorEvent; + use codex_protocol::protocol::EventMsg; + use codex_protocol::protocol::SessionSource; + use codex_protocol::protocol::SubAgentSource; + use codex_protocol::protocol::TurnAbortReason; + use codex_protocol::protocol::TurnAbortedEvent; + use codex_protocol::protocol::TurnCompleteEvent; + use codex_protocol::protocol::TurnStartedEvent; + use pretty_assertions::assert_eq; + use tempfile::TempDir; + use tokio::time::Duration; + use tokio::time::sleep; + use tokio::time::timeout; + use toml::Value as TomlValue; + + async fn test_config_with_cli_overrides( + cli_overrides: Vec<(String, TomlValue)>, + ) -> (TempDir, Config) { + let home = TempDir::new().expect("create temp dir"); + let config = ConfigBuilder::default() + .codex_home(home.path().to_path_buf()) + .cli_overrides(cli_overrides) + .loader_overrides(LoaderOverrides { + #[cfg(target_os = "macos")] + managed_preferences_base64: Some(String::new()), + macos_managed_config_requirements_base64: Some(String::new()), + ..LoaderOverrides::default() + }) + .build() + .await + .expect("load default test config"); + (home, config) + } + + async fn test_config() -> (TempDir, Config) { + test_config_with_cli_overrides(Vec::new()).await + } + + fn text_input(text: &str) -> Vec { + vec![UserInput::Text { + text: text.to_string(), + text_elements: Vec::new(), + }] + } + + struct AgentControlHarness { + _home: TempDir, + config: Config, + manager: ThreadManager, + control: AgentControl, + } + + impl AgentControlHarness { + async fn new() -> Self { + let (home, config) = test_config().await; + let manager = ThreadManager::with_models_provider_and_home_for_tests( + CodexAuth::from_api_key("dummy"), + config.model_provider.clone(), + config.codex_home.clone(), + ); + let control = manager.agent_control(); + Self { + _home: home, + config, + manager, + control, + } + } + + async fn start_thread(&self) -> (ThreadId, Arc) { + let new_thread = self + .manager + .start_thread(self.config.clone()) + .await + .expect("start thread"); + (new_thread.thread_id, new_thread.thread) + } + } + + fn has_subagent_notification(history_items: &[ResponseItem]) -> bool { + history_items.iter().any(|item| { + let ResponseItem::Message { role, content, .. } = item else { + return false; + }; + if role != "developer" { + return false; + } + content.iter().any(|content_item| match content_item { + ContentItem::InputText { text } | ContentItem::OutputText { text } => { + text.contains(SUBAGENT_NOTIFICATION_OPEN_TAG) + } + ContentItem::InputImage { .. } => false, + }) + }) + } + + /// Returns true when any message item contains `needle` in a text span. + fn history_contains_text(history_items: &[ResponseItem], needle: &str) -> bool { + history_items.iter().any(|item| { + let ResponseItem::Message { content, .. } = item else { + return false; + }; + content.iter().any(|content_item| match content_item { + ContentItem::InputText { text } | ContentItem::OutputText { text } => { + text.contains(needle) + } + ContentItem::InputImage { .. } => false, + }) + }) + } + + async fn wait_for_subagent_notification(parent_thread: &Arc) -> bool { + let wait = async { + loop { + let history_items = parent_thread + .codex + .session + .clone_history() + .await + .raw_items() + .to_vec(); + if has_subagent_notification(&history_items) { + return true; + } + sleep(Duration::from_millis(25)).await; + } + }; + timeout(Duration::from_secs(2), wait).await.is_ok() + } + + #[tokio::test] + async fn send_input_errors_when_manager_dropped() { + let control = AgentControl::default(); + let err = control + .send_input( + ThreadId::new(), + vec![UserInput::Text { + text: "hello".to_string(), + text_elements: Vec::new(), + }], + ) + .await + .expect_err("send_input should fail without a manager"); + assert_eq!( + err.to_string(), + "unsupported operation: thread manager dropped" + ); + } + + #[tokio::test] + async fn get_status_returns_not_found_without_manager() { + let control = AgentControl::default(); + let got = control.get_status(ThreadId::new()).await; + assert_eq!(got, AgentStatus::NotFound); + } + + #[tokio::test] + async fn on_event_updates_status_from_task_started() { + let status = agent_status_from_event(&EventMsg::TurnStarted(TurnStartedEvent { + turn_id: "turn-1".to_string(), + model_context_window: None, + collaboration_mode_kind: ModeKind::Default, + })); + assert_eq!(status, Some(AgentStatus::Running)); + } + + #[tokio::test] + async fn on_event_updates_status_from_task_complete() { + let status = agent_status_from_event(&EventMsg::TurnComplete(TurnCompleteEvent { + turn_id: "turn-1".to_string(), + last_agent_message: Some("done".to_string()), + })); + let expected = AgentStatus::Completed(Some("done".to_string())); + assert_eq!(status, Some(expected)); + } + + #[tokio::test] + async fn on_event_updates_status_from_error() { + let status = agent_status_from_event(&EventMsg::Error(ErrorEvent { + message: "boom".to_string(), + codex_error_info: None, + })); + + let expected = AgentStatus::Errored("boom".to_string()); + assert_eq!(status, Some(expected)); + } + + #[tokio::test] + async fn on_event_updates_status_from_turn_aborted() { + let status = agent_status_from_event(&EventMsg::TurnAborted(TurnAbortedEvent { + turn_id: Some("turn-1".to_string()), + reason: TurnAbortReason::Interrupted, + })); + + let expected = AgentStatus::Errored("Interrupted".to_string()); + assert_eq!(status, Some(expected)); + } + + #[tokio::test] + async fn on_event_updates_status_from_shutdown_complete() { + let status = agent_status_from_event(&EventMsg::ShutdownComplete); + assert_eq!(status, Some(AgentStatus::Shutdown)); + } + + #[tokio::test] + async fn spawn_agent_errors_when_manager_dropped() { + let control = AgentControl::default(); + let (_home, config) = test_config().await; + let err = control + .spawn_agent(config, text_input("hello"), None) + .await + .expect_err("spawn_agent should fail without a manager"); + assert_eq!( + err.to_string(), + "unsupported operation: thread manager dropped" + ); + } + + #[tokio::test] + async fn resume_agent_errors_when_manager_dropped() { + let control = AgentControl::default(); + let (_home, config) = test_config().await; + let err = control + .resume_agent_from_rollout(config, ThreadId::new(), SessionSource::Exec) + .await + .expect_err("resume_agent should fail without a manager"); + assert_eq!( + err.to_string(), + "unsupported operation: thread manager dropped" + ); + } + + #[tokio::test] + async fn send_input_errors_when_thread_missing() { + let harness = AgentControlHarness::new().await; + let thread_id = ThreadId::new(); + let err = harness + .control + .send_input( + thread_id, + vec![UserInput::Text { + text: "hello".to_string(), + text_elements: Vec::new(), + }], + ) + .await + .expect_err("send_input should fail for missing thread"); + assert_matches!(err, CodexErr::ThreadNotFound(id) if id == thread_id); + } + + #[tokio::test] + async fn get_status_returns_not_found_for_missing_thread() { + let harness = AgentControlHarness::new().await; + let status = harness.control.get_status(ThreadId::new()).await; + assert_eq!(status, AgentStatus::NotFound); + } + + #[tokio::test] + async fn get_status_returns_pending_init_for_new_thread() { + let harness = AgentControlHarness::new().await; + let (thread_id, _) = harness.start_thread().await; + let status = harness.control.get_status(thread_id).await; + assert_eq!(status, AgentStatus::PendingInit); + } + + #[tokio::test] + async fn subscribe_status_errors_for_missing_thread() { + let harness = AgentControlHarness::new().await; + let thread_id = ThreadId::new(); + let err = harness + .control + .subscribe_status(thread_id) + .await + .expect_err("subscribe_status should fail for missing thread"); + assert_matches!(err, CodexErr::ThreadNotFound(id) if id == thread_id); + } + + #[tokio::test] + async fn subscribe_status_updates_on_shutdown() { + let harness = AgentControlHarness::new().await; + let (thread_id, thread) = harness.start_thread().await; + let mut status_rx = harness + .control + .subscribe_status(thread_id) + .await + .expect("subscribe_status should succeed"); + assert_eq!(status_rx.borrow().clone(), AgentStatus::PendingInit); + + let _ = thread + .submit(Op::Shutdown {}) + .await + .expect("shutdown should submit"); + + let _ = status_rx.changed().await; + assert_eq!(status_rx.borrow().clone(), AgentStatus::Shutdown); + } + + #[tokio::test] + async fn send_input_submits_user_message() { + let harness = AgentControlHarness::new().await; + let (thread_id, _thread) = harness.start_thread().await; + + let submission_id = harness + .control + .send_input( + thread_id, + vec![UserInput::Text { + text: "hello from tests".to_string(), + text_elements: Vec::new(), + }], + ) + .await + .expect("send_input should succeed"); + assert!(!submission_id.is_empty()); + let expected = ( + thread_id, + Op::UserInput { + items: vec![UserInput::Text { + text: "hello from tests".to_string(), + text_elements: Vec::new(), + }], + final_output_json_schema: None, + }, + ); + let captured = harness + .manager + .captured_ops() + .into_iter() + .find(|entry| *entry == expected); + assert_eq!(captured, Some(expected)); + } + + #[tokio::test] + async fn spawn_agent_creates_thread_and_sends_prompt() { + let harness = AgentControlHarness::new().await; + let thread_id = harness + .control + .spawn_agent(harness.config.clone(), text_input("spawned"), None) + .await + .expect("spawn_agent should succeed"); + let _thread = harness + .manager + .get_thread(thread_id) + .await + .expect("thread should be registered"); + let expected = ( + thread_id, + Op::UserInput { + items: vec![UserInput::Text { + text: "spawned".to_string(), + text_elements: Vec::new(), + }], + final_output_json_schema: None, + }, + ); + let captured = harness + .manager + .captured_ops() + .into_iter() + .find(|entry| *entry == expected); + assert_eq!(captured, Some(expected)); + } + + #[tokio::test] + async fn spawn_agent_can_fork_parent_thread_history() { + let harness = AgentControlHarness::new().await; + let (parent_thread_id, parent_thread) = harness.start_thread().await; + parent_thread + .inject_message_without_turn(MessageRole::User, "parent seed context".to_string()) + .await; + let turn_context = parent_thread.codex.session.new_default_turn().await; + let parent_spawn_call_id = "spawn-call-history".to_string(); + let parent_spawn_call = ResponseItem::FunctionCall { + id: None, + name: "spawn_agent".to_string(), + arguments: "{}".to_string(), + call_id: parent_spawn_call_id.clone(), + }; + parent_thread + .codex + .session + .record_conversation_items(turn_context.as_ref(), &[parent_spawn_call]) + .await; + parent_thread + .codex + .session + .ensure_rollout_materialized() + .await; + parent_thread.codex.session.flush_rollout().await; + + let child_thread_id = harness + .control + .spawn_agent_with_options( + harness.config.clone(), + text_input("child task"), + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: None, + })), + SpawnAgentOptions { + fork_parent_spawn_call_id: Some(parent_spawn_call_id), + }, + ) + .await + .expect("forked spawn should succeed"); + + let child_thread = harness + .manager + .get_thread(child_thread_id) + .await + .expect("child thread should be registered"); + assert_ne!(child_thread_id, parent_thread_id); + let history = child_thread.codex.session.clone_history().await; + assert!(history_contains_text( + history.raw_items(), + "parent seed context" + )); + + let expected = ( + child_thread_id, + Op::UserInput { + items: vec![UserInput::Text { + text: "child task".to_string(), + text_elements: Vec::new(), + }], + final_output_json_schema: None, + }, + ); + let captured = harness + .manager + .captured_ops() + .into_iter() + .find(|entry| *entry == expected); + assert_eq!(captured, Some(expected)); + + let _ = harness + .control + .shutdown_agent(child_thread_id) + .await + .expect("child shutdown should submit"); + let _ = parent_thread + .submit(Op::Shutdown {}) + .await + .expect("parent shutdown should submit"); + } + + #[tokio::test] + async fn spawn_agent_fork_injects_output_for_parent_spawn_call() { + let harness = AgentControlHarness::new().await; + let (parent_thread_id, parent_thread) = harness.start_thread().await; + let turn_context = parent_thread.codex.session.new_default_turn().await; + let parent_spawn_call_id = "spawn-call-1".to_string(); + let parent_spawn_call = ResponseItem::FunctionCall { + id: None, + name: "spawn_agent".to_string(), + arguments: "{}".to_string(), + call_id: parent_spawn_call_id.clone(), + }; + parent_thread + .codex + .session + .record_conversation_items(turn_context.as_ref(), &[parent_spawn_call]) + .await; + parent_thread + .codex + .session + .ensure_rollout_materialized() + .await; + parent_thread.codex.session.flush_rollout().await; + + let child_thread_id = harness + .control + .spawn_agent_with_options( + harness.config.clone(), + text_input("child task"), + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: None, + })), + SpawnAgentOptions { + fork_parent_spawn_call_id: Some(parent_spawn_call_id.clone()), + }, + ) + .await + .expect("forked spawn should succeed"); + + let child_thread = harness + .manager + .get_thread(child_thread_id) + .await + .expect("child thread should be registered"); + let history = child_thread.codex.session.clone_history().await; + let injected_output = history.raw_items().iter().find_map(|item| match item { + ResponseItem::FunctionCallOutput { call_id, output } + if call_id == &parent_spawn_call_id => + { + Some(output) + } + _ => None, + }); + let injected_output = + injected_output.expect("forked child should contain synthetic tool output"); + assert_eq!( + injected_output.text_content(), + Some(FORKED_SPAWN_AGENT_OUTPUT_MESSAGE) + ); + assert_eq!(injected_output.success, Some(true)); + + let _ = harness + .control + .shutdown_agent(child_thread_id) + .await + .expect("child shutdown should submit"); + let _ = parent_thread + .submit(Op::Shutdown {}) + .await + .expect("parent shutdown should submit"); + } + + #[tokio::test] + async fn spawn_agent_fork_flushes_parent_rollout_before_loading_history() { + let harness = AgentControlHarness::new().await; + let (parent_thread_id, parent_thread) = harness.start_thread().await; + let turn_context = parent_thread.codex.session.new_default_turn().await; + let parent_spawn_call_id = "spawn-call-unflushed".to_string(); + let parent_spawn_call = ResponseItem::FunctionCall { + id: None, + name: "spawn_agent".to_string(), + arguments: "{}".to_string(), + call_id: parent_spawn_call_id.clone(), + }; + parent_thread + .codex + .session + .record_conversation_items(turn_context.as_ref(), &[parent_spawn_call]) + .await; + + let child_thread_id = harness + .control + .spawn_agent_with_options( + harness.config.clone(), + text_input("child task"), + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: None, + })), + SpawnAgentOptions { + fork_parent_spawn_call_id: Some(parent_spawn_call_id.clone()), + }, + ) + .await + .expect("forked spawn should flush parent rollout before loading history"); + + let child_thread = harness + .manager + .get_thread(child_thread_id) + .await + .expect("child thread should be registered"); + let history = child_thread.codex.session.clone_history().await; + + let mut parent_call_index = None; + let mut injected_output_index = None; + for (idx, item) in history.raw_items().iter().enumerate() { + match item { + ResponseItem::FunctionCall { call_id, .. } if call_id == &parent_spawn_call_id => { + parent_call_index = Some(idx); + } + ResponseItem::FunctionCallOutput { call_id, .. } + if call_id == &parent_spawn_call_id => + { + injected_output_index = Some(idx); + } + _ => {} + } + } + + let parent_call_index = + parent_call_index.expect("forked child should include the parent spawn_agent call"); + let injected_output_index = injected_output_index + .expect("forked child should include synthetic output for the parent spawn_agent call"); + assert!(parent_call_index < injected_output_index); + + let _ = harness + .control + .shutdown_agent(child_thread_id) + .await + .expect("child shutdown should submit"); + let _ = parent_thread + .submit(Op::Shutdown {}) + .await + .expect("parent shutdown should submit"); + } + + #[tokio::test] + async fn spawn_agent_respects_max_threads_limit() { + let max_threads = 1usize; + let (_home, config) = test_config_with_cli_overrides(vec![( + "agents.max_threads".to_string(), + TomlValue::Integer(max_threads as i64), + )]) + .await; + let manager = ThreadManager::with_models_provider_and_home_for_tests( + CodexAuth::from_api_key("dummy"), + config.model_provider.clone(), + config.codex_home.clone(), + ); + let control = manager.agent_control(); + + let _ = manager + .start_thread(config.clone()) + .await + .expect("start thread"); + + let first_agent_id = control + .spawn_agent(config.clone(), text_input("hello"), None) + .await + .expect("spawn_agent should succeed"); + + let err = control + .spawn_agent(config, text_input("hello again"), None) + .await + .expect_err("spawn_agent should respect max threads"); + let CodexErr::AgentLimitReached { + max_threads: seen_max_threads, + } = err + else { + panic!("expected CodexErr::AgentLimitReached"); + }; + assert_eq!(seen_max_threads, max_threads); + + let _ = control + .shutdown_agent(first_agent_id) + .await + .expect("shutdown agent"); + } + + #[tokio::test] + async fn spawn_agent_releases_slot_after_shutdown() { + let max_threads = 1usize; + let (_home, config) = test_config_with_cli_overrides(vec![( + "agents.max_threads".to_string(), + TomlValue::Integer(max_threads as i64), + )]) + .await; + let manager = ThreadManager::with_models_provider_and_home_for_tests( + CodexAuth::from_api_key("dummy"), + config.model_provider.clone(), + config.codex_home.clone(), + ); + let control = manager.agent_control(); + + let first_agent_id = control + .spawn_agent(config.clone(), text_input("hello"), None) + .await + .expect("spawn_agent should succeed"); + let _ = control + .shutdown_agent(first_agent_id) + .await + .expect("shutdown agent"); + + let second_agent_id = control + .spawn_agent(config.clone(), text_input("hello again"), None) + .await + .expect("spawn_agent should succeed after shutdown"); + let _ = control + .shutdown_agent(second_agent_id) + .await + .expect("shutdown agent"); + } + + #[tokio::test] + async fn spawn_agent_limit_shared_across_clones() { + let max_threads = 1usize; + let (_home, config) = test_config_with_cli_overrides(vec![( + "agents.max_threads".to_string(), + TomlValue::Integer(max_threads as i64), + )]) + .await; + let manager = ThreadManager::with_models_provider_and_home_for_tests( + CodexAuth::from_api_key("dummy"), + config.model_provider.clone(), + config.codex_home.clone(), + ); + let control = manager.agent_control(); + let cloned = control.clone(); + + let first_agent_id = cloned + .spawn_agent(config.clone(), text_input("hello"), None) + .await + .expect("spawn_agent should succeed"); + + let err = control + .spawn_agent(config, text_input("hello again"), None) + .await + .expect_err("spawn_agent should respect shared guard"); + let CodexErr::AgentLimitReached { max_threads } = err else { + panic!("expected CodexErr::AgentLimitReached"); + }; + assert_eq!(max_threads, 1); + + let _ = control + .shutdown_agent(first_agent_id) + .await + .expect("shutdown agent"); + } + + #[tokio::test] + async fn resume_agent_respects_max_threads_limit() { + let max_threads = 1usize; + let (_home, config) = test_config_with_cli_overrides(vec![( + "agents.max_threads".to_string(), + TomlValue::Integer(max_threads as i64), + )]) + .await; + let manager = ThreadManager::with_models_provider_and_home_for_tests( + CodexAuth::from_api_key("dummy"), + config.model_provider.clone(), + config.codex_home.clone(), + ); + let control = manager.agent_control(); + + let resumable_id = control + .spawn_agent(config.clone(), text_input("hello"), None) + .await + .expect("spawn_agent should succeed"); + let _ = control + .shutdown_agent(resumable_id) + .await + .expect("shutdown resumable thread"); + + let active_id = control + .spawn_agent(config.clone(), text_input("occupy"), None) + .await + .expect("spawn_agent should succeed for active slot"); + + let err = control + .resume_agent_from_rollout(config, resumable_id, SessionSource::Exec) + .await + .expect_err("resume should respect max threads"); + let CodexErr::AgentLimitReached { + max_threads: seen_max_threads, + } = err + else { + panic!("expected CodexErr::AgentLimitReached"); + }; + assert_eq!(seen_max_threads, max_threads); + + let _ = control + .shutdown_agent(active_id) + .await + .expect("shutdown active thread"); + } + + #[tokio::test] + async fn resume_agent_releases_slot_after_resume_failure() { + let max_threads = 1usize; + let (_home, config) = test_config_with_cli_overrides(vec![( + "agents.max_threads".to_string(), + TomlValue::Integer(max_threads as i64), + )]) + .await; + let manager = ThreadManager::with_models_provider_and_home_for_tests( + CodexAuth::from_api_key("dummy"), + config.model_provider.clone(), + config.codex_home.clone(), + ); + let control = manager.agent_control(); + + let _ = control + .resume_agent_from_rollout(config.clone(), ThreadId::new(), SessionSource::Exec) + .await + .expect_err("resume should fail for missing rollout path"); + + let resumed_id = control + .spawn_agent(config, text_input("hello"), None) + .await + .expect("spawn should succeed after failed resume"); + let _ = control + .shutdown_agent(resumed_id) + .await + .expect("shutdown resumed thread"); + } + + #[tokio::test] + async fn spawn_child_completion_notifies_parent_history() { + let harness = AgentControlHarness::new().await; + let (parent_thread_id, parent_thread) = harness.start_thread().await; + + let child_thread_id = harness + .control + .spawn_agent( + harness.config.clone(), + text_input("hello child"), + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: Some("explorer".to_string()), + })), + ) + .await + .expect("child spawn should succeed"); + + let child_thread = harness + .manager + .get_thread(child_thread_id) + .await + .expect("child thread should exist"); + let _ = child_thread + .submit(Op::Shutdown {}) + .await + .expect("child shutdown should submit"); + + assert_eq!(wait_for_subagent_notification(&parent_thread).await, true); + } + + #[tokio::test] + async fn completion_watcher_notifies_parent_when_child_is_missing() { + let harness = AgentControlHarness::new().await; + let (parent_thread_id, parent_thread) = harness.start_thread().await; + let child_thread_id = ThreadId::new(); + + harness.control.maybe_start_completion_watcher( + child_thread_id, + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: Some("explorer".to_string()), + })), + ); + + assert_eq!(wait_for_subagent_notification(&parent_thread).await, true); + + let history_items = parent_thread + .codex + .session + .clone_history() + .await + .raw_items() + .to_vec(); + assert_eq!( + history_contains_text( + &history_items, + &format!("\"agent_id\":\"{child_thread_id}\"") + ), + true + ); + assert_eq!( + history_contains_text(&history_items, "\"status\":\"not_found\""), + true + ); + } + + #[tokio::test] + async fn spawn_thread_subagent_gets_random_nickname_in_session_source() { + let harness = AgentControlHarness::new().await; + let (parent_thread_id, _parent_thread) = harness.start_thread().await; + + let child_thread_id = harness + .control + .spawn_agent( + harness.config.clone(), + text_input("hello child"), + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: Some("explorer".to_string()), + })), + ) + .await + .expect("child spawn should succeed"); + + let child_thread = harness + .manager + .get_thread(child_thread_id) + .await + .expect("child thread should be registered"); + let snapshot = child_thread.config_snapshot().await; + + let SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id: seen_parent_thread_id, + depth, + agent_nickname, + agent_role, + }) = snapshot.session_source + else { + panic!("expected thread-spawn sub-agent source"); + }; + assert_eq!(seen_parent_thread_id, parent_thread_id); + assert_eq!(depth, 1); + assert!(agent_nickname.is_some()); + assert_eq!(agent_role, Some("explorer".to_string())); + } + + #[tokio::test] + async fn spawn_thread_subagent_uses_role_specific_nickname_candidates() { + let mut harness = AgentControlHarness::new().await; + harness.config.agent_roles.insert( + "researcher".to_string(), + AgentRoleConfig { + description: Some("Research role".to_string()), + config_file: None, + nickname_candidates: Some(vec!["Atlas".to_string()]), + }, + ); + let (parent_thread_id, _parent_thread) = harness.start_thread().await; + + let child_thread_id = harness + .control + .spawn_agent( + harness.config.clone(), + text_input("hello child"), + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: Some("researcher".to_string()), + })), + ) + .await + .expect("child spawn should succeed"); + + let child_thread = harness + .manager + .get_thread(child_thread_id) + .await + .expect("child thread should be registered"); + let snapshot = child_thread.config_snapshot().await; + + let SessionSource::SubAgent(SubAgentSource::ThreadSpawn { agent_nickname, .. }) = + snapshot.session_source + else { + panic!("expected thread-spawn sub-agent source"); + }; + assert_eq!(agent_nickname, Some("Atlas".to_string())); + } + + #[tokio::test] + async fn resume_thread_subagent_restores_stored_nickname_and_role() { + let (home, mut config) = test_config().await; + config + .features + .enable(Feature::Sqlite) + .expect("test config should allow sqlite"); + let manager = ThreadManager::with_models_provider_and_home_for_tests( + CodexAuth::from_api_key("dummy"), + config.model_provider.clone(), + config.codex_home.clone(), + ); + let control = manager.agent_control(); + let harness = AgentControlHarness { + _home: home, + config, + manager, + control, + }; + let (parent_thread_id, _parent_thread) = harness.start_thread().await; + + let child_thread_id = harness + .control + .spawn_agent( + harness.config.clone(), + text_input("hello child"), + Some(SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: Some("explorer".to_string()), + })), + ) + .await + .expect("child spawn should succeed"); + + let child_thread = harness + .manager + .get_thread(child_thread_id) + .await + .expect("child thread should exist"); + let mut status_rx = harness + .control + .subscribe_status(child_thread_id) + .await + .expect("status subscription should succeed"); + if matches!(status_rx.borrow().clone(), AgentStatus::PendingInit) { + timeout(Duration::from_secs(5), async { + loop { + status_rx + .changed() + .await + .expect("child status should advance past pending init"); + if !matches!(status_rx.borrow().clone(), AgentStatus::PendingInit) { + break; + } + } + }) + .await + .expect("child should initialize before shutdown"); + } + let original_snapshot = child_thread.config_snapshot().await; + let original_nickname = original_snapshot + .session_source + .get_nickname() + .expect("spawned sub-agent should have a nickname"); + let state_db = child_thread + .state_db() + .expect("sqlite state db should be available for nickname resume test"); + timeout(Duration::from_secs(5), async { + loop { + if let Ok(Some(metadata)) = state_db.get_thread(child_thread_id).await + && metadata.agent_nickname.is_some() + && metadata.agent_role.as_deref() == Some("explorer") + { + break; + } + sleep(Duration::from_millis(10)).await; + } + }) + .await + .expect("child thread metadata should be persisted to sqlite before shutdown"); + + let _ = harness + .control + .shutdown_agent(child_thread_id) + .await + .expect("child shutdown should submit"); + + let resumed_thread_id = harness + .control + .resume_agent_from_rollout( + harness.config.clone(), + child_thread_id, + SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id, + depth: 1, + agent_nickname: None, + agent_role: None, + }), + ) + .await + .expect("resume should succeed"); + assert_eq!(resumed_thread_id, child_thread_id); + + let resumed_snapshot = harness + .manager + .get_thread(resumed_thread_id) + .await + .expect("resumed child thread should exist") + .config_snapshot() + .await; + let SessionSource::SubAgent(SubAgentSource::ThreadSpawn { + parent_thread_id: resumed_parent_thread_id, + depth: resumed_depth, + agent_nickname: resumed_nickname, + agent_role: resumed_role, + }) = resumed_snapshot.session_source + else { + panic!("expected thread-spawn sub-agent source"); + }; + assert_eq!(resumed_parent_thread_id, parent_thread_id); + assert_eq!(resumed_depth, 1); + assert_eq!(resumed_nickname, Some(original_nickname)); + assert_eq!(resumed_role, Some("explorer".to_string())); + + let _ = harness + .control + .shutdown_agent(resumed_thread_id) + .await + .expect("resumed child shutdown should submit"); + } +} diff --git a/codex-rs/core/src/apps/render.rs b/codex-rs/core/src/apps/render.rs index 98af11fb01..e7a30d49eb 100644 --- a/codex-rs/core/src/apps/render.rs +++ b/codex-rs/core/src/apps/render.rs @@ -1,6 +1,9 @@ use crate::mcp::CODEX_APPS_MCP_SERVER_NAME; pub(crate) fn render_apps_section() -> String { + // TODO(ccunningham): Revisit whether this should move to an XML-ish wrapper + // if we later standardize developer-envelope formatting around tagged + // fragments. format!( "## Apps\nApps are mentioned in user messages in the format `[$app-name](app://{{connector_id}})`.\nAn app is equivalent to a set of MCP tools within the `{CODEX_APPS_MCP_SERVER_NAME}` MCP.\nWhen you see an app mention, the app's MCP tools are either available tools in the `{CODEX_APPS_MCP_SERVER_NAME}` MCP server, or the tools do not exist because the user has not installed the app.\nDo not additionally call list_mcp_resources for apps that are already mentioned." ) diff --git a/codex-rs/core/src/arc_monitor.rs b/codex-rs/core/src/arc_monitor.rs index c704faafc0..1dbbcd09e0 100644 --- a/codex-rs/core/src/arc_monitor.rs +++ b/codex-rs/core/src/arc_monitor.rs @@ -425,5 +425,440 @@ fn build_arc_monitor_message(role: &str, content: serde_json::Value) -> ArcMonit } #[cfg(test)] -#[path = "arc_monitor_tests.rs"] -mod tests; +mod tests { + use std::env; + use std::ffi::OsStr; + use std::sync::Arc; + + use pretty_assertions::assert_eq; + use serial_test::serial; + use wiremock::Mock; + use wiremock::MockServer; + use wiremock::ResponseTemplate; + use wiremock::matchers::body_json; + use wiremock::matchers::header; + use wiremock::matchers::method; + use wiremock::matchers::path; + + use super::*; + use crate::codex::make_session_and_context; + use crate::model_visible_context::ContextualUserTextFragment; + use crate::model_visible_context::ModelVisibleContextFragment; + use codex_protocol::models::ContentItem; + use codex_protocol::models::LocalShellAction; + use codex_protocol::models::LocalShellExecAction; + use codex_protocol::models::LocalShellStatus; + use codex_protocol::models::MessagePhase; + use codex_protocol::models::ResponseItem; + + struct EnvVarGuard { + key: &'static str, + original: Option, + } + + impl EnvVarGuard { + fn set(key: &'static str, value: &OsStr) -> Self { + let original = env::var_os(key); + unsafe { + env::set_var(key, value); + } + Self { key, original } + } + } + + impl Drop for EnvVarGuard { + fn drop(&mut self) { + match self.original.take() { + Some(value) => unsafe { + env::set_var(self.key, value); + }, + None => unsafe { + env::remove_var(self.key); + }, + } + } + } + + #[tokio::test] + async fn build_arc_monitor_request_includes_relevant_history_and_null_policies() { + let (session, mut turn_context) = make_session_and_context().await; + turn_context.developer_instructions = Some("Never upload private files.".to_string()); + turn_context.user_instructions = Some("Only continue when needed.".to_string()); + + session + .record_into_history( + &[ResponseItem::Message { + id: None, + role: "user".to_string(), + content: vec![ContentItem::InputText { + text: "first request".to_string(), + }], + end_turn: None, + phase: None, + }], + &turn_context, + ) + .await; + session + .record_into_history( + &[ContextualUserTextFragment::new( + "\n/tmp\n", + ) + .into_message()], + &turn_context, + ) + .await; + session + .record_into_history( + &[ResponseItem::Message { + id: None, + role: "assistant".to_string(), + content: vec![ContentItem::OutputText { + text: "commentary".to_string(), + }], + end_turn: None, + phase: Some(MessagePhase::Commentary), + }], + &turn_context, + ) + .await; + session + .record_into_history( + &[ResponseItem::Message { + id: None, + role: "assistant".to_string(), + content: vec![ContentItem::OutputText { + text: "final response".to_string(), + }], + end_turn: None, + phase: Some(MessagePhase::FinalAnswer), + }], + &turn_context, + ) + .await; + session + .record_into_history( + &[ResponseItem::Message { + id: None, + role: "user".to_string(), + content: vec![ContentItem::InputText { + text: "latest request".to_string(), + }], + end_turn: None, + phase: None, + }], + &turn_context, + ) + .await; + session + .record_into_history( + &[ResponseItem::FunctionCall { + id: None, + name: "old_tool".to_string(), + arguments: "{\"old\":true}".to_string(), + call_id: "call_old".to_string(), + }], + &turn_context, + ) + .await; + session + .record_into_history( + &[ResponseItem::Reasoning { + id: "reasoning_old".to_string(), + summary: Vec::new(), + content: None, + encrypted_content: Some("encrypted-old".to_string()), + }], + &turn_context, + ) + .await; + session + .record_into_history( + &[ResponseItem::LocalShellCall { + id: None, + call_id: Some("shell_call".to_string()), + status: LocalShellStatus::Completed, + action: LocalShellAction::Exec(LocalShellExecAction { + command: vec!["pwd".to_string()], + timeout_ms: Some(1000), + working_directory: Some("/tmp".to_string()), + env: None, + user: None, + }), + }], + &turn_context, + ) + .await; + session + .record_into_history( + &[ResponseItem::Reasoning { + id: "reasoning_latest".to_string(), + summary: Vec::new(), + content: None, + encrypted_content: Some("encrypted-latest".to_string()), + }], + &turn_context, + ) + .await; + + let request = build_arc_monitor_request( + &session, + &turn_context, + serde_json::from_value(serde_json::json!({ "tool": "mcp_tool_call" })) + .expect("action should deserialize"), + ) + .await; + + assert_eq!( + request, + ArcMonitorRequest { + metadata: ArcMonitorMetadata { + codex_thread_id: session.conversation_id.to_string(), + codex_turn_id: turn_context.sub_id.clone(), + conversation_id: Some(session.conversation_id.to_string()), + protection_client_callsite: None, + }, + messages: Some(vec![ + ArcMonitorChatMessage { + role: "user".to_string(), + content: serde_json::json!([{ + "type": "input_text", + "text": "first request", + }]), + }, + ArcMonitorChatMessage { + role: "assistant".to_string(), + content: serde_json::json!([{ + "type": "output_text", + "text": "final response", + }]), + }, + ArcMonitorChatMessage { + role: "user".to_string(), + content: serde_json::json!([{ + "type": "input_text", + "text": "latest request", + }]), + }, + ArcMonitorChatMessage { + role: "assistant".to_string(), + content: serde_json::json!([{ + "type": "tool_call", + "tool_name": "shell", + "action": { + "type": "exec", + "command": ["pwd"], + "timeout_ms": 1000, + "working_directory": "/tmp", + "env": null, + "user": null, + }, + }]), + }, + ArcMonitorChatMessage { + role: "assistant".to_string(), + content: serde_json::json!([{ + "type": "encrypted_reasoning", + "encrypted_content": "encrypted-latest", + }]), + }, + ]), + input: None, + policies: Some(ArcMonitorPolicies { + user: None, + developer: None, + }), + action: serde_json::from_value(serde_json::json!({ "tool": "mcp_tool_call" })) + .expect("action should deserialize"), + } + ); + } + + #[tokio::test] + #[serial(arc_monitor_env)] + async fn monitor_action_posts_expected_arc_request() { + let server = MockServer::start().await; + let (session, mut turn_context) = make_session_and_context().await; + turn_context.auth_manager = Some(crate::test_support::auth_manager_from_auth( + crate::CodexAuth::create_dummy_chatgpt_auth_for_testing(), + )); + turn_context.developer_instructions = Some("Developer policy".to_string()); + turn_context.user_instructions = Some("User policy".to_string()); + + let mut config = (*turn_context.config).clone(); + config.chatgpt_base_url = server.uri(); + turn_context.config = Arc::new(config); + + session + .record_into_history( + &[ResponseItem::Message { + id: None, + role: "user".to_string(), + content: vec![ContentItem::InputText { + text: "please run the tool".to_string(), + }], + end_turn: None, + phase: None, + }], + &turn_context, + ) + .await; + + Mock::given(method("POST")) + .and(path("/codex/safety/arc")) + .and(header("authorization", "Bearer Access Token")) + .and(header("chatgpt-account-id", "account_id")) + .and(body_json(serde_json::json!({ + "metadata": { + "codex_thread_id": session.conversation_id.to_string(), + "codex_turn_id": turn_context.sub_id.clone(), + "conversation_id": session.conversation_id.to_string(), + }, + "messages": [{ + "role": "user", + "content": [{ + "type": "input_text", + "text": "please run the tool", + }], + }], + "policies": { + "developer": null, + "user": null, + }, + "action": { + "tool": "mcp_tool_call", + }, + }))) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "outcome": "ask-user", + "short_reason": "needs confirmation", + "rationale": "tool call needs additional review", + "risk_score": 42, + "risk_level": "medium", + "evidence": [{ + "message": "browser_navigate", + "why": "tool call needs additional review", + }], + }))) + .expect(1) + .mount(&server) + .await; + + let outcome = monitor_action( + &session, + &turn_context, + serde_json::json!({ "tool": "mcp_tool_call" }), + ) + .await; + + assert_eq!( + outcome, + ArcMonitorOutcome::AskUser("needs confirmation".to_string()) + ); + } + + #[tokio::test] + #[serial(arc_monitor_env)] + async fn monitor_action_uses_env_url_and_token_overrides() { + let server = MockServer::start().await; + let _url_guard = EnvVarGuard::set( + CODEX_ARC_MONITOR_ENDPOINT_OVERRIDE, + OsStr::new(&format!("{}/override/arc", server.uri())), + ); + let _token_guard = EnvVarGuard::set(CODEX_ARC_MONITOR_TOKEN, OsStr::new("override-token")); + + let (session, turn_context) = make_session_and_context().await; + session + .record_into_history( + &[ResponseItem::Message { + id: None, + role: "user".to_string(), + content: vec![ContentItem::InputText { + text: "please run the tool".to_string(), + }], + end_turn: None, + phase: None, + }], + &turn_context, + ) + .await; + + Mock::given(method("POST")) + .and(path("/override/arc")) + .and(header("authorization", "Bearer override-token")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "outcome": "steer-model", + "short_reason": "needs approval", + "rationale": "high-risk action", + "risk_score": 96, + "risk_level": "critical", + "evidence": [{ + "message": "browser_navigate", + "why": "high-risk action", + }], + }))) + .expect(1) + .mount(&server) + .await; + + let outcome = monitor_action( + &session, + &turn_context, + serde_json::json!({ "tool": "mcp_tool_call" }), + ) + .await; + + assert_eq!( + outcome, + ArcMonitorOutcome::SteerModel("high-risk action".to_string()) + ); + } + + #[tokio::test] + #[serial(arc_monitor_env)] + async fn monitor_action_rejects_legacy_response_fields() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/codex/safety/arc")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "outcome": "steer-model", + "reason": "legacy high-risk action", + "monitorRequestId": "arc_456", + }))) + .expect(1) + .mount(&server) + .await; + + let (session, mut turn_context) = make_session_and_context().await; + turn_context.auth_manager = Some(crate::test_support::auth_manager_from_auth( + crate::CodexAuth::create_dummy_chatgpt_auth_for_testing(), + )); + let mut config = (*turn_context.config).clone(); + config.chatgpt_base_url = server.uri(); + turn_context.config = Arc::new(config); + + session + .record_into_history( + &[ResponseItem::Message { + id: None, + role: "user".to_string(), + content: vec![ContentItem::InputText { + text: "please run the tool".to_string(), + }], + end_turn: None, + phase: None, + }], + &turn_context, + ) + .await; + + let outcome = monitor_action( + &session, + &turn_context, + serde_json::json!({ "tool": "mcp_tool_call" }), + ) + .await; + + assert_eq!(outcome, ArcMonitorOutcome::Ok); + } +} diff --git a/codex-rs/core/src/codex.rs b/codex-rs/core/src/codex.rs index fb35ab4a36..2ee6448917 100644 --- a/codex-rs/core/src/codex.rs +++ b/codex-rs/core/src/codex.rs @@ -167,11 +167,13 @@ use crate::config::types::McpServerConfig; use crate::config::types::ShellEnvironmentPolicy; use crate::context_manager::ContextManager; use crate::context_manager::TotalTokenUsageBreakdown; -use crate::environment_context::EnvironmentContext; use crate::error::CodexErr; use crate::error::Result as CodexResult; #[cfg(test)] use crate::exec::StreamOutput; +use crate::model_visible_context::DeveloperTextFragment; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::TurnContextDiffParams; use codex_config::CONFIG_TOML_FILE; mod rollout_reconstruction; @@ -202,7 +204,6 @@ use crate::feedback_tags; use crate::file_watcher::FileWatcher; use crate::file_watcher::FileWatcherEvent; use crate::git_info::get_git_repo_root; -use crate::instructions::UserInstructions; use crate::mcp::CODEX_APPS_MCP_SERVER_NAME; use crate::mcp::McpManager; use crate::mcp::auth::compute_auth_statuses; @@ -220,7 +221,8 @@ use crate::mentions::collect_tool_mentions_from_messages; use crate::network_policy_decision::execpolicy_network_rule_amendment; use crate::plugins::PluginsManager; use crate::plugins::build_plugin_injections; -use crate::project_doc::get_user_instructions; +use crate::plugins::render_plugin_instructions; +use crate::project_doc::build_project_doc_instructions_text; use crate::protocol::AgentMessageContentDeltaEvent; use crate::protocol::AgentReasoningSectionBreakEvent; use crate::protocol::ApplyPatchApprovalRequestEvent; @@ -316,7 +318,6 @@ use codex_protocol::config_types::ReasoningSummary as ReasoningSummaryConfig; use codex_protocol::config_types::ServiceTier; use codex_protocol::config_types::WindowsSandboxLevel; use codex_protocol::models::ContentItem; -use codex_protocol::models::DeveloperInstructions; use codex_protocol::models::ResponseInputItem; use codex_protocol::models::ResponseItem; use codex_protocol::openai_models::ReasoningEffort as ReasoningEffortConfig; @@ -422,7 +423,6 @@ impl Codex { let (tx_sub, rx_sub) = async_channel::bounded(SUBMISSION_CHANNEL_CAPACITY); let (tx_event, rx_event) = async_channel::unbounded(); - let loaded_plugins = plugins_manager.plugins_for_config(&config); let loaded_skills = skills_manager.skills_for_config(&config); for err in &loaded_skills.errors { @@ -468,14 +468,8 @@ impl Codex { config.startup_warnings.push(message); } - let allowed_skills_for_implicit_invocation = - loaded_skills.allowed_skills_for_implicit_invocation(); - let user_instructions = get_user_instructions( - &config, - Some(&allowed_skills_for_implicit_invocation), - Some(loaded_plugins.capability_summaries()), - ) - .await; + let project_doc_instructions = build_project_doc_instructions_text(&config).await; + let user_instructions = config.user_instructions.clone(); let exec_policy = if crate::guardian::is_guardian_subagent_source(&session_source) { // Guardian review should rely on the built-in shell safety checks, @@ -559,6 +553,7 @@ impl Codex { model_reasoning_summary: config.model_reasoning_summary, service_tier: config.service_tier, developer_instructions: config.developer_instructions.clone(), + project_doc_instructions, user_instructions, personality: config.personality, base_instructions, @@ -795,6 +790,7 @@ pub(crate) struct TurnContext { pub(crate) app_server_client_name: Option, pub(crate) developer_instructions: Option, pub(crate) compact_prompt: Option, + pub(crate) project_doc_instructions: Option, pub(crate) user_instructions: Option, pub(crate) collaboration_mode: CollaborationMode, pub(crate) personality: Option, @@ -898,6 +894,7 @@ impl TurnContext { app_server_client_name: self.app_server_client_name.clone(), developer_instructions: self.developer_instructions.clone(), compact_prompt: self.compact_prompt.clone(), + project_doc_instructions: self.project_doc_instructions.clone(), user_instructions: self.user_instructions.clone(), collaboration_mode, personality: self.personality, @@ -951,6 +948,7 @@ impl TurnContext { realtime_active: Some(self.realtime_active), effort: self.reasoning_effort, summary: self.reasoning_summary, + project_doc_instructions: self.project_doc_instructions.clone(), user_instructions: self.user_instructions.clone(), developer_instructions: self.developer_instructions.clone(), final_output_json_schema: self.final_output_json_schema.clone(), @@ -994,7 +992,10 @@ pub(crate) struct SessionConfiguration { /// Developer instructions that supplement the base instructions. developer_instructions: Option, - /// Model instructions that are appended to the base instructions. + /// Project-doc / AGENTS instructions for the session. + project_doc_instructions: Option, + + /// Custom user instructions configured for the session. user_instructions: Option, /// Personality preference for the model. @@ -1059,6 +1060,12 @@ impl SessionConfiguration { } } + async fn refresh_project_doc_instructions_for_cwd(&mut self) { + let mut config = (*self.original_config_do_not_use).clone(); + config.cwd = self.cwd.clone(); + self.project_doc_instructions = build_project_doc_instructions_text(&config).await; + } + pub(crate) fn apply(&self, updates: &SessionSettingsUpdate) -> ConstraintResult { let mut next_configuration = self.clone(); let file_system_policy_matches_legacy = self.file_system_sandbox_policy @@ -1315,6 +1322,7 @@ impl Session { app_server_client_name: session_configuration.app_server_client_name.clone(), developer_instructions: session_configuration.developer_instructions.clone(), compact_prompt: session_configuration.compact_prompt.clone(), + project_doc_instructions: session_configuration.project_doc_instructions.clone(), user_instructions: session_configuration.user_instructions.clone(), collaboration_mode: session_configuration.collaboration_mode.clone(), personality: session_configuration.personality, @@ -2170,31 +2178,36 @@ impl Session { &self, updates: SessionSettingsUpdate, ) -> ConstraintResult<()> { - let mut state = self.state.lock().await; - - match state.session_configuration.apply(&updates) { - Ok(updated) => { - let previous_cwd = state.session_configuration.cwd.clone(); - let next_cwd = updated.cwd.clone(); - let codex_home = updated.codex_home.clone(); - let session_source = updated.session_source.clone(); - state.session_configuration = updated; - drop(state); - - self.maybe_refresh_shell_snapshot_for_cwd( - &previous_cwd, - &next_cwd, - &codex_home, - &session_source, - ); - - Ok(()) - } - Err(err) => { - warn!("rejected session settings update: {err}"); - Err(err) + let (mut updated, previous_cwd) = { + let state = self.state.lock().await; + match state.session_configuration.apply(&updates) { + Ok(updated) => (updated, state.session_configuration.cwd.clone()), + Err(err) => { + warn!("rejected session settings update: {err}"); + return Err(err); + } } + }; + + if previous_cwd != updated.cwd { + updated.refresh_project_doc_instructions_for_cwd().await; } + + let next_cwd = updated.cwd.clone(); + let codex_home = updated.codex_home.clone(); + let session_source = updated.session_source.clone(); + let mut state = self.state.lock().await; + state.session_configuration = updated; + drop(state); + + self.maybe_refresh_shell_snapshot_for_cwd( + &previous_cwd, + &next_cwd, + &codex_home, + &session_source, + ); + + Ok(()) } pub(crate) async fn new_turn_with_sub_id( @@ -2209,36 +2222,43 @@ impl Session { codex_home, session_source, ) = { - let mut state = self.state.lock().await; - match state.session_configuration.clone().apply(&updates) { - Ok(next) => { - let previous_cwd = state.session_configuration.cwd.clone(); - let sandbox_policy_changed = - state.session_configuration.sandbox_policy != next.sandbox_policy; - let codex_home = next.codex_home.clone(); - let session_source = next.session_source.clone(); - state.session_configuration = next.clone(); - ( - next, - sandbox_policy_changed, - previous_cwd, - codex_home, - session_source, - ) - } - Err(err) => { - drop(state); - self.send_event_raw(Event { - id: sub_id.clone(), - msg: EventMsg::Error(ErrorEvent { - message: err.to_string(), - codex_error_info: Some(CodexErrorInfo::BadRequest), - }), - }) - .await; - return Err(err); + let (mut next, previous_cwd, sandbox_policy_changed) = { + let state = self.state.lock().await; + match state.session_configuration.clone().apply(&updates) { + Ok(next) => { + let previous_cwd = state.session_configuration.cwd.clone(); + let sandbox_policy_changed = + state.session_configuration.sandbox_policy != next.sandbox_policy; + (next, previous_cwd, sandbox_policy_changed) + } + Err(err) => { + drop(state); + self.send_event_raw(Event { + id: sub_id.clone(), + msg: EventMsg::Error(ErrorEvent { + message: err.to_string(), + codex_error_info: Some(CodexErrorInfo::BadRequest), + }), + }) + .await; + return Err(err); + } } + }; + if previous_cwd != next.cwd { + next.refresh_project_doc_instructions_for_cwd().await; } + let codex_home = next.codex_home.clone(); + let session_source = next.session_source.clone(); + let mut state = self.state.lock().await; + state.session_configuration = next.clone(); + ( + next, + sandbox_policy_changed, + previous_cwd, + codex_home, + session_source, + ) }; self.maybe_refresh_shell_snapshot_for_cwd( @@ -2498,13 +2518,17 @@ impl Session { }; let shell = self.user_shell(); let exec_policy = self.services.exec_policy.current(); - crate::context_manager::updates::build_settings_update_items( - reference_context_item, - previous_turn_settings.as_ref(), - current_context, + let diff_context = TurnContextDiffParams::new( shell.as_ref(), + previous_turn_settings.as_ref(), exec_policy.as_ref(), self.features.enabled(Feature::Personality), + None, + ); + crate::context_manager::updates::build_settings_update_items( + reference_context_item, + current_context, + &diff_context, ) } @@ -2663,24 +2687,8 @@ impl Session { return; }; let text = format!("Approved command prefix saved:\n{prefixes}"); - let message: ResponseItem = DeveloperInstructions::new(text.clone()).into(); - - if let Some(turn_context) = self.turn_context_for_sub_id(sub_id).await { - self.record_conversation_items(&turn_context, std::slice::from_ref(&message)) - .await; - return; - } - - if self - .inject_response_items(vec![ResponseInputItem::Message { - role: "developer".to_string(), - content: vec![ContentItem::InputText { text }], - }]) - .await - .is_err() - { - warn!("no active turn found to record execpolicy amendment message for {sub_id}"); - } + self.record_or_inject_developer_text_for_sub_id(sub_id, text) + .await; } pub(crate) async fn persist_network_policy_amendment( @@ -2760,23 +2768,26 @@ impl Session { "{action} network rule saved in execpolicy ({list_name}): {}", amendment.host ); - let message: ResponseItem = DeveloperInstructions::new(text.clone()).into(); + self.record_or_inject_developer_text_for_sub_id(sub_id, text) + .await; + } + async fn record_or_inject_developer_text_for_sub_id(&self, sub_id: &str, text: String) { if let Some(turn_context) = self.turn_context_for_sub_id(sub_id).await { + let message = DeveloperTextFragment::new(text.clone()).into_message(); self.record_conversation_items(&turn_context, std::slice::from_ref(&message)) .await; return; } if self - .inject_response_items(vec![ResponseInputItem::Message { - role: "developer".to_string(), - content: vec![ContentItem::InputText { text }], - }]) + .inject_response_items(vec![ + DeveloperTextFragment::new(text).into_response_input_item(), + ]) .await .is_err() { - warn!("no active turn found to record network policy amendment message for {sub_id}"); + warn!("no active turn found to record amendment message for {sub_id}"); } } @@ -3356,128 +3367,111 @@ impl Session { &self, turn_context: &TurnContext, ) -> Vec { - let mut developer_sections = Vec::::with_capacity(8); - let mut contextual_user_sections = Vec::::with_capacity(2); - let shell = self.user_shell(); - let (reference_context_item, previous_turn_settings, collaboration_mode, base_instructions) = { + let mut developer_envelope = + crate::context_manager::updates::DeveloperEnvelopeBuilder::default(); + let mut contextual_user_envelope = + crate::context_manager::updates::ContextualUserEnvelopeBuilder::default(); + let (previous_turn_settings, base_instructions) = { let state = self.state.lock().await; ( - state.reference_context_item(), state.previous_turn_settings(), - state.session_configuration.collaboration_mode.clone(), state.session_configuration.base_instructions.clone(), ) }; - if let Some(model_switch_message) = - crate::context_manager::updates::build_model_instructions_update_item( - previous_turn_settings.as_ref(), - turn_context, - ) - { - developer_sections.push(model_switch_message.into_text()); - } - developer_sections.push( - DeveloperInstructions::from_policy( - turn_context.sandbox_policy.get(), - turn_context.approval_policy.value(), - self.services.exec_policy.current().as_ref(), - &turn_context.cwd, - turn_context - .features - .enabled(Feature::ExecPermissionApprovals), - turn_context - .features - .enabled(Feature::RequestPermissionsTool), - ) - .into_text(), - ); - if let Some(developer_instructions) = turn_context.developer_instructions.as_deref() { - developer_sections.push(developer_instructions.to_string()); - } - // Add developer instructions for memories. - if turn_context.features.enabled(Feature::MemoryTool) - && turn_context.config.memories.use_memories - && let Some(memory_prompt) = - build_memory_tool_developer_instructions(&turn_context.config.codex_home).await - { - developer_sections.push(memory_prompt); - } - // Add developer instructions from collaboration_mode if they exist and are non-empty - if let Some(collab_instructions) = - DeveloperInstructions::from_collaboration_mode(&collaboration_mode) - { - developer_sections.push(collab_instructions.into_text()); - } - if let Some(realtime_update) = crate::context_manager::updates::build_initial_realtime_item( - reference_context_item.as_ref(), + let shell = self.user_shell(); + let exec_policy = self.services.exec_policy.current(); + let diff_context = TurnContextDiffParams::new( + shell.as_ref(), previous_turn_settings.as_ref(), + exec_policy.as_ref(), + self.features.enabled(Feature::Personality), + Some(base_instructions.as_str()), + ); + for fragment in crate::model_visible_fragments::build_turn_state_fragments( + None, turn_context, + &diff_context, ) { - developer_sections.push(realtime_update.into_text()); - } - if self.features.enabled(Feature::Personality) - && let Some(personality) = turn_context.personality - { - let model_info = turn_context.model_info.clone(); - let has_baked_personality = model_info.supports_personality() - && base_instructions == model_info.get_model_instructions(Some(personality)); - if !has_baked_personality - && let Some(personality_message) = - crate::context_manager::updates::personality_message_for( - &model_info, - personality, - ) - { - developer_sections.push( - DeveloperInstructions::personality_spec_message(personality_message) - .into_text(), - ); + match fragment { + crate::model_visible_fragments::BuiltTurnStateFragment::Developer(fragment) => { + developer_envelope.push(fragment); + } + crate::model_visible_fragments::BuiltTurnStateFragment::ContextualUser( + fragment, + ) => { + contextual_user_envelope.push_fragment(fragment); + } } } - if turn_context.apps_enabled() { - developer_sections.push(render_apps_section()); - } - if turn_context.features.enabled(Feature::CodexGitCommit) - && let Some(commit_message_instruction) = commit_message_trailer_instruction( - turn_context.config.commit_attribution.as_deref(), - ) + let memory_prompt = if turn_context.features.enabled(Feature::MemoryTool) + && turn_context.config.memories.use_memories { - developer_sections.push(commit_message_instruction); + build_memory_tool_developer_instructions(&turn_context.config.codex_home).await + } else { + None + }; + for fragment in [ + memory_prompt.map(DeveloperTextFragment::new), + turn_context + .apps_enabled() + .then(render_apps_section) + .map(DeveloperTextFragment::new), + turn_context + .features + .enabled(Feature::CodexGitCommit) + .then(|| { + commit_message_trailer_instruction( + turn_context.config.commit_attribution.as_deref(), + ) + }) + .flatten() + .map(DeveloperTextFragment::new), + ] + .into_iter() + .flatten() + { + developer_envelope.push(fragment); } - if let Some(user_instructions) = turn_context.user_instructions.as_deref() { - contextual_user_sections.push( - UserInstructions { - text: user_instructions.to_string(), - directory: turn_context.cwd.to_string_lossy().into_owned(), - } - .serialize_to_text(), - ); + let loaded_plugins = self + .services + .plugins_manager + .plugins_for_config(&turn_context.config); + if let Some(fragment) = render_plugin_instructions(loaded_plugins.capability_summaries()) { + contextual_user_envelope.push_fragment(fragment); } let subagents = self .services .agent_control .format_environment_context_subagents(self.conversation_id) .await; - contextual_user_sections.push( - EnvironmentContext::from_turn_context(turn_context, shell.as_ref()) - .with_subagents(subagents) - .serialize_to_xml(), - ); + if let Some(subagent_roster) = + crate::model_visible_fragments::SubagentRosterContext::new(subagents) + { + developer_envelope.push(subagent_roster); + } let mut items = Vec::with_capacity(2); - if let Some(developer_message) = - crate::context_manager::updates::build_developer_update_item(developer_sections) - { + if let Some(developer_message) = developer_envelope.build() { items.push(developer_message); } - if let Some(contextual_user_message) = - crate::context_manager::updates::build_contextual_user_message(contextual_user_sections) - { - items.push(contextual_user_message); + if let Some(model_visible_context) = contextual_user_envelope.build() { + items.push(model_visible_context); } items } + /// Build full initial context with no diff baseline. + /// + /// This is used by compaction replacement-history rebuilds, where we must + /// reinsert canonical current context regardless of what persisted + /// `reference_context_item` says. + pub(crate) async fn build_initial_context_without_reference_context_item( + &self, + turn_context: &TurnContext, + ) -> Vec { + self.build_initial_context(turn_context).await + } + pub(crate) async fn persist_rollout_items(&self, items: &[RolloutItem]) { let recorder = { let guard = self.services.rollout.lock().await; @@ -5244,6 +5238,7 @@ async fn spawn_review_thread( app_server_client_name: parent_turn_context.app_server_client_name.clone(), developer_instructions: None, user_instructions: None, + project_doc_instructions: parent_turn_context.project_doc_instructions.clone(), compact_prompt: parent_turn_context.compact_prompt.clone(), collaboration_mode: parent_turn_context.collaboration_mode.clone(), personality: parent_turn_context.personality, @@ -5613,8 +5608,8 @@ pub(crate) async fn run_turn( break; } if let Some(additional_context) = session_start_outcome.additional_context { - let developer_message: ResponseItem = - DeveloperInstructions::new(additional_context).into(); + let developer_message = + DeveloperTextFragment::new(additional_context).into_message(); sess.record_conversation_items( &turn_context, std::slice::from_ref(&developer_message), @@ -5660,7 +5655,8 @@ pub(crate) async fn run_turn( .for_prompt(&turn_context.model_info.input_modalities) }; if let Some(stop_hook_message) = pending_stop_hook_message.take() { - sampling_request_input.push(DeveloperInstructions::new(stop_hook_message).into()); + sampling_request_input + .push(DeveloperTextFragment::new(stop_hook_message).into_message()); } let sampling_request_input_messages = sampling_request_input diff --git a/codex-rs/core/src/codex_tests.rs b/codex-rs/core/src/codex_tests.rs index a1d88ff3dd..268972a8cd 100644 --- a/codex-rs/core/src/codex_tests.rs +++ b/codex-rs/core/src/codex_tests.rs @@ -10,6 +10,8 @@ use crate::config_loader::Sourced; use crate::exec::ExecToolCallOutput; use crate::function_tool::FunctionCallError; use crate::mcp_connection_manager::ToolInfo; +use crate::model_visible_context::DeveloperTextFragment; +use crate::model_visible_context::ModelVisibleContextFragment; use crate::models_manager::model_info; use crate::shell::default_user_shell; use crate::tools::format_exec_output_str; @@ -59,9 +61,9 @@ use codex_app_server_protocol::AppInfo; use codex_otel::TelemetryAuthMode; use codex_protocol::models::BaseInstructions; use codex_protocol::models::ContentItem; -use codex_protocol::models::DeveloperInstructions; use codex_protocol::models::ResponseInputItem; use codex_protocol::models::ResponseItem; +use codex_protocol::models::developer_personality_spec_text; use codex_protocol::openai_models::ModelsResponse; use codex_protocol::protocol::ConversationAudioParams; use codex_protocol::protocol::RealtimeAudioFrame; @@ -155,6 +157,15 @@ fn developer_input_texts(items: &[ResponseItem]) -> Vec<&str> { .collect() } +fn default_image_save_developer_message_text() -> String { + let image_output_dir = crate::stream_events_utils::default_image_generation_output_dir(); + format!( + "Generated images are saved to {} as {} by default.", + image_output_dir.display(), + image_output_dir.join(".png").display(), + ) +} + fn test_tool_runtime(session: Arc, turn_context: Arc) -> ToolCallRuntime { let router = Arc::new(ToolRouter::from_config( &turn_context.tools_config, @@ -941,6 +952,7 @@ async fn record_initial_history_forked_hydrates_previous_turn_settings() { realtime_active: Some(turn_context.realtime_active), effort: turn_context.reasoning_effort, summary: turn_context.reasoning_summary, + project_doc_instructions: None, user_instructions: None, developer_instructions: None, final_output_json_schema: None, @@ -1366,6 +1378,7 @@ async fn set_rate_limits_retains_previous_credits() { collaboration_mode, model_reasoning_summary: config.model_reasoning_summary, developer_instructions: config.developer_instructions.clone(), + project_doc_instructions: None, user_instructions: config.user_instructions.clone(), service_tier: None, personality: config.personality, @@ -1462,6 +1475,7 @@ async fn set_rate_limits_updates_plan_type_when_present() { collaboration_mode, model_reasoning_summary: config.model_reasoning_summary, developer_instructions: config.developer_instructions.clone(), + project_doc_instructions: None, user_instructions: config.user_instructions.clone(), service_tier: None, personality: config.personality, @@ -1816,6 +1830,7 @@ pub(crate) async fn make_session_configuration_for_tests() -> SessionConfigurati collaboration_mode, model_reasoning_summary: config.model_reasoning_summary, developer_instructions: config.developer_instructions.clone(), + project_doc_instructions: None, user_instructions: config.user_instructions.clone(), service_tier: None, personality: config.personality, @@ -1969,6 +1984,7 @@ async fn session_new_fails_when_zsh_fork_enabled_without_zsh_path() { collaboration_mode, model_reasoning_summary: config.model_reasoning_summary, developer_instructions: config.developer_instructions.clone(), + project_doc_instructions: None, user_instructions: config.user_instructions.clone(), service_tier: None, personality: config.personality, @@ -2062,6 +2078,7 @@ pub(crate) async fn make_session_and_context() -> (Session, TurnContext) { collaboration_mode, model_reasoning_summary: config.model_reasoning_summary, developer_instructions: config.developer_instructions.clone(), + project_doc_instructions: None, user_instructions: config.user_instructions.clone(), service_tier: None, personality: config.personality, @@ -2704,6 +2721,7 @@ pub(crate) async fn make_session_and_context_with_dynamic_tools_and_rx( collaboration_mode, model_reasoning_summary: config.model_reasoning_summary, developer_instructions: config.developer_instructions.clone(), + project_doc_instructions: None, user_instructions: config.user_instructions.clone(), service_tier: None, personality: config.personality, @@ -3126,7 +3144,7 @@ async fn build_settings_update_items_uses_previous_turn_settings_for_realtime_en } #[tokio::test] -async fn build_initial_context_uses_previous_realtime_state() { +async fn build_settings_update_items_omits_duplicate_realtime_start_when_baseline_exists() { let (session, mut turn_context) = make_session_and_context().await; turn_context.realtime_active = true; @@ -3144,7 +3162,12 @@ async fn build_initial_context_uses_previous_realtime_state() { let mut state = session.state.lock().await; state.set_reference_context_item(Some(previous_context_item)); } - let resumed_context = session.build_initial_context(&turn_context).await; + let resumed_context = session + .build_settings_update_items( + session.reference_context_item().await.as_ref(), + &turn_context, + ) + .await; let resumed_developer_texts = developer_input_texts(&resumed_context); assert!( !resumed_developer_texts @@ -3154,6 +3177,79 @@ async fn build_initial_context_uses_previous_realtime_state() { ); } +#[tokio::test] +async fn build_settings_update_items_emits_collaboration_mode_when_legacy_baseline_lacks_mode() { + let (session, previous_context) = make_session_and_context().await; + let mut current_context = previous_context + .with_model( + previous_context.model_info.slug.clone(), + &session.services.models_manager, + ) + .await; + current_context + .collaboration_mode + .settings + .developer_instructions = Some("legacy baseline collaboration instructions".to_string()); + + let mut previous_context_item = previous_context.to_turn_context_item(); + previous_context_item.collaboration_mode = None; + + let update_items = session + .build_settings_update_items(Some(&previous_context_item), ¤t_context) + .await; + + let developer_texts = developer_input_texts(&update_items); + assert!( + developer_texts.iter().any(|text| { + text.contains("") + && text.contains("legacy baseline collaboration instructions") + }), + "expected collaboration mode update from legacy baseline without persisted mode, got {developer_texts:?}" + ); +} + +#[tokio::test] +async fn build_settings_update_items_emits_agents_reset_when_project_doc_disappears() { + let (session, mut previous_context) = make_session_and_context().await; + previous_context.project_doc_instructions = Some("old agents body".to_string()); + let mut current_context = previous_context + .with_model( + previous_context.model_info.slug.clone(), + &session.services.models_manager, + ) + .await; + current_context.cwd = PathBuf::from("/tmp/without-agents"); + current_context.project_doc_instructions = None; + + let update_items = session + .build_settings_update_items( + Some(&previous_context.to_turn_context_item()), + ¤t_context, + ) + .await; + + let agents_reset = update_items + .iter() + .find_map(|item| match item { + ResponseItem::Message { role, content, .. } if role == "user" => { + content.iter().find_map(|entry| match entry { + ContentItem::InputText { text } => text + .starts_with("# AGENTS.md instructions for ") + .then_some(text.as_str()), + _ => None, + }) + } + _ => None, + }) + .expect("expected AGENTS reset item"); + assert!( + agents_reset.starts_with( + "# AGENTS.md instructions for /tmp/without-agents\n\n\n\n" + ), + "expected AGENTS reset fragment, got {agents_reset:?}" + ); +} + #[tokio::test] async fn build_initial_context_omits_default_image_save_location_with_image_history() { let (session, turn_context) = make_session_and_context().await; @@ -3195,12 +3291,13 @@ async fn build_initial_context_omits_default_image_save_location_without_image_h } #[tokio::test] -async fn handle_output_item_done_records_image_save_history_message() { +async fn handle_output_item_done_records_image_save_message_after_successful_save() { let (session, turn_context) = make_session_and_context().await; let session = Arc::new(session); let turn_context = Arc::new(turn_context); let call_id = "ig_history_records_message"; - let expected_saved_path = std::env::temp_dir().join(format!("{call_id}.png")); + let expected_saved_path = crate::stream_events_utils::default_image_generation_output_dir() + .join(format!("{call_id}.png")); let _ = std::fs::remove_file(&expected_saved_path); let item = ResponseItem::ImageGenerationCall { id: call_id.to_string(), @@ -3220,13 +3317,9 @@ async fn handle_output_item_done_records_image_save_history_message() { .expect("image generation item should succeed"); let history = session.clone_history().await; - let save_message: ResponseItem = DeveloperInstructions::new(format!( - "Generated images are saved to {} as {} by default.", - std::env::temp_dir().display(), - std::env::temp_dir().join(".png").display(), - )) - .into(); - assert_eq!(history.raw_items(), &[save_message, item]); + let expected_message: ResponseItem = + DeveloperTextFragment::new(default_image_save_developer_message_text()).into_message(); + assert_eq!(history.raw_items(), &[expected_message, item]); assert_eq!( std::fs::read(&expected_saved_path).expect("saved file"), b"foo" @@ -3240,7 +3333,8 @@ async fn handle_output_item_done_skips_image_save_message_when_save_fails() { let session = Arc::new(session); let turn_context = Arc::new(turn_context); let call_id = "ig_history_no_message"; - let expected_saved_path = std::env::temp_dir().join(format!("{call_id}.png")); + let expected_saved_path = crate::stream_events_utils::default_image_generation_output_dir() + .join(format!("{call_id}.png")); let _ = std::fs::remove_file(&expected_saved_path); let item = ResponseItem::ImageGenerationCall { id: call_id.to_string(), @@ -3937,7 +4031,7 @@ async fn abort_review_task_emits_exited_then_aborted_and_records_history() { let ContentItem::InputText { text } = content_item else { return false; }; - text.contains(crate::contextual_user_message::TURN_ABORTED_OPEN_TAG) + text.contains(crate::model_visible_context::TURN_ABORTED_OPEN_TAG) }) }), "expected a model-visible turn aborted marker in history after interrupt" @@ -4032,7 +4126,8 @@ async fn sample_rollout( .as_ref() .and_then(|m| m.get_personality_message(Some(p)).filter(|s| !s.is_empty())) { - let msg = DeveloperInstructions::personality_spec_message(personality_message).into(); + let msg = DeveloperTextFragment::new(developer_personality_spec_text(personality_message)) + .into_message(); let insert_at = initial_context .iter() .position(|m| matches!(m, ResponseItem::Message { role, .. } if role == "developer")) @@ -4208,10 +4303,6 @@ async fn rejects_escalated_permissions_when_policy_not_on_request() { network: None, sandbox_permissions, windows_sandbox_level: turn_context.windows_sandbox_level, - windows_sandbox_private_desktop: turn_context - .config - .permissions - .windows_sandbox_private_desktop, justification: Some("test".to_string()), arg0: None, }; @@ -4224,10 +4315,6 @@ async fn rejects_escalated_permissions_when_policy_not_on_request() { env: HashMap::new(), network: None, windows_sandbox_level: turn_context.windows_sandbox_level, - windows_sandbox_private_desktop: turn_context - .config - .permissions - .windows_sandbox_private_desktop, justification: params.justification.clone(), arg0: None, }; diff --git a/codex-rs/core/src/codex_thread.rs b/codex-rs/core/src/codex_thread.rs index 7a243f9f74..581f42886c 100644 --- a/codex-rs/core/src/codex_thread.rs +++ b/codex-rs/core/src/codex_thread.rs @@ -6,12 +6,12 @@ use crate::error::CodexErr; use crate::error::Result as CodexResult; use crate::features::Feature; use crate::file_watcher::WatchRegistration; +use crate::model_visible_context::ModelVisibleContextFragment; use crate::protocol::Event; use crate::protocol::Op; use crate::protocol::Submission; use codex_protocol::config_types::Personality; use codex_protocol::config_types::ServiceTier; -use codex_protocol::models::ContentItem; use codex_protocol::models::ResponseInputItem; use codex_protocol::models::ResponseItem; use codex_protocol::openai_models::ReasoningEffort; @@ -25,6 +25,11 @@ use std::path::PathBuf; use tokio::sync::Mutex; use tokio::sync::watch; +#[cfg(test)] +use codex_protocol::models::ContentItem; +#[cfg(test)] +use codex_protocol::models::MessageRole; + use crate::state_db::StateDbHandle; #[derive(Clone, Debug)] @@ -118,17 +123,30 @@ impl CodexThread { self.codex.session.total_token_usage().await } - /// Records a user-role session-prefix message without creating a new user turn boundary. - pub(crate) async fn inject_user_message_without_turn(&self, message: String) { - let pending_item = ResponseInputItem::Message { - role: "user".to_string(), + pub(crate) async fn inject_model_visible_fragment_without_turn( + &self, + fragment: impl ModelVisibleContextFragment, + ) { + // Runtime/session-prefix path: inject one typed model-visible fragment + // without opening a real user turn boundary. + self.inject_response_input_item_without_turn(fragment.into_response_input_item()) + .await; + } + + #[cfg(test)] + pub(crate) async fn inject_message_without_turn(&self, role: MessageRole, message: String) { + self.inject_response_input_item_without_turn(ResponseInputItem::Message { + role: role.to_string(), content: vec![ContentItem::InputText { text: message }], - }; - let pending_items = vec![pending_item]; + }) + .await; + } + + async fn inject_response_input_item_without_turn(&self, pending_item: ResponseInputItem) { let Err(items_without_active_turn) = self .codex .session - .inject_response_items(pending_items) + .inject_response_items(vec![pending_item]) .await else { return; diff --git a/codex-rs/core/src/compact.rs b/codex-rs/core/src/compact.rs index 4900051938..d07070d726 100644 --- a/codex-rs/core/src/compact.rs +++ b/codex-rs/core/src/compact.rs @@ -200,7 +200,9 @@ async fn run_compact_task_inner( initial_context_injection, InitialContextInjection::BeforeLastUserMessage ) { - let initial_context = sess.build_initial_context(turn_context.as_ref()).await; + let initial_context = sess + .build_initial_context_without_reference_context_item(turn_context.as_ref()) + .await; new_history = insert_initial_context_before_last_real_user_or_summary(new_history, initial_context); } diff --git a/codex-rs/core/src/compact_remote.rs b/codex-rs/core/src/compact_remote.rs index 718166cb08..f3dfe8aae6 100644 --- a/codex-rs/core/src/compact_remote.rs +++ b/codex-rs/core/src/compact_remote.rs @@ -178,7 +178,8 @@ pub(crate) async fn process_compacted_history( initial_context_injection, InitialContextInjection::BeforeLastUserMessage ) { - sess.build_initial_context(turn_context).await + sess.build_initial_context_without_reference_context_item(turn_context) + .await } else { Vec::new() }; diff --git a/codex-rs/core/src/context_manager/history_tests.rs b/codex-rs/core/src/context_manager/history_tests.rs index 066272748f..7a724fe510 100644 --- a/codex-rs/core/src/context_manager/history_tests.rs +++ b/codex-rs/core/src/context_manager/history_tests.rs @@ -70,6 +70,18 @@ fn user_input_text_msg(text: &str) -> ResponseItem { } } +fn developer_input_text_msg(text: &str) -> ResponseItem { + ResponseItem::Message { + id: None, + role: "developer".to_string(), + content: vec![ContentItem::InputText { + text: text.to_string(), + }], + end_turn: None, + phase: None, + } +} + fn custom_tool_call_output(call_id: &str, output: &str) -> ResponseItem { ResponseItem::CustomToolCallOutput { call_id: call_id.to_string(), @@ -705,7 +717,7 @@ fn drop_last_n_user_turns_ignores_session_prefix_user_messages() { "\ndemo\nskills/demo/SKILL.md\nbody\n", ), user_input_text_msg("echo 42"), - user_input_text_msg( + developer_input_text_msg( "{\"agent_id\":\"a\",\"status\":\"completed\"}", ), user_input_text_msg("turn 1 user"), @@ -727,7 +739,7 @@ fn drop_last_n_user_turns_ignores_session_prefix_user_messages() { "\ndemo\nskills/demo/SKILL.md\nbody\n", ), user_input_text_msg("echo 42"), - user_input_text_msg( + developer_input_text_msg( "{\"agent_id\":\"a\",\"status\":\"completed\"}", ), user_input_text_msg("turn 1 user"), @@ -748,7 +760,7 @@ fn drop_last_n_user_turns_ignores_session_prefix_user_messages() { "\ndemo\nskills/demo/SKILL.md\nbody\n", ), user_input_text_msg("echo 42"), - user_input_text_msg( + developer_input_text_msg( "{\"agent_id\":\"a\",\"status\":\"completed\"}", ), ]; @@ -762,7 +774,7 @@ fn drop_last_n_user_turns_ignores_session_prefix_user_messages() { "\ndemo\nskills/demo/SKILL.md\nbody\n", ), user_input_text_msg("echo 42"), - user_input_text_msg( + developer_input_text_msg( "{\"agent_id\":\"a\",\"status\":\"completed\"}", ), user_input_text_msg("turn 1 user"), @@ -782,7 +794,7 @@ fn drop_last_n_user_turns_ignores_session_prefix_user_messages() { "\ndemo\nskills/demo/SKILL.md\nbody\n", ), user_input_text_msg("echo 42"), - user_input_text_msg( + developer_input_text_msg( "{\"agent_id\":\"a\",\"status\":\"completed\"}", ), user_input_text_msg("turn 1 user"), diff --git a/codex-rs/core/src/context_manager/updates.rs b/codex-rs/core/src/context_manager/updates.rs index 031cfbe1fc..472efb0eab 100644 --- a/codex-rs/core/src/context_manager/updates.rs +++ b/codex-rs/core/src/context_manager/updates.rs @@ -1,183 +1,101 @@ -use crate::codex::PreviousTurnSettings; use crate::codex::TurnContext; -use crate::environment_context::EnvironmentContext; -use crate::features::Feature; -use crate::shell::Shell; -use codex_execpolicy::Policy; -use codex_protocol::config_types::Personality; +use crate::model_visible_context::ContextualUserContextRole; +use crate::model_visible_context::DeveloperContextRole; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::ModelVisibleContextRole; +use crate::model_visible_context::TurnContextDiffParams; +use crate::model_visible_fragments::BuiltTurnStateFragment; use codex_protocol::models::ContentItem; -use codex_protocol::models::DeveloperInstructions; use codex_protocol::models::ResponseItem; -use codex_protocol::openai_models::ModelInfo; use codex_protocol::protocol::TurnContextItem; +use std::marker::PhantomData; -fn build_environment_update_item( - previous: Option<&TurnContextItem>, - next: &TurnContext, - shell: &Shell, -) -> Option { - let prev = previous?; - let prev_context = EnvironmentContext::from_turn_context_item(prev, shell); - let next_context = EnvironmentContext::from_turn_context(next, shell); - if prev_context.equals_except_shell(&next_context) { - return None; - } +// Adjacent ContentItems in a single message are effectively concatenated in +// the model-visible token stream, so we inject an explicit separator between +// text fragments to preserve boundaries. +const MODEL_VISIBLE_FRAGMENT_SEPARATOR: &str = "\n\n"; - Some(ResponseItem::from( - EnvironmentContext::diff_from_turn_context_item(prev, next, shell), - )) +struct ModelVisibleContextEnvelopeBuilder { + content: Vec, + role: PhantomData, } -fn build_permissions_update_item( - previous: Option<&TurnContextItem>, - next: &TurnContext, - exec_policy: &Policy, -) -> Option { - let prev = previous?; - if prev.sandbox_policy == *next.sandbox_policy.get() - && prev.approval_policy == next.approval_policy.value() - { - return None; +impl ModelVisibleContextEnvelopeBuilder { + fn new() -> Self { + Self { + content: Vec::new(), + role: PhantomData, + } } - Some(DeveloperInstructions::from_policy( - next.sandbox_policy.get(), - next.approval_policy.value(), - exec_policy, - &next.cwd, - next.features.enabled(Feature::ExecPermissionApprovals), - next.features.enabled(Feature::RequestPermissionsTool), - )) -} + fn push_fragment(&mut self, fragment: impl ModelVisibleContextFragment) { + if let Some(ContentItem::InputText { text }) = self.content.last_mut() + && !text.ends_with(MODEL_VISIBLE_FRAGMENT_SEPARATOR) + { + text.push_str(MODEL_VISIBLE_FRAGMENT_SEPARATOR); + } + self.content.push(fragment.into_content_item()); + } -fn build_collaboration_mode_update_item( - previous: Option<&TurnContextItem>, - next: &TurnContext, -) -> Option { - let prev = previous?; - if prev.collaboration_mode.as_ref() != Some(&next.collaboration_mode) { - // If the next mode has empty developer instructions, this returns None and we emit no - // update, so prior collaboration instructions remain in the prompt history. - Some(DeveloperInstructions::from_collaboration_mode( - &next.collaboration_mode, - )?) - } else { - None + fn build(self) -> Option { + build_message::(self.content) } } -pub(crate) fn build_realtime_update_item( - previous: Option<&TurnContextItem>, - previous_turn_settings: Option<&PreviousTurnSettings>, - next: &TurnContext, -) -> Option { - match ( - previous.and_then(|item| item.realtime_active), - next.realtime_active, +pub(crate) struct DeveloperEnvelopeBuilder( + ModelVisibleContextEnvelopeBuilder, +); + +impl Default for DeveloperEnvelopeBuilder { + fn default() -> Self { + Self(ModelVisibleContextEnvelopeBuilder::new()) + } +} + +impl DeveloperEnvelopeBuilder { + pub(crate) fn push( + &mut self, + fragment: impl ModelVisibleContextFragment, ) { - (Some(true), false) => Some(DeveloperInstructions::realtime_end_message("inactive")), - (Some(false), true) | (None, true) => Some( - if let Some(instructions) = next - .config - .experimental_realtime_start_instructions - .as_deref() - { - DeveloperInstructions::realtime_start_message_with_instructions(instructions) - } else { - DeveloperInstructions::realtime_start_message() - }, - ), - (Some(true), true) | (Some(false), false) => None, - (None, false) => previous_turn_settings - .and_then(|settings| settings.realtime_active) - .filter(|realtime_active| *realtime_active) - .map(|_| DeveloperInstructions::realtime_end_message("inactive")), + self.0.push_fragment(fragment); + } + + pub(crate) fn build(self) -> Option { + self.0.build() } } -pub(crate) fn build_initial_realtime_item( - previous: Option<&TurnContextItem>, - previous_turn_settings: Option<&PreviousTurnSettings>, - next: &TurnContext, -) -> Option { - build_realtime_update_item(previous, previous_turn_settings, next) +pub(crate) struct ContextualUserEnvelopeBuilder( + ModelVisibleContextEnvelopeBuilder, +); + +impl Default for ContextualUserEnvelopeBuilder { + fn default() -> Self { + Self(ModelVisibleContextEnvelopeBuilder::new()) + } } -fn build_personality_update_item( - previous: Option<&TurnContextItem>, - next: &TurnContext, - personality_feature_enabled: bool, -) -> Option { - if !personality_feature_enabled { +impl ContextualUserEnvelopeBuilder { + pub(crate) fn push_fragment( + &mut self, + fragment: impl ModelVisibleContextFragment, + ) { + self.0.push_fragment(fragment); + } + + pub(crate) fn build(self) -> Option { + self.0.build() + } +} + +fn build_message(content: Vec) -> Option { + if content.is_empty() { return None; } - let previous = previous?; - if next.model_info.slug != previous.model { - return None; - } - - if let Some(personality) = next.personality - && next.personality != previous.personality - { - let model_info = &next.model_info; - let personality_message = personality_message_for(model_info, personality); - personality_message.map(DeveloperInstructions::personality_spec_message) - } else { - None - } -} - -pub(crate) fn personality_message_for( - model_info: &ModelInfo, - personality: Personality, -) -> Option { - model_info - .model_messages - .as_ref() - .and_then(|spec| spec.get_personality_message(Some(personality))) - .filter(|message| !message.is_empty()) -} - -pub(crate) fn build_model_instructions_update_item( - previous_turn_settings: Option<&PreviousTurnSettings>, - next: &TurnContext, -) -> Option { - let previous_turn_settings = previous_turn_settings?; - if previous_turn_settings.model == next.model_info.slug { - return None; - } - - let model_instructions = next.model_info.get_model_instructions(next.personality); - if model_instructions.is_empty() { - return None; - } - - Some(DeveloperInstructions::model_switch_message( - model_instructions, - )) -} - -pub(crate) fn build_developer_update_item(text_sections: Vec) -> Option { - build_text_message("developer", text_sections) -} - -pub(crate) fn build_contextual_user_message(text_sections: Vec) -> Option { - build_text_message("user", text_sections) -} - -fn build_text_message(role: &str, text_sections: Vec) -> Option { - if text_sections.is_empty() { - return None; - } - - let content = text_sections - .into_iter() - .map(|text| ContentItem::InputText { text }) - .collect(); Some(ResponseItem::Message { id: None, - role: role.to_string(), + role: R::MESSAGE_ROLE.to_string(), content, end_turn: None, phase: None, @@ -186,33 +104,120 @@ fn build_text_message(role: &str, text_sections: Vec) -> Option, - previous_turn_settings: Option<&PreviousTurnSettings>, next: &TurnContext, - shell: &Shell, - exec_policy: &Policy, - personality_feature_enabled: bool, + params: &TurnContextDiffParams<'_>, ) -> Vec { - let contextual_user_message = build_environment_update_item(previous, next, shell); - let developer_update_sections = [ - // Keep model-switch instructions first so model-specific guidance is read before - // any other context diffs on this turn. - build_model_instructions_update_item(previous_turn_settings, next), - build_permissions_update_item(previous, next, exec_policy), - build_collaboration_mode_update_item(previous, next), - build_realtime_update_item(previous, previous_turn_settings, next), - build_personality_update_item(previous, next, personality_feature_enabled), - ] - .into_iter() - .flatten() - .map(DeveloperInstructions::into_text) - .collect(); + let mut developer_envelope = DeveloperEnvelopeBuilder::default(); + let mut contextual_user_envelope = ContextualUserEnvelopeBuilder::default(); + + for fragment in + crate::model_visible_fragments::build_turn_state_fragments(previous, next, params) + { + match fragment { + BuiltTurnStateFragment::Developer(fragment) => developer_envelope.push(fragment), + BuiltTurnStateFragment::ContextualUser(fragment) => { + contextual_user_envelope.push_fragment(fragment); + } + } + } let mut items = Vec::with_capacity(2); - if let Some(developer_message) = build_developer_update_item(developer_update_sections) { + if let Some(developer_message) = developer_envelope.build() { items.push(developer_message); } - if let Some(contextual_user_message) = contextual_user_message { - items.push(contextual_user_message); + if let Some(model_visible_context) = contextual_user_envelope.build() { + items.push(model_visible_context); } items } + +#[cfg(test)] +mod tests { + use super::*; + use crate::model_visible_context::ContextualUserContextRole; + use crate::model_visible_context::DeveloperContextRole; + use crate::model_visible_context::DeveloperTextFragment; + use codex_protocol::models::ContentItem; + use pretty_assertions::assert_eq; + + #[test] + fn developer_envelope_builder_emits_one_message_in_order() { + let mut builder = DeveloperEnvelopeBuilder::default(); + builder.push(DeveloperTextFragment::new("first")); + builder.push(DeveloperTextFragment::new("second")); + + let item = builder.build().expect("developer message expected"); + let ResponseItem::Message { role, content, .. } = item else { + panic!("expected message"); + }; + + assert_eq!(role, "developer"); + assert_eq!( + content, + vec![ + ContentItem::InputText { + text: "first\n\n".to_string() + }, + ContentItem::InputText { + text: "second".to_string() + }, + ] + ); + } + + #[derive(Clone, Copy)] + struct FakeFragment { + text: &'static str, + } + + impl ModelVisibleContextFragment for FakeFragment { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { + self.text.to_string() + } + } + + #[test] + fn contextual_user_envelope_builder_emits_one_message_in_order() { + let mut builder = ContextualUserEnvelopeBuilder::default(); + builder.push_fragment(FakeFragment { text: "alpha" }); + builder.push_fragment(FakeFragment { text: "beta" }); + + let item = builder.build().expect("user message expected"); + let ResponseItem::Message { role, content, .. } = item else { + panic!("expected message"); + }; + + assert_eq!(role, "user"); + assert_eq!( + content, + vec![ + ContentItem::InputText { + text: "alpha\n\n".to_string() + }, + ContentItem::InputText { + text: "beta".to_string() + }, + ] + ); + } + + #[test] + fn empty_envelope_builders_return_none() { + assert!(DeveloperEnvelopeBuilder::default().build().is_none()); + assert!(ContextualUserEnvelopeBuilder::default().build().is_none()); + } + + #[test] + fn build_message_sets_role() { + let item = build_message::(vec![ContentItem::InputText { + text: "body".to_string(), + }]) + .expect("message expected"); + let ResponseItem::Message { role, .. } = item else { + panic!("expected message"); + }; + assert_eq!(role, "developer"); + } +} diff --git a/codex-rs/core/src/context_manager/updates/developer_update_fragments.rs b/codex-rs/core/src/context_manager/updates/developer_update_fragments.rs new file mode 100644 index 0000000000..1ecb1ac95a --- /dev/null +++ b/codex-rs/core/src/context_manager/updates/developer_update_fragments.rs @@ -0,0 +1,289 @@ +//! Developer-envelope model-visible fragments used by turn-state context +//! assembly. +//! +//! This module owns the turn-context diffing logic for developer-role context +//! updates (permissions, collaboration mode, realtime, personality, and model +//! switch guidance). + +use crate::codex::TurnContext; +use crate::features::Feature; +use crate::model_visible_context::DeveloperContextRole; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::TurnContextDiffFragment; +use crate::model_visible_context::TurnContextDiffParams; +use codex_protocol::models::developer_collaboration_mode_text; +use codex_protocol::models::developer_model_switch_text; +use codex_protocol::models::developer_permissions_text; +use codex_protocol::models::developer_personality_spec_text; +use codex_protocol::models::developer_realtime_end_text; +use codex_protocol::models::developer_realtime_start_text; +use codex_protocol::protocol::TurnContextItem; + +// --------------------------------------------------------------------------- +// Model instructions fragment +// --------------------------------------------------------------------------- + +pub(super) struct ModelInstructionsUpdateFragment { + text: String, +} + +impl ModelVisibleContextFragment for ModelInstructionsUpdateFragment { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +impl TurnContextDiffFragment for ModelInstructionsUpdateFragment { + fn build( + turn_context: &TurnContext, + _reference_context_item: Option<&TurnContextItem>, + params: &TurnContextDiffParams<'_>, + ) -> Option { + let previous_model = params + .previous_turn_settings + .map(|settings| settings.model.as_str()); + let previous_model = previous_model?; + if previous_model == turn_context.model_info.slug.as_str() { + return None; + } + + let model_instructions = turn_context + .model_info + .get_model_instructions(turn_context.personality); + if model_instructions.is_empty() { + return None; + } + + Some(Self { + text: developer_model_switch_text(model_instructions), + }) + } +} + +// --------------------------------------------------------------------------- +// Permissions fragment +// --------------------------------------------------------------------------- + +pub(super) struct PermissionsUpdateFragment { + text: String, +} + +impl ModelVisibleContextFragment for PermissionsUpdateFragment { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +impl TurnContextDiffFragment for PermissionsUpdateFragment { + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + params: &TurnContextDiffParams<'_>, + ) -> Option { + if reference_context_item.is_some_and(|previous| { + previous.sandbox_policy == *turn_context.sandbox_policy.get() + && previous.approval_policy == turn_context.approval_policy.value() + }) { + return None; + } + + Some(Self { + text: developer_permissions_text( + turn_context.sandbox_policy.get(), + turn_context.approval_policy.value(), + turn_context.features.enabled(Feature::GuardianApproval), + params.exec_policy, + &turn_context.cwd, + turn_context + .features + .enabled(Feature::RequestPermissionsTool), + ), + }) + } +} + +// --------------------------------------------------------------------------- +// Custom developer instructions fragment +// --------------------------------------------------------------------------- + +pub(super) struct CustomDeveloperInstructionsUpdateFragment { + text: String, +} + +impl ModelVisibleContextFragment for CustomDeveloperInstructionsUpdateFragment { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +impl TurnContextDiffFragment for CustomDeveloperInstructionsUpdateFragment { + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + _params: &TurnContextDiffParams<'_>, + ) -> Option { + if reference_context_item.is_some_and(|previous| { + previous.developer_instructions == turn_context.developer_instructions + }) { + return None; + } + + Some(Self { + text: turn_context.developer_instructions.as_ref()?.clone(), + }) + } +} + +// --------------------------------------------------------------------------- +// Collaboration mode fragment +// --------------------------------------------------------------------------- + +pub(super) struct CollaborationModeUpdateFragment { + text: String, +} + +impl ModelVisibleContextFragment for CollaborationModeUpdateFragment { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +impl TurnContextDiffFragment for CollaborationModeUpdateFragment { + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + _params: &TurnContextDiffParams<'_>, + ) -> Option { + if let Some(previous) = reference_context_item { + if previous.collaboration_mode.as_ref() != Some(&turn_context.collaboration_mode) { + // If the next mode has empty developer instructions, this returns None and we emit no + // update, so prior collaboration instructions remain in the prompt history. + return Some(Self { + text: developer_collaboration_mode_text(&turn_context.collaboration_mode)?, + }); + } + return None; + } + + developer_collaboration_mode_text(&turn_context.collaboration_mode) + .map(|text| Self { text }) + } +} + +// --------------------------------------------------------------------------- +// Realtime fragment +// --------------------------------------------------------------------------- + +pub(super) struct RealtimeUpdateFragment { + text: String, +} + +impl ModelVisibleContextFragment for RealtimeUpdateFragment { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +impl TurnContextDiffFragment for RealtimeUpdateFragment { + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + params: &TurnContextDiffParams<'_>, + ) -> Option { + let text = match ( + reference_context_item.and_then(|previous| previous.realtime_active), + turn_context.realtime_active, + ) { + (Some(true), false) => Some(developer_realtime_end_text("inactive")), + (Some(false), true) | (None, true) => Some(developer_realtime_start_text()), + (Some(true), true) | (Some(false), false) => None, + (None, false) => params + .previous_turn_settings + .and_then(|settings| settings.realtime_active) + .filter(|realtime_active| *realtime_active) + .map(|_| developer_realtime_end_text("inactive")), + }?; + + Some(Self { text }) + } +} + +// --------------------------------------------------------------------------- +// Personality fragment +// --------------------------------------------------------------------------- + +pub(super) struct PersonalityUpdateFragment { + text: String, +} + +impl ModelVisibleContextFragment for PersonalityUpdateFragment { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +impl TurnContextDiffFragment for PersonalityUpdateFragment { + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + params: &TurnContextDiffParams<'_>, + ) -> Option { + if !params.personality_feature_enabled { + return None; + } + + let Some(previous) = reference_context_item else { + let personality = turn_context.personality?; + let has_baked_personality = params.base_instructions.is_some_and(|base_instructions| { + turn_context.model_info.supports_personality() + && base_instructions + == turn_context + .model_info + .get_model_instructions(Some(personality)) + }); + if has_baked_personality { + return None; + } + let personality_message = turn_context + .model_info + .model_messages + .as_ref() + .and_then(|spec| spec.get_personality_message(Some(personality))) + .filter(|message| !message.is_empty())?; + return Some(Self { + text: developer_personality_spec_text(personality_message), + }); + }; + + if turn_context.model_info.slug != previous.model { + return None; + } + if let Some(personality) = turn_context.personality + && turn_context.personality != previous.personality + { + let personality_message = turn_context + .model_info + .model_messages + .as_ref() + .and_then(|spec| spec.get_personality_message(Some(personality))) + .filter(|message| !message.is_empty())?; + return Some(Self { + text: developer_personality_spec_text(personality_message), + }); + } + + None + } +} diff --git a/codex-rs/core/src/contextual_user_message.rs b/codex-rs/core/src/contextual_user_message.rs deleted file mode 100644 index d10f7a9fc7..0000000000 --- a/codex-rs/core/src/contextual_user_message.rs +++ /dev/null @@ -1,108 +0,0 @@ -use codex_protocol::models::ContentItem; -use codex_protocol::models::ResponseItem; -use codex_protocol::protocol::ENVIRONMENT_CONTEXT_CLOSE_TAG; -use codex_protocol::protocol::ENVIRONMENT_CONTEXT_OPEN_TAG; - -pub(crate) const AGENTS_MD_START_MARKER: &str = "# AGENTS.md instructions for "; -pub(crate) const AGENTS_MD_END_MARKER: &str = ""; -pub(crate) const SKILL_OPEN_TAG: &str = ""; -pub(crate) const SKILL_CLOSE_TAG: &str = ""; -pub(crate) const USER_SHELL_COMMAND_OPEN_TAG: &str = ""; -pub(crate) const USER_SHELL_COMMAND_CLOSE_TAG: &str = ""; -pub(crate) const TURN_ABORTED_OPEN_TAG: &str = ""; -pub(crate) const TURN_ABORTED_CLOSE_TAG: &str = ""; -pub(crate) const SUBAGENT_NOTIFICATION_OPEN_TAG: &str = ""; -pub(crate) const SUBAGENT_NOTIFICATION_CLOSE_TAG: &str = ""; - -#[derive(Clone, Copy)] -pub(crate) struct ContextualUserFragmentDefinition { - start_marker: &'static str, - end_marker: &'static str, -} - -impl ContextualUserFragmentDefinition { - pub(crate) const fn new(start_marker: &'static str, end_marker: &'static str) -> Self { - Self { - start_marker, - end_marker, - } - } - - pub(crate) fn matches_text(&self, text: &str) -> bool { - let trimmed = text.trim_start(); - let starts_with_marker = trimmed - .get(..self.start_marker.len()) - .is_some_and(|candidate| candidate.eq_ignore_ascii_case(self.start_marker)); - let trimmed = trimmed.trim_end(); - let ends_with_marker = trimmed - .get(trimmed.len().saturating_sub(self.end_marker.len())..) - .is_some_and(|candidate| candidate.eq_ignore_ascii_case(self.end_marker)); - starts_with_marker && ends_with_marker - } - - pub(crate) const fn start_marker(&self) -> &'static str { - self.start_marker - } - - pub(crate) const fn end_marker(&self) -> &'static str { - self.end_marker - } - - pub(crate) fn wrap(&self, body: String) -> String { - format!("{}\n{}\n{}", self.start_marker, body, self.end_marker) - } - - pub(crate) fn into_message(self, text: String) -> ResponseItem { - ResponseItem::Message { - id: None, - role: "user".to_string(), - content: vec![ContentItem::InputText { text }], - end_turn: None, - phase: None, - } - } -} - -pub(crate) const AGENTS_MD_FRAGMENT: ContextualUserFragmentDefinition = - ContextualUserFragmentDefinition::new(AGENTS_MD_START_MARKER, AGENTS_MD_END_MARKER); -pub(crate) const ENVIRONMENT_CONTEXT_FRAGMENT: ContextualUserFragmentDefinition = - ContextualUserFragmentDefinition::new( - ENVIRONMENT_CONTEXT_OPEN_TAG, - ENVIRONMENT_CONTEXT_CLOSE_TAG, - ); -pub(crate) const SKILL_FRAGMENT: ContextualUserFragmentDefinition = - ContextualUserFragmentDefinition::new(SKILL_OPEN_TAG, SKILL_CLOSE_TAG); -pub(crate) const USER_SHELL_COMMAND_FRAGMENT: ContextualUserFragmentDefinition = - ContextualUserFragmentDefinition::new( - USER_SHELL_COMMAND_OPEN_TAG, - USER_SHELL_COMMAND_CLOSE_TAG, - ); -pub(crate) const TURN_ABORTED_FRAGMENT: ContextualUserFragmentDefinition = - ContextualUserFragmentDefinition::new(TURN_ABORTED_OPEN_TAG, TURN_ABORTED_CLOSE_TAG); -pub(crate) const SUBAGENT_NOTIFICATION_FRAGMENT: ContextualUserFragmentDefinition = - ContextualUserFragmentDefinition::new( - SUBAGENT_NOTIFICATION_OPEN_TAG, - SUBAGENT_NOTIFICATION_CLOSE_TAG, - ); - -const CONTEXTUAL_USER_FRAGMENTS: &[ContextualUserFragmentDefinition] = &[ - AGENTS_MD_FRAGMENT, - ENVIRONMENT_CONTEXT_FRAGMENT, - SKILL_FRAGMENT, - USER_SHELL_COMMAND_FRAGMENT, - TURN_ABORTED_FRAGMENT, - SUBAGENT_NOTIFICATION_FRAGMENT, -]; - -pub(crate) fn is_contextual_user_fragment(content_item: &ContentItem) -> bool { - let ContentItem::InputText { text } = content_item else { - return false; - }; - CONTEXTUAL_USER_FRAGMENTS - .iter() - .any(|definition| definition.matches_text(text)) -} - -#[cfg(test)] -#[path = "contextual_user_message_tests.rs"] -mod tests; diff --git a/codex-rs/core/src/environment_context.rs b/codex-rs/core/src/environment_context.rs index 0c42cd0900..f2bfe481c3 100644 --- a/codex-rs/core/src/environment_context.rs +++ b/codex-rs/core/src/environment_context.rs @@ -1,7 +1,13 @@ use crate::codex::TurnContext; -use crate::contextual_user_message::ENVIRONMENT_CONTEXT_FRAGMENT; +use crate::model_visible_context::ContextualUserContextRole; +use crate::model_visible_context::ContextualUserFragmentMarkers; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::TaggedContextualUserFragment; +use crate::model_visible_context::TurnContextDiffFragment; +use crate::model_visible_context::TurnContextDiffParams; use crate::shell::Shell; -use codex_protocol::models::ResponseItem; +use codex_protocol::protocol::ENVIRONMENT_CONTEXT_CLOSE_TAG; +use codex_protocol::protocol::ENVIRONMENT_CONTEXT_OPEN_TAG; use codex_protocol::protocol::TurnContextItem; use codex_protocol::protocol::TurnContextNetworkItem; use serde::Deserialize; @@ -16,7 +22,6 @@ pub(crate) struct EnvironmentContext { pub current_date: Option, pub timezone: Option, pub network: Option, - pub subagents: Option, } #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Default)] @@ -26,13 +31,17 @@ pub(crate) struct NetworkContext { } impl EnvironmentContext { + const MARKERS: ContextualUserFragmentMarkers = ContextualUserFragmentMarkers::new( + ENVIRONMENT_CONTEXT_OPEN_TAG, + ENVIRONMENT_CONTEXT_CLOSE_TAG, + ); + pub fn new( cwd: Option, shell: Shell, current_date: Option, timezone: Option, network: Option, - subagents: Option, ) -> Self { Self { cwd, @@ -40,7 +49,6 @@ impl EnvironmentContext { current_date, timezone, network, - subagents, } } @@ -53,65 +61,12 @@ impl EnvironmentContext { current_date, timezone, network, - subagents, shell: _, } = other; self.cwd == *cwd && self.current_date == *current_date && self.timezone == *timezone && self.network == *network - && self.subagents == *subagents - } - - pub fn diff_from_turn_context_item( - before: &TurnContextItem, - after: &TurnContext, - shell: &Shell, - ) -> Self { - let before_network = Self::network_from_turn_context_item(before); - let after_network = Self::network_from_turn_context(after); - let cwd = if before.cwd != after.cwd { - Some(after.cwd.clone()) - } else { - None - }; - let current_date = after.current_date.clone(); - let timezone = after.timezone.clone(); - let network = if before_network != after_network { - after_network - } else { - before_network - }; - EnvironmentContext::new(cwd, shell.clone(), current_date, timezone, network, None) - } - - pub fn from_turn_context(turn_context: &TurnContext, shell: &Shell) -> Self { - Self::new( - Some(turn_context.cwd.clone()), - shell.clone(), - turn_context.current_date.clone(), - turn_context.timezone.clone(), - Self::network_from_turn_context(turn_context), - None, - ) - } - - pub fn from_turn_context_item(turn_context_item: &TurnContextItem, shell: &Shell) -> Self { - Self::new( - Some(turn_context_item.cwd.clone()), - shell.clone(), - turn_context_item.current_date.clone(), - turn_context_item.timezone.clone(), - Self::network_from_turn_context_item(turn_context_item), - None, - ) - } - - pub fn with_subagents(mut self, subagents: String) -> Self { - if !subagents.is_empty() { - self.subagents = Some(subagents); - } - self } fn network_from_turn_context(turn_context: &TurnContext) -> Option { @@ -142,33 +97,25 @@ impl EnvironmentContext { } } -impl EnvironmentContext { - /// Serializes the environment context to XML. Libraries like `quick-xml` - /// require custom macros to handle Enums with newtypes, so we just do it - /// manually, to keep things simple. Output looks like: - /// - /// ```xml - /// - /// ... - /// ... - /// - /// ``` - pub fn serialize_to_xml(self) -> String { +impl ModelVisibleContextFragment for EnvironmentContext { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { let mut lines = Vec::new(); - if let Some(cwd) = self.cwd { + if let Some(cwd) = &self.cwd { lines.push(format!(" {}", cwd.to_string_lossy())); } let shell_name = self.shell.name(); lines.push(format!(" {shell_name}")); - if let Some(current_date) = self.current_date { + if let Some(current_date) = &self.current_date { lines.push(format!(" {current_date}")); } - if let Some(timezone) = self.timezone { + if let Some(timezone) = &self.timezone { lines.push(format!(" {timezone}")); } - match self.network { - Some(ref network) => { + match &self.network { + Some(network) => { lines.push(" ".to_string()); for allowed in &network.allowed_domains { lines.push(format!(" {allowed}")); @@ -183,21 +130,269 @@ impl EnvironmentContext { // lines.push(" ".to_string()); } } - if let Some(subagents) = self.subagents { - lines.push(" ".to_string()); - lines.extend(subagents.lines().map(|line| format!(" {line}"))); - lines.push(" ".to_string()); - } - ENVIRONMENT_CONTEXT_FRAGMENT.wrap(lines.join("\n")) + Self::MARKERS.wrap_body(lines.join("\n")) } } -impl From for ResponseItem { - fn from(ec: EnvironmentContext) -> Self { - ENVIRONMENT_CONTEXT_FRAGMENT.into_message(ec.serialize_to_xml()) +impl TaggedContextualUserFragment for EnvironmentContext { + const MARKERS: ContextualUserFragmentMarkers = Self::MARKERS; +} + +impl TurnContextDiffFragment for EnvironmentContext { + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + params: &TurnContextDiffParams<'_>, + ) -> Option { + let current_network = Self::network_from_turn_context(turn_context); + let current_context = Self::new( + Some(turn_context.cwd.clone()), + params.shell.clone(), + turn_context.current_date.clone(), + turn_context.timezone.clone(), + current_network.clone(), + ); + + let Some(previous) = reference_context_item else { + return Some(current_context); + }; + + let previous_network = Self::network_from_turn_context_item(previous); + let previous_context = Self::new( + Some(previous.cwd.clone()), + params.shell.clone(), + previous.current_date.clone(), + previous.timezone.clone(), + previous_network.clone(), + ); + + if previous_context.equals_except_shell(¤t_context) { + return None; + } + + let cwd = if previous.cwd != turn_context.cwd { + Some(turn_context.cwd.clone()) + } else { + None + }; + let network = if previous_network != current_network { + current_network + } else { + previous_network + }; + + Some(Self::new( + cwd, + params.shell.clone(), + turn_context.current_date.clone(), + turn_context.timezone.clone(), + network, + )) } } #[cfg(test)] -#[path = "environment_context_tests.rs"] -mod tests; +mod tests { + use crate::shell::ShellType; + + use super::*; + use core_test_support::test_path_buf; + use pretty_assertions::assert_eq; + + fn fake_shell() -> Shell { + Shell { + shell_type: ShellType::Bash, + shell_path: PathBuf::from("/bin/bash"), + shell_snapshot: crate::shell::empty_shell_snapshot_receiver(), + } + } + + #[test] + fn serialize_workspace_write_environment_context() { + let cwd = test_path_buf("/repo"); + let context = EnvironmentContext::new( + Some(cwd.clone()), + fake_shell(), + Some("2026-02-26".to_string()), + Some("America/Los_Angeles".to_string()), + None, + ); + + let expected = format!( + r#" + {cwd} + bash + 2026-02-26 + America/Los_Angeles +"#, + cwd = cwd.display(), + ); + + assert_eq!(context.render_text(), expected); + } + + #[test] + fn serialize_environment_context_with_network() { + let network = NetworkContext { + allowed_domains: vec!["api.example.com".to_string(), "*.openai.com".to_string()], + denied_domains: vec!["blocked.example.com".to_string()], + }; + let context = EnvironmentContext::new( + Some(test_path_buf("/repo")), + fake_shell(), + Some("2026-02-26".to_string()), + Some("America/Los_Angeles".to_string()), + Some(network), + ); + + let expected = format!( + r#" + {} + bash + 2026-02-26 + America/Los_Angeles + + api.example.com + *.openai.com + blocked.example.com + +"#, + test_path_buf("/repo").display() + ); + + assert_eq!(context.render_text(), expected); + } + + #[test] + fn serialize_read_only_environment_context() { + let context = EnvironmentContext::new( + None, + fake_shell(), + Some("2026-02-26".to_string()), + Some("America/Los_Angeles".to_string()), + None, + ); + + let expected = r#" + bash + 2026-02-26 + America/Los_Angeles +"#; + + assert_eq!(context.render_text(), expected); + } + + #[test] + fn serialize_external_sandbox_environment_context() { + let context = EnvironmentContext::new( + None, + fake_shell(), + Some("2026-02-26".to_string()), + Some("America/Los_Angeles".to_string()), + None, + ); + + let expected = r#" + bash + 2026-02-26 + America/Los_Angeles +"#; + + assert_eq!(context.render_text(), expected); + } + + #[test] + fn serialize_external_sandbox_with_restricted_network_environment_context() { + let context = EnvironmentContext::new( + None, + fake_shell(), + Some("2026-02-26".to_string()), + Some("America/Los_Angeles".to_string()), + None, + ); + + let expected = r#" + bash + 2026-02-26 + America/Los_Angeles +"#; + + assert_eq!(context.render_text(), expected); + } + + #[test] + fn serialize_full_access_environment_context() { + let context = EnvironmentContext::new( + None, + fake_shell(), + Some("2026-02-26".to_string()), + Some("America/Los_Angeles".to_string()), + None, + ); + + let expected = r#" + bash + 2026-02-26 + America/Los_Angeles +"#; + + assert_eq!(context.render_text(), expected); + } + + #[test] + fn equals_except_shell_compares_cwd() { + let context1 = + EnvironmentContext::new(Some(PathBuf::from("/repo")), fake_shell(), None, None, None); + let context2 = + EnvironmentContext::new(Some(PathBuf::from("/repo")), fake_shell(), None, None, None); + assert!(context1.equals_except_shell(&context2)); + } + + #[test] + fn equals_except_shell_compares_cwd_differences() { + let context1 = EnvironmentContext::new( + Some(PathBuf::from("/repo1")), + fake_shell(), + None, + None, + None, + ); + let context2 = EnvironmentContext::new( + Some(PathBuf::from("/repo2")), + fake_shell(), + None, + None, + None, + ); + + assert!(!context1.equals_except_shell(&context2)); + } + + #[test] + fn equals_except_shell_ignores_shell() { + let context1 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Shell { + shell_type: ShellType::Bash, + shell_path: "/bin/bash".into(), + shell_snapshot: crate::shell::empty_shell_snapshot_receiver(), + }, + None, + None, + None, + ); + let context2 = EnvironmentContext::new( + Some(PathBuf::from("/repo")), + Shell { + shell_type: ShellType::Zsh, + shell_path: "/bin/zsh".into(), + shell_snapshot: crate::shell::empty_shell_snapshot_receiver(), + }, + None, + None, + None, + ); + + assert!(context1.equals_except_shell(&context2)); + } +} diff --git a/codex-rs/core/src/event_mapping.rs b/codex-rs/core/src/event_mapping.rs index 72372b24cd..51e244330d 100644 --- a/codex-rs/core/src/event_mapping.rs +++ b/codex-rs/core/src/event_mapping.rs @@ -18,7 +18,7 @@ use codex_protocol::user_input::UserInput; use tracing::warn; use uuid::Uuid; -use crate::contextual_user_message::is_contextual_user_fragment; +use crate::model_visible_context::is_contextual_user_fragment; use crate::web_search::web_search_action_detail; pub(crate) fn is_contextual_user_message_content(message: &[ContentItem]) -> bool { diff --git a/codex-rs/core/src/exec_policy.rs b/codex-rs/core/src/exec_policy.rs index aa0691f888..b447c4bbe7 100644 --- a/codex-rs/core/src/exec_policy.rs +++ b/codex-rs/core/src/exec_policy.rs @@ -21,9 +21,9 @@ use codex_execpolicy::RuleMatch; use codex_execpolicy::blocking_append_allow_prefix_rule; use codex_execpolicy::blocking_append_network_rule; use codex_protocol::approvals::ExecPolicyAmendment; -use codex_protocol::permissions::FileSystemSandboxKind; -use codex_protocol::permissions::FileSystemSandboxPolicy; use codex_protocol::protocol::AskForApproval; +use codex_protocol::protocol::FileSystemSandboxKind; +use codex_protocol::protocol::FileSystemSandboxPolicy; use codex_protocol::protocol::SandboxPolicy; use thiserror::Error; use tokio::fs; @@ -535,18 +535,19 @@ pub fn render_decision_for_unmatched_command( Decision::Prompt } AskForApproval::OnRequest => { - match file_system_sandbox_policy.kind { - FileSystemSandboxKind::Unrestricted | FileSystemSandboxKind::ExternalSandbox => { + match sandbox_policy { + SandboxPolicy::DangerFullAccess | SandboxPolicy::ExternalSandbox { .. } => { // The user has indicated we should "just run" commands // in their unrestricted environment, so we do so since the // command has not been flagged as dangerous. Decision::Allow } - FileSystemSandboxKind::Restricted => { - // In restricted sandboxes, do not prompt for non-escalated, - // non-dangerous commands; let the sandbox enforce - // restrictions without a user prompt. - if sandbox_permissions.requests_sandbox_override() { + SandboxPolicy::ReadOnly { .. } | SandboxPolicy::WorkspaceWrite { .. } => { + // In restricted sandboxes (ReadOnly/WorkspaceWrite), do not prompt for + // non‑permission-expanding, non‑dangerous commands — let the sandbox enforce + // default restrictions. Explicit permission requests (either full escalation + // or additional sandbox permissions) always require approval. + if sandbox_permissions.requires_additional_permissions() { Decision::Prompt } else { Decision::Allow @@ -561,7 +562,7 @@ pub fn render_decision_for_unmatched_command( Decision::Allow } FileSystemSandboxKind::Restricted => { - if sandbox_permissions.requests_sandbox_override() { + if sandbox_permissions.requires_additional_permissions() { Decision::Prompt } else { Decision::Allow @@ -823,5 +824,1574 @@ async fn collect_policy_files(dir: impl AsRef) -> Result, Exe } #[cfg(test)] -#[path = "exec_policy_tests.rs"] -mod tests; +mod tests { + use super::*; + use crate::config_loader::ConfigLayerEntry; + use crate::config_loader::ConfigLayerStack; + use crate::config_loader::ConfigRequirements; + use crate::config_loader::ConfigRequirementsToml; + use codex_app_server_protocol::ConfigLayerSource; + use codex_protocol::protocol::AskForApproval; + use codex_protocol::protocol::RejectConfig; + use codex_protocol::protocol::SandboxPolicy; + use codex_utils_absolute_path::AbsolutePathBuf; + use pretty_assertions::assert_eq; + use std::fs; + use std::path::Path; + use std::path::PathBuf; + use std::sync::Arc; + use tempfile::tempdir; + use toml::Value as TomlValue; + + fn config_stack_for_dot_codex_folder(dot_codex_folder: &Path) -> ConfigLayerStack { + let dot_codex_folder = AbsolutePathBuf::from_absolute_path(dot_codex_folder) + .expect("absolute dot_codex_folder"); + let layer = ConfigLayerEntry::new( + ConfigLayerSource::Project { dot_codex_folder }, + TomlValue::Table(Default::default()), + ); + ConfigLayerStack::new( + vec![layer], + ConfigRequirements::default(), + ConfigRequirementsToml::default(), + ) + .expect("ConfigLayerStack") + } + + fn host_absolute_path(segments: &[&str]) -> String { + let mut path = if cfg!(windows) { + PathBuf::from(r"C:\") + } else { + PathBuf::from("/") + }; + for segment in segments { + path.push(segment); + } + path.to_string_lossy().into_owned() + } + + fn host_program_path(name: &str) -> String { + let executable_name = if cfg!(windows) { + format!("{name}.exe") + } else { + name.to_string() + }; + host_absolute_path(&["usr", "bin", &executable_name]) + } + + fn starlark_string(value: &str) -> String { + value.replace('\\', "\\\\").replace('"', "\\\"") + } + + #[tokio::test] + async fn returns_empty_policy_when_no_policy_files_exist() { + let temp_dir = tempdir().expect("create temp dir"); + let config_stack = config_stack_for_dot_codex_folder(temp_dir.path()); + + let manager = ExecPolicyManager::load(&config_stack) + .await + .expect("manager result"); + let policy = manager.current(); + + let commands = [vec!["rm".to_string()]]; + assert_eq!( + Evaluation { + decision: Decision::Allow, + matched_rules: vec![RuleMatch::HeuristicsRuleMatch { + command: vec!["rm".to_string()], + decision: Decision::Allow + }], + }, + policy.check_multiple(commands.iter(), &|_| Decision::Allow) + ); + assert!(!temp_dir.path().join(RULES_DIR_NAME).exists()); + } + + #[tokio::test] + async fn collect_policy_files_returns_empty_when_dir_missing() { + let temp_dir = tempdir().expect("create temp dir"); + + let policy_dir = temp_dir.path().join(RULES_DIR_NAME); + let files = collect_policy_files(&policy_dir) + .await + .expect("collect policy files"); + + assert!(files.is_empty()); + } + + #[tokio::test] + async fn format_exec_policy_error_with_source_renders_range() { + let temp_dir = tempdir().expect("create temp dir"); + let config_stack = config_stack_for_dot_codex_folder(temp_dir.path()); + let policy_dir = temp_dir.path().join(RULES_DIR_NAME); + fs::create_dir_all(&policy_dir).expect("create policy dir"); + let broken_path = policy_dir.join("broken.rules"); + fs::write( + &broken_path, + r#"prefix_rule( + pattern = ["tmux capture-pane"], + decision = "allow", + match = ["tmux capture-pane -p"], +)"#, + ) + .expect("write broken policy file"); + + let err = load_exec_policy(&config_stack) + .await + .expect_err("expected parse error"); + let rendered = format_exec_policy_error_with_source(&err); + + assert!(rendered.contains("broken.rules:1:")); + assert!(rendered.contains("on or around line 1")); + } + + #[test] + fn parse_starlark_line_from_message_extracts_path_and_line() { + let parsed = parse_starlark_line_from_message( + "/tmp/default.rules:143:1: starlark error: error: Parse error: unexpected new line", + ) + .expect("parse should succeed"); + + assert_eq!(parsed.0, PathBuf::from("/tmp/default.rules")); + assert_eq!(parsed.1, 143); + } + + #[test] + fn parse_starlark_line_from_message_rejects_zero_line() { + let parsed = parse_starlark_line_from_message( + "/tmp/default.rules:0:1: starlark error: error: Parse error: unexpected new line", + ); + assert_eq!(parsed, None); + } + + #[tokio::test] + async fn loads_policies_from_policy_subdirectory() { + let temp_dir = tempdir().expect("create temp dir"); + let config_stack = config_stack_for_dot_codex_folder(temp_dir.path()); + let policy_dir = temp_dir.path().join(RULES_DIR_NAME); + fs::create_dir_all(&policy_dir).expect("create policy dir"); + fs::write( + policy_dir.join("deny.rules"), + r#"prefix_rule(pattern=["rm"], decision="forbidden")"#, + ) + .expect("write policy file"); + + let policy = load_exec_policy(&config_stack) + .await + .expect("policy result"); + let command = [vec!["rm".to_string()]]; + assert_eq!( + Evaluation { + decision: Decision::Forbidden, + matched_rules: vec![RuleMatch::PrefixRuleMatch { + matched_prefix: vec!["rm".to_string()], + decision: Decision::Forbidden, + resolved_program: None, + justification: None, + }], + }, + policy.check_multiple(command.iter(), &|_| Decision::Allow) + ); + } + + #[tokio::test] + async fn merges_requirements_exec_policy_network_rules() -> anyhow::Result<()> { + let temp_dir = tempdir()?; + + let mut requirements_exec_policy = Policy::empty(); + requirements_exec_policy.add_network_rule( + "blocked.example.com", + codex_execpolicy::NetworkRuleProtocol::Https, + Decision::Forbidden, + None, + )?; + + let requirements = ConfigRequirements { + exec_policy: Some(codex_config::Sourced::new( + codex_config::RequirementsExecPolicy::new(requirements_exec_policy), + codex_config::RequirementSource::Unknown, + )), + ..ConfigRequirements::default() + }; + let dot_codex_folder = AbsolutePathBuf::from_absolute_path(temp_dir.path())?; + let layer = ConfigLayerEntry::new( + ConfigLayerSource::Project { dot_codex_folder }, + TomlValue::Table(Default::default()), + ); + let config_stack = + ConfigLayerStack::new(vec![layer], requirements, ConfigRequirementsToml::default())?; + + let policy = load_exec_policy(&config_stack).await?; + let (allowed, denied) = policy.compiled_network_domains(); + + assert!(allowed.is_empty()); + assert_eq!(denied, vec!["blocked.example.com".to_string()]); + Ok(()) + } + + #[tokio::test] + async fn preserves_host_executables_when_requirements_overlay_is_present() -> anyhow::Result<()> + { + let temp_dir = tempdir()?; + let policy_dir = temp_dir.path().join(RULES_DIR_NAME); + fs::create_dir_all(&policy_dir)?; + let git_path = host_absolute_path(&["usr", "bin", "git"]); + let git_path_literal = starlark_string(&git_path); + fs::write( + policy_dir.join("host.rules"), + format!( + r#" +host_executable(name = "git", paths = ["{git_path_literal}"]) +"# + ), + )?; + + let mut requirements_exec_policy = Policy::empty(); + requirements_exec_policy.add_network_rule( + "blocked.example.com", + codex_execpolicy::NetworkRuleProtocol::Https, + Decision::Forbidden, + None, + )?; + + let requirements = ConfigRequirements { + exec_policy: Some(codex_config::Sourced::new( + codex_config::RequirementsExecPolicy::new(requirements_exec_policy), + codex_config::RequirementSource::Unknown, + )), + ..ConfigRequirements::default() + }; + let dot_codex_folder = AbsolutePathBuf::from_absolute_path(temp_dir.path())?; + let layer = ConfigLayerEntry::new( + ConfigLayerSource::Project { dot_codex_folder }, + TomlValue::Table(Default::default()), + ); + let config_stack = + ConfigLayerStack::new(vec![layer], requirements, ConfigRequirementsToml::default())?; + + let policy = load_exec_policy(&config_stack).await?; + + assert_eq!( + policy + .host_executables() + .get("git") + .expect("missing git host executable") + .as_ref(), + [AbsolutePathBuf::try_from(git_path)?] + ); + Ok(()) + } + + #[tokio::test] + async fn ignores_policies_outside_policy_dir() { + let temp_dir = tempdir().expect("create temp dir"); + let config_stack = config_stack_for_dot_codex_folder(temp_dir.path()); + fs::write( + temp_dir.path().join("root.rules"), + r#"prefix_rule(pattern=["ls"], decision="prompt")"#, + ) + .expect("write policy file"); + + let policy = load_exec_policy(&config_stack) + .await + .expect("policy result"); + let command = [vec!["ls".to_string()]]; + assert_eq!( + Evaluation { + decision: Decision::Allow, + matched_rules: vec![RuleMatch::HeuristicsRuleMatch { + command: vec!["ls".to_string()], + decision: Decision::Allow + }], + }, + policy.check_multiple(command.iter(), &|_| Decision::Allow) + ); + } + + #[tokio::test] + async fn ignores_rules_from_untrusted_project_layers() -> anyhow::Result<()> { + let project_dir = tempdir()?; + let policy_dir = project_dir.path().join(RULES_DIR_NAME); + fs::create_dir_all(&policy_dir)?; + fs::write( + policy_dir.join("untrusted.rules"), + r#"prefix_rule(pattern=["ls"], decision="forbidden")"#, + )?; + + let project_dot_codex_folder = AbsolutePathBuf::from_absolute_path(project_dir.path())?; + let layers = vec![ConfigLayerEntry::new_disabled( + ConfigLayerSource::Project { + dot_codex_folder: project_dot_codex_folder, + }, + TomlValue::Table(Default::default()), + "marked untrusted", + )]; + let config_stack = ConfigLayerStack::new( + layers, + ConfigRequirements::default(), + ConfigRequirementsToml::default(), + )?; + + let policy = load_exec_policy(&config_stack).await?; + + assert_eq!( + Evaluation { + decision: Decision::Allow, + matched_rules: vec![RuleMatch::HeuristicsRuleMatch { + command: vec!["ls".to_string()], + decision: Decision::Allow, + }], + }, + policy.check_multiple([vec!["ls".to_string()]].iter(), &|_| Decision::Allow) + ); + Ok(()) + } + + #[tokio::test] + async fn loads_policies_from_multiple_config_layers() -> anyhow::Result<()> { + let user_dir = tempdir()?; + let project_dir = tempdir()?; + + let user_policy_dir = user_dir.path().join(RULES_DIR_NAME); + fs::create_dir_all(&user_policy_dir)?; + fs::write( + user_policy_dir.join("user.rules"), + r#"prefix_rule(pattern=["rm"], decision="forbidden")"#, + )?; + + let project_policy_dir = project_dir.path().join(RULES_DIR_NAME); + fs::create_dir_all(&project_policy_dir)?; + fs::write( + project_policy_dir.join("project.rules"), + r#"prefix_rule(pattern=["ls"], decision="prompt")"#, + )?; + + let user_config_toml = + AbsolutePathBuf::from_absolute_path(user_dir.path().join("config.toml"))?; + let project_dot_codex_folder = AbsolutePathBuf::from_absolute_path(project_dir.path())?; + let layers = vec![ + ConfigLayerEntry::new( + ConfigLayerSource::User { + file: user_config_toml, + }, + TomlValue::Table(Default::default()), + ), + ConfigLayerEntry::new( + ConfigLayerSource::Project { + dot_codex_folder: project_dot_codex_folder, + }, + TomlValue::Table(Default::default()), + ), + ]; + let config_stack = ConfigLayerStack::new( + layers, + ConfigRequirements::default(), + ConfigRequirementsToml::default(), + )?; + + let policy = load_exec_policy(&config_stack).await?; + + assert_eq!( + Evaluation { + decision: Decision::Forbidden, + matched_rules: vec![RuleMatch::PrefixRuleMatch { + matched_prefix: vec!["rm".to_string()], + decision: Decision::Forbidden, + resolved_program: None, + justification: None, + }], + }, + policy.check_multiple([vec!["rm".to_string()]].iter(), &|_| Decision::Allow) + ); + assert_eq!( + Evaluation { + decision: Decision::Prompt, + matched_rules: vec![RuleMatch::PrefixRuleMatch { + matched_prefix: vec!["ls".to_string()], + decision: Decision::Prompt, + resolved_program: None, + justification: None, + }], + }, + policy.check_multiple([vec!["ls".to_string()]].iter(), &|_| Decision::Allow) + ); + Ok(()) + } + + #[tokio::test] + async fn evaluates_bash_lc_inner_commands() { + let policy_src = r#" +prefix_rule(pattern=["rm"], decision="forbidden") +"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + + let forbidden_script = vec![ + "bash".to_string(), + "-lc".to_string(), + "rm -rf /some/important/folder".to_string(), + ]; + + let manager = ExecPolicyManager::new(policy); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &forbidden_script, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Forbidden { + reason: "`bash -lc 'rm -rf /some/important/folder'` rejected: policy forbids commands starting with `rm`".to_string() + } + ); + } + + #[test] + fn commands_for_exec_policy_falls_back_for_empty_shell_script() { + let command = vec!["bash".to_string(), "-lc".to_string(), "".to_string()]; + + assert_eq!(commands_for_exec_policy(&command), (vec![command], false)); + } + + #[test] + fn commands_for_exec_policy_falls_back_for_whitespace_shell_script() { + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + " \n\t ".to_string(), + ]; + + assert_eq!(commands_for_exec_policy(&command), (vec![command], false)); + } + + #[tokio::test] + async fn evaluates_heredoc_script_against_prefix_rules() { + let policy_src = r#"prefix_rule(pattern=["python3"], decision="allow")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "python3 <<'PY'\nprint('hello')\nPY".to_string(), + ]; + + let requirement = ExecPolicyManager::new(policy) + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Skip { + bypass_sandbox: true, + proposed_execpolicy_amendment: None, + } + ); + } + + #[tokio::test] + async fn omits_auto_amendment_for_heredoc_fallback_prompts() { + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "python3 <<'PY'\nprint('hello')\nPY".to_string(), + ]; + + let requirement = ExecPolicyManager::default() + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: None, + } + ); + } + + #[tokio::test] + async fn drops_requested_amendment_for_heredoc_fallback_prompts_when_it_wont_match() { + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "python3 <<'PY'\nprint('hello')\nPY".to_string(), + ]; + let requested_prefix = vec!["python3".to_string(), "-m".to_string(), "pip".to_string()]; + + let requirement = ExecPolicyManager::default() + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: Some(requested_prefix.clone()), + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: None, + } + ); + } + + #[tokio::test] + async fn justification_is_included_in_forbidden_exec_approval_requirement() { + let policy_src = r#" +prefix_rule( + pattern=["rm"], + decision="forbidden", + justification="destructive command", +) +"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + + let manager = ExecPolicyManager::new(policy); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &[ + "rm".to_string(), + "-rf".to_string(), + "/some/important/folder".to_string(), + ], + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Forbidden { + reason: "`rm -rf /some/important/folder` rejected: destructive command".to_string() + } + ); + } + + #[tokio::test] + async fn exec_approval_requirement_prefers_execpolicy_match() { + let policy_src = r#"prefix_rule(pattern=["rm"], decision="prompt")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + let command = vec!["rm".to_string()]; + + let manager = ExecPolicyManager::new(policy); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: Some("`rm` requires approval by policy".to_string()), + proposed_execpolicy_amendment: None, + } + ); + } + + #[tokio::test] + async fn absolute_path_exec_approval_requirement_matches_host_executable_rules() { + let git_path = host_program_path("git"); + let git_path_literal = starlark_string(&git_path); + let policy_src = format!( + r#" +host_executable(name = "git", paths = ["{git_path_literal}"]) +prefix_rule(pattern=["git"], decision="allow") +"# + ); + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", &policy_src) + .expect("parse policy"); + let manager = ExecPolicyManager::new(Arc::new(parser.build())); + let command = vec![git_path, "status".to_string()]; + + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Skip { + bypass_sandbox: true, + proposed_execpolicy_amendment: None, + } + ); + } + + #[tokio::test] + async fn absolute_path_exec_approval_requirement_ignores_disallowed_host_executable_paths() { + let allowed_git_path = host_program_path("git"); + let disallowed_git_path = host_absolute_path(&[ + "opt", + "homebrew", + "bin", + if cfg!(windows) { "git.exe" } else { "git" }, + ]); + let allowed_git_path_literal = starlark_string(&allowed_git_path); + let policy_src = format!( + r#" +host_executable(name = "git", paths = ["{allowed_git_path_literal}"]) +prefix_rule(pattern=["git"], decision="prompt") +"# + ); + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", &policy_src) + .expect("parse policy"); + let manager = ExecPolicyManager::new(Arc::new(parser.build())); + let command = vec![disallowed_git_path, "status".to_string()]; + + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Skip { + bypass_sandbox: false, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(command)), + } + ); + } + + #[tokio::test] + async fn requested_prefix_rule_can_approve_absolute_path_commands() { + let command = vec![ + host_program_path("cargo"), + "install".to_string(), + "cargo-insta".to_string(), + ]; + let manager = ExecPolicyManager::default(); + + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: Some(vec!["cargo".to_string(), "install".to_string()]), + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(vec![ + "cargo".to_string(), + "install".to_string(), + ])), + } + ); + } + + #[tokio::test] + async fn exec_approval_requirement_respects_approval_policy() { + let policy_src = r#"prefix_rule(pattern=["rm"], decision="prompt")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + let command = vec!["rm".to_string()]; + + let manager = ExecPolicyManager::new(policy); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::Never, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Forbidden { + reason: PROMPT_CONFLICT_REASON.to_string() + } + ); + } + + #[test] + fn unmatched_reject_policy_still_prompts_for_restricted_sandbox_escalation() { + let command = vec!["madeup-cmd".to_string()]; + + assert_eq!( + Decision::Prompt, + render_decision_for_unmatched_command( + AskForApproval::Reject(RejectConfig { + sandbox_approval: false, + rules: false, + skill_approval: false, + request_permissions: false, + mcp_elicitations: false, + }), + &SandboxPolicy::new_read_only_policy(), + &command, + SandboxPermissions::RequireEscalated, + false, + ) + ); + + assert_eq!( + Decision::Prompt, + render_decision_for_unmatched_command( + AskForApproval::Reject(RejectConfig { + sandbox_approval: false, + rules: false, + skill_approval: false, + request_permissions: false, + mcp_elicitations: false, + }), + &SandboxPolicy::new_read_only_policy(), + &command, + SandboxPermissions::WithAdditionalPermissions, + false, + ) + ); + } + + #[test] + fn unmatched_on_request_policy_prompts_for_restricted_sandbox_additional_permissions() { + let command = vec!["madeup-cmd".to_string()]; + + assert_eq!( + Decision::Prompt, + render_decision_for_unmatched_command( + AskForApproval::OnRequest, + &SandboxPolicy::new_read_only_policy(), + &command, + SandboxPermissions::WithAdditionalPermissions, + false, + ) + ); + } + + #[tokio::test] + async fn exec_approval_requirement_rejects_unmatched_sandbox_escalation_when_sandbox_rejection_enabled() + { + let command = vec!["madeup-cmd".to_string()]; + + let requirement = ExecPolicyManager::default() + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::Reject(RejectConfig { + sandbox_approval: true, + rules: false, + skill_approval: false, + request_permissions: false, + mcp_elicitations: false, + }), + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::RequireEscalated, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Forbidden { + reason: REJECT_SANDBOX_APPROVAL_REASON.to_string(), + } + ); + } + + #[tokio::test] + async fn mixed_rule_and_sandbox_prompt_prioritizes_rule_for_rejection_decision() { + let policy_src = r#"prefix_rule(pattern=["git"], decision="prompt")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let manager = ExecPolicyManager::new(Arc::new(parser.build())); + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "git status && madeup-cmd".to_string(), + ]; + + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::Reject(RejectConfig { + sandbox_approval: true, + rules: false, + skill_approval: false, + request_permissions: false, + mcp_elicitations: false, + }), + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::RequireEscalated, + prefix_rule: None, + }) + .await; + + assert!(matches!( + requirement, + ExecApprovalRequirement::NeedsApproval { .. } + )); + } + + #[tokio::test] + async fn mixed_rule_and_sandbox_prompt_rejects_when_rules_rejection_enabled() { + let policy_src = r#"prefix_rule(pattern=["git"], decision="prompt")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let manager = ExecPolicyManager::new(Arc::new(parser.build())); + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "git status && madeup-cmd".to_string(), + ]; + + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::Reject(RejectConfig { + sandbox_approval: false, + rules: true, + skill_approval: false, + request_permissions: false, + mcp_elicitations: false, + }), + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::RequireEscalated, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Forbidden { + reason: REJECT_RULES_APPROVAL_REASON.to_string(), + } + ); + } + + #[tokio::test] + async fn exec_approval_requirement_falls_back_to_heuristics() { + let command = vec!["cargo".to_string(), "build".to_string()]; + + let manager = ExecPolicyManager::default(); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(command)) + } + ); + } + + #[tokio::test] + async fn empty_bash_lc_script_falls_back_to_original_command() { + let command = vec!["bash".to_string(), "-lc".to_string(), "".to_string()]; + + let manager = ExecPolicyManager::default(); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(command)), + } + ); + } + + #[tokio::test] + async fn whitespace_bash_lc_script_falls_back_to_original_command() { + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + " \n\t ".to_string(), + ]; + + let manager = ExecPolicyManager::default(); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(command)), + } + ); + } + + #[tokio::test] + async fn request_rule_uses_prefix_rule() { + let command = vec![ + "cargo".to_string(), + "install".to_string(), + "cargo-insta".to_string(), + ]; + let manager = ExecPolicyManager::default(); + + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::RequireEscalated, + prefix_rule: Some(vec!["cargo".to_string(), "install".to_string()]), + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(vec![ + "cargo".to_string(), + "install".to_string(), + ])), + } + ); + } + + #[tokio::test] + async fn request_rule_falls_back_when_prefix_rule_does_not_approve_all_commands() { + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "cargo install cargo-insta && rm -rf /tmp/codex".to_string(), + ]; + let manager = ExecPolicyManager::default(); + + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::RequireEscalated, + prefix_rule: Some(vec!["cargo".to_string(), "install".to_string()]), + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(vec![ + "rm".to_string(), + "-rf".to_string(), + "/tmp/codex".to_string(), + ])), + } + ); + } + + #[tokio::test] + async fn heuristics_apply_when_other_commands_match_policy() { + let policy_src = r#"prefix_rule(pattern=["apple"], decision="allow")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "apple | orange".to_string(), + ]; + + assert_eq!( + ExecPolicyManager::new(policy) + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(vec![ + "orange".to_string() + ])) + } + ); + } + + #[tokio::test] + async fn append_execpolicy_amendment_updates_policy_and_file() { + let codex_home = tempdir().expect("create temp dir"); + let prefix = vec!["echo".to_string(), "hello".to_string()]; + let manager = ExecPolicyManager::default(); + + manager + .append_amendment_and_update(codex_home.path(), &ExecPolicyAmendment::from(prefix)) + .await + .expect("update policy"); + let updated_policy = manager.current(); + + let evaluation = updated_policy.check( + &["echo".to_string(), "hello".to_string(), "world".to_string()], + &|_| Decision::Allow, + ); + assert!(matches!( + evaluation, + Evaluation { + decision: Decision::Allow, + .. + } + )); + + let contents = fs::read_to_string(default_policy_path(codex_home.path())) + .expect("policy file should have been created"); + assert_eq!( + contents, + r#"prefix_rule(pattern=["echo", "hello"], decision="allow") +"# + ); + } + + #[tokio::test] + async fn append_execpolicy_amendment_rejects_empty_prefix() { + let codex_home = tempdir().expect("create temp dir"); + let manager = ExecPolicyManager::default(); + + let result = manager + .append_amendment_and_update(codex_home.path(), &ExecPolicyAmendment::from(vec![])) + .await; + + assert!(matches!( + result, + Err(ExecPolicyUpdateError::AppendRule { + source: AmendError::EmptyPrefix, + .. + }) + )); + } + + #[tokio::test] + async fn proposed_execpolicy_amendment_is_present_for_single_command_without_policy_match() { + let command = vec!["cargo".to_string(), "build".to_string()]; + + let manager = ExecPolicyManager::default(); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(command)) + } + ); + } + + #[tokio::test] + async fn proposed_execpolicy_amendment_is_omitted_when_policy_prompts() { + let policy_src = r#"prefix_rule(pattern=["rm"], decision="prompt")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + let command = vec!["rm".to_string()]; + + let manager = ExecPolicyManager::new(policy); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: Some("`rm` requires approval by policy".to_string()), + proposed_execpolicy_amendment: None, + } + ); + } + + #[tokio::test] + async fn proposed_execpolicy_amendment_is_present_for_multi_command_scripts() { + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "cargo build && echo ok".to_string(), + ]; + let manager = ExecPolicyManager::default(); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(vec![ + "cargo".to_string(), + "build".to_string() + ])), + } + ); + } + + #[tokio::test] + async fn proposed_execpolicy_amendment_uses_first_no_match_in_multi_command_scripts() { + let policy_src = r#"prefix_rule(pattern=["cat"], decision="allow")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + + let command = vec![ + "bash".to_string(), + "-lc".to_string(), + "cat && apple".to_string(), + ]; + + assert_eq!( + ExecPolicyManager::new(policy) + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::UnlessTrusted, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(vec![ + "apple".to_string() + ])), + } + ); + } + + #[tokio::test] + async fn proposed_execpolicy_amendment_is_present_when_heuristics_allow() { + let command = vec!["echo".to_string(), "safe".to_string()]; + + let manager = ExecPolicyManager::default(); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Skip { + bypass_sandbox: false, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(command)), + } + ); + } + + #[tokio::test] + async fn proposed_execpolicy_amendment_is_suppressed_when_policy_matches_allow() { + let policy_src = r#"prefix_rule(pattern=["echo"], decision="allow")"#; + let mut parser = PolicyParser::new(); + parser + .parse("test.rules", policy_src) + .expect("parse policy"); + let policy = Arc::new(parser.build()); + let command = vec!["echo".to_string(), "safe".to_string()]; + + let manager = ExecPolicyManager::new(policy); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::Skip { + bypass_sandbox: true, + proposed_execpolicy_amendment: None, + } + ); + } + + fn derive_requested_execpolicy_amendment_for_test( + prefix_rule: Option<&Vec>, + matched_rules: &[RuleMatch], + ) -> Option { + let commands = prefix_rule + .cloned() + .map(|prefix_rule| vec![prefix_rule]) + .unwrap_or_else(|| vec![vec!["echo".to_string()]]); + derive_requested_execpolicy_amendment_from_prefix_rule( + prefix_rule, + matched_rules, + &Policy::empty(), + &commands, + &|_: &[String]| Decision::Allow, + &MatchOptions::default(), + ) + } + + #[test] + fn derive_requested_execpolicy_amendment_returns_none_for_missing_prefix_rule() { + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test(None, &[]) + ); + } + + #[test] + fn derive_requested_execpolicy_amendment_returns_none_for_empty_prefix_rule() { + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test(Some(&Vec::new()), &[]) + ); + } + + #[test] + fn derive_requested_execpolicy_amendment_returns_none_for_exact_banned_prefix_rule() { + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test( + Some(&vec!["python".to_string(), "-c".to_string()]), + &[], + ) + ); + } + + #[test] + fn derive_requested_execpolicy_amendment_returns_none_for_windows_and_pypy_variants() { + for prefix_rule in [ + vec!["py".to_string()], + vec!["py".to_string(), "-3".to_string()], + vec!["pythonw".to_string()], + vec!["pyw".to_string()], + vec!["pypy".to_string()], + vec!["pypy3".to_string()], + ] { + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test(Some(&prefix_rule), &[]) + ); + } + } + + #[test] + fn derive_requested_execpolicy_amendment_returns_none_for_shell_and_powershell_variants() { + for prefix_rule in [ + vec!["bash".to_string(), "-lc".to_string()], + vec!["sh".to_string(), "-c".to_string()], + vec!["sh".to_string(), "-lc".to_string()], + vec!["zsh".to_string(), "-lc".to_string()], + vec!["/bin/bash".to_string(), "-lc".to_string()], + vec!["/bin/zsh".to_string(), "-lc".to_string()], + vec!["pwsh".to_string()], + vec!["pwsh".to_string(), "-Command".to_string()], + vec!["pwsh".to_string(), "-c".to_string()], + vec!["powershell".to_string()], + vec!["powershell".to_string(), "-Command".to_string()], + vec!["powershell".to_string(), "-c".to_string()], + vec!["powershell.exe".to_string()], + vec!["powershell.exe".to_string(), "-Command".to_string()], + vec!["powershell.exe".to_string(), "-c".to_string()], + ] { + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test(Some(&prefix_rule), &[]) + ); + } + } + + #[test] + fn derive_requested_execpolicy_amendment_allows_non_exact_banned_prefix_rule_match() { + let prefix_rule = vec![ + "python".to_string(), + "-c".to_string(), + "print('hi')".to_string(), + ]; + + assert_eq!( + Some(ExecPolicyAmendment::new(prefix_rule.clone())), + derive_requested_execpolicy_amendment_for_test(Some(&prefix_rule), &[]) + ); + } + + #[test] + fn derive_requested_execpolicy_amendment_returns_none_when_policy_matches() { + let prefix_rule = vec!["cargo".to_string(), "build".to_string()]; + + let matched_rules_prompt = vec![RuleMatch::PrefixRuleMatch { + matched_prefix: vec!["cargo".to_string()], + decision: Decision::Prompt, + resolved_program: None, + justification: None, + }]; + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test( + Some(&prefix_rule), + &matched_rules_prompt + ), + "should return none when prompt policy matches" + ); + let matched_rules_allow = vec![RuleMatch::PrefixRuleMatch { + matched_prefix: vec!["cargo".to_string()], + decision: Decision::Allow, + resolved_program: None, + justification: None, + }]; + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test( + Some(&prefix_rule), + &matched_rules_allow + ), + "should return none when prompt policy matches" + ); + let matched_rules_forbidden = vec![RuleMatch::PrefixRuleMatch { + matched_prefix: vec!["cargo".to_string()], + decision: Decision::Forbidden, + resolved_program: None, + justification: None, + }]; + assert_eq!( + None, + derive_requested_execpolicy_amendment_for_test( + Some(&prefix_rule), + &matched_rules_forbidden, + ), + "should return none when prompt policy matches" + ); + } + + #[tokio::test] + async fn dangerous_rm_rf_requires_approval_in_danger_full_access() { + let command = vec_str(&["rm", "-rf", "/tmp/nonexistent"]); + let manager = ExecPolicyManager::default(); + let requirement = manager + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::DangerFullAccess, + sandbox_permissions: SandboxPermissions::UseDefault, + prefix_rule: None, + }) + .await; + + assert_eq!( + requirement, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(command)), + } + ); + } + + fn vec_str(items: &[&str]) -> Vec { + items.iter().map(std::string::ToString::to_string).collect() + } + + /// Note this test behaves differently on Windows because it exercises an + /// `if cfg!(windows)` code path in render_decision_for_unmatched_command(). + #[tokio::test] + async fn verify_approval_requirement_for_unsafe_powershell_command() { + // `brew install powershell` to run this test on a Mac! + // Note `pwsh` is required to parse a PowerShell command to see if it + // is safe. + if which::which("pwsh").is_err() { + return; + } + + let policy = ExecPolicyManager::new(Arc::new(Policy::empty())); + let permissions = SandboxPermissions::UseDefault; + + // This command should not be run without user approval unless there is + // a proper sandbox in place to ensure safety. + let sneaky_command = vec_str(&["pwsh", "-Command", "echo hi @(calc)"]); + let expected_amendment = Some(ExecPolicyAmendment::new(vec_str(&[ + "pwsh", + "-Command", + "echo hi @(calc)", + ]))); + let (pwsh_approval_reason, expected_req) = if cfg!(windows) { + ( + r#"On Windows, SandboxPolicy::ReadOnly should be assumed to mean + that no sandbox is present, so anything that is not "provably + safe" should require approval."#, + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: expected_amendment.clone(), + }, + ) + } else { + ( + "On non-Windows, rely on the read-only sandbox to prevent harm.", + ExecApprovalRequirement::Skip { + bypass_sandbox: false, + proposed_execpolicy_amendment: expected_amendment.clone(), + }, + ) + }; + assert_eq!( + expected_req, + policy + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &sneaky_command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: permissions, + prefix_rule: None, + }) + .await, + "{pwsh_approval_reason}" + ); + + // This is flagged as a dangerous command on all platforms. + let dangerous_command = vec_str(&["rm", "-rf", "/important/data"]); + assert_eq!( + ExecApprovalRequirement::NeedsApproval { + reason: None, + proposed_execpolicy_amendment: Some(ExecPolicyAmendment::new(vec_str(&[ + "rm", + "-rf", + "/important/data", + ]))), + }, + policy + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &dangerous_command, + approval_policy: AskForApproval::OnRequest, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: permissions, + prefix_rule: None, + }) + .await, + r#"On all platforms, a forbidden command should require approval + (unless AskForApproval::Never is specified)."# + ); + + // A dangerous command should be forbidden if the user has specified + // AskForApproval::Never. + assert_eq!( + ExecApprovalRequirement::Forbidden { + reason: "`rm -rf /important/data` rejected: blocked by policy".to_string(), + }, + policy + .create_exec_approval_requirement_for_command(ExecApprovalRequest { + command: &dangerous_command, + approval_policy: AskForApproval::Never, + sandbox_policy: &SandboxPolicy::new_read_only_policy(), + sandbox_permissions: permissions, + prefix_rule: None, + }) + .await, + r#"On all platforms, a forbidden command should require approval + (unless AskForApproval::Never is specified)."# + ); + } +} diff --git a/codex-rs/core/src/instructions/contextual_user_fragments.rs b/codex-rs/core/src/instructions/contextual_user_fragments.rs new file mode 100644 index 0000000000..f38138b772 --- /dev/null +++ b/codex-rs/core/src/instructions/contextual_user_fragments.rs @@ -0,0 +1,248 @@ +//! Contextual-user model-visible fragments used by initial-context assembly. +//! +//! These fragments represent injected user-role context (for example AGENTS.md, +//! skills, and plugin guidance) and include turn-context extraction/diffing for +//! AGENTS.md instructions. + +use serde::Deserialize; +use serde::Serialize; + +use crate::codex::TurnContext; +use crate::model_visible_context::ContextualUserContextRole; +use crate::model_visible_context::ContextualUserFragmentDetector; +use crate::model_visible_context::ContextualUserFragmentMarkers; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::PLUGINS_CLOSE_TAG; +use crate::model_visible_context::PLUGINS_OPEN_TAG; +use crate::model_visible_context::SKILL_CLOSE_TAG; +use crate::model_visible_context::SKILL_OPEN_TAG; +use crate::model_visible_context::TaggedContextualUserFragment; +use crate::model_visible_context::TurnContextDiffFragment; +use crate::model_visible_context::TurnContextDiffParams; +use codex_protocol::protocol::TurnContextItem; + +// --------------------------------------------------------------------------- +// AGENTS instructions fragment +// --------------------------------------------------------------------------- + +const AGENTS_MD_START_MARKER: &str = "# AGENTS.md instructions for "; +const AGENTS_MD_END_MARKER: &str = ""; + +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] +#[serde(rename = "user_instructions", rename_all = "snake_case")] +pub(crate) struct AgentsMdInstructions { + pub directory: String, + pub text: String, +} + +impl ModelVisibleContextFragment for AgentsMdInstructions { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { + // TODO(ccunningham): Switch AGENTS.md rendering/detection to + // `...` for consistency with the + // other contextual-user fragments. + format!( + "{prefix}{directory}\n\n\n{contents}\n{suffix}", + prefix = AGENTS_MD_START_MARKER, + directory = self.directory, + contents = self.text, + suffix = AGENTS_MD_END_MARKER, + ) + } +} + +impl TurnContextDiffFragment for AgentsMdInstructions { + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + _params: &TurnContextDiffParams<'_>, + ) -> Option { + let text = turn_context.user_instructions.as_ref()?.clone(); + let current = Self { + directory: turn_context.cwd.to_string_lossy().into_owned(), + text, + }; + if let Some(previous) = reference_context_item { + let previous_directory = previous.cwd.to_string_lossy().into_owned(); + if previous_directory == current.directory + && previous.user_instructions.as_deref() == Some(current.text.as_str()) + { + return None; + } + } + + Some(current) + } +} + +impl ContextualUserFragmentDetector for AgentsMdInstructions { + fn matches_contextual_user_text(text: &str) -> bool { + let trimmed = text.trim_start(); + // TODO(ccunningham): Switch detection to the XML-ish wrapper once we + // intentionally change the shipped AGENTS.md fragment format. + trimmed.starts_with(AGENTS_MD_START_MARKER) + && trimmed.trim_end().ends_with(AGENTS_MD_END_MARKER) + } +} + +// --------------------------------------------------------------------------- +// Skills fragment +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] +#[serde(rename = "skill_instructions", rename_all = "snake_case")] +pub(crate) struct SkillInstructions { + pub name: String, + pub path: String, + pub contents: String, +} + +impl ModelVisibleContextFragment for SkillInstructions { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { + Self::wrap_contextual_user_body(format!( + "{}\n{}\n{}", + self.name, self.path, self.contents + )) + } +} + +impl TaggedContextualUserFragment for SkillInstructions { + const MARKERS: ContextualUserFragmentMarkers = + ContextualUserFragmentMarkers::new(SKILL_OPEN_TAG, SKILL_CLOSE_TAG); +} + +// --------------------------------------------------------------------------- +// Plugins fragment +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] +#[serde(rename = "plugin_instructions", rename_all = "snake_case")] +pub(crate) struct PluginInstructions { + pub text: String, +} + +impl ModelVisibleContextFragment for PluginInstructions { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { + Self::wrap_contextual_user_body(self.text.clone()) + } +} + +impl TaggedContextualUserFragment for PluginInstructions { + const MARKERS: ContextualUserFragmentMarkers = + ContextualUserFragmentMarkers::new(PLUGINS_OPEN_TAG, PLUGINS_CLOSE_TAG); +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +#[cfg(test)] +mod tests { + use super::*; + use codex_protocol::models::ContentItem; + use codex_protocol::models::ResponseItem; + use pretty_assertions::assert_eq; + + #[test] + fn test_user_instructions() { + let user_instructions = AgentsMdInstructions { + directory: "test_directory".to_string(), + text: "test_text".to_string(), + }; + let response_item = user_instructions.into_message(); + + let ResponseItem::Message { role, content, .. } = response_item else { + panic!("expected ResponseItem::Message"); + }; + + assert_eq!(role, "user"); + + let [ContentItem::InputText { text }] = content.as_slice() else { + panic!("expected one InputText content item"); + }; + + assert_eq!( + text, + "# AGENTS.md instructions for test_directory\n\n\ntest_text\n", + ); + } + + #[test] + fn test_is_user_instructions() { + assert!(crate::model_visible_context::is_contextual_user_fragment( + &ContentItem::InputText { + text: "# AGENTS.md instructions for test_directory\n\n\ntest_text\n" + .to_string(), + } + )); + assert!( + ::matches_contextual_user_text( + "# AGENTS.md instructions for test_directory\n\n\ntest_text\n" + ) + ); + } + + #[test] + fn test_skill_instructions() { + let skill_instructions = SkillInstructions { + name: "demo-skill".to_string(), + path: "skills/demo/SKILL.md".to_string(), + contents: "body".to_string(), + }; + let response_item = skill_instructions.into_message(); + + let ResponseItem::Message { role, content, .. } = response_item else { + panic!("expected ResponseItem::Message"); + }; + + assert_eq!(role, "user"); + + let [ContentItem::InputText { text }] = content.as_slice() else { + panic!("expected one InputText content item"); + }; + + assert_eq!( + text, + "\ndemo-skill\nskills/demo/SKILL.md\nbody\n", + ); + } + + #[test] + fn test_is_skill_instructions() { + assert!( + ::matches_contextual_user_text( + "\ndemo-skill\nskills/demo/SKILL.md\nbody\n" + ) + ); + assert!( + !::matches_contextual_user_text( + "regular text" + ) + ); + } + + #[test] + fn test_plugin_instructions() { + let plugin_instructions = PluginInstructions { + text: "## Plugins\n- `sample`".to_string(), + }; + let response_item = plugin_instructions.into_message(); + + let ResponseItem::Message { role, content, .. } = response_item else { + panic!("expected ResponseItem::Message"); + }; + + assert_eq!(role, "user"); + + let [ContentItem::InputText { text }] = content.as_slice() else { + panic!("expected one InputText content item"); + }; + + assert_eq!(text, "\n## Plugins\n- `sample`\n"); + } +} diff --git a/codex-rs/core/src/instructions/mod.rs b/codex-rs/core/src/instructions/mod.rs index 9f1d95d2f6..8c22f67e3b 100644 --- a/codex-rs/core/src/instructions/mod.rs +++ b/codex-rs/core/src/instructions/mod.rs @@ -1,5 +1,5 @@ -mod user_instructions; +mod contextual_user_fragments; -pub(crate) use user_instructions::SkillInstructions; -pub use user_instructions::USER_INSTRUCTIONS_PREFIX; -pub(crate) use user_instructions::UserInstructions; +pub(crate) use contextual_user_fragments::AgentsMdInstructions; +pub(crate) use contextual_user_fragments::PluginInstructions; +pub(crate) use contextual_user_fragments::SkillInstructions; diff --git a/codex-rs/core/src/instructions/user_instructions.rs b/codex-rs/core/src/instructions/user_instructions.rs deleted file mode 100644 index a0389c9ff8..0000000000 --- a/codex-rs/core/src/instructions/user_instructions.rs +++ /dev/null @@ -1,57 +0,0 @@ -use serde::Deserialize; -use serde::Serialize; - -use codex_protocol::models::ResponseItem; - -use crate::contextual_user_message::AGENTS_MD_FRAGMENT; -use crate::contextual_user_message::SKILL_FRAGMENT; - -pub const USER_INSTRUCTIONS_PREFIX: &str = "# AGENTS.md instructions for "; - -#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] -#[serde(rename = "user_instructions", rename_all = "snake_case")] -pub(crate) struct UserInstructions { - pub directory: String, - pub text: String, -} - -impl UserInstructions { - pub(crate) fn serialize_to_text(&self) -> String { - format!( - "{prefix}{directory}\n\n\n{contents}\n{suffix}", - prefix = AGENTS_MD_FRAGMENT.start_marker(), - directory = self.directory, - contents = self.text, - suffix = AGENTS_MD_FRAGMENT.end_marker(), - ) - } -} - -impl From for ResponseItem { - fn from(ui: UserInstructions) -> Self { - AGENTS_MD_FRAGMENT.into_message(ui.serialize_to_text()) - } -} - -#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] -#[serde(rename = "skill_instructions", rename_all = "snake_case")] -pub(crate) struct SkillInstructions { - pub name: String, - pub path: String, - pub contents: String, -} - -impl SkillInstructions {} - -impl From for ResponseItem { - fn from(si: SkillInstructions) -> Self { - SKILL_FRAGMENT.into_message(SKILL_FRAGMENT.wrap(format!( - "{}\n{}\n{}", - si.name, si.path, si.contents - ))) - } -} - -#[cfg(test)] -#[path = "user_instructions_tests.rs"] -mod tests; diff --git a/codex-rs/core/src/lib.rs b/codex-rs/core/src/lib.rs index a57e02ecb2..48cfbaddf6 100644 --- a/codex-rs/core/src/lib.rs +++ b/codex-rs/core/src/lib.rs @@ -29,7 +29,6 @@ pub mod config; pub mod config_loader; pub mod connectors; mod context_manager; -mod contextual_user_message; pub mod custom_prompts; pub mod env; mod environment_context; @@ -48,6 +47,7 @@ pub mod landlock; pub mod mcp; mod mcp_connection_manager; mod mcp_tool_approval_templates; +mod model_visible_context; pub mod models_manager; mod network_policy_decision; pub mod network_proxy_loader; diff --git a/codex-rs/core/src/model_visible_context.rs b/codex-rs/core/src/model_visible_context.rs new file mode 100644 index 0000000000..2080b90d38 --- /dev/null +++ b/codex-rs/core/src/model_visible_context.rs @@ -0,0 +1,397 @@ +//! Shared model-visible context abstractions. +//! +//! Use this path for any injected prompt context, whether it renders in the +//! developer envelope or the contextual-user envelope. +//! +//! Contextual-user fragments must provide stable markers so history parsing can +//! distinguish them from real user intent. Developer fragments do not need +//! markers because they are already separable by role. + +use crate::codex::PreviousTurnSettings; +use crate::codex::TurnContext; +use crate::shell::Shell; +use codex_execpolicy::Policy; +use codex_protocol::models::ContentItem; +use codex_protocol::models::CustomDeveloperInstructions; +use codex_protocol::models::MessageRole; +use codex_protocol::models::ResponseInputItem; +use codex_protocol::models::ResponseItem; +use codex_protocol::protocol::TurnContextItem; + +pub(crate) const SKILL_OPEN_TAG: &str = ""; +pub(crate) const SKILL_CLOSE_TAG: &str = ""; +pub(crate) const USER_SHELL_COMMAND_OPEN_TAG: &str = ""; +pub(crate) const USER_SHELL_COMMAND_CLOSE_TAG: &str = ""; +pub(crate) const TURN_ABORTED_OPEN_TAG: &str = ""; +pub(crate) const TURN_ABORTED_CLOSE_TAG: &str = ""; +pub(crate) const PLUGINS_OPEN_TAG: &str = ""; +pub(crate) const PLUGINS_CLOSE_TAG: &str = ""; +pub(crate) const SUBAGENTS_OPEN_TAG: &str = ""; +pub(crate) const SUBAGENTS_CLOSE_TAG: &str = ""; +pub(crate) const SUBAGENT_NOTIFICATION_OPEN_TAG: &str = ""; +pub(crate) const SUBAGENT_NOTIFICATION_CLOSE_TAG: &str = ""; + +pub(crate) trait ModelVisibleContextRole { + const MESSAGE_ROLE: MessageRole; +} + +pub(crate) struct DeveloperContextRole; + +impl ModelVisibleContextRole for DeveloperContextRole { + const MESSAGE_ROLE: MessageRole = MessageRole::Developer; +} + +pub(crate) struct ContextualUserContextRole; + +impl ModelVisibleContextRole for ContextualUserContextRole { + const MESSAGE_ROLE: MessageRole = MessageRole::User; +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +pub(crate) struct ContextualUserFragmentMarkers { + start_marker: &'static str, + end_marker: &'static str, +} + +pub(crate) trait ContextualUserFragmentDetector { + fn matches_contextual_user_text(text: &str) -> bool; +} + +pub(crate) trait TaggedContextualUserFragment { + const MARKERS: ContextualUserFragmentMarkers; + + fn wrap_contextual_user_body(body: String) -> String { + Self::MARKERS.wrap_body(body) + } +} + +impl ContextualUserFragmentDetector for T { + fn matches_contextual_user_text(text: &str) -> bool { + T::MARKERS.matches_text(text) + } +} + +impl ContextualUserFragmentMarkers { + pub(crate) const fn new(start_marker: &'static str, end_marker: &'static str) -> Self { + Self { + start_marker, + end_marker, + } + } + + pub(crate) fn matches_text(self, text: &str) -> bool { + let trimmed = text.trim_start(); + let starts_with_marker = trimmed + .get(..self.start_marker.len()) + .is_some_and(|candidate| candidate.eq_ignore_ascii_case(self.start_marker)); + let trimmed = trimmed.trim_end(); + let ends_with_marker = trimmed + .get(trimmed.len().saturating_sub(self.end_marker.len())..) + .is_some_and(|candidate| candidate.eq_ignore_ascii_case(self.end_marker)); + starts_with_marker && ends_with_marker + } + + pub(crate) fn wrap_body(self, body: String) -> String { + format!("{}\n{}\n{}", self.start_marker, body, self.end_marker) + } +} + +pub(crate) fn model_visible_content_item(text: String) -> ContentItem { + ContentItem::InputText { text } +} + +pub(crate) fn model_visible_message(text: String) -> ResponseItem { + ResponseItem::Message { + id: None, + role: R::MESSAGE_ROLE.to_string(), + content: vec![model_visible_content_item(text)], + end_turn: None, + phase: None, + } +} + +pub(crate) fn model_visible_response_input_item( + text: String, +) -> ResponseInputItem { + ResponseInputItem::Message { + role: R::MESSAGE_ROLE.to_string(), + content: vec![model_visible_content_item(text)], + } +} + +/// Implement this for any model-visible prompt fragment, regardless of which +/// envelope it renders into. +pub(crate) trait ModelVisibleContextFragment { + type Role: ModelVisibleContextRole; + + fn render_text(&self) -> String; + + fn into_content_item(self) -> ContentItem + where + Self: Sized, + { + model_visible_content_item(self.render_text()) + } + + fn into_message(self) -> ResponseItem + where + Self: Sized, + { + model_visible_message::(self.render_text()) + } + + fn into_response_input_item(self) -> ResponseInputItem + where + Self: Sized, + { + model_visible_response_input_item::(self.render_text()) + } +} + +pub(crate) struct DeveloperTextFragment { + text: String, +} + +impl DeveloperTextFragment { + pub(crate) fn new(text: impl Into) -> Self { + Self { text: text.into() } + } +} + +pub(crate) struct ContextualUserTextFragment { + text: String, +} + +impl ContextualUserTextFragment { + pub(crate) fn new(text: impl Into) -> Self { + Self { text: text.into() } + } +} + +type ContextualUserTurnStateBuilder = fn( + Option<&TurnContextItem>, + &TurnContext, + &TurnContextDiffParams<'_>, +) -> Option; + +#[derive(Clone, Copy)] +struct ContextualUserFragmentRegistration { + detect: fn(&str) -> bool, + turn_state_builder: Option, +} + +impl ContextualUserFragmentRegistration { + const fn new( + detect: fn(&str) -> bool, + turn_state_builder: Option, + ) -> Self { + Self { + detect, + turn_state_builder, + } + } +} + +pub(crate) struct TurnContextDiffParams<'a> { + pub(crate) shell: &'a Shell, + pub(crate) previous_turn_settings: Option<&'a PreviousTurnSettings>, + pub(crate) exec_policy: &'a Policy, + pub(crate) personality_feature_enabled: bool, + pub(crate) base_instructions: Option<&'a str>, +} + +impl<'a> TurnContextDiffParams<'a> { + pub(crate) fn new( + shell: &'a Shell, + previous_turn_settings: Option<&'a PreviousTurnSettings>, + exec_policy: &'a Policy, + personality_feature_enabled: bool, + base_instructions: Option<&'a str>, + ) -> Self { + Self { + shell, + previous_turn_settings, + exec_policy, + personality_feature_enabled, + base_instructions, + } + } +} + +/// Implement this for fragments that are built from current/persisted turn +/// state rather than one-off runtime events. +pub(crate) trait TurnContextDiffFragment: ModelVisibleContextFragment + Sized { + /// Build the fragment from the current turn state and an optional baseline + /// context item. + /// + /// `reference_context_item` is the last persisted turn-context snapshot whose + /// effects are already represented in model-visible history. Implementations + /// should diff `turn_context` against this baseline and return `None` when + /// there is no model-visible change to inject. + /// + /// `reference_context_item` is `None` for initial-context assembly and when + /// no baseline turn context can be recovered (for example after + /// compaction/backtracking/resume), so implementations should treat that as + /// "no known represented baseline" and decide whether to emit full current + /// state or nothing. + fn build( + turn_context: &TurnContext, + reference_context_item: Option<&TurnContextItem>, + params: &TurnContextDiffParams<'_>, + ) -> Option; +} + +fn detect_contextual_user_fragment(text: &str) -> bool { + F::matches_contextual_user_text(text) +} + +fn build_contextual_user_turn_state_fragment( + reference_context_item: Option<&TurnContextItem>, + turn_context: &TurnContext, + params: &TurnContextDiffParams<'_>, +) -> Option +where + F: TurnContextDiffFragment, +{ + let fragment = F::build(turn_context, reference_context_item, params)?; + Some(ContextualUserTextFragment::new(fragment.render_text())) +} + +/// Canonical contextual-user fragment registry. +/// +/// Add new contextual-user fragment types here so injected context is not +/// mistaken for real user intent during event mapping/truncation, and to wire +/// turn-state contextual-user diff fragments in one place. +const REGISTERED_CONTEXTUAL_USER_FRAGMENTS: &[ContextualUserFragmentRegistration] = &[ + ContextualUserFragmentRegistration::new( + detect_contextual_user_fragment::, + Some( + build_contextual_user_turn_state_fragment::, + ), + ), + ContextualUserFragmentRegistration::new( + detect_contextual_user_fragment::, + Some( + build_contextual_user_turn_state_fragment::< + crate::environment_context::EnvironmentContext, + >, + ), + ), + ContextualUserFragmentRegistration::new( + detect_contextual_user_fragment::, + None, + ), + ContextualUserFragmentRegistration::new( + detect_contextual_user_fragment::, + None, + ), + ContextualUserFragmentRegistration::new( + detect_contextual_user_fragment::, + None, + ), + ContextualUserFragmentRegistration::new( + detect_contextual_user_fragment::, + None, + ), +]; + +fn is_legacy_contextual_user_fragment(text: &str) -> bool { + // TODO(ccunningham): Drop this once old user-role subagent notification + // history no longer needs resume/compaction compatibility. + ContextualUserFragmentMarkers::new( + SUBAGENT_NOTIFICATION_OPEN_TAG, + SUBAGENT_NOTIFICATION_CLOSE_TAG, + ) + .matches_text(text) +} + +pub(crate) fn is_contextual_user_fragment(content_item: &ContentItem) -> bool { + let ContentItem::InputText { text } = content_item else { + return false; + }; + REGISTERED_CONTEXTUAL_USER_FRAGMENTS + .iter() + .any(|registration| (registration.detect)(text)) + || is_legacy_contextual_user_fragment(text) +} + +pub(crate) fn build_contextual_user_turn_state_fragments( + reference_context_item: Option<&TurnContextItem>, + turn_context: &TurnContext, + params: &TurnContextDiffParams<'_>, +) -> Vec { + REGISTERED_CONTEXTUAL_USER_FRAGMENTS + .iter() + .filter_map(|registration| { + registration + .turn_state_builder + .and_then(|build| build(reference_context_item, turn_context, params)) + }) + .collect() +} + +impl ModelVisibleContextFragment for CustomDeveloperInstructions { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.clone().into_text() + } +} + +impl ModelVisibleContextFragment for DeveloperTextFragment { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +impl ModelVisibleContextFragment for ContextualUserTextFragment { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { + self.text.clone() + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn detects_environment_context_fragment() { + assert!(is_contextual_user_fragment(&ContentItem::InputText { + text: "\n/tmp\n".to_string(), + })); + } + + #[test] + fn detects_agents_instructions_fragment() { + assert!(is_contextual_user_fragment(&ContentItem::InputText { + text: "# AGENTS.md instructions for /tmp\n\n\nbody\n" + .to_string(), + })); + } + + #[test] + fn detects_legacy_subagent_notification_fragment() { + assert!(is_contextual_user_fragment(&ContentItem::InputText { + text: "\n{\"agent_id\":\"a\",\"status\":\"completed\"}\n" + .to_string(), + })); + } + + #[test] + fn ignores_regular_user_text() { + assert!(!is_contextual_user_fragment(&ContentItem::InputText { + text: "hello".to_string(), + })); + } + + #[test] + fn marker_matching_ignores_plain_text() { + assert!( + !::matches_contextual_user_text("plain text") + ); + } +} diff --git a/codex-rs/core/src/plugins/injection.rs b/codex-rs/core/src/plugins/injection.rs index 00b15426fe..992ea6802f 100644 --- a/codex-rs/core/src/plugins/injection.rs +++ b/codex-rs/core/src/plugins/injection.rs @@ -1,7 +1,8 @@ use std::collections::BTreeSet; use std::collections::HashMap; -use codex_protocol::models::DeveloperInstructions; +use crate::model_visible_context::DeveloperTextFragment; +use crate::model_visible_context::ModelVisibleContextFragment; use codex_protocol::models::ResponseItem; use crate::connectors; @@ -51,8 +52,8 @@ pub(crate) fn build_plugin_injections( .into_iter() .collect::>(); render_explicit_plugin_instructions(plugin, &available_mcp_servers, &available_apps) - .map(DeveloperInstructions::new) - .map(ResponseItem::from) + .map(DeveloperTextFragment::new) + .map(ModelVisibleContextFragment::into_message) }) .collect() } diff --git a/codex-rs/core/src/plugins/mod.rs b/codex-rs/core/src/plugins/mod.rs index 7eb76798ad..720f64951c 100644 --- a/codex-rs/core/src/plugins/mod.rs +++ b/codex-rs/core/src/plugins/mod.rs @@ -43,6 +43,6 @@ pub use marketplace::MarketplacePluginAuthPolicy; pub use marketplace::MarketplacePluginInstallPolicy; pub use marketplace::MarketplacePluginSourceSummary; pub(crate) use render::render_explicit_plugin_instructions; -pub(crate) use render::render_plugins_section; +pub(crate) use render::render_plugin_instructions; pub use store::PluginId; pub use toggles::collect_plugin_enabled_candidates; diff --git a/codex-rs/core/src/plugins/render.rs b/codex-rs/core/src/plugins/render.rs index 136f256e18..583222bc9d 100644 --- a/codex-rs/core/src/plugins/render.rs +++ b/codex-rs/core/src/plugins/render.rs @@ -1,6 +1,9 @@ +use crate::instructions::PluginInstructions; use crate::plugins::PluginCapabilitySummary; -pub(crate) fn render_plugins_section(plugins: &[PluginCapabilitySummary]) -> Option { +pub(crate) fn render_plugin_instructions( + plugins: &[PluginCapabilitySummary], +) -> Option { if plugins.is_empty() { return None; } @@ -31,7 +34,9 @@ pub(crate) fn render_plugins_section(plugins: &[PluginCapabilitySummary]) -> Opt .to_string(), ); - Some(lines.join("\n")) + Some(PluginInstructions { + text: lines.join("\n"), + }) } pub(crate) fn render_explicit_plugin_instructions( @@ -83,5 +88,12 @@ pub(crate) fn render_explicit_plugin_instructions( } #[cfg(test)] -#[path = "render_tests.rs"] -mod tests; +mod tests { + use super::*; + use pretty_assertions::assert_eq; + + #[test] + fn render_plugin_instructions_returns_none_for_empty_plugins() { + assert_eq!(render_plugin_instructions(&[]), None); + } +} diff --git a/codex-rs/core/src/project_doc.rs b/codex-rs/core/src/project_doc.rs index 0aa94c836f..6b7d493b4c 100644 --- a/codex-rs/core/src/project_doc.rs +++ b/codex-rs/core/src/project_doc.rs @@ -21,8 +21,6 @@ use crate::config_loader::default_project_root_markers; use crate::config_loader::merge_toml_values; use crate::config_loader::project_root_markers_from_config; use crate::features::Feature; -use crate::plugins::PluginCapabilitySummary; -use crate::plugins::render_plugins_section; use crate::skills::SkillMetadata; use crate::skills::render_skills_section; use codex_app_server_protocol::ConfigLayerSource; @@ -83,7 +81,6 @@ fn render_js_repl_instructions(config: &Config) -> Option { pub(crate) async fn get_user_instructions( config: &Config, skills: Option<&[SkillMetadata]>, - plugins: Option<&[PluginCapabilitySummary]>, ) -> Option { let project_docs = read_project_docs(config).await; @@ -113,13 +110,6 @@ pub(crate) async fn get_user_instructions( output.push_str(&js_repl_section); } - if let Some(plugin_section) = plugins.and_then(render_plugins_section) { - if !output.is_empty() { - output.push_str("\n\n"); - } - output.push_str(&plugin_section); - } - let skills_section = skills.and_then(render_skills_section); if let Some(skills_section) = skills_section { if !output.is_empty() { @@ -311,5 +301,473 @@ fn candidate_filenames<'a>(config: &'a Config) -> Vec<&'a str> { } #[cfg(test)] -#[path = "project_doc_tests.rs"] -mod tests; +mod tests { + use super::*; + use crate::config::ConfigBuilder; + use crate::features::Feature; + use crate::skills::loader::SkillRoot; + use crate::skills::loader::load_skills_from_roots; + use codex_protocol::protocol::SkillScope; + use std::fs; + use std::path::PathBuf; + use tempfile::TempDir; + + /// Helper that returns a `Config` pointing at `root` and using `limit` as + /// the maximum number of bytes to embed from AGENTS.md. The caller can + /// optionally specify a custom `instructions` string – when `None` the + /// value is cleared to mimic a scenario where no system instructions have + /// been configured. + async fn make_config(root: &TempDir, limit: usize, instructions: Option<&str>) -> Config { + let codex_home = TempDir::new().unwrap(); + let mut config = ConfigBuilder::default() + .codex_home(codex_home.path().to_path_buf()) + .build() + .await + .expect("defaults for test should always succeed"); + + config.cwd = root.path().to_path_buf(); + config.project_doc_max_bytes = limit; + + config.user_instructions = instructions.map(ToOwned::to_owned); + config + } + + async fn make_config_with_fallback( + root: &TempDir, + limit: usize, + instructions: Option<&str>, + fallbacks: &[&str], + ) -> Config { + let mut config = make_config(root, limit, instructions).await; + config.project_doc_fallback_filenames = fallbacks + .iter() + .map(std::string::ToString::to_string) + .collect(); + config + } + + async fn make_config_with_project_root_markers( + root: &TempDir, + limit: usize, + instructions: Option<&str>, + markers: &[&str], + ) -> Config { + let codex_home = TempDir::new().unwrap(); + let cli_overrides = vec![( + "project_root_markers".to_string(), + TomlValue::Array( + markers + .iter() + .map(|marker| TomlValue::String((*marker).to_string())) + .collect(), + ), + )]; + let mut config = ConfigBuilder::default() + .codex_home(codex_home.path().to_path_buf()) + .cli_overrides(cli_overrides) + .build() + .await + .expect("defaults for test should always succeed"); + + config.cwd = root.path().to_path_buf(); + config.project_doc_max_bytes = limit; + config.user_instructions = instructions.map(ToOwned::to_owned); + config + } + + fn load_test_skills(config: &Config) -> crate::skills::SkillLoadOutcome { + load_skills_from_roots([SkillRoot { + path: config.codex_home.join("skills"), + scope: SkillScope::User, + }]) + } + + /// AGENTS.md missing – should yield `None`. + #[tokio::test] + async fn no_doc_file_returns_none() { + let tmp = tempfile::tempdir().expect("tempdir"); + + let res = get_user_instructions(&make_config(&tmp, 4096, None).await, None).await; + assert!( + res.is_none(), + "Expected None when AGENTS.md is absent and no system instructions provided" + ); + assert!(res.is_none(), "Expected None when AGENTS.md is absent"); + } + + /// Small file within the byte-limit is returned unmodified. + #[tokio::test] + async fn doc_smaller_than_limit_is_returned() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join("AGENTS.md"), "hello world").unwrap(); + + let res = get_user_instructions(&make_config(&tmp, 4096, None).await, None) + .await + .expect("doc expected"); + + assert_eq!( + res, "hello world", + "The document should be returned verbatim when it is smaller than the limit and there are no existing instructions" + ); + } + + /// Oversize file is truncated to `project_doc_max_bytes`. + #[tokio::test] + async fn doc_larger_than_limit_is_truncated() { + const LIMIT: usize = 1024; + let tmp = tempfile::tempdir().expect("tempdir"); + + let huge = "A".repeat(LIMIT * 2); // 2 KiB + fs::write(tmp.path().join("AGENTS.md"), &huge).unwrap(); + + let res = get_user_instructions(&make_config(&tmp, LIMIT, None).await, None) + .await + .expect("doc expected"); + + assert_eq!(res.len(), LIMIT, "doc should be truncated to LIMIT bytes"); + assert_eq!(res, huge[..LIMIT]); + } + + /// When `cwd` is nested inside a repo, the search should locate AGENTS.md + /// placed at the repository root (identified by `.git`). + #[tokio::test] + async fn finds_doc_in_repo_root() { + let repo = tempfile::tempdir().expect("tempdir"); + + // Simulate a git repository. Note .git can be a file or a directory. + std::fs::write( + repo.path().join(".git"), + "gitdir: /path/to/actual/git/dir\n", + ) + .unwrap(); + + // Put the doc at the repo root. + fs::write(repo.path().join("AGENTS.md"), "root level doc").unwrap(); + + // Now create a nested working directory: repo/workspace/crate_a + let nested = repo.path().join("workspace/crate_a"); + std::fs::create_dir_all(&nested).unwrap(); + + // Build config pointing at the nested dir. + let mut cfg = make_config(&repo, 4096, None).await; + cfg.cwd = nested; + + let res = get_user_instructions(&cfg, None) + .await + .expect("doc expected"); + assert_eq!(res, "root level doc"); + } + + /// Explicitly setting the byte-limit to zero disables project docs. + #[tokio::test] + async fn zero_byte_limit_disables_docs() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join("AGENTS.md"), "something").unwrap(); + + let res = get_user_instructions(&make_config(&tmp, 0, None).await, None).await; + assert!( + res.is_none(), + "With limit 0 the function should return None" + ); + } + + #[tokio::test] + async fn js_repl_instructions_are_appended_when_enabled() { + let tmp = tempfile::tempdir().expect("tempdir"); + let mut cfg = make_config(&tmp, 4096, None).await; + cfg.features + .enable(Feature::JsRepl) + .expect("test config should allow js_repl"); + + let res = get_user_instructions(&cfg, None) + .await + .expect("js_repl instructions expected"); + let expected = "## JavaScript REPL (Node)\n- Use `js_repl` for Node-backed JavaScript with top-level await in a persistent kernel.\n- `js_repl` is a freeform/custom tool. Direct `js_repl` calls must send raw JavaScript tool input (optionally with first-line `// codex-js-repl: timeout_ms=15000`). Do not wrap code in JSON (for example `{\"code\":\"...\"}`), quotes, or markdown code fences.\n- Helpers: `codex.cwd`, `codex.homeDir`, `codex.tmpDir`, `codex.tool(name, args?)`, and `codex.emitImage(imageLike)`.\n- `codex.tool` executes a normal tool call and resolves to the raw tool output object. Use it for shell and non-shell tools alike. Nested tool outputs stay inside JavaScript unless you emit them explicitly.\n- `codex.emitImage(...)` adds one image to the outer `js_repl` function output each time you call it, so you can call it multiple times to emit multiple images. It accepts a data URL, a single `input_image` item, an object like `{ bytes, mimeType }`, or a raw tool response object with exactly one image and no text. It rejects mixed text-and-image content.\n- Request full-resolution image processing with `detail: \"original\"` only when the `view_image` tool schema includes a `detail` argument. The same availability applies to `codex.emitImage(...)`: if `view_image.detail` is present, you may also pass `detail: \"original\"` there. Use this when high-fidelity image perception or precise localization is needed, especially for CUA agents.\n- Example of sharing an in-memory Playwright screenshot: `await codex.emitImage({ bytes: await page.screenshot({ type: \"jpeg\", quality: 85 }), mimeType: \"image/jpeg\", detail: \"original\" })`.\n- Example of sharing a local image tool result: `await codex.emitImage(codex.tool(\"view_image\", { path: \"/absolute/path\", detail: \"original\" }))`.\n- When encoding an image to send with `codex.emitImage(...)` or `view_image`, prefer JPEG at about 85 quality when lossy compression is acceptable; use PNG when transparency or lossless detail matters. Smaller uploads are faster and less likely to hit size limits.\n- Top-level bindings persist across cells. If a cell throws, prior bindings remain available and bindings that finished initializing before the throw often remain usable in later cells. For code you plan to reuse across cells, prefer declaring or assigning it in direct top-level statements before operations that might throw. If you hit `SyntaxError: Identifier 'x' has already been declared`, first reuse the existing binding, reassign a previously declared `let`, or pick a new descriptive name. Use `{ ... }` only for a short temporary block when you specifically need local scratch names; do not wrap an entire cell in block scope if you want those names reusable later. Reset the kernel with `js_repl_reset` only when you need a clean state.\n- Top-level static import declarations (for example `import x from \"./file.js\"`) are currently unsupported in `js_repl`; use dynamic imports with `await import(\"pkg\")`, `await import(\"./file.js\")`, or `await import(\"/abs/path/file.mjs\")` instead. Imported local files must be ESM `.js`/`.mjs` files and run in the same REPL VM context. Bare package imports always resolve from REPL-global search roots (`CODEX_JS_REPL_NODE_MODULE_DIRS`, then cwd), not relative to the imported file location. Local files may statically import only other local relative/absolute/`file://` `.js`/`.mjs` files; package and builtin imports from local files must stay dynamic. `import.meta.resolve()` returns importable strings such as `file://...`, bare package names, and `node:...` specifiers. Local file modules reload between execs, while top-level bindings persist until `js_repl_reset`.\n- Avoid direct access to `process.stdout` / `process.stderr` / `process.stdin`; it can corrupt the JSON line protocol. Use `console.log`, `codex.tool(...)`, and `codex.emitImage(...)`."; + assert_eq!(res, expected); + } + + #[tokio::test] + async fn js_repl_tools_only_instructions_are_feature_gated() { + let tmp = tempfile::tempdir().expect("tempdir"); + let mut cfg = make_config(&tmp, 4096, None).await; + let mut features = cfg.features.get().clone(); + features + .enable(Feature::JsRepl) + .enable(Feature::JsReplToolsOnly); + cfg.features + .set(features) + .expect("test config should allow js_repl tool restrictions"); + + let res = get_user_instructions(&cfg, None) + .await + .expect("js_repl instructions expected"); + let expected = "## JavaScript REPL (Node)\n- Use `js_repl` for Node-backed JavaScript with top-level await in a persistent kernel.\n- `js_repl` is a freeform/custom tool. Direct `js_repl` calls must send raw JavaScript tool input (optionally with first-line `// codex-js-repl: timeout_ms=15000`). Do not wrap code in JSON (for example `{\"code\":\"...\"}`), quotes, or markdown code fences.\n- Helpers: `codex.cwd`, `codex.homeDir`, `codex.tmpDir`, `codex.tool(name, args?)`, and `codex.emitImage(imageLike)`.\n- `codex.tool` executes a normal tool call and resolves to the raw tool output object. Use it for shell and non-shell tools alike. Nested tool outputs stay inside JavaScript unless you emit them explicitly.\n- `codex.emitImage(...)` adds one image to the outer `js_repl` function output each time you call it, so you can call it multiple times to emit multiple images. It accepts a data URL, a single `input_image` item, an object like `{ bytes, mimeType }`, or a raw tool response object with exactly one image and no text. It rejects mixed text-and-image content.\n- Request full-resolution image processing with `detail: \"original\"` only when the `view_image` tool schema includes a `detail` argument. The same availability applies to `codex.emitImage(...)`: if `view_image.detail` is present, you may also pass `detail: \"original\"` there. Use this when high-fidelity image perception or precise localization is needed, especially for CUA agents.\n- Example of sharing an in-memory Playwright screenshot: `await codex.emitImage({ bytes: await page.screenshot({ type: \"jpeg\", quality: 85 }), mimeType: \"image/jpeg\", detail: \"original\" })`.\n- Example of sharing a local image tool result: `await codex.emitImage(codex.tool(\"view_image\", { path: \"/absolute/path\", detail: \"original\" }))`.\n- When encoding an image to send with `codex.emitImage(...)` or `view_image`, prefer JPEG at about 85 quality when lossy compression is acceptable; use PNG when transparency or lossless detail matters. Smaller uploads are faster and less likely to hit size limits.\n- Top-level bindings persist across cells. If a cell throws, prior bindings remain available and bindings that finished initializing before the throw often remain usable in later cells. For code you plan to reuse across cells, prefer declaring or assigning it in direct top-level statements before operations that might throw. If you hit `SyntaxError: Identifier 'x' has already been declared`, first reuse the existing binding, reassign a previously declared `let`, or pick a new descriptive name. Use `{ ... }` only for a short temporary block when you specifically need local scratch names; do not wrap an entire cell in block scope if you want those names reusable later. Reset the kernel with `js_repl_reset` only when you need a clean state.\n- Top-level static import declarations (for example `import x from \"./file.js\"`) are currently unsupported in `js_repl`; use dynamic imports with `await import(\"pkg\")`, `await import(\"./file.js\")`, or `await import(\"/abs/path/file.mjs\")` instead. Imported local files must be ESM `.js`/`.mjs` files and run in the same REPL VM context. Bare package imports always resolve from REPL-global search roots (`CODEX_JS_REPL_NODE_MODULE_DIRS`, then cwd), not relative to the imported file location. Local files may statically import only other local relative/absolute/`file://` `.js`/`.mjs` files; package and builtin imports from local files must stay dynamic. `import.meta.resolve()` returns importable strings such as `file://...`, bare package names, and `node:...` specifiers. Local file modules reload between execs, while top-level bindings persist until `js_repl_reset`.\n- Do not call tools directly; use `js_repl` + `codex.tool(...)` for all tool calls, including shell commands.\n- MCP tools (if any) can also be called by name via `codex.tool(...)`.\n- Avoid direct access to `process.stdout` / `process.stderr` / `process.stdin`; it can corrupt the JSON line protocol. Use `console.log`, `codex.tool(...)`, and `codex.emitImage(...)`."; + assert_eq!(res, expected); + } + + #[tokio::test] + async fn js_repl_image_detail_original_does_not_change_instructions() { + let tmp = tempfile::tempdir().expect("tempdir"); + let mut cfg = make_config(&tmp, 4096, None).await; + let mut features = cfg.features.get().clone(); + features + .enable(Feature::JsRepl) + .enable(Feature::ImageDetailOriginal); + cfg.features + .set(features) + .expect("test config should allow js_repl image detail settings"); + + let res = get_user_instructions(&cfg, None) + .await + .expect("js_repl instructions expected"); + let expected = "## JavaScript REPL (Node)\n- Use `js_repl` for Node-backed JavaScript with top-level await in a persistent kernel.\n- `js_repl` is a freeform/custom tool. Direct `js_repl` calls must send raw JavaScript tool input (optionally with first-line `// codex-js-repl: timeout_ms=15000`). Do not wrap code in JSON (for example `{\"code\":\"...\"}`), quotes, or markdown code fences.\n- Helpers: `codex.cwd`, `codex.homeDir`, `codex.tmpDir`, `codex.tool(name, args?)`, and `codex.emitImage(imageLike)`.\n- `codex.tool` executes a normal tool call and resolves to the raw tool output object. Use it for shell and non-shell tools alike. Nested tool outputs stay inside JavaScript unless you emit them explicitly.\n- `codex.emitImage(...)` adds one image to the outer `js_repl` function output each time you call it, so you can call it multiple times to emit multiple images. It accepts a data URL, a single `input_image` item, an object like `{ bytes, mimeType }`, or a raw tool response object with exactly one image and no text. It rejects mixed text-and-image content.\n- Request full-resolution image processing with `detail: \"original\"` only when the `view_image` tool schema includes a `detail` argument. The same availability applies to `codex.emitImage(...)`: if `view_image.detail` is present, you may also pass `detail: \"original\"` there. Use this when high-fidelity image perception or precise localization is needed, especially for CUA agents.\n- Example of sharing an in-memory Playwright screenshot: `await codex.emitImage({ bytes: await page.screenshot({ type: \"jpeg\", quality: 85 }), mimeType: \"image/jpeg\", detail: \"original\" })`.\n- Example of sharing a local image tool result: `await codex.emitImage(codex.tool(\"view_image\", { path: \"/absolute/path\", detail: \"original\" }))`.\n- When encoding an image to send with `codex.emitImage(...)` or `view_image`, prefer JPEG at about 85 quality when lossy compression is acceptable; use PNG when transparency or lossless detail matters. Smaller uploads are faster and less likely to hit size limits.\n- Top-level bindings persist across cells. If a cell throws, prior bindings remain available and bindings that finished initializing before the throw often remain usable in later cells. For code you plan to reuse across cells, prefer declaring or assigning it in direct top-level statements before operations that might throw. If you hit `SyntaxError: Identifier 'x' has already been declared`, first reuse the existing binding, reassign a previously declared `let`, or pick a new descriptive name. Use `{ ... }` only for a short temporary block when you specifically need local scratch names; do not wrap an entire cell in block scope if you want those names reusable later. Reset the kernel with `js_repl_reset` only when you need a clean state.\n- Top-level static import declarations (for example `import x from \"./file.js\"`) are currently unsupported in `js_repl`; use dynamic imports with `await import(\"pkg\")`, `await import(\"./file.js\")`, or `await import(\"/abs/path/file.mjs\")` instead. Imported local files must be ESM `.js`/`.mjs` files and run in the same REPL VM context. Bare package imports always resolve from REPL-global search roots (`CODEX_JS_REPL_NODE_MODULE_DIRS`, then cwd), not relative to the imported file location. Local files may statically import only other local relative/absolute/`file://` `.js`/`.mjs` files; package and builtin imports from local files must stay dynamic. `import.meta.resolve()` returns importable strings such as `file://...`, bare package names, and `node:...` specifiers. Local file modules reload between execs, while top-level bindings persist until `js_repl_reset`.\n- Avoid direct access to `process.stdout` / `process.stderr` / `process.stdin`; it can corrupt the JSON line protocol. Use `console.log`, `codex.tool(...)`, and `codex.emitImage(...)`."; + assert_eq!(res, expected); + } + + /// When both system instructions *and* a project doc are present the two + /// should be concatenated with the separator. + #[tokio::test] + async fn merges_existing_instructions_with_project_doc() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join("AGENTS.md"), "proj doc").unwrap(); + + const INSTRUCTIONS: &str = "base instructions"; + + let res = get_user_instructions(&make_config(&tmp, 4096, Some(INSTRUCTIONS)).await, None) + .await + .expect("should produce a combined instruction string"); + + let expected = format!("{INSTRUCTIONS}{PROJECT_DOC_SEPARATOR}{}", "proj doc"); + + assert_eq!(res, expected); + } + + /// If there are existing system instructions but the project doc is + /// missing we expect the original instructions to be returned unchanged. + #[tokio::test] + async fn keeps_existing_instructions_when_doc_missing() { + let tmp = tempfile::tempdir().expect("tempdir"); + + const INSTRUCTIONS: &str = "some instructions"; + + let res = + get_user_instructions(&make_config(&tmp, 4096, Some(INSTRUCTIONS)).await, None).await; + + assert_eq!(res, Some(INSTRUCTIONS.to_string())); + } + + /// When both the repository root and the working directory contain + /// AGENTS.md files, their contents are concatenated from root to cwd. + #[tokio::test] + async fn concatenates_root_and_cwd_docs() { + let repo = tempfile::tempdir().expect("tempdir"); + + // Simulate a git repository. + std::fs::write( + repo.path().join(".git"), + "gitdir: /path/to/actual/git/dir\n", + ) + .unwrap(); + + // Repo root doc. + fs::write(repo.path().join("AGENTS.md"), "root doc").unwrap(); + + // Nested working directory with its own doc. + let nested = repo.path().join("workspace/crate_a"); + std::fs::create_dir_all(&nested).unwrap(); + fs::write(nested.join("AGENTS.md"), "crate doc").unwrap(); + + let mut cfg = make_config(&repo, 4096, None).await; + cfg.cwd = nested; + + let res = get_user_instructions(&cfg, None) + .await + .expect("doc expected"); + assert_eq!(res, "root doc\n\ncrate doc"); + } + + #[tokio::test] + async fn project_root_markers_are_honored_for_agents_discovery() { + let root = tempfile::tempdir().expect("tempdir"); + fs::write(root.path().join(".codex-root"), "").unwrap(); + fs::write(root.path().join("AGENTS.md"), "parent doc").unwrap(); + + let nested = root.path().join("dir1"); + fs::create_dir_all(nested.join(".git")).unwrap(); + fs::write(nested.join("AGENTS.md"), "child doc").unwrap(); + + let mut cfg = + make_config_with_project_root_markers(&root, 4096, None, &[".codex-root"]).await; + cfg.cwd = nested; + + let discovery = discover_project_doc_paths(&cfg).expect("discover paths"); + let expected_parent = + dunce::canonicalize(root.path().join("AGENTS.md")).expect("canonical parent doc path"); + let expected_child = + dunce::canonicalize(cfg.cwd.join("AGENTS.md")).expect("canonical child doc path"); + assert_eq!(discovery.len(), 2); + assert_eq!(discovery[0], expected_parent); + assert_eq!(discovery[1], expected_child); + + let res = get_user_instructions(&cfg, None) + .await + .expect("doc expected"); + assert_eq!(res, "parent doc\n\nchild doc"); + } + + /// AGENTS.override.md is preferred over AGENTS.md when both are present. + #[tokio::test] + async fn agents_local_md_preferred() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join(DEFAULT_PROJECT_DOC_FILENAME), "versioned").unwrap(); + fs::write(tmp.path().join(LOCAL_PROJECT_DOC_FILENAME), "local").unwrap(); + + let cfg = make_config(&tmp, 4096, None).await; + + let res = get_user_instructions(&cfg, None) + .await + .expect("local doc expected"); + + assert_eq!(res, "local"); + + let discovery = discover_project_doc_paths(&cfg).expect("discover paths"); + assert_eq!(discovery.len(), 1); + assert_eq!( + discovery[0].file_name().unwrap().to_string_lossy(), + LOCAL_PROJECT_DOC_FILENAME + ); + } + + /// When AGENTS.md is absent but a configured fallback exists, the fallback is used. + #[tokio::test] + async fn uses_configured_fallback_when_agents_missing() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join("EXAMPLE.md"), "example instructions").unwrap(); + + let cfg = make_config_with_fallback(&tmp, 4096, None, &["EXAMPLE.md"]).await; + + let res = get_user_instructions(&cfg, None) + .await + .expect("fallback doc expected"); + + assert_eq!(res, "example instructions"); + } + + /// AGENTS.md remains preferred when both AGENTS.md and fallbacks are present. + #[tokio::test] + async fn agents_md_preferred_over_fallbacks() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join("AGENTS.md"), "primary").unwrap(); + fs::write(tmp.path().join("EXAMPLE.md"), "secondary").unwrap(); + + let cfg = make_config_with_fallback(&tmp, 4096, None, &["EXAMPLE.md", ".example.md"]).await; + + let res = get_user_instructions(&cfg, None) + .await + .expect("AGENTS.md should win"); + + assert_eq!(res, "primary"); + + let discovery = discover_project_doc_paths(&cfg).expect("discover paths"); + assert_eq!(discovery.len(), 1); + assert!( + discovery[0] + .file_name() + .unwrap() + .to_string_lossy() + .eq(DEFAULT_PROJECT_DOC_FILENAME) + ); + } + + #[tokio::test] + async fn skills_are_appended_to_project_doc() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join("AGENTS.md"), "base doc").unwrap(); + + let cfg = make_config(&tmp, 4096, None).await; + create_skill( + cfg.codex_home.clone(), + "pdf-processing", + "extract from pdfs", + ); + + let skills = load_test_skills(&cfg); + let res = get_user_instructions( + &cfg, + skills.errors.is_empty().then_some(skills.skills.as_slice()), + ) + .await + .expect("instructions expected"); + let expected_path = dunce::canonicalize( + cfg.codex_home + .join("skills/pdf-processing/SKILL.md") + .as_path(), + ) + .unwrap_or_else(|_| cfg.codex_home.join("skills/pdf-processing/SKILL.md")); + let expected_path_str = expected_path.to_string_lossy().replace('\\', "/"); + let usage_rules = "- Discovery: The list above is the skills available in this session (name + description + file path). Skill bodies live on disk at the listed paths.\n- Trigger rules: If the user names a skill (with `$SkillName` or plain text) OR the task clearly matches a skill's description shown above, you must use that skill for that turn. Multiple mentions mean use them all. Do not carry skills across turns unless re-mentioned.\n- Missing/blocked: If a named skill isn't in the list or the path can't be read, say so briefly and continue with the best fallback.\n- How to use a skill (progressive disclosure):\n 1) After deciding to use a skill, open its `SKILL.md`. Read only enough to follow the workflow.\n 2) When `SKILL.md` references relative paths (e.g., `scripts/foo.py`), resolve them relative to the skill directory listed above first, and only consider other paths if needed.\n 3) If `SKILL.md` points to extra folders such as `references/`, load only the specific files needed for the request; don't bulk-load everything.\n 4) If `scripts/` exist, prefer running or patching them instead of retyping large code blocks.\n 5) If `assets/` or templates exist, reuse them instead of recreating from scratch.\n- Coordination and sequencing:\n - If multiple skills apply, choose the minimal set that covers the request and state the order you'll use them.\n - Announce which skill(s) you're using and why (one short line). If you skip an obvious skill, say why.\n- Context hygiene:\n - Keep context small: summarize long sections instead of pasting them; only load extra files when needed.\n - Avoid deep reference-chasing: prefer opening only files directly linked from `SKILL.md` unless you're blocked.\n - When variants exist (frameworks, providers, domains), pick only the relevant reference file(s) and note that choice.\n- Safety and fallback: If a skill can't be applied cleanly (missing files, unclear instructions), state the issue, pick the next-best approach, and continue."; + let expected = format!( + "base doc\n\n## Skills\nA skill is a set of local instructions to follow that is stored in a `SKILL.md` file. Below is the list of skills that can be used. Each entry includes a name, description, and file path so you can open the source for full instructions when using a specific skill.\n### Available skills\n- pdf-processing: extract from pdfs (file: {expected_path_str})\n### How to use skills\n{usage_rules}" + ); + assert_eq!(res, expected); + } + + #[tokio::test] + async fn skills_render_without_project_doc() { + let tmp = tempfile::tempdir().expect("tempdir"); + let cfg = make_config(&tmp, 4096, None).await; + create_skill(cfg.codex_home.clone(), "linting", "run clippy"); + + let skills = load_test_skills(&cfg); + let res = get_user_instructions( + &cfg, + skills.errors.is_empty().then_some(skills.skills.as_slice()), + ) + .await + .expect("instructions expected"); + let expected_path = + dunce::canonicalize(cfg.codex_home.join("skills/linting/SKILL.md").as_path()) + .unwrap_or_else(|_| cfg.codex_home.join("skills/linting/SKILL.md")); + let expected_path_str = expected_path.to_string_lossy().replace('\\', "/"); + let usage_rules = "- Discovery: The list above is the skills available in this session (name + description + file path). Skill bodies live on disk at the listed paths.\n- Trigger rules: If the user names a skill (with `$SkillName` or plain text) OR the task clearly matches a skill's description shown above, you must use that skill for that turn. Multiple mentions mean use them all. Do not carry skills across turns unless re-mentioned.\n- Missing/blocked: If a named skill isn't in the list or the path can't be read, say so briefly and continue with the best fallback.\n- How to use a skill (progressive disclosure):\n 1) After deciding to use a skill, open its `SKILL.md`. Read only enough to follow the workflow.\n 2) When `SKILL.md` references relative paths (e.g., `scripts/foo.py`), resolve them relative to the skill directory listed above first, and only consider other paths if needed.\n 3) If `SKILL.md` points to extra folders such as `references/`, load only the specific files needed for the request; don't bulk-load everything.\n 4) If `scripts/` exist, prefer running or patching them instead of retyping large code blocks.\n 5) If `assets/` or templates exist, reuse them instead of recreating from scratch.\n- Coordination and sequencing:\n - If multiple skills apply, choose the minimal set that covers the request and state the order you'll use them.\n - Announce which skill(s) you're using and why (one short line). If you skip an obvious skill, say why.\n- Context hygiene:\n - Keep context small: summarize long sections instead of pasting them; only load extra files when needed.\n - Avoid deep reference-chasing: prefer opening only files directly linked from `SKILL.md` unless you're blocked.\n - When variants exist (frameworks, providers, domains), pick only the relevant reference file(s) and note that choice.\n- Safety and fallback: If a skill can't be applied cleanly (missing files, unclear instructions), state the issue, pick the next-best approach, and continue."; + let expected = format!( + "## Skills\nA skill is a set of local instructions to follow that is stored in a `SKILL.md` file. Below is the list of skills that can be used. Each entry includes a name, description, and file path so you can open the source for full instructions when using a specific skill.\n### Available skills\n- linting: run clippy (file: {expected_path_str})\n### How to use skills\n{usage_rules}" + ); + assert_eq!(res, expected); + } + + #[tokio::test] + async fn apps_feature_does_not_emit_user_instructions_by_itself() { + let tmp = tempfile::tempdir().expect("tempdir"); + let mut cfg = make_config(&tmp, 4096, None).await; + cfg.features + .enable(Feature::Apps) + .expect("test config should allow apps"); + + let res = get_user_instructions(&cfg, None).await; + assert_eq!(res, None); + } + + #[tokio::test] + async fn apps_feature_does_not_append_to_project_doc_user_instructions() { + let tmp = tempfile::tempdir().expect("tempdir"); + fs::write(tmp.path().join("AGENTS.md"), "base doc").unwrap(); + + let mut cfg = make_config(&tmp, 4096, None).await; + cfg.features + .enable(Feature::Apps) + .expect("test config should allow apps"); + + let res = get_user_instructions(&cfg, None) + .await + .expect("instructions expected"); + assert_eq!(res, "base doc"); + } + + fn create_skill(codex_home: PathBuf, name: &str, description: &str) { + let skill_dir = codex_home.join(format!("skills/{name}")); + fs::create_dir_all(&skill_dir).unwrap(); + let content = format!("---\nname: {name}\ndescription: {description}\n---\n\n# Body\n"); + fs::write(skill_dir.join("SKILL.md"), content).unwrap(); + } +} diff --git a/codex-rs/core/src/session_prefix.rs b/codex-rs/core/src/session_prefix.rs index db3ac00a6d..5eb262098b 100644 --- a/codex-rs/core/src/session_prefix.rs +++ b/codex-rs/core/src/session_prefix.rs @@ -1,16 +1,66 @@ use codex_protocol::protocol::AgentStatus; -/// Helpers for model-visible session state markers that are stored in user-role -/// messages but are not user intent. -use crate::contextual_user_message::SUBAGENT_NOTIFICATION_FRAGMENT; +/// Helpers for model-visible subagent session state rendered in the developer +/// envelope. +use crate::model_visible_context::DeveloperContextRole; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::SUBAGENT_NOTIFICATION_CLOSE_TAG; +use crate::model_visible_context::SUBAGENT_NOTIFICATION_OPEN_TAG; +use crate::model_visible_context::SUBAGENTS_CLOSE_TAG; +use crate::model_visible_context::SUBAGENTS_OPEN_TAG; -pub(crate) fn format_subagent_notification_message(agent_id: &str, status: &AgentStatus) -> String { - let payload_json = serde_json::json!({ - "agent_id": agent_id, - "status": status, - }) - .to_string(); - SUBAGENT_NOTIFICATION_FRAGMENT.wrap(payload_json) +pub(crate) struct SubagentRosterContext { + subagents: String, +} + +impl SubagentRosterContext { + pub(crate) fn new(subagents: String) -> Option { + if subagents.is_empty() { + None + } else { + Some(Self { subagents }) + } + } +} + +impl ModelVisibleContextFragment for SubagentRosterContext { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + let lines = self + .subagents + .lines() + .map(|line| format!(" {line}")) + .collect::>() + .join("\n"); + format!("{SUBAGENTS_OPEN_TAG}\n{lines}\n{SUBAGENTS_CLOSE_TAG}") + } +} + +pub(crate) struct SubagentNotification<'a> { + agent_id: &'a str, + status: &'a AgentStatus, +} + +impl<'a> SubagentNotification<'a> { + pub(crate) fn new(agent_id: &'a str, status: &'a AgentStatus) -> Self { + Self { agent_id, status } + } +} + +impl ModelVisibleContextFragment for SubagentNotification<'_> { + type Role = DeveloperContextRole; + + fn render_text(&self) -> String { + let payload_json = serde_json::json!({ + "agent_id": self.agent_id, + "status": self.status, + }) + .to_string(); + format!( + "{SUBAGENT_NOTIFICATION_OPEN_TAG}\n{payload_json}\n{SUBAGENT_NOTIFICATION_CLOSE_TAG}" + ) + } } pub(crate) fn format_subagent_context_line(agent_id: &str, agent_nickname: Option<&str>) -> String { @@ -19,3 +69,26 @@ pub(crate) fn format_subagent_context_line(agent_id: &str, agent_nickname: Optio None => format!("- {agent_id}"), } } + +#[cfg(test)] +mod tests { + use super::*; + use pretty_assertions::assert_eq; + + #[test] + fn serializes_subagent_roster_context() { + let context = + SubagentRosterContext::new("- agent-1: Atlas\n- agent-2: Juniper".to_string()) + .expect("context expected"); + + assert_eq!( + context.render_text(), + "\n - agent-1: Atlas\n - agent-2: Juniper\n" + ); + } + + #[test] + fn skips_empty_subagent_roster_context() { + assert!(SubagentRosterContext::new(String::new()).is_none()); + } +} diff --git a/codex-rs/core/src/skills/injection.rs b/codex-rs/core/src/skills/injection.rs index 549ccca87e..e55745def4 100644 --- a/codex-rs/core/src/skills/injection.rs +++ b/codex-rs/core/src/skills/injection.rs @@ -9,6 +9,7 @@ use crate::analytics_client::TrackEventsContext; use crate::instructions::SkillInstructions; use crate::mention_syntax::TOOL_MENTION_SIGIL; use crate::mentions::build_skill_name_counts; +use crate::model_visible_context::ModelVisibleContextFragment; use crate::skills::SkillMetadata; use codex_otel::SessionTelemetry; use codex_protocol::models::ResponseItem; @@ -47,11 +48,12 @@ pub(crate) async fn build_skill_injections( skill_path: skill.path_to_skills_md.clone(), invocation_type: InvocationType::Explicit, }); - result.items.push(ResponseItem::from(SkillInstructions { + let fragment = SkillInstructions { name: skill.name.clone(), path: skill.path_to_skills_md.to_string_lossy().into_owned(), contents, - })); + }; + result.items.push(fragment.into_message()); } Err(err) => { emit_skill_injected_metric(otel, skill, "error"); diff --git a/codex-rs/core/src/tasks/mod.rs b/codex-rs/core/src/tasks/mod.rs index a7a8f38131..c23ea16d68 100644 --- a/codex-rs/core/src/tasks/mod.rs +++ b/codex-rs/core/src/tasks/mod.rs @@ -22,8 +22,13 @@ use tracing::warn; use crate::AuthManager; use crate::codex::Session; use crate::codex::TurnContext; -use crate::contextual_user_message::TURN_ABORTED_OPEN_TAG; use crate::event_mapping::parse_turn_item; +use crate::model_visible_context::ContextualUserContextRole; +use crate::model_visible_context::ContextualUserFragmentMarkers; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::TURN_ABORTED_CLOSE_TAG; +use crate::model_visible_context::TURN_ABORTED_OPEN_TAG; +use crate::model_visible_context::TaggedContextualUserFragment; use crate::models_manager::manager::ModelsManager; use crate::protocol::EventMsg; use crate::protocol::TokenUsage; @@ -39,7 +44,6 @@ use codex_otel::metrics::names::TURN_NETWORK_PROXY_METRIC; use codex_otel::metrics::names::TURN_TOKEN_USAGE_METRIC; use codex_otel::metrics::names::TURN_TOOL_CALL_METRIC; use codex_protocol::items::TurnItem; -use codex_protocol::models::ContentItem; use codex_protocol::models::ResponseInputItem; use codex_protocol::models::ResponseItem; use codex_protocol::protocol::RolloutItem; @@ -58,6 +62,23 @@ pub(crate) use user_shell::execute_user_shell_command; const GRACEFULL_INTERRUPTION_TIMEOUT_MS: u64 = 100; const TURN_ABORTED_INTERRUPTED_GUIDANCE: &str = "The user interrupted the previous turn on purpose. Any running unified exec processes were terminated. If any tools/commands were aborted, they may have partially executed; verify current state before retrying."; +pub(crate) struct TurnAbortedMarker { + guidance: &'static str, +} + +impl ModelVisibleContextFragment for TurnAbortedMarker { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { + Self::wrap_contextual_user_body(self.guidance.to_string()) + } +} + +impl TaggedContextualUserFragment for TurnAbortedMarker { + const MARKERS: ContextualUserFragmentMarkers = + ContextualUserFragmentMarkers::new(TURN_ABORTED_OPEN_TAG, TURN_ABORTED_CLOSE_TAG); +} + fn emit_turn_network_proxy_metric( session_telemetry: &SessionTelemetry, network_proxy_active: bool, @@ -433,18 +454,10 @@ impl Session { if reason == TurnAbortReason::Interrupted { self.cleanup_after_interrupt(&task.turn_context).await; - - let marker = ResponseItem::Message { - id: None, - role: "user".to_string(), - content: vec![ContentItem::InputText { - text: format!( - "{TURN_ABORTED_OPEN_TAG}\n{TURN_ABORTED_INTERRUPTED_GUIDANCE}\n" - ), - }], - end_turn: None, - phase: None, + let marker = TurnAbortedMarker { + guidance: TURN_ABORTED_INTERRUPTED_GUIDANCE, }; + let marker = marker.into_message(); self.record_into_history(std::slice::from_ref(&marker), task.turn_context.as_ref()) .await; self.persist_rollout_items(&[RolloutItem::ResponseItem(marker)]) diff --git a/codex-rs/core/src/tools/handlers/mod.rs b/codex-rs/core/src/tools/handlers/mod.rs index 7ae1d1428d..0c80cbb0bc 100644 --- a/codex-rs/core/src/tools/handlers/mod.rs +++ b/codex-rs/core/src/tools/handlers/mod.rs @@ -169,7 +169,7 @@ pub(super) fn implicit_granted_permissions( additional_permissions: Option<&PermissionProfile>, effective_additional_permissions: &EffectiveAdditionalPermissions, ) -> Option { - if !sandbox_permissions.uses_additional_permissions() + if !sandbox_permissions.requires_additional_permissions() && !matches!(sandbox_permissions, SandboxPermissions::RequireEscalated) && additional_permissions.is_none() { @@ -214,12 +214,13 @@ pub(super) async fn apply_granted_turn_permissions( _ => false, }; - let sandbox_permissions = - if effective_permissions.is_some() && !sandbox_permissions.uses_additional_permissions() { - SandboxPermissions::WithAdditionalPermissions - } else { - sandbox_permissions - }; + let sandbox_permissions = if effective_permissions.is_some() + && !sandbox_permissions.requires_additional_permissions() + { + SandboxPermissions::WithAdditionalPermissions + } else { + sandbox_permissions + }; EffectiveAdditionalPermissions { sandbox_permissions, diff --git a/codex-rs/core/src/tools/handlers/shell.rs b/codex-rs/core/src/tools/handlers/shell.rs index a3014a8ed4..dc76869a7a 100644 --- a/codex-rs/core/src/tools/handlers/shell.rs +++ b/codex-rs/core/src/tools/handlers/shell.rs @@ -381,7 +381,7 @@ impl ShellHandler { // continue through the normal exec approval flow for the command. if effective_additional_permissions .sandbox_permissions - .requests_sandbox_override() + .requires_escalated_permissions() && !effective_additional_permissions.permissions_preapproved && !matches!( turn.approval_policy.value(), diff --git a/codex-rs/core/src/tools/handlers/unified_exec.rs b/codex-rs/core/src/tools/handlers/unified_exec.rs index 123225065e..4ac19ed862 100644 --- a/codex-rs/core/src/tools/handlers/unified_exec.rs +++ b/codex-rs/core/src/tools/handlers/unified_exec.rs @@ -187,7 +187,7 @@ impl ToolHandler for UnifiedExecHandler { // continue through the normal exec approval flow for the command. if effective_additional_permissions .sandbox_permissions - .requests_sandbox_override() + .requires_escalated_permissions() && !effective_additional_permissions.permissions_preapproved && !matches!( context.turn.approval_policy.value(), diff --git a/codex-rs/core/src/user_shell_command.rs b/codex-rs/core/src/user_shell_command.rs index 32cf78cf2a..72c254578d 100644 --- a/codex-rs/core/src/user_shell_command.rs +++ b/codex-rs/core/src/user_shell_command.rs @@ -3,8 +3,13 @@ use std::time::Duration; use codex_protocol::models::ResponseItem; use crate::codex::TurnContext; -use crate::contextual_user_message::USER_SHELL_COMMAND_FRAGMENT; use crate::exec::ExecToolCallOutput; +use crate::model_visible_context::ContextualUserContextRole; +use crate::model_visible_context::ContextualUserFragmentMarkers; +use crate::model_visible_context::ModelVisibleContextFragment; +use crate::model_visible_context::TaggedContextualUserFragment; +use crate::model_visible_context::USER_SHELL_COMMAND_CLOSE_TAG; +use crate::model_visible_context::USER_SHELL_COMMAND_OPEN_TAG; use crate::tools::format_exec_output_str; fn format_duration_line(duration: Duration) -> String { @@ -12,34 +17,54 @@ fn format_duration_line(duration: Duration) -> String { format!("Duration: {duration_seconds:.4} seconds") } -fn format_user_shell_command_body( - command: &str, - exec_output: &ExecToolCallOutput, - turn_context: &TurnContext, -) -> String { - let mut sections = Vec::new(); - sections.push("".to_string()); - sections.push(command.to_string()); - sections.push("".to_string()); - sections.push("".to_string()); - sections.push(format!("Exit code: {}", exec_output.exit_code)); - sections.push(format_duration_line(exec_output.duration)); - sections.push("Output:".to_string()); - sections.push(format_exec_output_str( - exec_output, - turn_context.truncation_policy, - )); - sections.push("".to_string()); - sections.join("\n") +pub(crate) struct UserShellCommandFragment; + +impl TaggedContextualUserFragment for UserShellCommandFragment { + const MARKERS: ContextualUserFragmentMarkers = ContextualUserFragmentMarkers::new( + USER_SHELL_COMMAND_OPEN_TAG, + USER_SHELL_COMMAND_CLOSE_TAG, + ); } +struct UserShellCommandRecord<'a> { + command: &'a str, + exec_output: &'a ExecToolCallOutput, + turn_context: &'a TurnContext, +} + +impl ModelVisibleContextFragment for UserShellCommandRecord<'_> { + type Role = ContextualUserContextRole; + + fn render_text(&self) -> String { + let mut sections = Vec::new(); + sections.push("".to_string()); + sections.push(self.command.to_string()); + sections.push("".to_string()); + sections.push("".to_string()); + sections.push(format!("Exit code: {}", self.exec_output.exit_code)); + sections.push(format_duration_line(self.exec_output.duration)); + sections.push("Output:".to_string()); + sections.push(format_exec_output_str( + self.exec_output, + self.turn_context.truncation_policy, + )); + sections.push("".to_string()); + UserShellCommandFragment::wrap_contextual_user_body(sections.join("\n")) + } +} + +#[cfg(test)] pub fn format_user_shell_command_record( command: &str, exec_output: &ExecToolCallOutput, turn_context: &TurnContext, ) -> String { - let body = format_user_shell_command_body(command, exec_output, turn_context); - USER_SHELL_COMMAND_FRAGMENT.wrap(body) + UserShellCommandRecord { + command, + exec_output, + turn_context, + } + .render_text() } pub fn user_shell_command_record_item( @@ -47,13 +72,71 @@ pub fn user_shell_command_record_item( exec_output: &ExecToolCallOutput, turn_context: &TurnContext, ) -> ResponseItem { - USER_SHELL_COMMAND_FRAGMENT.into_message(format_user_shell_command_record( + UserShellCommandRecord { command, exec_output, turn_context, - )) + } + .into_message() } #[cfg(test)] -#[path = "user_shell_command_tests.rs"] -mod tests; +mod tests { + use super::*; + use crate::codex::make_session_and_context; + use crate::exec::StreamOutput; + use codex_protocol::models::ContentItem; + use pretty_assertions::assert_eq; + + #[test] + fn detects_user_shell_command_text_variants() { + assert!( + ::matches_contextual_user_text("\necho hi\n") + ); + assert!( + !::matches_contextual_user_text("echo hi") + ); + } + + #[tokio::test] + async fn formats_basic_record() { + let exec_output = ExecToolCallOutput { + exit_code: 0, + stdout: StreamOutput::new("hi".to_string()), + stderr: StreamOutput::new(String::new()), + aggregated_output: StreamOutput::new("hi".to_string()), + duration: Duration::from_secs(1), + timed_out: false, + }; + let (_, turn_context) = make_session_and_context().await; + let item = user_shell_command_record_item("echo hi", &exec_output, &turn_context); + let ResponseItem::Message { content, .. } = item else { + panic!("expected message"); + }; + let [ContentItem::InputText { text }] = content.as_slice() else { + panic!("expected input text"); + }; + assert_eq!( + text, + "\n\necho hi\n\n\nExit code: 0\nDuration: 1.0000 seconds\nOutput:\nhi\n\n" + ); + } + + #[tokio::test] + async fn uses_aggregated_output_over_streams() { + let exec_output = ExecToolCallOutput { + exit_code: 42, + stdout: StreamOutput::new("stdout-only".to_string()), + stderr: StreamOutput::new("stderr-only".to_string()), + aggregated_output: StreamOutput::new("combined output wins".to_string()), + duration: Duration::from_millis(120), + timed_out: false, + }; + let (_, turn_context) = make_session_and_context().await; + let record = format_user_shell_command_record("false", &exec_output, &turn_context); + assert_eq!( + record, + "\n\nfalse\n\n\nExit code: 42\nDuration: 0.1200 seconds\nOutput:\ncombined output wins\n\n" + ); + } +} diff --git a/codex-rs/core/tests/common/context_snapshot.rs b/codex-rs/core/tests/common/context_snapshot.rs index 5471fd8913..14977326fb 100644 --- a/codex-rs/core/tests/common/context_snapshot.rs +++ b/codex-rs/core/tests/common/context_snapshot.rs @@ -245,34 +245,22 @@ fn canonicalize_snapshot_text(text: &str) -> String { return "".to_string(); } if text.starts_with("") { - let subagent_count = text - .split_once("") - .and_then(|(_, rest)| rest.split_once("")) - .map(|(subagents, _)| { - subagents - .lines() - .filter(|line| line.trim_start().starts_with("- ")) - .count() - }) - .unwrap_or(0); - let subagents_suffix = if subagent_count > 0 { - format!(":subagents={subagent_count}") - } else { - String::new() - }; if let (Some(cwd_start), Some(cwd_end)) = (text.find(""), text.find("")) { let cwd = &text[cwd_start + "".len()..cwd_end]; return if cwd.ends_with("PRETURN_CONTEXT_DIFF_CWD") { - format!("") + "".to_string() } else { - format!("{subagents_suffix}>") + ">".to_string() }; } - return if subagent_count > 0 { - format!("") - } else { - "".to_string() - }; + return "".to_string(); + } + if text.starts_with("") { + let subagent_count = text + .lines() + .filter(|line| line.trim_start().starts_with("- ")) + .count(); + return format!(""); } if text.starts_with("You are performing a CONTEXT CHECKPOINT COMPACTION.") { return "".to_string(); @@ -354,13 +342,13 @@ mod tests { } #[test] - fn redacted_text_mode_normalizes_environment_context_with_subagents() { + fn redacted_text_mode_normalizes_subagents_fragment() { let items = vec![json!({ "type": "message", - "role": "user", + "role": "developer", "content": [{ "type": "input_text", - "text": "\n /tmp/example\n bash\n \n - agent-1: atlas\n - agent-2\n \n" + "text": "\n - agent-1: atlas\n - agent-2\n" }] })]; @@ -369,10 +357,7 @@ mod tests { &ContextSnapshotOptions::default().render_mode(ContextSnapshotRenderMode::RedactedText), ); - assert_eq!( - rendered, - "00:message/user::subagents=2>" - ); + assert_eq!(rendered, "00:message/developer:"); } #[test] diff --git a/codex-rs/core/tests/suite/approvals.rs b/codex-rs/core/tests/suite/approvals.rs index 7a9dba038e..bcbd421ec2 100644 --- a/codex-rs/core/tests/suite/approvals.rs +++ b/codex-rs/core/tests/suite/approvals.rs @@ -239,7 +239,7 @@ fn shell_event_with_prefix_rule( "command": command, "timeout_ms": timeout_ms, }); - if sandbox_permissions.requests_sandbox_override() { + if sandbox_permissions.requires_escalated_permissions() { args["sandbox_permissions"] = json!(sandbox_permissions); } if let Some(prefix_rule) = prefix_rule { @@ -262,7 +262,7 @@ fn exec_command_event( if let Some(yield_time_ms) = yield_time_ms { args["yield_time_ms"] = json!(yield_time_ms); } - if sandbox_permissions.requests_sandbox_override() { + if sandbox_permissions.requires_escalated_permissions() { args["sandbox_permissions"] = json!(sandbox_permissions); let reason = justification.unwrap_or(DEFAULT_UNIFIED_EXEC_JUSTIFICATION); args["justification"] = json!(reason); diff --git a/codex-rs/core/tests/suite/client.rs b/codex-rs/core/tests/suite/client.rs index 6406507264..64fd98fc70 100644 --- a/codex-rs/core/tests/suite/client.rs +++ b/codex-rs/core/tests/suite/client.rs @@ -330,7 +330,7 @@ async fn resume_includes_initial_messages_and_sends_prior_items() { .position(|(role, text)| { role == "user" && text.contains("be nice") - && (text.starts_with("# AGENTS.md instructions for ")) + && text.starts_with("# AGENTS.md instructions for ") }) .expect("user instructions"); let pos_environment = messages @@ -908,9 +908,9 @@ async fn includes_user_instructions_message_in_request() { let ui_text = user_context_texts .iter() .copied() - .find(|text| text.contains("")) + .find(|text| text.starts_with("# AGENTS.md instructions for ")) .expect("invalid message content"); - assert!(ui_text.contains("")); + assert!(ui_text.contains("")); assert!(ui_text.contains("be nice")); assert!( user_context_texts @@ -1766,9 +1766,9 @@ async fn includes_developer_instructions_message_in_request() { let ui_text = user_context_texts .iter() .copied() - .find(|text| text.contains("")) + .find(|text| text.starts_with("# AGENTS.md instructions for ")) .expect("invalid message content"); - assert!(ui_text.contains("")); + assert!(ui_text.contains("")); assert!(ui_text.contains("be nice")); assert!( user_context_texts diff --git a/codex-rs/core/tests/suite/model_visible_layout.rs b/codex-rs/core/tests/suite/model_visible_layout.rs index b02e3ae156..e4b56e369f 100644 --- a/codex-rs/core/tests/suite/model_visible_layout.rs +++ b/codex-rs/core/tests/suite/model_visible_layout.rs @@ -25,7 +25,6 @@ use core_test_support::responses::start_mock_server; use core_test_support::skip_if_no_network; use core_test_support::test_codex::test_codex; use core_test_support::wait_for_event; -use serde_json::json; const PRETURN_CONTEXT_DIFF_CWD: &str = "PRETURN_CONTEXT_DIFF_CWD"; @@ -53,30 +52,6 @@ fn agents_message_count(request: &ResponsesRequest) -> usize { .count() } -fn format_environment_context_subagents_snapshot(subagents: &[&str]) -> String { - let subagents_block = if subagents.is_empty() { - String::new() - } else { - let lines = subagents - .iter() - .map(|line| format!(" {line}")) - .collect::>() - .join("\n"); - format!("\n \n{lines}\n ") - }; - let items = vec![json!({ - "type": "message", - "role": "user", - "content": [{ - "type": "input_text", - "text": format!( - "\n /tmp/example\n bash{subagents_block}\n" - ), - }], - })]; - context_snapshot::format_response_items_snapshot(items.as_slice(), &context_snapshot_options()) -} - #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn snapshot_model_visible_layout_turn_overrides() -> Result<()> { skip_if_no_network!(Ok(())); @@ -175,9 +150,7 @@ async fn snapshot_model_visible_layout_turn_overrides() -> Result<()> { } #[tokio::test(flavor = "multi_thread", worker_threads = 2)] -// TODO(ccunningham): Diff `user_instructions` and emit updates when AGENTS.md content changes -// (for example after cwd changes), then update this test to assert refreshed AGENTS content. -async fn snapshot_model_visible_layout_cwd_change_does_not_refresh_agents() -> Result<()> { +async fn snapshot_model_visible_layout_cwd_change_refreshes_agents() -> Result<()> { skip_if_no_network!(Ok(())); let server = start_mock_server().await; @@ -268,13 +241,13 @@ async fn snapshot_model_visible_layout_cwd_change_does_not_refresh_agents() -> R ); assert_eq!( agents_message_count(&requests[1]), - 1, - "expected AGENTS to refresh after cwd change, but current behavior only keeps history AGENTS" + 2, + "expected updated AGENTS instructions to be re-injected after cwd change" ); insta::assert_snapshot!( - "model_visible_layout_cwd_change_does_not_refresh_agents", + "model_visible_layout_cwd_change_refreshes_agents", format_labeled_requests_snapshot( - "Second turn changes cwd to a directory with different AGENTS.md; current behavior does not emit refreshed AGENTS instructions.", + "Second turn changes cwd to a directory with different AGENTS.md; updated AGENTS instructions are re-injected.", &[ ("First Request (agents_one)", &requests[0]), ("Second Request (agents_two cwd)", &requests[1]), @@ -481,23 +454,3 @@ async fn snapshot_model_visible_layout_resume_override_matches_rollout_model() - Ok(()) } - -#[tokio::test(flavor = "multi_thread", worker_threads = 2)] -async fn snapshot_model_visible_layout_environment_context_includes_one_subagent() -> Result<()> { - insta::assert_snapshot!( - "model_visible_layout_environment_context_includes_one_subagent", - format_environment_context_subagents_snapshot(&["- agent-1: Atlas"]) - ); - - Ok(()) -} - -#[tokio::test(flavor = "multi_thread", worker_threads = 2)] -async fn snapshot_model_visible_layout_environment_context_includes_two_subagents() -> Result<()> { - insta::assert_snapshot!( - "model_visible_layout_environment_context_includes_two_subagents", - format_environment_context_subagents_snapshot(&["- agent-1: Atlas", "- agent-2: Juniper"]) - ); - - Ok(()) -} diff --git a/codex-rs/core/tests/suite/permissions_messages.rs b/codex-rs/core/tests/suite/permissions_messages.rs index d9c4d44116..016473a56c 100644 --- a/codex-rs/core/tests/suite/permissions_messages.rs +++ b/codex-rs/core/tests/suite/permissions_messages.rs @@ -1,7 +1,8 @@ use anyhow::Result; use codex_core::config::Constrained; +use codex_core::features::Feature; use codex_execpolicy::Policy; -use codex_protocol::models::DeveloperInstructions; +use codex_protocol::models::developer_permissions_text; use codex_protocol::protocol::AskForApproval; use codex_protocol::protocol::EventMsg; use codex_protocol::protocol::Op; @@ -79,6 +80,48 @@ async fn permissions_message_sent_once_on_start() -> Result<()> { Ok(()) } +#[tokio::test(flavor = "multi_thread", worker_threads = 2)] +async fn permissions_message_mentions_additional_permissions_for_legacy_exec_flow() -> Result<()> { + skip_if_no_network!(Ok(())); + + let server = start_mock_server().await; + let req = mount_sse_once( + &server, + sse(vec![ev_response_created("resp-1"), ev_completed("resp-1")]), + ) + .await; + + let mut builder = test_codex().with_config(move |config| { + config.permissions.approval_policy = Constrained::allow_any(AskForApproval::OnRequest); + config + .features + .enable(Feature::ExecPermissionApprovals) + .expect("test config should allow feature update"); + }); + let test = builder.build(&server).await?; + + test.codex + .submit(Op::UserInput { + items: vec![UserInput::Text { + text: "hello".into(), + text_elements: Vec::new(), + }], + final_output_json_schema: None, + }) + .await?; + wait_for_event(&test.codex, |ev| matches!(ev, EventMsg::TurnComplete(_))).await; + + let request = req.single_request(); + let body = request.body_json(); + let input = body["input"].as_array().expect("input array"); + let permissions = permissions_texts(input); + assert_eq!(permissions.len(), 1); + assert!(permissions[0].contains("with_additional_permissions")); + assert!(permissions[0].contains("additional_permissions")); + + Ok(()) +} + #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn permissions_message_added_on_override_change() -> Result<()> { skip_if_no_network!(Ok(())); @@ -487,15 +530,14 @@ async fn permissions_message_includes_writable_roots() -> Result<()> { let body = req.single_request().body_json(); let input = body["input"].as_array().expect("input array"); let permissions = permissions_texts(input); - let expected = DeveloperInstructions::from_policy( + let expected = developer_permissions_text( &sandbox_policy, AskForApproval::OnRequest, + false, &Policy::empty(), test.config.cwd.as_path(), false, - false, - ) - .into_text(); + ); // Normalize line endings to handle Windows vs Unix differences let normalize_line_endings = |s: &str| s.replace("\r\n", "\n"); let expected_normalized = normalize_line_endings(&expected); diff --git a/codex-rs/core/tests/suite/plugins.rs b/codex-rs/core/tests/suite/plugins.rs index a68f75a019..14fe2d0687 100644 --- a/codex-rs/core/tests/suite/plugins.rs +++ b/codex-rs/core/tests/suite/plugins.rs @@ -21,6 +21,7 @@ use core_test_support::stdio_server_bin; use core_test_support::test_codex::test_codex; use core_test_support::wait_for_event; use core_test_support::wait_for_event_with_timeout; +use dunce::canonicalize as normalize_path; use tempfile::TempDir; use wiremock::MockServer; @@ -189,7 +190,7 @@ fn tool_description(body: &serde_json::Value, tool_name: &str) -> Option } #[tokio::test(flavor = "multi_thread", worker_threads = 2)] -async fn plugin_skills_append_to_instructions() -> Result<()> { +async fn plugin_instructions_are_split_from_agents_instructions() -> Result<()> { skip_if_no_network!(Ok(())); let server = MockServer::start().await; @@ -200,7 +201,7 @@ async fn plugin_skills_append_to_instructions() -> Result<()> { .await; let codex_home = Arc::new(TempDir::new()?); - write_plugin_skill_plugin(codex_home.as_ref()); + let skill_path = write_plugin_skill_plugin(codex_home.as_ref()); let codex = build_plugin_test_codex(&server, Arc::clone(&codex_home)).await?; codex @@ -216,17 +217,34 @@ async fn plugin_skills_append_to_instructions() -> Result<()> { wait_for_event(&codex, |ev| matches!(ev, EventMsg::TurnComplete(_))).await; let request = resp_mock.single_request(); - let request_body = request.body_json(); - let instructions_text = request_body["input"][1]["content"][0]["text"] - .as_str() - .expect("instructions text"); + let user_texts = request.message_input_texts("user"); + let instructions_text = user_texts + .iter() + .find(|text| text.starts_with("# AGENTS.md instructions for ")) + .expect("AGENTS instructions text"); + let plugin_text = user_texts + .iter() + .find(|text| text.starts_with("\n")) + .expect("plugins fragment text"); assert!( - instructions_text.contains("## Plugins"), + !instructions_text.contains("## Plugins"), + "expected plugins section to be separate from AGENTS instructions" + ); + assert!( + plugin_text.contains("## Plugins"), "expected plugins section present" ); assert!( - instructions_text.contains("`sample`"), - "expected enabled plugin name in instructions" + plugin_text.contains("### Available plugins\n- `sample`"), + "expected enabled plugin list in plugin fragment" + ); + assert!( + plugin_text.contains("### How to use plugins"), + "expected plugin usage guidance heading" + ); + assert!( + instructions_text.contains("## Skills"), + "expected skills section present" ); assert!( instructions_text.contains("`sample`: inspect sample data"), @@ -240,6 +258,23 @@ async fn plugin_skills_append_to_instructions() -> Result<()> { instructions_text.contains("sample:sample-search: inspect sample data"), "expected namespaced plugin skill summary" ); + let expected_path = normalize_path(skill_path)?; + let expected_path_str = expected_path.to_string_lossy().replace('\\', "/"); + assert!( + instructions_text.contains(&expected_path_str), + "expected path {expected_path_str} in instructions" + ); + assert!( + user_texts + .iter() + .position(|text| text == plugin_text) + .expect("plugin fragment position") + > user_texts + .iter() + .position(|text| text == instructions_text) + .expect("instructions fragment position"), + "expected plugin fragment to appear after AGENTS instructions" + ); Ok(()) } diff --git a/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_cwd_change_does_not_refresh_agents.snap b/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_cwd_change_does_not_refresh_agents.snap deleted file mode 100644 index 42d92a720f..0000000000 --- a/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_cwd_change_does_not_refresh_agents.snap +++ /dev/null @@ -1,23 +0,0 @@ ---- -source: core/tests/suite/model_visible_layout.rs -assertion_line: 288 -expression: "format_labeled_requests_snapshot(\"Second turn changes cwd to a directory with different AGENTS.md; current behavior does not emit refreshed AGENTS instructions.\",\n&[(\"First Request (agents_one)\", &requests[0]),\n(\"Second Request (agents_two cwd)\", &requests[1]),])" ---- -Scenario: Second turn changes cwd to a directory with different AGENTS.md; current behavior does not emit refreshed AGENTS instructions. - -## First Request (agents_one) -00:message/developer: -01:message/user[2]: - [01] - [02] > -02:message/user:first turn in agents_one - -## Second Request (agents_two cwd) -00:message/developer: -01:message/user[2]: - [01] - [02] > -02:message/user:first turn in agents_one -03:message/assistant:turn one complete -04:message/user:> -05:message/user:second turn in agents_two diff --git a/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_cwd_change_refreshes_agents.snap b/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_cwd_change_refreshes_agents.snap new file mode 100644 index 0000000000..8e888da88e --- /dev/null +++ b/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_cwd_change_refreshes_agents.snap @@ -0,0 +1,26 @@ +--- +source: core/tests/suite/model_visible_layout.rs +expression: "format_labeled_requests_snapshot(\"Second turn changes cwd to a directory with different AGENTS.md; updated AGENTS instructions are re-injected.\",\n&[(\"First Request (agents_one)\", &requests[0]),\n(\"Second Request (agents_two cwd)\", &requests[1]),])" +--- +Scenario: Second turn changes cwd to a directory with different AGENTS.md; updated AGENTS instructions are re-injected. + +## First Request (agents_one) +00:message/developer: +01:message/user[2]: + [01] + [02] > +02:message/user:first turn in agents_one + +## Second Request (agents_two cwd) +00:message/developer: +01:message/user[2]: + [01] + [02] > +02:message/developer: +03:message/user:> +04:message/user:first turn in agents_one +05:message/assistant:turn one complete +06:message/user[2]: + [01] + [02] > +07:message/user:second turn in agents_two diff --git a/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_environment_context_includes_one_subagent.snap b/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_environment_context_includes_one_subagent.snap deleted file mode 100644 index 3436943cd2..0000000000 --- a/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_environment_context_includes_one_subagent.snap +++ /dev/null @@ -1,6 +0,0 @@ ---- -source: core/tests/suite/model_visible_layout.rs -assertion_line: 476 -expression: "format_environment_context_subagents_snapshot(&[\"- agent-1: Atlas\"])" ---- -00:message/user::subagents=1> diff --git a/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_environment_context_includes_two_subagents.snap b/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_environment_context_includes_two_subagents.snap deleted file mode 100644 index 105c28515b..0000000000 --- a/codex-rs/core/tests/suite/snapshots/all__suite__model_visible_layout__model_visible_layout_environment_context_includes_two_subagents.snap +++ /dev/null @@ -1,6 +0,0 @@ ---- -source: core/tests/suite/model_visible_layout.rs -assertion_line: 486 -expression: "format_environment_context_subagents_snapshot(&[\"- agent-1: Atlas\",\n\"- agent-2: Juniper\",])" ---- -00:message/user::subagents=2> diff --git a/codex-rs/core/tests/suite/subagent_notifications.rs b/codex-rs/core/tests/suite/subagent_notifications.rs index 8959975798..f2d3c07cd9 100644 --- a/codex-rs/core/tests/suite/subagent_notifications.rs +++ b/codex-rs/core/tests/suite/subagent_notifications.rs @@ -58,7 +58,7 @@ fn body_contains(req: &wiremock::Request, text: &str) -> bool { } fn has_subagent_notification(req: &ResponsesRequest) -> bool { - req.message_input_texts("user") + req.message_input_texts("developer") .iter() .any(|text| text.contains("")) } diff --git a/codex-rs/protocol/src/models.rs b/codex-rs/protocol/src/models.rs index 700c1f80e0..4c09a738c0 100644 --- a/codex-rs/protocol/src/models.rs +++ b/codex-rs/protocol/src/models.rs @@ -14,7 +14,6 @@ use crate::config_types::SandboxMode; use crate::protocol::AskForApproval; use crate::protocol::COLLABORATION_MODE_CLOSE_TAG; use crate::protocol::COLLABORATION_MODE_OPEN_TAG; -use crate::protocol::GranularApprovalConfig; use crate::protocol::NetworkAccess; use crate::protocol::REALTIME_CONVERSATION_CLOSE_TAG; use crate::protocol::REALTIME_CONVERSATION_OPEN_TAG; @@ -283,6 +282,33 @@ pub enum MessagePhase { FinalAnswer, } +/// Canonical message roles used throughout Codex. +#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Hash, JsonSchema, TS)] +#[serde(rename_all = "lowercase")] +pub enum MessageRole { + User, + Assistant, + Developer, + System, +} + +impl MessageRole { + pub const fn as_str(self) -> &'static str { + match self { + Self::User => "user", + Self::Assistant => "assistant", + Self::Developer => "developer", + Self::System => "system", + } + } +} + +impl std::fmt::Display for MessageRole { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.write_str(self.as_str()) + } +} + #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS)] #[serde(tag = "type", rename_all = "snake_case")] pub enum ResponseItem { @@ -451,9 +477,12 @@ impl Default for BaseInstructions { /// Developer-provided guidance that is injected into a turn as a developer role /// message. +/// +/// This type is only for custom developer instructions provided by users or +/// clients (for example config/app-server overrides). #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS)] #[serde(rename = "developer_instructions", rename_all = "snake_case")] -pub struct DeveloperInstructions { +pub struct CustomDeveloperInstructions { text: String, } @@ -466,6 +495,7 @@ const APPROVAL_POLICY_ON_REQUEST_RULE: &str = include_str!("prompts/permissions/approval_policy/on_request_rule.md"); const APPROVAL_POLICY_ON_REQUEST_RULE_REQUEST_PERMISSION: &str = include_str!("prompts/permissions/approval_policy/on_request_rule_request_permission.md"); +const GUARDIAN_APPROVAL_FEATURE: &str = ""; const SANDBOX_MODE_DANGER_FULL_ACCESS: &str = include_str!("prompts/permissions/sandbox_mode/danger_full_access.md"); @@ -476,293 +506,187 @@ const SANDBOX_MODE_READ_ONLY: &str = include_str!("prompts/permissions/sandbox_m const REALTIME_START_INSTRUCTIONS: &str = include_str!("prompts/realtime/realtime_start.md"); const REALTIME_END_INSTRUCTIONS: &str = include_str!("prompts/realtime/realtime_end.md"); -impl DeveloperInstructions { +impl CustomDeveloperInstructions { pub fn new>(text: T) -> Self { Self { text: text.into() } } - pub fn from( - approval_policy: AskForApproval, - exec_policy: &Policy, - exec_permission_approvals_enabled: bool, - request_permissions_tool_enabled: bool, - ) -> DeveloperInstructions { - let with_request_permissions_tool = |text: &str| { - if request_permissions_tool_enabled { - format!("{text}\n\n{}", request_permissions_tool_prompt_section()) - } else { - text.to_string() - } - }; - let on_request_instructions = || { - let on_request_rule = if exec_permission_approvals_enabled { - APPROVAL_POLICY_ON_REQUEST_RULE_REQUEST_PERMISSION.to_string() - } else { - APPROVAL_POLICY_ON_REQUEST_RULE.to_string() - }; - let mut sections = vec![on_request_rule]; - if request_permissions_tool_enabled { - sections.push(request_permissions_tool_prompt_section().to_string()); - } - if let Some(prefixes) = approved_command_prefixes_text(exec_policy) { - sections.push(format!( - "## Approved command prefixes\nThe following prefix rules have already been approved: {prefixes}" - )); - } - sections.join("\n\n") - }; - let text = match approval_policy { - AskForApproval::Never => APPROVAL_POLICY_NEVER.to_string(), - AskForApproval::UnlessTrusted => { - with_request_permissions_tool(APPROVAL_POLICY_UNLESS_TRUSTED) - } - AskForApproval::OnFailure => with_request_permissions_tool(APPROVAL_POLICY_ON_FAILURE), - AskForApproval::OnRequest => on_request_instructions(), - AskForApproval::Granular(granular_config) => granular_instructions( - granular_config, - exec_policy, - exec_permission_approvals_enabled, - request_permissions_tool_enabled, - ), - }; - - DeveloperInstructions::new(text) - } - pub fn into_text(self) -> String { self.text } +} - pub fn concat(self, other: impl Into) -> Self { - let mut text = self.text; - if !text.ends_with('\n') { - text.push('\n'); +impl From for ResponseItem { + fn from(instructions: CustomDeveloperInstructions) -> Self { + ResponseItem::Message { + id: None, + role: MessageRole::Developer.to_string(), + content: vec![ContentItem::InputText { + text: instructions.into_text(), + }], + end_turn: None, + phase: None, } - text.push_str(&other.into().text); - Self { text } - } - - pub fn model_switch_message(model_instructions: String) -> Self { - DeveloperInstructions::new(format!( - "\nThe user was previously using a different model. Please continue the conversation according to the following instructions:\n\n{model_instructions}\n" - )) - } - - pub fn realtime_start_message() -> Self { - Self::realtime_start_message_with_instructions(REALTIME_START_INSTRUCTIONS.trim()) - } - - pub fn realtime_start_message_with_instructions(instructions: &str) -> Self { - DeveloperInstructions::new(format!( - "{REALTIME_CONVERSATION_OPEN_TAG}\n{instructions}\n{REALTIME_CONVERSATION_CLOSE_TAG}" - )) - } - - pub fn realtime_end_message(reason: &str) -> Self { - DeveloperInstructions::new(format!( - "{REALTIME_CONVERSATION_OPEN_TAG}\n{}\n\nReason: {reason}\n{REALTIME_CONVERSATION_CLOSE_TAG}", - REALTIME_END_INSTRUCTIONS.trim() - )) - } - - pub fn personality_spec_message(spec: String) -> Self { - let message = format!( - " The user has requested a new communication style. Future messages should adhere to the following personality: \n{spec} " - ); - DeveloperInstructions::new(message) - } - - pub fn from_policy( - sandbox_policy: &SandboxPolicy, - approval_policy: AskForApproval, - exec_policy: &Policy, - cwd: &Path, - exec_permission_approvals_enabled: bool, - request_permissions_tool_enabled: bool, - ) -> Self { - let network_access = if sandbox_policy.has_full_network_access() { - NetworkAccess::Enabled - } else { - NetworkAccess::Restricted - }; - - let (sandbox_mode, writable_roots) = match sandbox_policy { - SandboxPolicy::DangerFullAccess => (SandboxMode::DangerFullAccess, None), - SandboxPolicy::ReadOnly { .. } => (SandboxMode::ReadOnly, None), - SandboxPolicy::ExternalSandbox { .. } => (SandboxMode::DangerFullAccess, None), - SandboxPolicy::WorkspaceWrite { .. } => { - let roots = sandbox_policy.get_writable_roots_with_cwd(cwd); - (SandboxMode::WorkspaceWrite, Some(roots)) - } - }; - - DeveloperInstructions::from_permissions_with_network( - sandbox_mode, - network_access, - approval_policy, - exec_policy, - writable_roots, - exec_permission_approvals_enabled, - request_permissions_tool_enabled, - ) - } - - /// Returns developer instructions from a collaboration mode if they exist and are non-empty. - pub fn from_collaboration_mode(collaboration_mode: &CollaborationMode) -> Option { - collaboration_mode - .settings - .developer_instructions - .as_ref() - .filter(|instructions| !instructions.is_empty()) - .map(|instructions| { - DeveloperInstructions::new(format!( - "{COLLABORATION_MODE_OPEN_TAG}{instructions}{COLLABORATION_MODE_CLOSE_TAG}" - )) - }) - } - - fn from_permissions_with_network( - sandbox_mode: SandboxMode, - network_access: NetworkAccess, - approval_policy: AskForApproval, - exec_policy: &Policy, - writable_roots: Option>, - exec_permission_approvals_enabled: bool, - request_permissions_tool_enabled: bool, - ) -> Self { - let start_tag = DeveloperInstructions::new(""); - let end_tag = DeveloperInstructions::new(""); - start_tag - .concat(DeveloperInstructions::sandbox_text( - sandbox_mode, - network_access, - )) - .concat(DeveloperInstructions::from( - approval_policy, - exec_policy, - exec_permission_approvals_enabled, - request_permissions_tool_enabled, - )) - .concat(DeveloperInstructions::from_writable_roots(writable_roots)) - .concat(end_tag) - } - - fn from_writable_roots(writable_roots: Option>) -> Self { - let Some(roots) = writable_roots else { - return DeveloperInstructions::new(""); - }; - - if roots.is_empty() { - return DeveloperInstructions::new(""); - } - - let roots_list: Vec = roots - .iter() - .map(|r| format!("`{}`", r.root.to_string_lossy())) - .collect(); - let text = if roots_list.len() == 1 { - format!(" The writable root is {}.", roots_list[0]) - } else { - format!(" The writable roots are {}.", roots_list.join(", ")) - }; - DeveloperInstructions::new(text) - } - - fn sandbox_text(mode: SandboxMode, network_access: NetworkAccess) -> DeveloperInstructions { - let template = match mode { - SandboxMode::DangerFullAccess => SANDBOX_MODE_DANGER_FULL_ACCESS.trim_end(), - SandboxMode::WorkspaceWrite => SANDBOX_MODE_WORKSPACE_WRITE.trim_end(), - SandboxMode::ReadOnly => SANDBOX_MODE_READ_ONLY.trim_end(), - }; - let text = template.replace("{network_access}", &network_access.to_string()); - - DeveloperInstructions::new(text) } } -fn approved_command_prefixes_text(exec_policy: &Policy) -> Option { - format_allow_prefixes(exec_policy.get_allowed_prefixes()) - .filter(|prefixes| !prefixes.is_empty()) +pub fn developer_model_switch_text(model_instructions: String) -> String { + format!( + "\nThe user was previously using a different model. Please continue the conversation according to the following instructions:\n\n{model_instructions}\n" + ) } -fn granular_prompt_intro_text() -> &'static str { - "# Approval Requests\n\nApproval policy is `granular`. Categories set to `false` are automatically rejected instead of prompting the user." +pub fn developer_realtime_start_text() -> String { + developer_realtime_start_text_with_instructions(None) } -fn request_permissions_tool_prompt_section() -> &'static str { - "# request_permissions Tool\n\nThe built-in `request_permissions` tool is available in this session. Invoke it when you need to request additional `network`, `file_system`, or `macos` permissions before later shell-like commands need them. Request only the specific permissions required for the task." +pub fn developer_realtime_start_text_with_instructions(instructions: Option<&str>) -> String { + let instructions = instructions + .filter(|instructions| !instructions.is_empty()) + .unwrap_or(REALTIME_START_INSTRUCTIONS.trim()); + format!("{REALTIME_CONVERSATION_OPEN_TAG}\n{instructions}\n{REALTIME_CONVERSATION_CLOSE_TAG}") } -fn granular_instructions( - granular_config: GranularApprovalConfig, +pub fn developer_realtime_end_text(reason: &str) -> String { + format!( + "{REALTIME_CONVERSATION_OPEN_TAG}\n{}\n\nReason: {reason}\n{REALTIME_CONVERSATION_CLOSE_TAG}", + REALTIME_END_INSTRUCTIONS.trim() + ) +} + +pub fn developer_personality_spec_text(spec: String) -> String { + format!( + " The user has requested a new communication style. Future messages should adhere to the following personality: \n{spec} " + ) +} + +pub fn developer_permissions_text( + sandbox_policy: &SandboxPolicy, + approval_policy: AskForApproval, + guardian_approval_enabled: bool, exec_policy: &Policy, - exec_permission_approvals_enabled: bool, - request_permissions_tool_enabled: bool, + cwd: &Path, + additional_permissions_guidance_enabled: bool, ) -> String { - let sandbox_approval_prompts_allowed = granular_config.allows_sandbox_approval(); - let shell_permission_requests_available = - exec_permission_approvals_enabled && sandbox_approval_prompts_allowed; - let request_permissions_tool_prompts_allowed = - request_permissions_tool_enabled && granular_config.allows_request_permissions(); - let categories = [ - Some(( - granular_config.allows_sandbox_approval(), - "`sandbox_approval`", - )), - Some((granular_config.allows_rules_approval(), "`rules`")), - Some((granular_config.allows_skill_approval(), "`skill_approval`")), - request_permissions_tool_enabled.then_some(( - granular_config.allows_request_permissions(), - "`request_permissions`", - )), - Some(( - granular_config.allows_mcp_elicitations(), - "`mcp_elicitations`", - )), - ]; - let prompted_categories = categories - .iter() - .flatten() - .filter(|&&(is_allowed, _)| is_allowed) - .map(|&(_, category)| format!("- {category}")) - .collect::>(); - let rejected_categories = categories - .iter() - .flatten() - .filter(|&&(is_allowed, _)| !is_allowed) - .map(|&(_, category)| format!("- {category}")) - .collect::>(); + let network_access = if sandbox_policy.has_full_network_access() { + NetworkAccess::Enabled + } else { + NetworkAccess::Restricted + }; + let (sandbox_mode, writable_roots) = match sandbox_policy { + SandboxPolicy::DangerFullAccess => (SandboxMode::DangerFullAccess, None), + SandboxPolicy::ReadOnly { .. } => (SandboxMode::ReadOnly, None), + SandboxPolicy::ExternalSandbox { .. } => (SandboxMode::DangerFullAccess, None), + SandboxPolicy::WorkspaceWrite { .. } => { + let roots = sandbox_policy.get_writable_roots_with_cwd(cwd); + (SandboxMode::WorkspaceWrite, Some(roots)) + } + }; + developer_permissions_with_network_text( + sandbox_mode, + network_access, + approval_policy, + guardian_approval_enabled, + exec_policy, + writable_roots, + additional_permissions_guidance_enabled, + ) +} - let mut sections = vec![granular_prompt_intro_text().to_string()]; +pub fn developer_collaboration_mode_text(collaboration_mode: &CollaborationMode) -> Option { + collaboration_mode + .settings + .developer_instructions + .as_ref() + .filter(|instructions| !instructions.is_empty()) + .map(|instructions| { + format!("{COLLABORATION_MODE_OPEN_TAG}{instructions}{COLLABORATION_MODE_CLOSE_TAG}") + }) +} - if !prompted_categories.is_empty() { - sections.push(format!( - "These approval categories may still prompt the user when needed:\n{}", - prompted_categories.join("\n") - )); - } - if !rejected_categories.is_empty() { - sections.push(format!( - "These approval categories are automatically rejected instead of prompting the user:\n{}", - rejected_categories.join("\n") - )); - } +pub fn developer_permissions_with_network_text( + sandbox_mode: SandboxMode, + network_access: NetworkAccess, + approval_policy: AskForApproval, + guardian_approval_enabled: bool, + exec_policy: &Policy, + writable_roots: Option>, + additional_permissions_guidance_enabled: bool, +) -> String { + let on_request_instructions = || { + let on_request_rule = if additional_permissions_guidance_enabled { + APPROVAL_POLICY_ON_REQUEST_RULE_REQUEST_PERMISSION + } else { + APPROVAL_POLICY_ON_REQUEST_RULE + }; + let command_prefixes = format_allow_prefixes(exec_policy.get_allowed_prefixes()); + match command_prefixes { + Some(prefixes) => { + format!( + "{on_request_rule}\n## Approved command prefixes\nThe following prefix rules have already been approved: {prefixes}" + ) + } + None => on_request_rule.to_string(), + } + }; + let approval_policy_text = match approval_policy { + AskForApproval::Never => APPROVAL_POLICY_NEVER.to_string(), + AskForApproval::UnlessTrusted => APPROVAL_POLICY_UNLESS_TRUSTED.to_string(), + AskForApproval::OnFailure => APPROVAL_POLICY_ON_FAILURE.to_string(), + AskForApproval::OnRequest => { + let mut instructions = on_request_instructions(); + if guardian_approval_enabled && !GUARDIAN_APPROVAL_FEATURE.is_empty() { + instructions.push_str("\n\n"); + instructions.push_str(GUARDIAN_APPROVAL_FEATURE); + } + instructions + } + AskForApproval::Granular(granular_config) => { + let on_request_instructions = on_request_instructions(); + let sandbox_approval = granular_config.sandbox_approval; + let rules = granular_config.rules; + let skill_approval = granular_config.skill_approval; + let request_permissions = granular_config.request_permissions; + let mcp_elicitations = granular_config.mcp_elicitations; + format!( + "{on_request_instructions}\n\n\ + Approval policy is `granular`.\n\ + - `sandbox_approval`: {sandbox_approval}\n\ + - `rules`: {rules}\n\ + - `skill_approval`: {skill_approval}\n\ + - `request_permissions`: {request_permissions}\n\ + - `mcp_elicitations`: {mcp_elicitations}\n\ + When a category is `true`, requests in that category are allowed. When it is `false`, they are auto-rejected instead of prompting the user." + ) + } + }; + let writable_roots_text = match writable_roots { + Some(roots) if !roots.is_empty() => { + let roots_list: Vec = roots + .iter() + .map(|r| format!("`{}`", r.root.to_string_lossy())) + .collect(); + if roots_list.len() == 1 { + format!(" The writable root is {}.", roots_list[0]) + } else { + format!(" The writable roots are {}.", roots_list.join(", ")) + } + } + _ => String::new(), + }; + let sandbox_text = developer_sandbox_mode_text(sandbox_mode, network_access); + format!( + "\n{sandbox_text}\n{approval_policy_text}{writable_roots_text}\n" + ) +} - if shell_permission_requests_available { - sections.push(APPROVAL_POLICY_ON_REQUEST_RULE_REQUEST_PERMISSION.to_string()); - } - - if request_permissions_tool_prompts_allowed { - sections.push(request_permissions_tool_prompt_section().to_string()); - } - - if let Some(prefixes) = approved_command_prefixes_text(exec_policy) { - sections.push(format!( - "## Approved command prefixes\nThe following prefix rules have already been approved: {prefixes}" - )); - } - - sections.join("\n\n") +pub fn developer_sandbox_mode_text(mode: SandboxMode, network_access: NetworkAccess) -> String { + let template = match mode { + SandboxMode::DangerFullAccess => SANDBOX_MODE_DANGER_FULL_ACCESS.trim_end(), + SandboxMode::WorkspaceWrite => SANDBOX_MODE_WORKSPACE_WRITE.trim_end(), + SandboxMode::ReadOnly => SANDBOX_MODE_READ_ONLY.trim_end(), + }; + template.replace("{network_access}", &network_access.to_string()) } const MAX_RENDERED_PREFIXES: usize = 100; @@ -821,28 +745,14 @@ fn render_command_prefix(prefix: &[String]) -> String { format!("[{tokens}]") } -impl From for ResponseItem { - fn from(di: DeveloperInstructions) -> Self { - ResponseItem::Message { - id: None, - role: "developer".to_string(), - content: vec![ContentItem::InputText { - text: di.into_text(), - }], - end_turn: None, - phase: None, - } - } -} - -impl From for DeveloperInstructions { +impl From for CustomDeveloperInstructions { fn from(mode: SandboxMode) -> Self { let network_access = match mode { SandboxMode::DangerFullAccess => NetworkAccess::Enabled, SandboxMode::WorkspaceWrite | SandboxMode::ReadOnly => NetworkAccess::Restricted, }; - DeveloperInstructions::sandbox_text(mode, network_access) + CustomDeveloperInstructions::new(developer_sandbox_mode_text(mode, network_access)) } } @@ -1502,41 +1412,6 @@ mod tests { use std::path::PathBuf; use tempfile::tempdir; - #[test] - fn sandbox_permissions_helpers_match_documented_semantics() { - let cases = [ - (SandboxPermissions::UseDefault, false, false, false), - (SandboxPermissions::RequireEscalated, true, true, false), - ( - SandboxPermissions::WithAdditionalPermissions, - false, - true, - true, - ), - ]; - - for ( - sandbox_permissions, - requires_escalated_permissions, - requests_sandbox_override, - uses_additional_permissions, - ) in cases - { - assert_eq!( - sandbox_permissions.requires_escalated_permissions(), - requires_escalated_permissions - ); - assert_eq!( - sandbox_permissions.requests_sandbox_override(), - requests_sandbox_override - ); - assert_eq!( - sandbox_permissions.uses_additional_permissions(), - uses_additional_permissions - ); - } - } - #[test] fn convert_mcp_content_to_items_preserves_data_urls() { let contents = vec![serde_json::json!({ @@ -1883,36 +1758,52 @@ mod tests { #[test] fn converts_sandbox_mode_into_developer_instructions() { - let workspace_write: DeveloperInstructions = SandboxMode::WorkspaceWrite.into(); + let workspace_write: CustomDeveloperInstructions = SandboxMode::WorkspaceWrite.into(); assert_eq!( workspace_write, - DeveloperInstructions::new( + CustomDeveloperInstructions::new( "Filesystem sandboxing defines which files can be read or written. `sandbox_mode` is `workspace-write`: The sandbox permits reading files, and editing files in `cwd` and `writable_roots`. Editing files in other directories requires approval. Network access is restricted." ) ); - let read_only: DeveloperInstructions = SandboxMode::ReadOnly.into(); + let read_only: CustomDeveloperInstructions = SandboxMode::ReadOnly.into(); assert_eq!( read_only, - DeveloperInstructions::new( + CustomDeveloperInstructions::new( "Filesystem sandboxing defines which files can be read or written. `sandbox_mode` is `read-only`: The sandbox only permits reading files. Network access is restricted." ) ); } + #[test] + fn custom_developer_instructions_convert_to_developer_response_item() { + let instructions = CustomDeveloperInstructions::new("be concise"); + let response_item: ResponseItem = instructions.into(); + assert_eq!( + response_item, + ResponseItem::Message { + id: None, + role: MessageRole::Developer.to_string(), + content: vec![ContentItem::InputText { + text: "be concise".to_string(), + }], + end_turn: None, + phase: None, + } + ); + } + #[test] fn builds_permissions_with_network_access_override() { - let instructions = DeveloperInstructions::from_permissions_with_network( + let text = developer_permissions_with_network_text( SandboxMode::WorkspaceWrite, NetworkAccess::Enabled, AskForApproval::OnRequest, + false, &Policy::empty(), None, false, - false, ); - - let text = instructions.into_text(); assert!( text.contains("Network access is enabled."), "expected network access to be enabled in message" @@ -1933,15 +1824,14 @@ mod tests { exclude_slash_tmp: false, }; - let instructions = DeveloperInstructions::from_policy( + let text = developer_permissions_text( &policy, AskForApproval::UnlessTrusted, + false, &Policy::empty(), &PathBuf::from("/tmp"), false, - false, ); - let text = instructions.into_text(); assert!(text.contains("Network access is enabled.")); assert!(text.contains("`approval_policy` is `unless-trusted`")); } @@ -1955,290 +1845,57 @@ mod tests { codex_execpolicy::Decision::Allow, ) .expect("add rule"); - let instructions = DeveloperInstructions::from_permissions_with_network( + let text = developer_permissions_with_network_text( SandboxMode::WorkspaceWrite, NetworkAccess::Enabled, AskForApproval::OnRequest, + false, &exec_policy, None, false, - false, ); - - let text = instructions.into_text(); assert!(text.contains("prefix_rule")); assert!(text.contains("Approved command prefixes")); assert!(text.contains(r#"["git", "pull"]"#)); } - #[test] - fn includes_request_permissions_tool_instructions_for_unless_trusted_when_enabled() { - let instructions = DeveloperInstructions::from_permissions_with_network( - SandboxMode::WorkspaceWrite, - NetworkAccess::Enabled, - AskForApproval::UnlessTrusted, - &Policy::empty(), - None, - false, - true, - ); - - let text = instructions.into_text(); - assert!(text.contains("`approval_policy` is `unless-trusted`")); - assert!(text.contains("# request_permissions Tool")); - } - - #[test] - fn includes_request_permissions_tool_instructions_for_on_failure_when_enabled() { - let instructions = DeveloperInstructions::from_permissions_with_network( - SandboxMode::WorkspaceWrite, - NetworkAccess::Enabled, - AskForApproval::OnFailure, - &Policy::empty(), - None, - false, - true, - ); - - let text = instructions.into_text(); - assert!(text.contains("`approval_policy` is `on-failure`")); - assert!(text.contains("# request_permissions Tool")); - } - #[test] fn includes_request_permission_rule_instructions_for_on_request_when_enabled() { - let instructions = DeveloperInstructions::from_permissions_with_network( + let text = developer_permissions_with_network_text( SandboxMode::WorkspaceWrite, NetworkAccess::Enabled, AskForApproval::OnRequest, + false, &Policy::empty(), None, true, - false, ); - - let text = instructions.into_text(); assert!(text.contains("with_additional_permissions")); assert!(text.contains("additional_permissions")); } #[test] - fn includes_request_permissions_tool_instructions_for_on_request_when_tool_is_enabled() { - let instructions = DeveloperInstructions::from_permissions_with_network( + fn includes_request_permissions_category_in_granular_policy_instructions() { + let text = developer_permissions_with_network_text( SandboxMode::WorkspaceWrite, NetworkAccess::Enabled, - AskForApproval::OnRequest, - &Policy::empty(), - None, - false, - true, - ); - - let text = instructions.into_text(); - assert!(text.contains("# request_permissions Tool")); - assert!( - text.contains("The built-in `request_permissions` tool is available in this session.") - ); - } - - #[test] - fn on_request_includes_tool_guidance_alongside_inline_permission_guidance_when_both_exist() { - let instructions = DeveloperInstructions::from_permissions_with_network( - SandboxMode::WorkspaceWrite, - NetworkAccess::Enabled, - AskForApproval::OnRequest, - &Policy::empty(), - None, - true, - true, - ); - - let text = instructions.into_text(); - assert!(text.contains("with_additional_permissions")); - assert!(text.contains("# request_permissions Tool")); - } - - fn granular_categories_section(title: &str, categories: &[&str]) -> String { - format!("{title}\n{}", categories.join("\n")) - } - - fn granular_prompt_expected( - prompted_categories: &[&str], - rejected_categories: &[&str], - include_shell_permission_request_instructions: bool, - include_request_permissions_tool_section: bool, - ) -> String { - let mut sections = vec![granular_prompt_intro_text().to_string()]; - if !prompted_categories.is_empty() { - sections.push(granular_categories_section( - "These approval categories may still prompt the user when needed:", - prompted_categories, - )); - } - if !rejected_categories.is_empty() { - sections.push(granular_categories_section( - "These approval categories are automatically rejected instead of prompting the user:", - rejected_categories, - )); - } - if include_shell_permission_request_instructions { - sections.push(APPROVAL_POLICY_ON_REQUEST_RULE_REQUEST_PERMISSION.to_string()); - } - if include_request_permissions_tool_section { - sections.push(request_permissions_tool_prompt_section().to_string()); - } - sections.join("\n\n") - } - - #[test] - fn granular_policy_lists_prompted_and_rejected_categories_separately() { - let text = DeveloperInstructions::from( - AskForApproval::Granular(GranularApprovalConfig { - sandbox_approval: false, - rules: true, - skill_approval: false, - request_permissions: true, - mcp_elicitations: false, - }), - &Policy::empty(), - true, - false, - ) - .into_text(); - - assert_eq!( - text, - [ - granular_prompt_intro_text().to_string(), - granular_categories_section( - "These approval categories may still prompt the user when needed:", - &["- `rules`"], - ), - granular_categories_section( - "These approval categories are automatically rejected instead of prompting the user:", - &["- `sandbox_approval`", "- `skill_approval`", "- `mcp_elicitations`",], - ), - ] - .join("\n\n") - ); - } - - #[test] - fn granular_policy_includes_command_permission_instructions_when_sandbox_approval_can_prompt() { - let text = DeveloperInstructions::from( AskForApproval::Granular(GranularApprovalConfig { sandbox_approval: true, - rules: true, - skill_approval: true, - request_permissions: true, - mcp_elicitations: true, - }), - &Policy::empty(), - true, - false, - ) - .into_text(); - - assert_eq!( - text, - granular_prompt_expected( - &[ - "- `sandbox_approval`", - "- `rules`", - "- `skill_approval`", - "- `mcp_elicitations`", - ], - &[], - true, - false, - ) - ); - } - - #[test] - fn granular_policy_omits_shell_permission_instructions_when_inline_requests_are_disabled() { - let text = DeveloperInstructions::from( - AskForApproval::Granular(GranularApprovalConfig { - sandbox_approval: true, - rules: true, - skill_approval: true, - request_permissions: true, - mcp_elicitations: true, - }), - &Policy::empty(), - false, - false, - ) - .into_text(); - - assert_eq!( - text, - granular_prompt_expected( - &[ - "- `sandbox_approval`", - "- `rules`", - "- `skill_approval`", - "- `mcp_elicitations`", - ], - &[], - false, - false, - ) - ); - } - - #[test] - fn granular_policy_includes_request_permissions_tool_only_when_that_prompt_can_still_fire() { - let allowed = DeveloperInstructions::from( - AskForApproval::Granular(GranularApprovalConfig { - sandbox_approval: true, - rules: true, - skill_approval: true, - request_permissions: true, - mcp_elicitations: true, - }), - &Policy::empty(), - true, - true, - ) - .into_text(); - assert!(allowed.contains("# request_permissions Tool")); - - let rejected = DeveloperInstructions::from( - AskForApproval::Granular(GranularApprovalConfig { - sandbox_approval: true, - rules: true, - skill_approval: true, - request_permissions: false, - mcp_elicitations: true, - }), - &Policy::empty(), - true, - true, - ) - .into_text(); - assert!(!rejected.contains("# request_permissions Tool")); - } - - #[test] - fn granular_policy_lists_request_permissions_category_without_tool_section_when_tool_is_unavailable() - { - let text = DeveloperInstructions::from( - AskForApproval::Granular(GranularApprovalConfig { - sandbox_approval: false, rules: false, skill_approval: false, request_permissions: true, mcp_elicitations: false, }), - &Policy::empty(), - true, false, - ) - .into_text(); - - assert!(!text.contains("- `request_permissions`")); - assert!(!text.contains("# request_permissions Tool")); + &Policy::empty(), + None, + true, + ); + assert!(text.contains("- `sandbox_approval`: true")); + assert!(text.contains("- `rules`: false")); + assert!(text.contains("- `skill_approval`: false")); + assert!(text.contains("- `request_permissions`: true")); + assert!(text.contains("- `mcp_elicitations`: false")); } #[test] diff --git a/docs/model-visible-context.md b/docs/model-visible-context.md new file mode 100644 index 0000000000..6ea3ec3453 --- /dev/null +++ b/docs/model-visible-context.md @@ -0,0 +1,94 @@ +# Model-visible context fragments + +Codex injects model-visible context through two envelopes: + +- the developer envelope, rendered as a single `developer` message +- the contextual-user envelope, rendered as a single `user` message whose contents are contextual state rather than real user intent + +Both envelopes use the same internal fragment contract in `codex-rs/core`. +Envelope builders normalize text fragment boundaries by inserting `\n\n` between adjacent text content items, so fragments do not run together in the model-visible token stream. + +## Canonical rules + +1. All model-visible injected context must be represented as a typed `ModelVisibleContextFragment`. +2. Turn-state context assembly always produces exactly two envelopes: one developer message and one contextual-user message. +3. Contextual-user fragments must have stable detection so history parsing can distinguish contextual state from true user intent. +4. If a fragment is derived from durable/current turn state and should survive history-mutating flows (resume/fork/compaction/backtracking) via re-diffing, it must implement `TurnContextDiffFragment`. +5. Do not hand-construct model-visible `ResponseItem::Message` payloads in new code. Use fragment conversion (`into_message` / `into_response_input_item`) and envelope builders. + +Rule 2 applies to turn-state context assembly (`build_initial_context` / `build_settings_update_items`). Runtime or session-prefix events may inject standalone fragment messages, but those still must be typed `ModelVisibleContextFragment`s (rules 1 and 5). + +## Blessed path + +When adding new model-visible context: + +1. Define a typed fragment type. +2. Implement `ModelVisibleContextFragment` for it. +3. Set the fragment `type Role` to the correct developer or contextual-user role. +4. If it is a contextual-user fragment, implement contextual-user detection: + - prefer `TaggedContextualUserFragment` for marker-based detection/wrapping + - use `ContextualUserFragmentDetector` when matching is dynamic (for example AGENTS.md tags that embed directory names) +5. If the fragment is derived from durable/current turn state and should be diffed/reinjected after history mutations, also implement `TurnContextDiffFragment`. +6. Register the fragment type: + - developer turn-state fragments: add to `REGISTERED_DEVELOPER_TURN_STATE_FRAGMENT_BUILDERS` in `context_manager/updates.rs`. + - contextual-user fragments: add once to `REGISTERED_CONTEXTUAL_USER_FRAGMENTS` in `model_visible_context.rs` (this powers both history detection and optional turn-state diff assembly). +7. Push the resulting fragments through the shared envelope builders. + +Do not hand-build developer or contextual-user model-visible `ResponseItem`s in new code. + +The role lives in the fragment's associated `type Role`. + +## Choosing an envelope + +Use the developer envelope for developer-role guidance: + +- permissions / approval policy instructions +- collaboration-mode developer guidance +- model switch and realtime notices +- personality guidance +- subagent roster and subagent notifications +- other developer-only instructions + +Use `CustomDeveloperInstructions` only for custom developer override text (for example config/app-server `developer_instructions` values). + +For system-generated developer guidance (permissions, collaboration-mode wrappers, realtime notices, personality wrappers, model-switch notices), use typed developer fragments whose text comes from the neutral `developer_*_text` helpers in `codex_protocol::models`. + +Use the contextual-user envelope for contextual state or runtime markers that should not count as real user turns: + +- AGENTS / user instructions +- plugin instructions +- environment context +- skill instructions +- user shell command records +- turn-aborted markers + +Contextual-user fragments must have stable detection because history parsing uses it to distinguish contextual state from real user intent. + +Use `` only for environment facts derived from turn/session state (`TurnContext`) that may need turn-to-turn diffing. Today that includes `cwd`, `shell`, optional `current_date`, optional `timezone`, and optional network allow/deny domain summaries. Do not put developer policy/instructions or plugin/skill metadata into ``; those belong in their own typed fragments. + +## Turn-backed fragments + +If a fragment is derived from durable turn/session state and should be updated/reinjected by diff after history mutation, keep its extraction, diffing, and rendering logic together by implementing `TurnContextDiffFragment`. + +`TurnContextDiffFragment` exposes one `build(...)` method that receives: + +- current `TurnContext` +- optional `reference_context_item` (the turn context state already represented in model-visible history, if available) +- `TurnContextDiffParams` for shared runtime inputs (for example shell, previous-turn bridge state, exec-policy rendering context, and feature gating flags) + +This is envelope-agnostic: both contextual-user state fragments and developer state-diff fragments use the same trait. + +If a fragment is runtime-event/session-prefix only (for example subagent completion notification, turn-aborted marker, or user-shell-command marker), `ModelVisibleContextFragment` alone is enough. + +That trait is the blessed path for fragments that need to: + +- build full initial context when no reference context item is available +- compute turn diffs when a reference context item is available + +`EnvironmentContext` is the canonical example. Future turn-backed contextual fragments should follow the same pattern instead of introducing one-off extraction or diff helpers. + +## History behavior + +Developer fragments do not need contextual-user marker matching because they are already separable by message role. + +Contextual-user fragments do need marker matching because they share the `user` role with real user turns, and history parsing / truncation must avoid treating injected context as actual user input.