docs: refine subagent behavior docs

2026-04-28 02:11:08 +03:00 · 2025-11-22 09:34:49 -08:00
parent beb71f4a00
commit da7d6f1abb
4 changed files with 53 additions and 24 deletions
--- a/codex-rs/core/root_agent_prompt.md
+++ b/codex-rs/core/root_agent_prompt.md
@@ -2,6 +2,8 @@ You are the **root agent** in a multi‑agent Codex session.

 Your job is to solve the user’s task end‑to‑end. Use subagents as semi‑autonomous workers when that makes the work simpler, safer, or more parallel, and otherwise act directly in the conversation as a normal assistant.

+Subagent behavior and limits are configured via `config.toml` settings such as `max_active_subagents`, `root_agent_uses_user_messages`, `subagent_root_inbox_autosubmit`, and `subagent_inbox_inject_before_tools`.
+
 Use subagents as follows:

 - Spawn or fork a subagent when a piece of work can be isolated behind a clear prompt, or when you want an independent view on a problem.
@@ -13,10 +15,10 @@ Use subagents as follows:
 - Use `subagent_list`, `subagent_prune`, and `subagent_cancel` to keep the set of active subagents small and relevant.
 - When you spawn a subagent or start a watchdog and there’s nothing else useful to do, issue the tool call right away and say you’re waiting for results (or for the watchdog to start). If you can do other useful work in parallel, do that instead of stalling, and only await when necessary.

-Be concise and direct. Delegate multi‑step or long‑running work to subagents, summarize what they have done for the user, and always keep the conversation focused on the user’s goal.**
+Be concise and direct. Delegate multi‑step or long‑running work to subagents, summarize what they have done for the user, and always keep the conversation focused on the user’s goal.

-Example: long‑running supervision with a watchdog
+**Example: long‑running supervision with a watchdog**
 - Spawn a supervisor to own `PLAN.md`: e.g., `subagent_spawn` label `supervisor`, prompt it to keep the plan fresh, launch workers, and heartbeat every few minutes.
 - Attach a watchdog to the supervisor (or to yourself) that pings on a cadence and asks for progress: call `subagent_watchdog` with `{agent_id: <supervisor_id>, interval_s: 300, message: "Watchdog ping — report current status and PLAN progress", cancel: false}`.
 - The supervisor should reply to each ping with a brief status and, if needed, spawn/interrupt workers; the root can cancel or retarget by invoking `subagent_watchdog` again with `cancel: true`.
- You can also set a self‑watchdog on the root agent to ensure you keep emitting status updates during multi‑hour tasks.***
+- You can also set a self‑watchdog on the root agent to ensure you keep emitting status updates during multi‑hour tasks.
--- a/codex-rs/core/subagent_prompt.md
+++ b/codex-rs/core/subagent_prompt.md
@@ -1,16 +1,16 @@
 # You are a Subagent

-You are a **subagent** in a multi‑agent Codex session. You may have prior message context or not - you should not totally disregard it, but your goal is the prompt next sent to you.
+You are a **subagent** in a multi‑agent Codex session. You may see prior conversation context, but treat it as background; your primary goal is to respond to the prompt you have just been given.

 Another agent has created you to complete a specific part of a larger task. Your job is to do that work carefully and efficiently, then communicate what you have done so your parent agent can integrate the results.

 Work style:

- Stay within the scope of the prompt and the files or questions you have been given.
- When you make meaningful progress, or when you finish a sub‑task, send a short summary back to your parent via `subagent_send_message` so they can see what has changed.
- If you need to coordinate with another agent, use `subagent_send_message` to send them a clear, concise request and, when appropriate, a brief summary of context.
+- Stay within the scope of the prompt and the files or questions you’ve been given.
+- Respect the parent/root agent’s instructions and the configured sandbox/approval rules; never attempt to bypass safety constraints.
+- When you make meaningful progress or finish a sub‑task, send a short summary back to your parent via `subagent_send_message` so they can see what changed.
+- If you need to coordinate with another agent, use `subagent_send_message` to send a clear, concise request and, when appropriate, a brief summary of context.
 - Use `subagent_await` only when you truly need to wait for another agent’s response before continuing. If you can keep working independently, prefer to do so and send progress updates instead of blocking.
 - Use `subagent_logs` only when you need to inspect another agent’s recent activity without changing its state.

 Communicate in plain language. Explain what you changed, what you observed, and what you recommend next, so that your parent agent can make good decisions without rereading all of your intermediate steps.
-
--- a/codex-rs/docs/protocol_v1.md
+++ b/codex-rs/docs/protocol_v1.md
@@ -2,7 +2,7 @@ Overview of Protocol Defined in [protocol.rs](../core/src/protocol.rs) and [agen

 The goal of this document is to define terminology used in the system and explain the expected behavior of the system.

-NOTE: The code might not completely match this spec. There are a few minor changes that need to be made after this spec has been reviewed, which will not alter the existing TUI's functionality.
+NOTE: This document summarizes the protocol at a high level. The Rust types and enums in [protocol.rs](../core/src/protocol.rs) are the source of truth and may occasionally include additional fields or variants beyond what is covered here.

 ## Entities

@@ -79,11 +79,15 @@ For complete documentation of the `Op` and `EventMsg` variants, refer to [protoc
  - `EventMsg::Error` – A task stopped with an error
  - `EventMsg::Warning` – A non-fatal warning that the client should surface to the user
  - `EventMsg::TurnComplete` – Contains a `response_id` bookmark for last `response_id` executed by the task. This can be used to continue the task at a later point in time, perhaps with additional user input.
- `EventMsg::SubagentLifecycle` – Emits `SubagentSummary` payloads (now including `agent_id`, `parent_agent_id`, and pending inbox counts) whenever a child session is created, updates status/reasoning headers, or is removed.
-  These lifecycle events now persist in rollout files so `codex resume` can restore prior subagent state (attachments on spawn/fork and detach on cancel/prune).
- `EventMsg::AgentInbox` – Notifies the UI when a subagent’s inbox depth changes, e.g., after the parent sends an interrupt. Contains the target `agent_id`, `session_id`, and the counts of pending regular vs interrupt messages so UIs can render badges without polling.
+- `EventMsg::SubagentLifecycle` – Emits `SubagentSummary` payloads that describe each child session, including its `agent_id`, `parent_agent_id`, and current pending inbox counts.
+  These lifecycle events are emitted whenever the daemon’s view of a subagent changes (creation, status/reasoning-header updates, or removal). They also persist in rollout files so `codex resume` can rebuild prior subagent state—including attachments on spawn/fork and detach on cancel/prune—before replaying model turns.
+- `EventMsg::AgentInbox` – Notifies the UI when a subagent’s inbox depth changes, for example after the parent sends an interrupt or a watchdog ping arrives. The payload includes the target `agent_id`, `session_id`, and the counts of pending regular vs interrupt messages so UIs can render badges without polling.
+  For example, if the root interrupts child agent `3`, the UI may receive an `AgentInbox` event for `agent_id = 3` showing one pending interrupt message and zero regular messages.

-Subagent tool reminders: `subagent_await` accepts an optional `timeout_s` capped at 1,800 s (30 minutes). Omit it or pass 0 to use the 30-minute default; prefer at least 300 s and use backoff (30s → 60s → 120s) so you can check on children, log progress, or deliver interrupts instead of parking for the full cap.
+#### Subagent tool reminders
+
+- `subagent_await` accepts an optional `timeout_s` capped at 1,800 s (30 minutes). Omit it or pass `0` to use the 30-minute default. Each `timeout_s` must be at least 300 s (5 minutes); prefer 5–30 minute timeouts and use backoff (for example, 300s → 600s → 1,200s) so you can check on children, log progress, or deliver interrupts instead of parking for the full cap.
+- `subagent_logs` is read-only and does not change a child’s state; prefer it when you only need to inspect recent activity without advancing the subagent.

 The `response_id` returned from each task matches the OpenAI `response_id` stored in the API's `/responses` endpoint. It can be stored and used in future `Sessions` to resume threads of work.

--- a/docs/config.md
+++ b/docs/config.md
@@ -35,7 +35,7 @@ Optional and experimental capabilities are toggled via the `[features]` table in
 [features]
 streamable_shell = true          # enable the streamable exec tool
 web_search_request = true        # allow the model to request web searches
-# subagent_tools = true          # expose spawn/fork/list/await/logs/prune subagent tools
+# subagent_tools = true          # expose subagent_* orchestration tools (spawn/fork/send_message/list/await/watchdog/logs/prune/cancel)
 # view_image_tool defaults to true; omit to keep defaults
 ```

@@ -49,7 +49,7 @@ Supported features:
 | `apply_patch_freeform`                    |  false  | Beta         | Include the freeform `apply_patch` tool              |
 | `view_image_tool`                         |  true   | Stable       | Include the `view_image` tool                        |
 | `web_search_request`                      |  false  | Stable       | Allow the model to issue web searches                |
-| `subagent_tools`                          |  false  | Experimental | Enable built-in subagent orchestration tools         |
+| `subagent_tools`                          |  false  | Experimental | Enable built-in subagent orchestration tools (spawn/fork/send_message/list/await/watchdog/logs/prune/cancel) |
 | `experimental_sandbox_command_assessment` |  false  | Experimental | Enable model-based sandbox risk assessment           |
 | `ghost_commit`                            |  false  | Experimental | Create a ghost commit each turn                      |
 | `enable_experimental_windows_sandbox`     |  false  | Experimental | Use the Windows restricted-token sandbox             |
@@ -362,9 +362,24 @@ max_active_subagents = 8

 When the limit is reached, additional `spawn`/`fork` tool calls immediately return an error telling the model to prune or await existing children before launching new work. Values below 1 are rejected, and values above 64 are clamped to 64 to prevent runaway resource use.

+As a rule of thumb:
+
+- Keep the default of `8` for typical workflows with a handful of concurrent workers.
+- Lower the value (for example, `2`–`4`) on resource-constrained machines or when you rarely use subagents and want a tighter bound on memory.
+- Raise it cautiously (for example, `16`–`32`) only if you are intentionally orchestrating many parallel subagents and are confident your machine has CPU and memory headroom.
+
 ### root_agent_uses_user_messages

-Controls how the root agent’s messages to a subagent are represented in the subagent’s own history. When `true` (default), root-to-subagent messages are injected as `user` turns in the child. When `false`, every cross-agent message arrives only via `subagent_await` tool results, so the child must explicitly read the tool output to see root instructions.
+Controls how the root agent’s messages to a subagent are represented in the subagent’s own history.
+
+When `true` (default), messages the root sends with `subagent_send_message` are injected as ordinary `user` turns in the child. This keeps the subagent’s prompt simple and matches how models are usually trained to read instructions. For example, the child might see:
+
+```text
+user: Please summarize the last 10 log lines.
+assistant: …
+```
+
+When `false`, cross-agent messages arrive only via `subagent_await` tool results, and the child must explicitly read the tool output to discover what the root said. Only disable this if you are experimenting with fully tool-centric prompting and are prepared to handle the extra plumbing inside the subagent.

 ### subagent_root_inbox_autosubmit

@@ -386,19 +401,27 @@ and whether it may auto-start a follow-up turn based on those messages. When
 When `false`, the root must call `subagent_await` explicitly to see inbox
 messages during a turn, and no autosubmitted turns are emitted while idle.

+Example: if a worker subagent finishes while the root is idle and
+`subagent_root_inbox_autosubmit = true`, Codex will drain the inbox, record a
+synthetic `subagent_await` call/output for the completion in the root
+transcript, and immediately start a new turn so the root can read the result
+and decide what to do next without waiting for fresh user input.
+
 ### subagent_inbox_inject_before_tools

 Controls where synthetic `subagent_await` tool calls and outputs derived from
 inbox delivery are injected relative to real tool outputs inside a turn.

- When `false` (default), Codex records the model’s tool call and tool
-  output(s) for a turn first, and only then appends synthetic `subagent_await`
-  calls/outputs derived from inbox messages (Option A). This is closer to
-  training-time patterns where the model generally sees its own tool call and
-  result before extra context.
- When `true`, Codex records synthetic `subagent_await` calls/outputs first
-  and then appends tool outputs (Option B), which is closer to strict
-  chronological ordering when inbox messages arrive while tools are running.
+- When `false` (default), Codex records the model’s tool call(s) and tool
+  output(s) for a turn first, and then appends any synthetic `subagent_await`
+  calls/outputs derived from inbox messages. This most closely matches common
+  training patterns where the model sees its own tool call and result before
+  additional context, and is recommended for most setups.
+- When `true`, Codex records synthetic `subagent_await` calls/outputs
+  immediately after the model’s messages/tool calls and before the tool
+  outputs. This is closer to strict chronological ordering when inbox
+  messages arrive while tools are running, because the synthetic await
+  appears ahead of the corresponding tool results in the history.

 This flag only affects how Codex orders conversation items in history; it
 never splices synthetic items into the middle of an in-flight streaming turn.