Add guardian approval MVP (#13692)

## Summary
- add the guardian reviewer flow for `on-request` approvals in command,
patch, sandbox-retry, and managed-network approval paths
- keep guardian behind `features.guardian_approval` instead of exposing
a public `approval_policy = guardian` mode
- route ordinary `OnRequest` approvals to the guardian subagent when the
feature is enabled, without changing the public approval-mode surface

## Public model
- public approval modes stay unchanged
- guardian is enabled via `features.guardian_approval`
- when that feature is on, `approval_policy = on-request` keeps the same
approval boundaries but sends those approval requests to the guardian
reviewer instead of the user
- `/experimental` only persists the feature flag; it does not rewrite
`approval_policy`
- CLI and app-server no longer expose a separate `guardian` approval
mode in this PR

## Guardian reviewer
- the reviewer runs as a normal subagent and reuses the existing
subagent/thread machinery
- it is locked to a read-only sandbox and `approval_policy = never`
- it does not inherit user/project exec-policy rules
- it prefers `gpt-5.4` when the current provider exposes it, otherwise
falls back to the parent turn's active model
- it fail-closes on timeout, startup failure, malformed output, or any
other review error
- it currently auto-approves only when `risk_score < 80`

## Review context and policy
- guardian mirrors `OnRequest` approval semantics rather than
introducing a separate approval policy
- explicit `require_escalated` requests follow the same approval surface
as `OnRequest`; the difference is only who reviews them
- managed-network allowlist misses that enter the approval flow are also
reviewed by guardian
- the review prompt includes bounded recent transcript history plus
recent tool call/result evidence
- transcript entries and planned-action strings are truncated with
explicit `<guardian_truncated ... />` markers so large payloads stay
bounded
- apply-patch reviews include the full patch content (without
duplicating the structured `changes` payload)
- the guardian request layout is snapshot-tested using the same
model-visible Responses request formatter used elsewhere in core

## Guardian network behavior
- the guardian subagent inherits the parent session's managed-network
allowlist when one exists, so it can use the same approved network
surface while reviewing
- exact session-scoped network approvals are copied into the guardian
session with protocol/port scope preserved
- those copied approvals are now seeded before the guardian's first turn
is submitted, so inherited approvals are available during any immediate
review-time checks

## Out of scope / follow-ups
- the sandbox-permission validation split was pulled into a separate PR
and is not part of this diff
- a future follow-up can enable `serde_json` preserve-order in
`codex-core` and then simplify the guardian action rendering further

---------

Co-authored-by: Codex <noreply@openai.com>
This commit is contained in:
Charley Cunningham
2026-03-07 05:40:10 -08:00
committed by GitHub
parent cf143bf71e
commit e84ee33cc0
34 changed files with 2477 additions and 139 deletions

View File

@@ -502,16 +502,18 @@ pub fn render_decision_for_unmatched_command(
cfg!(windows) && matches!(sandbox_policy, SandboxPolicy::ReadOnly { .. });
// If the command is flagged as dangerous or we have no sandbox protection,
// we should never allow it to run without user approval.
// we should never allow it to run without approval.
//
// We prefer to prompt the user rather than outright forbid the command,
// but if the user has explicitly disabled prompts, we must
// forbid the command.
if command_might_be_dangerous(command) || runtime_sandbox_provides_safety {
return if matches!(approval_policy, AskForApproval::Never) {
Decision::Forbidden
} else {
Decision::Prompt
return match approval_policy {
AskForApproval::Never => Decision::Forbidden,
AskForApproval::OnFailure
| AskForApproval::OnRequest
| AskForApproval::UnlessTrusted
| AskForApproval::Reject(_) => Decision::Prompt,
};
}
@@ -1578,7 +1580,8 @@ prefix_rule(pattern=["git"], decision="prompt")
}
#[tokio::test]
async fn exec_approval_requirement_rejects_unmatched_prompt_when_sandbox_rejection_enabled() {
async fn exec_approval_requirement_rejects_unmatched_sandbox_escalation_when_sandbox_rejection_enabled()
{
let command = vec!["madeup-cmd".to_string()];
let requirement = ExecPolicyManager::default()