Motivation:
User-shell command tests can observe shell startup output from the local environment before or around the command output being asserted. Exact output equality makes these tests depend on machine-specific shell profile behavior.
Summary:
Update thread shell command tests to wait for output deltas containing the expected command output and to assert aggregated output contains the expected text instead of requiring exact equality.
Testing:
- cargo test -p codex-app-server --test all thread_shell_command_runs_as_standalone_turn_and_persists_history
- cargo test -p codex-app-server --test all thread_shell_command_uses_existing_active_turn
- cargo test -p codex-app-server