feat: mem v2 - PR1 (#11364)

# Memories migration plan (simplified global workflow) ## Target behavior - One shared memory root only: `~/.codex/memories/`. - No per-cwd memory buckets, no cwd hash handling. - Phase 1 candidate rules: - Not currently being processed unless the job lease is stale. - Rollout updated within the max-age window (currently 30 days). - Rollout idle for at least 12 hours (new constant). - Global cap: at most 64 stage-1 jobs in `running` state at any time (new invariant). - Stage-1 model output shape (new): - `rollout_slug` (accepted but ignored for now). - `rollout_summary`. - `raw_memory`. - Phase-1 artifacts written under the shared root: - `rollout_summaries/<thread_id>.md` for each rollout summary. - `raw_memories.md` containing appended/merged raw memory paragraphs. - Phase 2 runs one consolidation agent for the shared `memories/` directory. - Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry. ## Current code map - Core startup pipeline: `core/src/memories/startup/mod.rs`. - Stage-1 request+parse: `core/src/memories/startup/extract.rs`, `core/src/memories/stage_one.rs`, templates in `core/templates/memories/`. - File materialization: `core/src/memories/storage.rs`, `core/src/memories/layout.rs`. - Scope routing (cwd/user): `core/src/memories/scope.rs`, `core/src/memories/startup/mod.rs`. - DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`. ## PR plan ## PR 1: Correct phase-1 selection invariants (no behavior-breaking layout changes yet) - Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in `core/src/memories/mod.rs`. - Thread this into `state::claim_stage1_jobs_for_startup(...)`. - Enforce idle-time filter in DB selection logic (not only in-memory filtering after `scan_limit`) so eligible threads are not starved by very recent threads. - Enforce global running cap of 64 at claim time in DB logic: - Count fresh `memory_stage1` running jobs. - Only allow new claims while count < cap. - Keep stale-lease takeover behavior intact. - Add/adjust tests in `state/src/runtime.rs`: - Idle filter inclusion/exclusion around 12h boundary. - Global running-cap guarantee. - Existing stale/fresh ownership behavior still passes. Acceptance criteria: - Startup never creates more than 64 fresh `memory_stage1` running jobs. - Threads updated <12h ago are skipped. - Threads older than 30d are skipped. ## PR 2: Stage-1 output contract + storage artifacts (forward-compatible) - Update parser/types to accept the new structured output while keeping backward compatibility: - Add `rollout_slug` (optional for now). - Add `rollout_summary`. - Keep alias support for legacy `summary` and `rawMemory` until prompt swap completes. - Update stage-1 schema generator in `core/src/memories/stage_one.rs` to include the new keys. - Update prompt templates: - `core/templates/memories/stage_one_system.md`. - `core/templates/memories/stage_one_input.md`. - Replace storage model in `core/src/memories/storage.rs`: - Introduce `rollout_summaries/` directory writer (`<thread_id>.md` files). - Introduce `raw_memories.md` aggregator writer from DB rows. - Keep deterministic rebuild behavior from DB outputs so files can always be regenerated. - Update consolidation prompt template to reference `rollout_summaries/` + `raw_memories.md` inputs. Acceptance criteria: - Stage-1 accepts both old and new output keys during migration. - Phase-1 artifacts are generated in new format from DB state. - No dependence on per-thread files in `raw_memories/`. ## PR 3: Remove per-cwd memories and move to one global memory root - Simplify layout in `core/src/memories/layout.rs`: - Single root: `codex_home/memories`. - Remove cwd-hash bucket helpers and normalization logic used only for memory pathing. - Remove scope branching from startup phase-2 dispatch path: - No cwd/user mapping in `core/src/memories/startup/mod.rs`. - One target root for consolidation. - In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd consolidation scope. - Keep one logical consolidation scope/job key (global/user) to avoid a risky schema rewrite in same PR. - Add one-time migration helper (core side) to preserve current shared memory output: - If `~/.codex/memories/user/memory` exists and new root is empty, move/copy contents into `~/.codex/memories`. - Leave old hashed cwd buckets untouched for now (safe/no-destructive migration). Acceptance criteria: - New runs only read/write `~/.codex/memories`. - No new cwd-scoped consolidation jobs are enqueued. - Existing user-shared memory content is preserved. ## PR 4: Phase-2 global lock simplification and cleanup - Replace multi-scope dispatch with a single global consolidation claim path: - Either reuse jobs table with one fixed key, or add a tiny dedicated lock helper; keep 1h lease. - Ensure at most one consolidation agent can run at once. - Keep heartbeat + stale lock recovery semantics in `core/src/memories/startup/watch.rs`. - Remove dead scope code and legacy constants no longer used. - Update tests: - One-agent-at-a-time behavior. - Lock expiry allows takeover after stale lease. Acceptance criteria: - Exactly one phase-2 consolidation agent can be active cluster-wide (per local DB). - Stale lock recovers automatically. ## PR 5: Final cleanup and docs - Remove legacy artifacts and references: - `raw_memories/` and `memory_summary.md` assumptions from prompts/comments/tests. - Scope constants for cwd memory pathing in core/state if fully unused. - Update docs under `docs/` for memory workflow and directory layout. - Add a brief operator note for rollout: compatibility window for old stage-1 JSON keys and when to remove aliases. Acceptance criteria: - Code and docs reflect only the simplified global workflow. - No stale references to per-cwd memory buckets. ## Notes on sequencing - PR 1 is safest first because it improves correctness without changing external artifact layout. - PR 2 keeps parser compatibility so prompt deployment can happen independently. - PR 3 and PR 4 split filesystem/scope simplification from locking simplification to reduce blast radius. - PR 5 is intentionally cleanup-only.
2026-05-04 13:21:54 +03:00 · 2026-02-10 21:29:06 +00:00
parent a6e9469fa4
commit 07da740c8a
5 changed files with 322 additions and 74 deletions
--- a/codex-rs/state/src/runtime.rs
+++ b/codex-rs/state/src/runtime.rs
@@ -77,6 +77,16 @@ pub struct Stage1JobClaim {
    pub ownership_token: String,
 }

+#[derive(Debug, Clone, Copy)]
+pub struct Stage1StartupClaimParams<'a> {
+    pub scan_limit: usize,
+    pub max_claimed: usize,
+    pub max_age_days: i64,
+    pub min_rollout_idle_hours: i64,
+    pub allowed_sources: &'a [String],
+    pub lease_seconds: i64,
+}
+
 /// Scope row used to queue phase-2 consolidation work.
 #[derive(Debug, Clone, PartialEq, Eq)]
 pub struct PendingScopeConsolidation {
@@ -922,6 +932,7 @@ mod tests {
    use super::STATE_DB_FILENAME;
    use super::STATE_DB_VERSION;
    use super::Stage1JobClaimOutcome;
+    use super::Stage1StartupClaimParams;
    use super::StateRuntime;
    use super::ThreadMetadata;
    use super::state_db_filename;
@@ -1095,7 +1106,7 @@ mod tests {
        let owner_b = ThreadId::from_string(&Uuid::new_v4().to_string()).expect("owner id");

        let claim = runtime
-            .try_claim_stage1_job(thread_id, owner_a, 100, 3600)
+            .try_claim_stage1_job(thread_id, owner_a, 100, 3600, 64)
            .await
            .expect("claim stage1 job");
        let ownership_token = match claim {
@@ -1112,13 +1123,13 @@ mod tests {
        );

        let up_to_date = runtime
-            .try_claim_stage1_job(thread_id, owner_b, 100, 3600)
+            .try_claim_stage1_job(thread_id, owner_b, 100, 3600, 64)
            .await
            .expect("claim stage1 up-to-date");
        assert_eq!(up_to_date, Stage1JobClaimOutcome::SkippedUpToDate);

        let needs_rerun = runtime
-            .try_claim_stage1_job(thread_id, owner_b, 101, 3600)
+            .try_claim_stage1_job(thread_id, owner_b, 101, 3600, 64)
            .await
            .expect("claim stage1 newer source");
        assert!(
@@ -1146,13 +1157,13 @@ mod tests {
            .expect("upsert thread");

        let claim_a = runtime
-            .try_claim_stage1_job(thread_id, owner_a, 100, 3600)
+            .try_claim_stage1_job(thread_id, owner_a, 100, 3600, 64)
            .await
            .expect("claim a");
        assert!(matches!(claim_a, Stage1JobClaimOutcome::Claimed { .. }));

        let claim_b_fresh = runtime
-            .try_claim_stage1_job(thread_id, owner_b, 100, 3600)
+            .try_claim_stage1_job(thread_id, owner_b, 100, 3600, 64)
            .await
            .expect("claim b fresh");
        assert_eq!(claim_b_fresh, Stage1JobClaimOutcome::SkippedRunning);
@@ -1164,7 +1175,7 @@ mod tests {
            .expect("force stale lease");

        let claim_b_stale = runtime
-            .try_claim_stage1_job(thread_id, owner_b, 100, 3600)
+            .try_claim_stage1_job(thread_id, owner_b, 100, 3600, 64)
            .await
            .expect("claim b stale");
        assert!(matches!(
@@ -1176,20 +1187,26 @@ mod tests {
    }

    #[tokio::test]
-    async fn claim_stage1_jobs_filters_by_age_and_current_thread() {
+    async fn claim_stage1_jobs_filters_by_age_idle_and_current_thread() {
        let codex_home = unique_temp_dir();
        let runtime = StateRuntime::init(codex_home.clone(), "test-provider".to_string(), None)
            .await
            .expect("initialize runtime");

        let now = Utc::now();
-        let recent_at = now - Duration::seconds(10);
+        let fresh_at = now - Duration::hours(1);
+        let just_under_idle_at = now - Duration::hours(12) + Duration::minutes(1);
+        let eligible_idle_at = now - Duration::hours(12) - Duration::minutes(1);
        let old_at = now - Duration::days(31);

        let current_thread_id =
            ThreadId::from_string(&Uuid::new_v4().to_string()).expect("current thread id");
-        let recent_thread_id =
-            ThreadId::from_string(&Uuid::new_v4().to_string()).expect("recent thread id");
+        let fresh_thread_id =
+            ThreadId::from_string(&Uuid::new_v4().to_string()).expect("fresh thread id");
+        let just_under_idle_thread_id =
+            ThreadId::from_string(&Uuid::new_v4().to_string()).expect("just under idle thread id");
+        let eligible_idle_thread_id =
+            ThreadId::from_string(&Uuid::new_v4().to_string()).expect("eligible idle thread id");
        let old_thread_id =
            ThreadId::from_string(&Uuid::new_v4().to_string()).expect("old thread id");

@@ -1202,11 +1219,35 @@ mod tests {
            .await
            .expect("upsert current");

-        let mut recent =
-            test_thread_metadata(&codex_home, recent_thread_id, codex_home.join("recent"));
-        recent.created_at = recent_at;
-        recent.updated_at = recent_at;
-        runtime.upsert_thread(&recent).await.expect("upsert recent");
+        let mut fresh =
+            test_thread_metadata(&codex_home, fresh_thread_id, codex_home.join("fresh"));
+        fresh.created_at = fresh_at;
+        fresh.updated_at = fresh_at;
+        runtime.upsert_thread(&fresh).await.expect("upsert fresh");
+
+        let mut just_under_idle = test_thread_metadata(
+            &codex_home,
+            just_under_idle_thread_id,
+            codex_home.join("just-under-idle"),
+        );
+        just_under_idle.created_at = just_under_idle_at;
+        just_under_idle.updated_at = just_under_idle_at;
+        runtime
+            .upsert_thread(&just_under_idle)
+            .await
+            .expect("upsert just-under-idle");
+
+        let mut eligible_idle = test_thread_metadata(
+            &codex_home,
+            eligible_idle_thread_id,
+            codex_home.join("eligible-idle"),
+        );
+        eligible_idle.created_at = eligible_idle_at;
+        eligible_idle.updated_at = eligible_idle_at;
+        runtime
+            .upsert_thread(&eligible_idle)
+            .await
+            .expect("upsert eligible-idle");

        let mut old = test_thread_metadata(&codex_home, old_thread_id, codex_home.join("old"));
        old.created_at = old_at;
@@ -1217,17 +1258,147 @@ mod tests {
        let claims = runtime
            .claim_stage1_jobs_for_startup(
                current_thread_id,
-                10,
-                5,
-                30,
-                allowed_sources.as_slice(),
-                3600,
+                Stage1StartupClaimParams {
+                    scan_limit: 1,
+                    max_claimed: 5,
+                    max_age_days: 30,
+                    min_rollout_idle_hours: 12,
+                    allowed_sources: allowed_sources.as_slice(),
+                    lease_seconds: 3600,
+                },
            )
            .await
            .expect("claim stage1 jobs");

        assert_eq!(claims.len(), 1);
-        assert_eq!(claims[0].thread.id, recent_thread_id);
+        assert_eq!(claims[0].thread.id, eligible_idle_thread_id);
+
+        let _ = tokio::fs::remove_dir_all(codex_home).await;
+    }
+
+    #[tokio::test]
+    async fn claim_stage1_jobs_enforces_global_running_cap() {
+        let codex_home = unique_temp_dir();
+        let runtime = StateRuntime::init(codex_home.clone(), "test-provider".to_string(), None)
+            .await
+            .expect("initialize runtime");
+
+        let current_thread_id =
+            ThreadId::from_string(&Uuid::new_v4().to_string()).expect("current thread id");
+        runtime
+            .upsert_thread(&test_thread_metadata(
+                &codex_home,
+                current_thread_id,
+                codex_home.join("current"),
+            ))
+            .await
+            .expect("upsert current");
+
+        let now = Utc::now();
+        let started_at = now.timestamp();
+        let lease_until = started_at + 3600;
+        let eligible_at = now - Duration::hours(13);
+        let existing_running = 10usize;
+        let total_candidates = 80usize;
+
+        for idx in 0..total_candidates {
+            let thread_id = ThreadId::from_string(&Uuid::new_v4().to_string()).expect("thread id");
+            let mut metadata = test_thread_metadata(
+                &codex_home,
+                thread_id,
+                codex_home.join(format!("thread-{idx}")),
+            );
+            metadata.created_at = eligible_at - Duration::seconds(idx as i64);
+            metadata.updated_at = eligible_at - Duration::seconds(idx as i64);
+            runtime
+                .upsert_thread(&metadata)
+                .await
+                .expect("upsert thread");
+
+            if idx < existing_running {
+                sqlx::query(
+                    r#"
+INSERT INTO jobs (
+    kind,
+    job_key,
+    status,
+    worker_id,
+    ownership_token,
+    started_at,
+    finished_at,
+    lease_until,
+    retry_at,
+    retry_remaining,
+    last_error,
+    input_watermark,
+    last_success_watermark
+) VALUES (?, ?, 'running', ?, ?, ?, NULL, ?, NULL, ?, NULL, ?, NULL)
+                    "#,
+                )
+                .bind("memory_stage1")
+                .bind(thread_id.to_string())
+                .bind(current_thread_id.to_string())
+                .bind(Uuid::new_v4().to_string())
+                .bind(started_at)
+                .bind(lease_until)
+                .bind(3)
+                .bind(metadata.updated_at.timestamp())
+                .execute(runtime.pool.as_ref())
+                .await
+                .expect("seed running stage1 job");
+            }
+        }
+
+        let allowed_sources = vec!["cli".to_string()];
+        let claims = runtime
+            .claim_stage1_jobs_for_startup(
+                current_thread_id,
+                Stage1StartupClaimParams {
+                    scan_limit: 200,
+                    max_claimed: 64,
+                    max_age_days: 30,
+                    min_rollout_idle_hours: 12,
+                    allowed_sources: allowed_sources.as_slice(),
+                    lease_seconds: 3600,
+                },
+            )
+            .await
+            .expect("claim stage1 jobs");
+        assert_eq!(claims.len(), 54);
+
+        let running_count = sqlx::query(
+            r#"
+SELECT COUNT(*) AS count
+FROM jobs
+WHERE kind = 'memory_stage1'
+  AND status = 'running'
+  AND lease_until IS NOT NULL
+  AND lease_until > ?
+            "#,
+        )
+        .bind(Utc::now().timestamp())
+        .fetch_one(runtime.pool.as_ref())
+        .await
+        .expect("count running stage1 jobs")
+        .try_get::<i64, _>("count")
+        .expect("running count value");
+        assert_eq!(running_count, 64);
+
+        let more_claims = runtime
+            .claim_stage1_jobs_for_startup(
+                current_thread_id,
+                Stage1StartupClaimParams {
+                    scan_limit: 200,
+                    max_claimed: 64,
+                    max_age_days: 30,
+                    min_rollout_idle_hours: 12,
+                    allowed_sources: allowed_sources.as_slice(),
+                    lease_seconds: 3600,
+                },
+            )
+            .await
+            .expect("claim stage1 jobs with cap reached");
+        assert_eq!(more_claims.len(), 0);

        let _ = tokio::fs::remove_dir_all(codex_home).await;
    }
@@ -1248,7 +1419,7 @@ mod tests {
            .expect("upsert thread");

        let claim = runtime
-            .try_claim_stage1_job(thread_id, owner, 100, 3600)
+            .try_claim_stage1_job(thread_id, owner, 100, 3600, 64)
            .await
            .expect("claim stage1");
        let ownership_token = match claim {
@@ -1395,7 +1566,7 @@ mod tests {
            .expect("upsert thread");

        let claim = runtime
-            .try_claim_stage1_job(thread_id, owner, 100, 3600)
+            .try_claim_stage1_job(thread_id, owner, 100, 3600, 64)
            .await
            .expect("claim stage1");
        let ownership_token = match claim {
@@ -1459,7 +1630,7 @@ mod tests {
            .expect("upsert thread b");

        let claim_a = runtime
-            .try_claim_stage1_job(thread_a, owner, 100, 3600)
+            .try_claim_stage1_job(thread_a, owner, 100, 3600, 64)
            .await
            .expect("claim stage1 a");
        let token_a = match claim_a {
@@ -1475,7 +1646,7 @@ mod tests {
        );

        let claim_b = runtime
-            .try_claim_stage1_job(thread_b, owner, 101, 3600)
+            .try_claim_stage1_job(thread_b, owner, 101, 3600, 64)
            .await
            .expect("claim stage1 b");
        let token_b = match claim_b {