Compare commits

...

1 Commits

Author SHA1 Message Date
jif-oai
22dd9ad392 Densify and version memory summaries (#23148)
## Why

`memory_summary.md` is injected into every session, so its value depends
on staying compact, navigational, and easy to regenerate when the
expected shape changes. The previous consolidation prompt encouraged a
broad actionable inventory and allowed older summary structures to be
patched in place, which makes it easier for stale or overly verbose
summaries to keep accumulating.

This change makes the summary format explicitly versioned and biases
Phase 2 memory consolidation toward denser prompt-loaded context.

## What changed

- Require `memory_summary.md` to begin with an exact `v1` header.
- Teach consolidation to regenerate `memory_summary.md` from scratch
when the header is missing or incompatible, while still allowing
incremental updates to `MEMORY.md`.
- Tighten the `memory_summary.md` instructions so it acts as a compact
routing/index layer instead of a second handbook.
- Lower `MEMORY_TOOL_DEVELOPER_INSTRUCTIONS_SUMMARY_TOKEN_LIMIT` from
`5_000` to `2_500` so the runtime prompt budget matches the denser
summary target.

## Verification

Not run; this is a prompt/template update plus a prompt budget constant
change.
2026-05-18 09:59:34 +02:00
2 changed files with 68 additions and 30 deletions

View File

@@ -13,7 +13,7 @@ use codex_utils_absolute_path::AbsolutePathBuf;
pub use prompts::build_memory_tool_developer_instructions;
const MEMORY_TOOL_DEVELOPER_INSTRUCTIONS_SUMMARY_TOKEN_LIMIT: usize = 5_000;
const MEMORY_TOOL_DEVELOPER_INSTRUCTIONS_SUMMARY_TOKEN_LIMIT: usize = 2_500;
pub fn memory_root(codex_home: &AbsolutePathBuf) -> AbsolutePathBuf {
codex_home.join("memories")

View File

@@ -20,8 +20,8 @@ CONTEXT: MEMORY FOLDER STRUCTURE
Folder structure (under {{ memory_root }}/):
- memory_summary.md
- Always loaded into the system prompt. Must remain informative and highly navigational,
but still discriminative enough to guide retrieval.
- Always loaded into the system prompt. First line must be exactly `v1`.
Must stay dense, highly navigational, and discriminative enough to guide retrieval.
- MEMORY.md
- Handbook entries. Used to grep for keywords; aggregated insights from rollouts;
pointers to rollout summaries if certain past rollouts are very relevant.
@@ -133,7 +133,8 @@ Under `{{ memory_root }}/`:
- merged memories; produce a lightly clustered version if applicable
- `rollout_summaries/*.md`
- `memory_summary.md`
- read the existing summary so updates stay consistent
- read the existing summary so updates stay consistent only if its first line is exactly `v1`;
otherwise treat the summary as schema-incompatible and regenerate the whole file from scratch
- `skills/*`
- read existing skills so updates are incremental and non-duplicative
{{ memory_extensions_primary_inputs }}
@@ -143,6 +144,8 @@ Mode selection:
and `skills/`).
- INCREMENTAL UPDATE: existing artifacts already exist and `raw_memories.md`
mostly contains new additions.
- Summary schema reset: if `memory_summary.md` is missing, empty, or does not start with exactly
`v1`, regenerate only `memory_summary.md` from scratch after `MEMORY.md` is current.
Memory workspace diff:
@@ -183,6 +186,8 @@ Rules:
- If there is no meaningful signal to add beyond what already exists, keep outputs minimal.
- You should always make sure `MEMORY.md` and `memory_summary.md` exist and are up to date.
- `memory_summary.md` must start with the exact line `v1`; if it does not, rewrite the entire
file rather than patching the previous summary in place.
- Follow the format and schema of the artifacts below.
- Do not target fixed counts (memory blocks, task groups, topics, or bullets). Let the
signal determine the granularity and depth.
@@ -441,14 +446,40 @@ task group/task` from the full raw inventory so you can have a better overview.
- Prefer placing reusable user preferences in `## User preferences` and the rest of the durable
know-how in `## Reusable knowledge` and `## Failures and how to do differently`.
- Use `memory_summary.md` as the cross-task summary layer, not the place for project-specific
runbooks. It should stay compact in narrative/profile sections, but its `## User preferences`
section is the main actionable payload and may be much longer when that helps future agents
avoid repeated user steering.
runbooks. Its `## User preferences` section is the main actionable payload, but it should
still stay compact, deduplicated, and limited to preferences likely to change future behavior.
============================================================
2) `memory_summary.md` FORMAT (STRICT)
============================================================
File header:
The file must begin exactly:
```md
v1
## User Profile
```
- The first line must be exactly `v1` with no leading/trailing whitespace and no frontmatter
before it.
- If the existing `memory_summary.md` first line is not exactly `v1`, discard the old summary
structure and regenerate the entire file from the finalized `MEMORY.md`, skills, and current
rollout evidence.
Density objective (strict):
- `memory_summary.md` is prompt-loaded context, so optimize for high signal per token.
- Keep only high-level, cross-task signal and brief routing summaries. Put details, provenance,
runbooks, and task-local nuance in `MEMORY.md`, skills, or rollout summaries.
- Deduplicate aggressively. If two bullets would cause the same future behavior or route to the
same `MEMORY.md` area, merge them or keep the sharper one.
- Prefer short, concrete bullets over narrative explanation. Delete low-signal caveats,
examples, and historical detail unless they change future agent behavior.
- Give directly links to important information to maximize the retrieval efficiency.
Format:
## User Profile
@@ -474,15 +505,15 @@ For example, include (when known):
You may end with short fun facts if they are real and useful, but keep the main profile concrete
and grounded. Do not let the optional fun-facts tail make the rest of the section more stylized
or abstract.
This entire section is free-form, <= 500 words.
This entire section is free-form, <= 350 words.
## User preferences
Include a dedicated bullet list of actionable user preferences that are likely to matter again,
not just inside one task group.
This section should be more concrete and easier to apply than `## User Profile`.
Prefer preferences that repeatedly save user keystrokes or avoid predictable interruption.
This section may be long. Do not compress it to just a few umbrella bullets when `MEMORY.md`
contains many distinct actionable preferences.
Keep it dense and non-duplicative. Include only stable or high-leverage preferences that would
change future agent behavior across recurring workflows.
Treat this as the main actionable payload of `memory_summary.md`.
For example, include (when known):
@@ -500,16 +531,12 @@ Rules:
- Keep each bullet actionable and future-facing.
- Default to lifting or lightly adapting strong bullets from `MEMORY.md` `## User preferences`
rather than rewriting them into smoother higher-level summaries.
- Preserve more of the user's original point than a terse summary would. Prefer evidence-aware
bullets that still keep some original wording over abstract umbrella summaries.
- Preserve the user's original point when it is compact and behavior-changing; otherwise compress
to the shortest faithful wording.
- When a short quoted or near-verbatim phrase makes the preference easier to recognize or grep
for later, keep that phrase in the bullet instead of replacing it with an abstraction.
- Do not over-merge adjacent preferences. If several distinct preferences would change different
future defaults, keep them as separate bullets.
- Prefer many narrow actionable bullets over a few broad umbrella bullets.
- Prefer a broad actionable inventory over a short highly deduped list.
- Do not treat 5-10 bullets as an implicit target; long-lived memory sets may justify a much
longer list.
- Merge adjacent preferences unless they would change different future defaults.
- Prefer a compact set of sharp bullets over a broad inventory.
- Do not require a preference to be broad across task families. If it is likely to matter again
in a recurring workflow, it belongs here.
- When deciding whether to include a preference, ask whether omitting it would make the next
@@ -542,16 +569,20 @@ For example, include (when known):
This is a compact index to help future agents quickly find details in `MEMORY.md`,
`skills/`, and `rollout_summaries/`.
Treat it as a routing/index layer, not a mini-handbook:
Treat it as a dense routing/index layer, not a mini-handbook:
- tell future agents what to search first,
- preserve enough specificity to route into the right `MEMORY.md` block quickly.
- keep topic descriptions brief; delete stale, duplicated, or low-signal topics even if they
existed in the previous summary.
Topic selection and quality rules:
- Organize the index first by cwd / project scope, then by topic.
- Split the index into a recent high-utility window and older topics.
- Do not target a fixed topic count. Include informative topics and omit low-signal noise.
- Keep the index current. Feel free to restructure, rename, merge, or delete topics when the
current `MEMORY.md` organization or evidence has changed.
- Prefer grouping by task family / workflow intent, not by incidental tool overlap alone.
- Order topics by utility, using `updated_at` recency as a strong default proxy unless there is
strong contrary evidence.
@@ -588,7 +619,7 @@ Recent Active Memory Window behavior (scope-first, then day-ordered):
- Within each scope, order day subsections by recency.
- If a scope has only one meaningful recent day, include only that day for that scope.
- For each recent-day subsection inside a scope, prioritize informative, likely-to-recur topics and make
those entries richer (better keywords, clearer descriptions, and useful recent learnings);
those entries denser (better keywords, brief descriptions, and useful recent learnings);
do not spend much space on trivial tasks touched that day.
- Preserve routing coverage for `MEMORY.md` in the overall index. If a scope/day includes
less useful topics, include shorter/compact entries for routing rather than dropping them.
@@ -596,16 +627,16 @@ Recent Active Memory Window behavior (scope-first, then day-ordered):
appears; do not duplicate it under multiple day sections.
- If a topic spans multiple scopes and retrieval would differ by scope, split it. Otherwise,
place it under the dominant scope and mention the secondary scope in the description.
- Recent-day entries should be richer than older-topic entries: stronger keywords, clearer
descriptions, and concise recent learnings/change notes.
- Recent-day entries should be more informative than older-topic entries through stronger
keywords and concise recent learnings/change notes, not longer prose.
- Group similar tasks/topics together when it improves routing clarity.
- Do not over cluster topics together, especially when they contain distinct task intents.
Recent-topic format:
- <topic>: <keyword1>, <keyword2>, <keyword3>, ...
- desc: <clear and specific description of what tasks are inside this topic; what future task/user goal this helps with; what kinds of outcomes/artifacts/procedures are covered; when to search this topic first; preserve original source phrasing when it is a useful retrieval handle; and include explicit cwd applicability text when the work is checkout-sensitive>
- learnings: <some concise, topic-local recent takeaways / decision triggers / updates worth checking first; include useful specifics, original source phrasing where possible, and cwd mismatch caveats when important; avoid overlap with `## User preferences` and `## General Tips` (cross-task actionable defaults belong in `## User preferences`; broad reusable guidance belongs in `## General Tips`)>
- desc: <brief description of what is inside this topic, when to search it first, and any cwd applicability needed for routing>
- learnings: <one dense line of topic-local takeaways / decision triggers / updates worth checking first; avoid overlap with `## User preferences` and `## General Tips`>
### <cwd / project scope>
@@ -643,10 +674,9 @@ Notes:
caveats, and decision triggers; move cross-task, stable, broadly reusable user defaults to
`## User preferences`.
- Coverage guardrail: ensure every top-level `# Task Group` in `MEMORY.md` is represented by
at least one topic bullet in this index (either directly or via a clearly subsuming topic).
- Keep descriptions explicit: what is inside, when to use it, and what kind of
outcome/procedure depth is available (for example: runbook, diagnostics, reporting, recovery),
so a future agent can quickly choose which topic/keyword cluster to search first.
at least one topic bullet in this index (either directly or via a clearly subsuming compact topic).
- Keep descriptions explicit but short: enough for a future agent to choose the right
topic/keyword cluster, not enough to replace opening `MEMORY.md`.
- `memory_summary.md` should not sound like a second-order executive summary. Prefer concrete,
source-faithful wording over polished abstraction, especially in:
- `## User preferences`
@@ -731,6 +761,9 @@ WORKFLOW
============================================================
1. Determine mode (INIT vs INCREMENTAL UPDATE) using artifact availability and current run context.
Independently check `memory_summary.md` first line: if it is not exactly `v1`, regenerate
`memory_summary.md` from scratch after the other artifacts are finalized, even when `MEMORY.md`
itself can be updated incrementally.
2. INIT phase behavior:
- Read `raw_memories.md` first, then rollout summaries carefully.
@@ -747,8 +780,8 @@ WORKFLOW
conflicting task families until MEMORY blocks are richer and more useful than raw memories
3. INCREMENTAL UPDATE behavior:
- Read existing `MEMORY.md` and `memory_summary.md` first for continuity and to locate
existing references that may need surgical cleanup.
- Read existing `MEMORY.md` and, only when it starts with exactly `v1`, existing
`memory_summary.md` first for continuity and to locate references that may need surgical cleanup.
- Use the injected git-style workspace changes as the first routing pass:
- added/modified `raw_memories.md` and `rollout_summaries/*.md` = ingestion queue
- deleted `rollout_summaries/*.md` and `extensions/*/resources/*.md` = forgetting /
@@ -775,6 +808,8 @@ WORKFLOW
- doing light clustering and merging if needed
- refreshing `MEMORY.md` top-of-file ordering so recent high-utility task families stay easy to find
- rebuilding the `memory_summary.md` recent active window (last 3 memory days) from current `updated_at` coverage
- freely restructuring `memory_summary.md` so it reflects the current memory set without
stale topics, duplicated preference bullets, or obsolete routing labels
- updating existing skills or adding new skills only when there is clear new reusable procedure
- updating `memory_summary.md` last to reflect the final state of the memory folder
- Minimize churn in incremental mode: if an existing `MEMORY.md` block or `## What's in Memory`
@@ -816,6 +851,9 @@ WORKFLOW
7. Final pass:
- remove duplication in memory_summary, skills/, and MEMORY.md
- verify `memory_summary.md` still begins with exactly `v1`
- verify `memory_summary.md` is dense: brief high-level profile, compact actionable
preferences, compact general tips, and a routing index rather than a second handbook
- remove stale or low-signal blocks that are less likely to be useful in the future
- remove or rewrite blocks/task sections whose supporting rollout references point only to
deleted inputs or missing rollout summary files