The Graph as Agent Memory
The graph as agent memory rejects the notebook metaphor. A notebook remembers what you wrote, but not when you believed it, nor when the fact itself was true. Flat vector stores and long-context transformers collapse time into a single present, and an agent that cannot distinguish "I knew this yesterday" from "this is still true today" is not reasoning — it is repeating. A bi-temporal knowledge graph — one that records both valid_at (when the fact held in the world) and recorded_at (when the agent ingested it) — turns memory from a static log into a navigable, revision-conscious archive where nothing is deleted and facts are superseded by stamping invalid_at.
This is article #4 in the Autonomous Knowledge Graphs series. The AI-engineer curriculum concept graph from #1 doubles as the agent's long-term, revision-conscious memory of the curriculum as it evolves across months of sessions, under the same engineering constraints: a control plane built on LlamaIndex — DeepSeek as the LLM client, its PropertyGraphIndex for retrieval — with the autonomous loop itself written in plain Python rather than run by a workflow or graph-orchestration engine, over a Cloudflare D1 concept-graph data plane (concepts, concept_edges, lesson_concepts), with a thin TypeScript layer applying every write; DeepSeek-only model egress through one Cloudflare AI Gateway; a grounding-first record on every write — {confidence, reason, source, evidence} with bi-temporal valid_at/recorded_at stamps; and invalidate-not-delete at every irreversible step.
The Gap: Why Flat Memory Fails Agents
Agent memory today falls into two camps. The flat log gives perfect recall of when something was said but no structure for reasoning across entities. The vector store excels at semantic similarity but has no notion of sequence or validity, so it cannot answer "what changed between last week and today?" Long-context models are offered as a third option, but attention dilutes across irrelevant history as context grows, making multi-hop reasoning unreliable — packing more tokens adds noise, not structure. Agents need structured, temporal, and traceable memory, and a graph is the representation that natively supports all three. The 2026 landscape reflects this: two large surveys reframe the graph as the agent-memory substrate (Yang et al., 2026, arXiv:2602.05665; Huang et al., 2026, arXiv:2602.06052).
Bi-Temporal Graphs: The Why and the How
The anchor is Engram, which proposes a bi-temporal knowledge graph for agent memory with salience decay and asynchronous consolidation (Wang, 2026, arXiv:2606.09900). Each fact — an edge between entity nodes — carries two dates: valid_at records the real-world time the relationship held; recorded_at records ingestion time. A third, invalid_at, is set when the fact is superseded; nothing is physically deleted.
Why two timestamps? Consider the agent maintaining the curriculum graph. A lesson is rewritten so that fine-tuning now builds_on parameter-efficient methods on the 1st, but the agent ingests the rewrite on the 5th. Without bi-temporal stamps the system cannot answer "what did the agent think the prerequisite was on the 3rd?" — the answer should be the old relationship, because the new fact had not been learned yet. A single-timestamp graph would return the new edge on any query after the 5th, regardless of when the knowledge was acquired. Each edge is therefore a concept-to-concept relationship carrying its full provenance record — (source, edge_type, target, valid_at, recorded_at, confidence, reason, source, evidence): the confidence must clear the write-time grounding gate, and reason/source/evidence link the edge back to the lesson span (source: "lesson:fine-tuning-fundamentals") or paper that produced it.
Reference Architecture: Bi-Temporal Edges on Cloudflare D1
The pure-Python engine manages the memory lifecycle — extraction, storage, retrieval, evolution — proposing mutations that a thin TypeScript layer writes to Cloudflare D1. There is no schema migration: the bi-temporal stamps ride in the existing concept_edges.metadata JSON as a provenance record, so D1 stores the bi-temporal concept graph as-is:
-- Cloudflare D1. Bi-temporal stamps live in metadata JSON — no migration.
CREATE TABLE concept_edges (
id TEXT PRIMARY KEY,
source_id TEXT NOT NULL, -- concept node
target_id TEXT NOT NULL, -- concept node
edge_type TEXT NOT NULL, -- prerequisite | builds_on | contrasts_with | part_of | related | applies_to
weight REAL DEFAULT 1.0,
metadata TEXT NOT NULL, -- JSON: { provenance: { confidence, reason, source, evidence,
-- valid_at, recorded_at, invalid_at, salience, status } }
UNIQUE(source_id, target_id, edge_type)
);
-- point-in-time over system-time, reading the JSON stamps:
CREATE INDEX idx_edges_systime ON concept_edges (
json_extract(metadata, '$.provenance.recorded_at'),
json_extract(metadata, '$.provenance.invalid_at')
);
This is bi-temporal: valid_at is world-time (when the curriculum relationship held), while recorded_at/invalid_at form the system-time interval (when the agent believed it). A point-in-time query answers "what did the agent believe on date D?" by filtering system-time — recorded_at ≤ as_of AND (invalid_at IS NULL OR invalid_at > as_of). Forgetting and supersession both set invalid_at (never a physical delete); they are told apart by status and reason — a superseded edge reads status: "invalidated" from belief revision, while a cold edge reads reason: "consolidation: forgotten (low salience)".
The Four-Phase Memory Lifecycle
GAM defines a hierarchical graph memory that decouples encoding from consolidation via an event-progression graph plus a topic associative network (Wu et al., 2026, arXiv:2604.12285). This design adapts that into 4 phases:
- Extraction. Lesson text goes to a DeepSeek model that emits candidate concept edges with a confidence score and an evidence span; only edges clearing the
0.6grounding gate survive, trading some recall for less noise. - Storage. The edge is inserted with its provenance record. If one with the same
source, edge_type, targetandvalid_atexists, the newrecorded_atbecomes a separate row — no destructive update — preserving the full history of belief changes. - Retrieval. The system fixes the relevant time window (
now, or an explicit date for historical queries) and returns edges valid in that window. Following Mnemis (Tang et al., 2026, arXiv:2602.15313), retrieval is dual-route: a fast similarity-first System-1 path narrows candidates, then a slower System-2 traversal handles structured multi-hop reasoning. - Evolution. Consolidation runs asynchronously: redundant edges (repeated
recorded_atwith unchangedvalid_at) are merged, and edges weakened by downstream verification are quarantined rather than deleted. The 2026 memory survey frames exactly this extract → store → retrieve → evolve lifecycle, naming evolution as the stage most systems under-build (Yang et al., 2026).
Governance: Consolidation and Forgetting
Memory without forgetting grows without bound. The bi-temporal design makes forgetting safe — a fact is never lost, only marked superseded or lowered in access priority. Consolidation has two levers:
- Salience decay.
consolidate(decay=0.1, cold_threshold=0.2)lowers eachproposededge'ssalienceby0.1per cycle; oncesaliencedrops below0.2the edge is invalidated withreason: "consolidation: forgotten (low salience)"— a soft-archive. Onlyproposed(not yet committed) edges decay, so a committed, still-true relationship is never dropped from belief queries just for being cold. Engram uses a configurable decay scheme of this kind. - Subsumption. When a new edge with the same
source, edge_type, targetbut a latervalid_atarrives, the old edge is stampedinvalid_atat the new edge'srecorded_atand itsstatusbecomesinvalidated. This is the belief-revision mechanism, and it never performs a destructive delete.
The cost is real: every update appends rather than overwrites, so bi-temporal write amplification accumulates and is mitigated by periodic cold-compaction that rewrites the active set without invalidated edges.
Failure Modes: Semantic Drift and Contamination
Semantic drift. Over many cycles, a concept can accumulate conflicting edges. A relationship changing over time is correct, but the same pair asserted twice without a clear valid_at produces contradictory data over overlapping windows. The fix is to require every edge to carry a valid_at — explicit from the lesson, or inferred from the interval between successive recorded_at values — so the timeline stays unambiguous.
Contamination. A single bad extraction can propagate through multi-hop retrieval and poison downstream decisions. The grounding gate is the first defense; the second is a separate DeepSeek pass that re-scores the top retrieved paths and rejects the result set (the agent answers "I don't know") when path confidence is too low. MAGMA argues for separating semantic, temporal, causal, and entity graphs so contamination in one does not spread to the others (Jiang et al., 2026, arXiv:2601.03236) — a direction worth evaluating.
Decision Table: Graph vs Vector vs Long Context
| Requirement | Graph memory (this design) | Vector memory | Long context |
|---|---|---|---|
| Multi-hop reasoning across entities | Native | weak (needs chunk linking) | degrades as context grows |
| Temporal "what did the agent believe on date X?" | Bi-temporal stamps | no time metadata | must re-read history |
| Audit trail | Full history retained | overwrites | only via external logging |
| Very high write throughput | consolidation is a bottleneck | fast insert | no writes (static) |
| Frequent destructive updates | append-only (stamped) | direct removal | regenerate context |
For agents reasoning over an evolving curriculum, a body of lessons, or multi-session learning histories, graph memory wins on recall quality; the cost is higher infrastructure complexity and slower writes.
Numbered Limitations
- Consolidation scales with edge count. Full consolidation grows with the active graph; sharding by subject helps but makes cross-shard joins expensive.
- Entity resolution is unsolved at scale. The graph assumes clean concept IDs, but "Retrieval-Augmented Generation" and "RAG" are the same concept; a lightweight deduper helps, and no 2026 paper solves this at scale for agent memory.
- The confidence threshold is static. A production memory should vary the write gate by context — higher for critical or financial facts, lower for exploratory preferences — rather than hard-coding one value; this design hard-codes it. (MAGMA's multi-graph separation of semantic, temporal, causal, and entity views is a complementary way to make retrieval context-aware.)
- Write amplification. Every update writes at least two rows; periodic cold-compaction reclaims space but needs an exclusive lock window.
- No standard benchmark. The 2026 surveys (Yang et al.; Huang et al.) call for one, but none exists, so any latency or accuracy figures are implementation-specific and none are claimed here.
- Inter-agent merging is open. In a multi-agent fleet, merging two belief graphs with overlapping entities can create contradiction cascades — an active problem with no production-ready solution.
Conclusion
The graph as agent memory is a statement about what an agent is. Bi-temporal graphs let an agent introspect its own history — "when did I learn X?", "was I ever wrong about Y?" — which is a prerequisite for self-correction and for trust: when a user asks "why did you do that?", the answer should be a traceable path through a graph, not a vector-similarity black box. The architecture here — a pure-Python engine, D1 storage, DeepSeek extraction behind a confidence gate, and bi-temporal stamps on every edge — is pragmatic enough to deploy; the hard problems (entity resolution, inter-agent merging) remain open. The message is simple: give the agent a graph, stamp everything with two dates, and never delete.
Frequently Asked Questions
What is a bi-temporal knowledge graph for agent memory? It is an agent's long-term memory stored as a graph where every edge carries valid_at (when the fact held) and recorded_at (when the agent learned it). A superseded fact is stamped invalid_at rather than deleted, so memory is a revision-conscious archive, not a flat log.
Why do flat logs and vector stores fail as agent memory? A flat log records when something was said but offers no structure across entities; a vector store finds similar memories but has no notion of validity, so it cannot answer "what changed since last week." Long context dilutes attention. Graphs provide structured, temporal, traceable memory.
Why store two timestamps instead of one? If a lesson reframes a concept on the 1st but the agent learns it on the 5th, only a bi-temporal graph can answer "what did the agent believe on the 3rd?" — the old relationship. valid_at captures world time, recorded_at captures ingestion time, and the gap between them is where belief revision lives.
How does the memory avoid unbounded growth without deleting? Consolidation runs asynchronously with two levers: salience decay lowers the priority of edges not retrieved over time, and subsumption stamps invalid_at when a later valid_at supersedes a fact. Superseded edges move to cold storage and stay queryable.
When should an agent use graph memory over vector memory? When it must reason multi-hop across concepts, answer temporal point-in-time questions, and keep an audit trail — for example over an AI-engineering curriculum revised across many sessions. Vector memory suits pure similarity at very high write rates; long context suits a single static read.
Autonomous Knowledge Graphs — the series
- Autonomous Knowledge Graph Construction: Graphs That Build Themselves (autonomy: high)
- Reasoning Over the Graph: From GraphRAG to Planning Agents (autonomy: high)
- Self-Healing Knowledge Graphs: Graphs That Fix Themselves (guardrail)
- The Graph as Agent Memory (this article — autonomy: medium)
- Closing the Loop: Evaluation, Debate, and Discovery (guardrail)
A companion thread to The Autonomous Sales Fleet. Next: #5 Closing the Loop.
References
- Liuyin Wang. Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents (Engram). 2026. arXiv:2606.09900. https://arxiv.org/abs/2606.09900
- Zhaofen Wu et al. GAM: Hierarchical Graph-based Agentic Memory for LLM Agents. 2026. arXiv:2604.12285. https://arxiv.org/abs/2604.12285
- Zihao Tang et al. Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory. 2026. arXiv:2602.15313. https://arxiv.org/abs/2602.15313
- Dongming Jiang et al. MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents. 2026. arXiv:2601.03236. https://arxiv.org/abs/2601.03236
- Chang Yang et al. Graph-based Agent Memory: Taxonomy, Techniques, and Applications. 2026. arXiv:2602.05665. https://arxiv.org/abs/2602.05665
- Wei-Chieh Huang et al. Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey. 2026. arXiv:2602.06052. https://arxiv.org/abs/2602.06052
