The preprint ExpGraph proposes a self-evolving graph memory for LLM agents

A preprint posted on arXiv on May 29, 2026, named ExpGraph, argues that an agent based on a large language model can accumulate reusable experience without any modifications to the executor model's parameters - it remains frozen and interchangeable. The thesis shifts the question of AI budgeting: invest in a more powerful model, or in a portable external memory layer that travels from one executor to another? The framework is authored by eleven researchers affiliated with the University of Illinois at Urbana-Champaign, Nanyang Technological University, and Meta Monetization AI. The paper, categorized under computational linguistics (cs.CL on arXiv), has not been peer-reviewed as of its posting date; the results are declared by the authors.

Graph diffusion and RL copilot: the ExpGraph mechanism

ExpGraph summarizes an agent's historical trajectories into reusable skills and lessons from failures, organizing them as nodes in a self-evolving experience graph. Retrieval combines graph diffusion and utility ranking; a lightweight copilot trained via reinforcement learning (RL) selects experiences to inject, with the reward signal being the executor's performance gap with and without the retrieved experience. The presence of Jiaxuan You, a recognized expert in graph neural networks (GraphSAGE, Open Graph Benchmark), among the authors is a technical credibility signal for the framework's graph diffusion component. Empirically, the authors report in the preprint gains of 12.2% and 4.7% on static tasks depending on the executor size, and 21.4% and 12.7% in agentic environments including ALFWorld, a standard evaluation home simulation environment for agents (details of other environments are not accessible from the abstract). However, the evaluation is based on ExpSuite, a benchmark designed by the authors themselves, and the baseline used is not named in the abstract - two elements that only a full reading of the paper will clarify.

Custom benchmark, unnamed baseline
The performance gains claimed by ExpGraph are measured on ExpSuite, an evaluation protocol designed by the paper's authors. The comparison baseline is not named in the preprint. These results have not yet undergone peer review - handle with caution before generalizing.

An active academic lineage, already peer-reviewed

ExpGraph is part of a lineage of work on experiential learning of LLM agents, several of which have already undergone peer review. Two papers accepted at ICLR 2026 - one of the three major international machine learning conferences - are particularly comparable. NAVER LABS Europe in Meylan published Retrieval-Augmented LLM Agents: Learning to Learn from Experience, which posits that “achieving robust generalization to unknown tasks remains a major challenge” (freely translated) for generalist agents. The same conference accepted From Experience to Strategy, which proposes “an agent-centered, trainable multi-layer graph memory framework” (freely translated) coupled with reward-guided weight optimization. The transition from a flat list of experiences - a paradigm historically associated with previous frameworks, including ExpeL (AAAI 2024) - to a graph structure is thus not new; it has already been implemented and validated by academic reviews. The space is not uncharted: ExpGraph adds itself as a variant, not a breakthrough, and remains so far the only one of the three not to have received external validation.

Three contemporary papers on agentic memory

Paper	Institution	Status	Memory Approach
Retrieval-Augmented LLM Agents	NAVER LABS Europe	ICLR 2026 - peer-reviewed	Experiential RAG
From Experience to Strategy	Not specified	ICLR 2026 - peer-reviewed	Trainable graph memory (RL)
ExpGraph	UIUC + NTU + Meta Monetization AI	arXiv preprint - non peer-reviewed	Structured experience graph

External memory or more capable model: two bets that say different things

The ExpGraph proposal, signed by the UIUC, NTU, and Meta Monetization AI team, contains an architectural thesis claimed by its authors: fine-tuning on collected experiences certainly improves reuse, but becomes inflexible as soon as a more powerful or better-suited executor emerges. The consequence, advocated in the preprint, is that accumulated knowledge must live outside the model to remain portable when the model changes. Anthropic's trajectory illustrates the opposite bet: strengthening the model so that agentic gains travel with it - from Claude agents optimized for programming to Claude Opus 4.8 announced on May 28, 2026, with an assumed limitation: these gains do not survive the model's replacement. A third, more marginal variant further shifts the focus towards self-improving model architectures, still at the exploratory stage in industrial labs. No published empirical work today settles the dispute: the three approaches coexist, and the open question of architectural displacement remains, for now, more an argument of competing papers than an independent test bench result.

A production stack already exists, but on different principles

Alongside the academic trajectory, the agent memory production ecosystem has already solidified by 2025. Mem0, which raised $24 million in October from Y Combinator, Peak XV, and Basis Set, claims according to its funding round to have surpassed 41,000 GitHub stars and thirteen million downloads of its Python package, and figures alongside Letta and Supermemory among the frameworks adopted by developers. These stacks, whose API calls shifted from about 35 to 186 million over the first three quarters of 2025 according to figures highlighted by Mem0, share with ExpGraph the philosophy of an external memory to unmodified models, but do not rely on a self-evolving graph driven by diffusion and reinforcement learning. The distinction is not trivial: the production stack today seeks portability and persistence between sessions; the academic stack aims for generalization to unknown tasks. The observable signal by the end of 2026: the reproducibility of ExpGraph gains outside ExpSuite, on a third-party agentic benchmark, and the trajectory of Mem0's API calls - which will indicate whether the production stack has, or has not, integrated the self-evolving graph primitive.

35 million → 186 million API calls in three quarters
The progression of Mem0 over the first nine months of 2025 illustrates the real industrial demand for external memory layers, independently of academic debates on the optimal architecture.

Stephane Nachez

ActuIA editorial team — news, data and analysis on artificial intelligence for decision-makers.