Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
MemRL outperforms RAG on complex agent benchmarks without fine-tuning

A new technique developed by researchers at Shanghai Jiao Tong University and other institutions enables large language model agents to learn new skills without the need for expensive fine-tuning.The researchers propose MemRL, a framework that gives agents the ability to develop episodic memory, the capacity to retrieve past experiences to create solutions for unseen tasks. MemRL allows agents to use environmental feedback to refine their problem-solving strategies continuously.MemRL is part of a broader push in the research community to develop continual learning capabilities for AI applications. In experiments on key industry benchmarks, the framework outperformed other baselines such as RAG and other memory organization techniques, particularly in complex environments that require exploration and experiments. This suggests MemRL could become a critical component for building AI applications that must operate in dynamic real-world settings where requirements and tasks constantly shift.The stability-plasticity dilemmaOne of the central challenges in deploying agentic applications is adapting the underlying model to new knowledge and tasks after the initial training phase. Current approaches generally fall into two categories: parametric approaches, such as fine-tuning, and non-parametric approaches, such as RAG. But both come with significant trade-offs.Fine-tuning, while effective for baking in new information, is computationally expensive and slow. More critically, it often leads to catastrophic forgetting, a phenomenon where newly acquired knowledge overwrites previously learned data, degrading the model's general performance.Conversely, non-parametric methods like RAG are fundamentally passive; they retrieve information based solely on semantic similarity, such as vector embeddings, without evaluating the actual utility of the information to the input query. This approach assumes that "similar implies useful," which is often flawed in complex reasoning tasks. The researchers argue that human intelligence solves this problem by maintaining “the delicate balance between the stability of cognitive reasoning and the plasticity of episodic memory.” In the human brain, stable reasoning (associated with the cortex) is decoupled from dynamic episodic memory. This allows humans to adapt to new tasks without "rewiring neural circuitry" (the rough equivalent of model fine-tuning).Inside the MemRL frameworkInspired by humans’ use of episodic memory and cognitive reasoning, MemRL is designed to enable an agent to continuously improve its performance after deployment without compromising the stability of its backbone LLM. Instead of changing the model’s parameters, the framework shifts the adaptation mechanism to an external, self-evolving memory structure.In this architecture, the LLM's parameters remain completely frozen. The model acts effectively as the "cortex," responsible for general reasoning, logic, and code generation, but it is not responsible for storing specific successes or failures encountered after deployment. This structure ensures stable cognitive reasoning and prevents catastrophic forgetting.To handle adaptation, MemRL maintains a dynamic episodic memory component. Instead of storing plain text documents and static embedding values, as is common in RAG, MemRL organizes memory into "intent-experience-utility" triplets. These contain the user's query (the intent), the specific solution trajectory or action taken (the experience), and a score, known as the Q-value, that represents how successful this specific experience was in the past (the utility).Crucially for enterprise architects, this new data structure doesn't require ripping out existing infrastructure. "MemRL is designed to be a 'drop-in' replacement for the retrieval layer in existing technology stacks and is compatible with various vector databases," Muning Wen, a co-author of the paper and PhD candidate at Shanghai Jiao Tong University, told VentureBeat. "The existence and updating of 'Q-Value' is solely for better evaluation and management of dynamic data... and is independent of the storage format."This utility score is the key differentiator from classic RAG systems. At inference time, MemRL agents employ a "two-phase retrieval" mechanism. First, the system identifies memories that are semantically close to the query to ensure relevance. It then re-ranks these candidates based on their Q-value, effectively prioritizing proven strategies.The framework incorporates reinforcement learning directly into the memory retrieval process. When an agent attempts a solution and receives environmental feedback (i.e., success or failure) it updates the Q-value of the retrieved memory. This creates a closed feedback loop: over time, the agent learns to ignore distractor memories and prioritize high-value strategies without ever needing to retrain the underlying LLM.While adding a reinforcement learning step might sound like it adds significant latency, Wen noted that the computational overhead is minimal. "Our Q-value calculation is performed entirely on the CPU," he said.MemRL also possesses runtime continual learning capabilities. When the agent encounters a new scenario, the system uses the frozen LLM to summarize the new trajectory and adds it to the memory bank as a new triplet. This allows the agent to expand its knowledge base dynamically as it interacts with the world.It is worth noting that the automation of the value assignment comes with a risk: If the system mistakenly validates a bad interaction, the agent could learn the wrong lesson. Wen acknowledges this "poisoned memory" risk but notes that unlike black-box neural networks, MemRL remains transparent and auditable. "If a bad interaction is mistakenly classified as a positive example... it may spread more widely," Wen said. "However … we can easily fix it by removing the contaminated data from the memory bank or resetting their Q-values."MemRL in actionThe researchers evaluated MemRL against several baselines on four diverse industry benchmarks: BigCodeBench (code generation), ALFWorld (embodied navigation), Lifelong Agent Bench (OS and database interaction), and Humanity's Last Exam (complex multidisciplinary reasoning). The results showed that MemRL consistently outperformed baselines in both runtime learning (improving during the session) and transfer learning (generalizing to unseen tasks).The advantages of this value-aware retrieval mechanism were most pronounced in exploration-heavy environments like ALFWorld. In this benchmark, which requires agents to navigate and interact with a simulated household environment, MemRL achieved a relative improvement of approximately 56% over MemP, another agentic memory framework. The researchers found that the reinforcement learning component effectively encouraged the agent to explore and discover solutions for complex tasks that similarity-based retrieval methods often failed to solve.When the memory bank was frozen and tested on held-out sets to measure generalization, MemRL achieved the highest accuracy across benchmarks. For example, on the Lifelong Agent Bench, it improved significantly upon the standard RAG baseline on OS tasks. This indicates that the system does not merely memorize training data but effectively filters out low-value memories to retain high-utility experiences that generalize to new situations.The broader picture for self-evolving agentsMemRL fits within a growing body of research focused on Memory-Based Markov Decision Processes (M-MDP), a formulation that frames memory retrieval as an active decision-making step rather than a passive search function. By treating retrieval as an action that can be optimized via reinforcement learning, frameworks like MemRL and similar approaches such as Memento are paving the way for more autonomous systems. For enterprise AI, this shift is significant. It suggests a future where agents can be deployed with a general-purpose LLM and then rapidly adapt to specific company workflows, proprietary databases, and unique problem sets through interaction alone. The key shift we’re seeing is frameworks that are treating applications as dynamic environments that they can learn from.These emerging capabilities will allow organizations to maintain consistent, high-performance agents that evolve alongside their business needs, solving the problem of stale models without incurring the prohibitive costs of constant retraining.It marks a transition in how we value data. "In a future where static data is about to be exhausted, the interaction experience generated by each intelligent agent during its lifespan will become the new fuel," Wen said.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
New framework simplifies the complex landscape of agentic AI

<p>With the ecosystem of agentic tools and frameworks exploding in size, navigating the many options for building AI systems is becoming increasingly difficult, leaving developers confused and p [...]

Match Score: 142.74

venturebeat
Databricks' Instructed Retriever beats traditional RAG data retrieval

<p>A core element of any data retrieval operation is the use of a component known as a retriever. Its job is to retrieve the relevant content for a given query. </p><p>In the AI era, [...]

Match Score: 137.30

venturebeat
With 91% accuracy, open source Hindsight agentic memory provides 20/20 visi

<p>It has become increasingly clear in 2025 that retrieval augmented generation (RAG) isn&#x27;t enough to meet the growing data requirements for agentic AI.</p><p>RAG emerged in [...]

Match Score: 127.90

venturebeat
Six data shifts that will shape enterprise AI in 2026

<p>For decades the data landscape was relatively static. Relational databases (hello, Oracle!) were the default and dominated, organizing information into familiar columns and rows.</p>< [...]

Match Score: 116.54

venturebeat
We keep talking about AI agents, but do we ever know what they are?

<p>Imagine you do two things on a Monday morning.</p><p>First, you ask a chatbot to summarize your new emails. Next, you ask an AI tool to figure out why your top competitor grew so [...]

Match Score: 113.53

venturebeat
Why Google’s File Search could displace DIY RAG stacks in the enterprise

<p>By now, enterprises understand that retrieval augmented generation (RAG) allows applications and agents to find the best, most grounded information for queries. However, typical RAG setups co [...]

Match Score: 109.77

venturebeat
Baseten takes on hyperscalers with new AI training platform that lets you o

<p><a href="https://www.baseten.co/"><u>Baseten</u></a>, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product [...]

Match Score: 105.57

venturebeat
Phi-4 proves that a 'data-first' SFT methodology is the new diffe

<p>AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. </p><p>The &l [...]

Match Score: 95.14

venturebeat
Snowflake builds new intelligence that goes beyond RAG to query and aggrega

<p>Enterprise AI has a data problem. Despite billions in investment and increasingly capable language models, most organizations still can&#x27;t answer basic analytical questions about thei [...]

Match Score: 93.41