Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
EAGLET boosts AI agent performance on longer-horizon tasks by generating custom plans

2025 was supposed to be the year of "AI agents," according to Nvidia CEO Jensen Huang, and other AI industry personnel. And it has been, in many ways, with numerous leading AI model providers such as OpenAI, Google, and even Chinese competitors like Alibaba releasing fine-tuned AI models or applications designed to focus on a narrow set of tasks, such as web search and report writing. But one big hurdle to a future of highly performant, reliable, AI agents remains: getting them to stay on task when the task extends over a number of steps. Third-party benchmark tests show even the most powerful AI models experience higher failure rates the more steps they take to complete a task, and the longer time they spend on it (exceeding hours). A new academic framework called EAGLET proposes a practical and efficient method to improve long-horizon task performance in LLM-based agents — without the need for manual data labeling or retraining. Developed by researchers from Tsinghua University, Peking University, DeepLang AI, and the University of Illinois Urbana-Champaign, EAGLET offers a "global planner" that can be integrated into existing agent workflows to reduce hallucinations and improve task efficiency.EAGLET is a fine-tuned language model that interprets task instructions — typically provided as prompts by the user or the agent's operating environment — and generates a high-level plan for the agent (powered by its own LLM). It does not intervene during execution, but its up-front guidance helps reduce planning errors and improve task completion rates.Addressing the Planning Problem in Long-Horizon AgentsMany LLM-based agents struggle with long-horizon tasks because they rely on reactive, step-by-step reasoning. This approach often leads to trial-and-error behavior, planning hallucinations, and inefficient trajectories. EAGLET tackles this limitation by introducing a global planning module that works alongside the executor agent. Instead of blending planning and action generation in a single model, EAGLET separates them, enabling more coherent, task-level strategies.A Two-Stage Training Pipeline with No Human AnnotationsEAGLET’s planner is trained using a two-stage process that requires no human-written plans or annotations. The first stage involves generating synthetic plans with high-capability LLMs, such as GPT-5 and DeepSeek-V3.1-Think. These plans are then filtered using a novel strategy called homologous consensus filtering, which retains only those that improve task performance for both expert and novice executor agents. In the second stage, a rule-based reinforcement learning process further refines the planner, using a custom-designed reward function to assess how much each plan helps multiple agents succeed.Introducing the Executor Capability Gain Reward (ECGR)One of EAGLET’s key innovations is the Executor Capability Gain Reward (ECGR). This reward measures the value of a generated plan by checking whether it helps both high- and low-capability agents complete tasks more successfully and with fewer steps. It also includes a decay factor to favor shorter, more efficient task trajectories. This approach avoids over-rewarding plans that are only useful to already-competent agents and promotes more generalizable planning guidance.Compatible with Existing Agents and ModelsThe EAGLET planner is designed to be modular and "plug-and-play," meaning it can be inserted into existing agent pipelines without requiring executor retraining. In evaluations, the planner boosted performance across a variety of foundational models, including GPT-4.1, GPT-5, Llama-3.1, and Qwen2.5. It also proved effective regardless of prompting strategy, working well with standard ReAct-style prompts as well as approaches like Reflexion.State-of-the-Art Performance Across BenchmarksEAGLET was tested on three widely used benchmarks for long-horizon agent tasks: ScienceWorld, which simulates scientific experiments in a text-based lab environment; ALFWorld, which tasks agents with completing household activities through natural language in a simulated home setting; and WebShop, which evaluates goal-driven behavior in a realistic online shopping interface.Across all three, executor agents equipped with EAGLET outperformed their non-planning counterparts and other planning baselines, including MPO and KnowAgent. In experiments with the open source Llama-3.1-8B-Instruct model, EAGLET boosted average performance from 39.5 to 59.4, a +19.9 point gain across tasks. On ScienceWorld unseen scenarios, it raised performance from 42.2 to 61.6. In ALFWorld seen scenarios, EAGLET improved outcomes from 22.9 to 54.3, a more than 2.3× increase in performance.Even stronger gains were seen with more capable models. For instance, GPT-4.1 improved from 75.5 to 82.2 average score with EAGLET, and GPT-5 rose from 84.5 to 88.1, despite already being strong performers. In some benchmarks, performance gains were as high as +11.8 points, such as when combining EAGLET with the ETO executor method on ALFWorld unseen tasks.Compared to other planning baselines like MPO, EAGLET consistently delivered higher task completion rates. For example, on ALFWorld unseen tasks with GPT-4.1, MPO achieved 79.1, while EAGLET scored 83.6—a +4.5 point advantage.Additionally, the paper reports that agents using EAGLET complete tasks in fewer steps on average. With GPT-4.1 as executor, average step count dropped from 13.0 (no planner) to 11.1 (EAGLET). With GPT-5, it dropped from 11.4 to 9.4, supporting the claim of improved execution efficiency.Efficiency Gains in Training and ExecutionCompared to RL-based methods like GiGPO, which can require hundreds of training iterations, EAGLET achieved better or comparable results with roughly one-eighth the training effort. This efficiency also carries over into execution: agents using EAGLET typically needed fewer steps to complete tasks. This translates into reduced inference time and compute cost in production scenarios.No Public Code—YetAs of the version submitted to arXiv, the authors have not released an open-source implementation of EAGLET. It is unclear if or when the code will be released, under what license, or how it will be maintained, which may limit the near-term utility of the framework for enterprise deployment. VentureBeat has reached out to the authors to clarify these points and will update this piece when we hear back.Enterprise Deployment Questions RemainWhile the planner is described as plug-and-play, it remains unclear whether EAGLET can be easily integrated into popular enterprise agent frameworks such as LangChain or AutoGen, or if it requires a custom stack to support plan-execute separation. Similarly, the training setup leverages multiple executor agents, which may be difficult to replicate in enterprise environments with limited model access. VentureBeat has asked the researchers whether the homologous consensus filtering method can be adapted for teams that only have access to one executor model or limited compute resources.EAGLET’s authors report success across model types and sizes, but it is not yet known what the minimal viable model scale is for practical deployment. For example, can enterprise teams use the planner effectively with sub-10B parameter open models in latency-sensitive environments? Additionally, the framework may offer industry-specific value in domains like customer support or IT automation, but it remains to be seen how easily the planner can be fine-tuned or customized for such verticals.Real-Time vs. Pre-Generated PlanningAnother open question is how EAGLET is best deployed in practice. Should the planner operate in real-time alongside executors within a loop, or is it better used offline to pre-generate global plans for known task types? Each approach has implications for latency, cost, and operational complexity. VentureBeat has posed this question to the authors and will report any insights that emerge.Strategic Tradeoffs for Enterprise TeamsFor technical leaders at medium-to-large enterprises, EAGLET represents a compelling proof of concept for improving the reliability and efficiency of LLM agents. But without public tooling or implementation guidelines, the framework still presents a build-versus-wait decision. Enterprises must weigh the potential gains in task performance and efficiency against the costs of reproducing or approximating the training process in-house.Potential Use Cases in Enterprise SettingsFor enterprises developing agentic AI systems—especially in environments requiring stepwise planning, such as IT automation, customer support, or online interactions—EAGLET offers a template for how to incorporate planning without retraining. Its ability to guide both open- and closed-source models, along with its efficient training method, may make it an appealing starting point for teams seeking to improve agent performance with minimal overhead.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
We keep talking about AI agents, but do we ever know what they are?

<p>Imagine you do two things on a Monday morning.</p><p>First, you ask a chatbot to summarize your new emails. Next, you ask an AI tool to figure out why your top competitor grew so [...]

Match Score: 113.70

Sony sues Tencent over its Horizon Zero Dawn clone
Sony sues Tencent over its Horizon Zero Dawn clone

<p>Sony is suing Tencent for copying nearly every aspect of its Horizon games for the upcoming <em>Light of Motiram</em>, an open-world hunting game with some obvious similarities to [...]

Match Score: 89.84

Forza Horizon 5 is on the PS5, so I no longer need an Xbox
Forza Horizon 5 is on the PS5, so I no longer need an Xbox

<p><em>Forza Horizon 5</em> is the entire reason I have an Xbox Series S. I’m not really a car guy in real life — if money, practicality and burning through fossil fuels were les [...]

Match Score: 87.96

venturebeat
Microsoft retires AutoGen and debuts Agent Framework to unify and govern en

<p><a href="https://www.microsoft.com/"><u>Microsoft</u></a>’s multi-agent framework, AutoGen, acts as the backbone for many enterprise projects, particularly [...]

Match Score: 78.94

venturebeat
OpenAI unveils AgentKit that lets developers drag and drop to build AI agen

<p><a href="https://openai.com/">OpenAI</a> launched an agent builder that the company hopes will eliminate fragmented tools and make it easier for enterprises to utilize O [...]

Match Score: 75.78

Forza Horizon 5 careens onto PS5 on April 29
Forza Horizon 5 careens onto PS5 on April 29

<div id="4ef7fbb4791041788a8bc0bae2e6c7c8"><div style="left:0;width:100%;height:0;position:relative;padding-bottom:56.25%;"><iframe src="https://www.youtube.com [...]

Match Score: 73.32

venturebeat
New memory framework builds AI agents that can handle the real world's unpr

<p>Researchers at the <a href="https://illinois.edu/"><u>University of Illinois Urbana-Champaign</u></a> and <a href="https://research.google/teams/clou [...]

Match Score: 69.07

Forza Horizon 6 takes the arcade racing series to Japan in 2026
Forza Horizon 6 takes the arcade racing series to Japan in 2026

<p>Microsoft has officially unveiled the next Forza Horizon game, confirming months of rumors that the latest entry in its consistently excellent open-world racing series will be set in Japan. A [...]

Match Score: 65.83

venturebeat
Visa just launched a protocol to secure the AI shopping boom — here’s w

<p><a href="https://usa.visa.com/"><u>Visa</u></a> is introducing a new security framework designed to solve one of the thorniest problems emerging in artificia [...]

Match Score: 59.23