AnyAi.fyi - Discover ANY AI to make more online for less.

venturebeat

EAGLET boosts AI agent performance on longer-horizon tasks by generating custom plans

2025 was supposed to be the year of "AI agents," according to Nvidia CEO Jensen Huang, and other AI industry personnel. And it has been, in many ways, with numerous leading AI model providers such as OpenAI, Google, and even Chinese competitors like Alibaba releasing fine-tuned AI models or applications designed to focus on a narrow set of tasks, such as web search and report writing. But one big hurdle to a future of highly performant, reliable, AI agents remains: getting them to stay on task when the task extends over a number of steps. Third-party benchmark tests show even the most powerful AI models experience higher failure rates the more steps they take to complete a task, and the longer time they spend on it (exceeding hours). A new academic framework called EAGLET proposes a practical and efficient method to improve long-horizon task performance in LLM-based agents — without the need for manual data labeling or retraining. Developed by researchers from Tsinghua University, Peking University, DeepLang AI, and the University of Illinois Urbana-Champaign, EAGLET offers a "global planner" that can be integrated into existing agent workflows to reduce hallucinations and improve task efficiency.EAGLET is a fine-tuned language model that interprets task instructions — typically provided as prompts by the user or the agent's operating environment — and generates a high-level plan for the agent (powered by its own LLM). It does not intervene during execution, but its up-front guidance helps reduce planning errors and improve task completion rates.Addressing the Planning Problem in Long-Horizon AgentsMany LLM-based agents struggle with long-horizon tasks because they rely on reactive, step-by-step reasoning. This approach often leads to trial-and-error behavior, planning hallucinations, and inefficient trajectories. EAGLET tackles this limitation by introducing a global planning module that works alongside the executor agent. Instead of blending planning and action generation in a single model, EAGLET separates them, enabling more coherent, task-level strategies.A Two-Stage Training Pipeline with No Human AnnotationsEAGLET’s planner is trained using a two-stage process that requires no human-written plans or annotations. The first stage involves generating synthetic plans with high-capability LLMs, such as GPT-5 and DeepSeek-V3.1-Think. These plans are then filtered using a novel strategy called homologous consensus filtering, which retains only those that improve task performance for both expert and novice executor agents. In the second stage, a rule-based reinforcement learning process further refines the planner, using a custom-designed reward function to assess how much each plan helps multiple agents succeed.Introducing the Executor Capability Gain Reward (ECGR)One of EAGLET’s key innovations is the Executor Capability Gain Reward (ECGR). This reward measures the value of a generated plan by checking whether it helps both high- and low-capability agents complete tasks more successfully and with fewer steps. It also includes a decay factor to favor shorter, more efficient task trajectories. This approach avoids over-rewarding plans that are only useful to already-competent agents and promotes more generalizable planning guidance.Compatible with Existing Agents and ModelsThe EAGLET planner is designed to be modular and "plug-and-play," meaning it can be inserted into existing agent pipelines without requiring executor retraining. In evaluations, the planner boosted performance across a variety of foundational models, including GPT-4.1, GPT-5, Llama-3.1, and Qwen2.5. It also proved effective regardless of prompting strategy, working well with standard ReAct-style prompts as well as approaches like Reflexion.State-of-the-Art Performance Across BenchmarksEAGLET was tested on three widely used benchmarks for long-horizon agent tasks: ScienceWorld, which simulates scientific experiments in a text-based lab environment; ALFWorld, which tasks agents with completing household activities through natural language in a simulated home setting; and WebShop, which evaluates goal-driven behavior in a realistic online shopping interface.Across all three, executor agents equipped with EAGLET outperformed their non-planning counterparts and other planning baselines, including MPO and KnowAgent. In experiments with the open source Llama-3.1-8B-Instruct model, EAGLET boosted average performance from 39.5 to 59.4, a +19.9 point gain across tasks. On ScienceWorld unseen scenarios, it raised performance from 42.2 to 61.6. In ALFWorld seen scenarios, EAGLET improved outcomes from 22.9 to 54.3, a more than 2.3× increase in performance.Even stronger gains were seen with more capable models. For instance, GPT-4.1 improved from 75.5 to 82.2 average score with EAGLET, and GPT-5 rose from 84.5 to 88.1, despite already being strong performers. In some benchmarks, performance gains were as high as +11.8 points, such as when combining EAGLET with the ETO executor method on ALFWorld unseen tasks.Compared to other planning baselines like MPO, EAGLET consistently delivered higher task completion rates. For example, on ALFWorld unseen tasks with GPT-4.1, MPO achieved 79.1, while EAGLET scored 83.6—a +4.5 point advantage.Additionally, the paper reports that agents using EAGLET complete tasks in fewer steps on average. With GPT-4.1 as executor, average step count dropped from 13.0 (no planner) to 11.1 (EAGLET). With GPT-5, it dropped from 11.4 to 9.4, supporting the claim of improved execution efficiency.Efficiency Gains in Training and ExecutionCompared to RL-based methods like GiGPO, which can require hundreds of training iterations, EAGLET achieved better or comparable results with roughly one-eighth the training effort. This efficiency also carries over into execution: agents using EAGLET typically needed fewer steps to complete tasks. This translates into reduced inference time and compute cost in production scenarios.No Public Code—YetAs of the version submitted to arXiv, the authors have not released an open-source implementation of EAGLET. It is unclear if or when the code will be released, under what license, or how it will be maintained, which may limit the near-term utility of the framework for enterprise deployment. VentureBeat has reached out to the authors to clarify these points and will update this piece when we hear back.Enterprise Deployment Questions RemainWhile the planner is described as plug-and-play, it remains unclear whether EAGLET can be easily integrated into popular enterprise agent frameworks such as LangChain or AutoGen, or if it requires a custom stack to support plan-execute separation. Similarly, the training setup leverages multiple executor agents, which may be difficult to replicate in enterprise environments with limited model access. VentureBeat has asked the researchers whether the homologous consensus filtering method can be adapted for teams that only have access to one executor model or limited compute resources.EAGLET’s authors report success across model types and sizes, but it is not yet known what the minimal viable model scale is for practical deployment. For example, can enterprise teams use the planner effectively with sub-10B parameter open models in latency-sensitive environments? Additionally, the framework may offer industry-specific value in domains like customer support or IT automation, but it remains to be seen how easily the planner can be fine-tuned or customized for such verticals.Real-Time vs. Pre-Generated PlanningAnother open question is how EAGLET is best deployed in practice. Should the planner operate in real-time alongside executors within a loop, or is it better used offline to pre-generate global plans for known task types? Each approach has implications for latency, cost, and operational complexity. VentureBeat has posed this question to the authors and will report any insights that emerge.Strategic Tradeoffs for Enterprise TeamsFor technical leaders at medium-to-large enterprises, EAGLET represents a compelling proof of concept for improving the reliability and efficiency of LLM agents. But without public tooling or implementation guidelines, the framework still presents a build-versus-wait decision. Enterprises must weigh the potential gains in task performance and efficiency against the costs of reproducing or approximating the training process in-house.Potential Use Cases in Enterprise SettingsFor enterprises developing agentic AI systems—especially in environments requiring stepwise planning, such as IT automation, customer support, or online interactions—EAGLET offers a template for how to incorporate planning without retraining. Its ability to guide both open- and closed-source models, along with its efficient training method, may make it an appealing starting point for teams seeking to improve agent performance with minimal overhead.

Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

Upwork study shows AI agents excel with human partners but fail independent

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to < [...]

More Copy

Match Score: 112.83

venturebeat

We keep talking about AI agents, but do we ever know what they are?

Imagine you do two things on a Monday morning.First, you ask a chatbot to summarize your new emails. Next, you ask an AI tool to figure out why your top competitor grew so [...]

More Copy

Match Score: 104.41

Sony’s latest Horizon spin-off is an MMORPG for PC and mobile, but not PS

<div><div style="left:0;width:100%;height:0;position:relative;padding-bottom:56.25%;"><iframe src="https://www.youtube.com/embed/bN-NNgWIyIQ?rel=0" style="top:0 [...]

More Copy

Match Score: 103.94

venturebeat

GitHub's Agent HQ aims to solve enterprises' biggest AI coding problem: Too

<a href="https://github.com/">GitHub</a> is making a bold bet that enterprises don't need another proprietary coding agent. They need a way t [...]

More Copy

Match Score: 93.29

venturebeat

Under the hood of AI agents: A technical guide to the next frontier of gen

Agents are the trendiest topic in AI today — and with good reason. Taking gen AI out of the protected sandbox of the chat interface and allowing it to act directly on the world represents a [...]

More Copy

Match Score: 87.37

venturebeat

Salesforce Agentforce Observability lets you watch your AI agents think in

<a href="https://www.salesforce.com/">Salesforce</a> launched a suite of monitoring tools on Thursday designed to solve what has become one of the tho [...]

More Copy

Match Score: 87.03

Sony sues Tencent over its Horizon Zero Dawn clone

Sony is suing Tencent for copying nearly every aspect of its Horizon games for the upcoming Light of Motiram, an open-world hunting game with some obvious similarities to [...]

More Copy

Match Score: 84.14

Forza Horizon 5 is on the PS5, so I no longer need an Xbox

Forza Horizon 5 is the entire reason I have an Xbox Series S. I’m not really a car guy in real life — if money, practicality and burning through fossil fuels were les [...]

More Copy

Match Score: 81.95

venturebeat

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by M

Elon Musk's frontier generative AI startup xAI<a href="https://x.ai/news/grok-4-1-fast"> formally opened developer access to its Grok 4.1 Fast models</a> last n [...]

More Copy

Match Score: 78.83