Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
Researchers find that retraining only small parts of AI models can cut costs and prevent forgetting

Enterprises often find that when they fine-tune models, one effective approach to making a large language model (LLM) fit for purpose and grounded in data is to have the model lose some of its abilities. After fine-tuning, some models “forget” how to perform certain tasks or other tasks they already learned. Research from the University of Illinois Urbana-Champaign proposes a new method for retraining models that avoids “catastrophic forgetting,” in which the model loses some of its prior knowledge. The paper focuses on two specific LLMs that generate responses from images: LLaVA and Qwen 2.5-VL.The approach encourages enterprises to retrain only narrow parts of an LLM to avoid retraining the entire model and incurring a significant increase in compute costs. The team claims that catastrophic forgetting isn’t true memory loss, but rather a side effect of bias drift. “Training a new LMM can cost millions of dollars, weeks of time, and emit hundreds of tons of CO2, so finding ways to more efficiently and effectively update existing models is a pressing concern,” the team wrote in the paper. “Guided by this result, we explore tuning recipes that preserve learning while limiting output shift.”The researchers focused on a multi-layer perceptron (MLP), the model's internal decision-making component. 
Catastrophic forgetting The researchers wanted first to verify the existence and the cause of catastrophic forgetting in models. To do this, they created a set of target tasks for the models to complete. The models were then fine-tuned and evaluated to determine whether they led to substantial forgetting. But as the process went on, the researchers found that the models were recovering some of their abilities. “We also noticed a surprising result, that the model performance would drop significantly in held out benchmarks after training on the counting task, it would mostly recover on PathVQA, another specialized task that is not well represented in the benchmarks,” they said. “Meanwhile, while performing the forgetting mitigation experiments, we also tried separately tuning only the self-attention projection (SA Proj) or MLP layers, motivated by the finding that tuning only the LLM was generally better than tuning the full model. This led to another very surprising result – that tuning only self-attention projection layers led to very good learning of the target tasks with no drop in performance in held out tasks, even after training all five target tasks in a sequence.”The researchers said they believe that “what looks like forgetting or interference after fine-tuning on a narrow target task is actually bias in the output distribution due to the task distribution shift.”Narrow retrainingThat finding turned out to be the key to the experiment. The researchers noted that tuning the MLP increases the likelihood of “outputting numeric tokens and a highly correlated drop in held out task accuracy.” What it showed is that a model forgetting some of its knowledge is only temporary and not a long-term matter. “To avoid biasing the output distribution, we tune the MLP up/gating projections while keeping the down projection frozen, and find that it achieves similar learning to full MLP tuning with little forgetting,” the researchers said. This allows for a more straightforward and more reproducible method for fine-tuning a model. By focusing on a narrow segment of the model, rather than a wholesale retraining, enterprises can cut compute costs. It also allows better control of output drift. However, the research focuses only on two models, specifically those dealing with vision and language. The researchers noted that due to limited resources, they are unable to try the experiment with other models.Their findings, however, can be extended to other LLMs, especially for different modalities. 

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Self-improving language models are becoming reality with MIT's updated SEAL

<p>Researchers at the Massachusetts Institute of Technology (MIT) are gaining renewed attention for developing and <a href="https://github.com/Continual-Intelligence/SEAL/blob/main/LICEN [...]

Match Score: 94.08

venturebeat
ACE prevents context collapse with ‘evolving playbooks’ for self-improv

<p>A new framework from <a href="https://www.stanford.edu/"><u>Stanford University</u></a> and <a href="https://sambanova.ai/"><u>SambaNov [...]

Match Score: 79.63

venturebeat
'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mamba/Transfo

<p>IBM today <a href="https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models">announced the release of Granite 4.0</a>, the ne [...]

Match Score: 69.99

ExpressVPN review 2025: Fast speeds and a low learning curve
ExpressVPN review 2025: Fast speeds and a low learning curve

<p><a href="https://www.engadget.com/vpn-review-expressvpn-2023-gaming-streaming-160052492.html" data-autolinker-wiki-id="ExpressVPN" data-original-link="">Ex [...]

Match Score: 67.01

venturebeat
Samsung AI researcher's new, open reasoning model TRM outperforms models 10

<p>The trend of AI researchers developing new, <a href="https://www.linkedin.com/pulse/next-big-thing-ai-think-small-models-venturebeat-yyrte/?trackingId=x3X3vTZhTnmwCTUtOWGAug%3D%3D&quo [...]

Match Score: 62.17

NordVPN Review 2025: Innovative features, a few missteps
NordVPN Review 2025: Innovative features, a few missteps

<p>When we say that NordVPN is a good VPN that's not quite great, it's important to put that in perspective. Building a good VPN is hard, as evidenced by all the shovelware VPNs flooding the mar [...]

Match Score: 59.39

venturebeat
EAGLET boosts AI agent performance on longer-horizon tasks by generating cu

<p>2025 was supposed to be<a href="https://www.barrons.com/articles/nvidia-stock-ceo-ai-agents-8c20ddfb?gaa_at=eafs&amp;gaa_n=ASWzDAjLKLIimw5qFdsG0kmEnu-fOoNZXVCdnBx-zn_CbT1hLgiWcYGx [...]

Match Score: 57.95

Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa+
Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa+

<p>The keyword for the <a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/mobile/smartphones/iphone-16e-review-whats-your-acceptable-compromise-020016288.html"> [...]

Match Score: 57.56

venturebeat
Nvidia researchers boost LLMs reasoning skills by getting them to 'think' d

<p>Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. </p><p>The method, called <a href="https:// [...]

Match Score: 51.90