Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
Nvidia debuts Nemotron 3 with hybrid MoE and Mamba-Transformer to drive efficient agentic AI

Nvidia launched the new version of its frontier models, Nemotron 3, by leaning in on a model architecture that the world’s most valuable company said offers more accuracy and reliability for agents. Nemotron 3 will be available in three sizes: Nemotron 3 Nano with 30B parameters, mainly for targeted, highly efficient tasks; Nemotron 3 Super, which is a 100B parameter model for multi-agent applications and with high-accuracy reasoning and Nemotron 3 Ultra, with its large reasoning engine and around 500B parameters for more complex applications. To build the Nemotron 3 models, Nvidia said it leaned into a hybrid mixture-of-experts (MoE) architecture to improve scalability and efficiency. By using this architecture, Nvidia said in a press release that its new models also offer enterprises more openness and performance when building multi-agent autonomous systems. Kari Briski, Nvidia vice president for generative AI software, told reporters in a briefing that the company wanted to demonstrate its commitment to learn and improving from previous iterations of its models. “We believe that we are uniquely positioned to serve a wide range of developers who want full flexibility to customize models for building specialized AI by combining that new hybrid mixture of our mixture of experts architecture with a 1 million token context length,” Briski said.  Nvidia said early adopters of the Nemotron 3 models include Accenture, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens and Zoom.Breakthrough architectures Nvidia has been using the hybrid Mamba-Transformer mixture-of-experts architecture for many of its models, including Nemotron-Nano-9B-v2.The architecture is based on research from Carnegie Mellon University and Princeton, which weaves in selective state-space models to handle long pieces of information while maintaining states. It can reduce compute costs even through long contexts. Nvidia noted its design “achieves up to 4x higher token throughput” compared to Nemotron 2 Nano and can significantly lower inference costs by reducing reasoning token generation by up 60%.“We really need to be able to bring that efficiency up and the cost per token down. And you can do it through a number of ways, but we're really doing it through the innovations of that model architecture,” Briski said. “The hybrid Mamba transformer architecture runs several times faster with less memory, because it avoids these huge attention maps and key value caches for every single token.”Nvidia also introduced an additional innovation for the Nemotron 3 Super and Ultra models. For these, Briski said Nvidia deployed “a breakthrough called latent MoE.”“That’s all these experts that are in your model share a common core and keep only a small part private. It’s kind of like chefs sharing one big kitchen, but they need to get their own spice rack,” Briski added. Nvidia is not the only company that employs this kind of architecture to build models. AI21 Labs uses it for its Jamba models, most recently in its Jamba Reasoning 3B model.The Nemotron 3 models benefited from extended reinforcement learning. The larger models, Super and Ultra, used the company’s 4-bit NVFP4 training format, which allows them to train on existing infrastructure without compromising accuracy.Benchmark testing from Artificial Analysis placed the Nemotron models highly among models of similar size. New environments for models to ‘work out’As part of the Nemotron 3 launch, Nvidia will also give users access to its research by releasing its papers and sample prompts, offering open datasets where people can use and look at pre-training tokens and post-training samples, and most importantly, a new NeMo Gym where customers can let their models and agents “workout.” The NeMo Gym is a reinforcement learning lab where users can let their models run in simulated environments to test their post-training performance. AWS announced a similar tool through its Nova Forge platform, targeted for enterprises that want to test out their newly created distilled or smaller models.  Briski said the samples of post-training data Nvidia plans to release “are orders of magnitude larger than any available post-training data set and are also very permissive and open.”Nvidia pointed to developers seeking highly intelligent and performant open models, so they can better understand how to guide them if needed, as the basis for releasing more information about how it trains its models. “Model developers today hit this tough trifecta. They need to find models that are ultra open, that are extremely intelligent and are highly efficient,” she said. “Most open models force developers into painful trade-offs between efficiencies like token costs, latency, and throughput.”She said developers want to know how a model was trained, where the training data came from and how they can evaluate it.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Open source Mamba 3 arrives to surpass Transformer architecture with nearly

<p>The generative AI era began for most people with the <a href="https://openai.com/index/chatgpt/">launch of OpenAI&#x27;s ChatGPT in late 2022</a>, but the underlying [...]

Match Score: 564.34

venturebeat
'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mam

<p>IBM today <a href="https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models">announced the release of Granite 4.0</a>, the ne [...]

Match Score: 412.40

venturebeat
Nvidia launches enterprise AI agent platform with Adobe, Salesforce, SAP am

<p><a href="https://www.nvidia.com/gtc/keynote/">Jensen Huang</a> walked onto the <a href="https://www.nvidia.com/gtc/">GTC stage</a> Monday wearing h [...]

Match Score: 373.59

venturebeat
Nvidia introduces Vera Rubin, a seven-chip AI platform with OpenAI, Anthrop

<p><a href="https://nvidia.com/">Nvidia</a> on Monday took the wraps off <a href="https://www.nvidia.com/en-us/data-center/technologies/rubin/">Vera Rubin&l [...]

Match Score: 248.22

venturebeat
Nvidia's new open weights Nemotron 3 super combines three different ar

<p>Multi-agent systems, designed to handle long-horizon tasks like software engineering or cybersecurity triaging, can generate up to 15 times the token volume of standard chats — threatening [...]

Match Score: 243.63

venturebeat
Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base lever

<p>When the transformer architecture was introduced in 2017 in the now seminal Google paper &quot;<a href="https://arxiv.org/abs/1706.03762">Attention Is All You Need</a&g [...]

Match Score: 233.23

venturebeat
TII’s Falcon H1R 7B can out-reason models up to 7x its size — and it’

<p>For the last two years, the prevailing logic in generative AI has been one of brute force: if you want better reasoning, you need a bigger model. </p><p>While &quot;small& [...]

Match Score: 230.63

venturebeat
Nvidia’s Cosmos Reason 2 aims to bring reasoning VLMs into the physical w

<p>Nvidia CEO Jensen Huang said last year that we are now entering the age of physical AI. While the company continues to offer LLMs for software use cases, Nvidia is <a href="https://ve [...]

Match Score: 186.89

venturebeat
Nvidia's Nemotron-Cascade 2 wins math and coding gold medals with 3B a

<p>The prevailing assumption in AI development has been straightforward: larger models trained on more data produce better results. Nvidia&#x27;s latest release directly challenges that size [...]

Match Score: 158.61