Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


Most AI models can fake alignment, but safety training suppresses the behavior, study finds
Most AI models can fake alignment, but safety training suppresses the behavior, study finds

A new study analyzing 25 language models finds that most do not fake safety compliance - though not due to a lack of capability.
The article Most AI models can fake alignment, but safety training suppresses the behavior, study finds appeared first on THE DECODER.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Baseten takes on hyperscalers with new AI training platform that lets you o

<p><a href="https://www.baseten.co/"><u>Baseten</u></a>, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product [...]

Match Score: 145.70

venturebeat
Nvidia researchers boost LLMs reasoning skills by getting them to 'think' d

<p>Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. </p><p>The method, called <a href="https:// [...]

Match Score: 81.87

venturebeat
Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by M

<p>Elon Musk&#x27;s frontier generative AI startup xAI<a href="https://x.ai/news/grok-4-1-fast"> formally opened developer access to its Grok 4.1 Fast models</a> last n [...]

Match Score: 79.12

venturebeat
MIT offshoot Liquid AI releases blueprint for enterprise-grade small-model

<p>When Liquid AI, a startup f<a href="https://aimmediahouse.com/market-industry/from-worm-brains-to-a-2-billion-ai-unicorn-liquid-ai-defies-conventional-ai-limits">ounded by MIT [...]

Match Score: 74.64

venturebeat
Anthropic vs. OpenAI red teaming methods reveal different security prioriti

<p>M<!-- -->odel providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficul [...]

Match Score: 71.69

venturebeat
Google Cloud takes aim at CoreWeave and AWS with managed Slurm for enterpri

<p>Some enterprises are best served by fine-tuning large models to their needs, but a number of companies plan to <a href="https://venturebeat.com/ai/build-or-buy-scaling-your-enterprise [...]

Match Score: 70.61

venturebeat
AI models block 87% of single attacks, but just 8% when attackers persist

<p>One malicious prompt gets blocked, while ten prompts get through. That gap defines the difference between passing benchmarks and withstanding real-world attacks — and it&#x27;s a gap mo [...]

Match Score: 69.15

venturebeat
Arcee aims to reboot U.S. open source AI with new Trinity models released u

<p>For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.</p><p>Chinese research labs inc [...]

Match Score: 68.77

venturebeat
OpenAI experiment finds that sparse models could give AI builders the tools

<p><a href="https://openai.com/">OpenAI</a> researchers are <a href="https://openai.com/index/understanding-neural-networks-through-sparse-circuits/">experi [...]

Match Score: 68.74