AnyAi.fyi - Discover ANY AI to make more online for less.

Gemini 3 Pro and GPT-5 still fail at complex physics tasks designed for real scientific research

A new physics benchmark called "CritPt" puts leading AI models to the test at the level of early-stage PhD research. The results show that even top systems like Gemini 3 Pro and GPT-5 still fall far short of acting as autonomous scientists.
The article Gemini 3 Pro and GPT-5 still fail at complex physics tasks designed for real scientific research appeared first on THE DECODER.

Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

Artificial Analysis overhauls its AI Intelligence Index, replacing popular

The arms race to build smarter AI models has a measurement problem: the tests used to rank them are becoming obsolete almost as quickly as the models improve. On Monday, <a href="http [...]

More Copy

Match Score: 138.75

venturebeat

Gemini 3 Flash arrives with reduced costs and latency — a powerful combo

Enterprises can now harness the power of a large language model that's near that of the state-of-the-art<a href="https://venturebeat.com/ai/google-unveils-gemini-3-claiming-t [...]

More Copy

Match Score: 136.28

venturebeat

OpenAI is ending API access to fan-favorite GPT-4o model in February 2026

OpenAI has sent out emails notifying API customers that its chatgpt-4o-latest model will be retired from the developer platform in mid-February 2026,. Access to the model i [...]

More Copy

Match Score: 136.02

venturebeat

Google unveils Gemini 3 claiming the lead in math, science, multimodal and

After more than a month of rumors and feverish speculation — including <a href="https://polymarket.com/event/gemini-3pt0-released-by">Polymarket wagering on the release date [...]

More Copy

Match Score: 135.27

venturebeat

Upwork study shows AI agents excel with human partners but fail independent

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to < [...]

More Copy

Match Score: 120.85

venturebeat

OpenAI's GPT-5.2 is here: what enterprises need to know

The rumors were true, and the "<a href="https://www.theinformation.com/articles/openai-ceo-declares-code-red-combat-threats-chatgpt-delays-ads-effort">Code Red</a& [...]

More Copy

Match Score: 117.66

Google's Michel Devoret is one of the 2025 winners of the Nobel Prize

The Royal Swedish Academy of Sciences <a data-i13n="elm:context_link;elmt:doNotAffiliate;cpos:1;pos:1" class="no-affiliate-link" href="https://www.nobelprize.org/p [...]

More Copy

Match Score: 85.67

venturebeat

OpenAI reboots ChatGPT experience with GPT-5.1 after mixed reviews of GPT-5

ChatGPT is about to become faster and more conversational as <a href="https://openai.com/">OpenAI</a> upgrades its <a href="https://venturebea [...]

More Copy

Match Score: 84.30

venturebeat

What enterprises should know about The White House's new AI 'Manh

President Donald Trump’s new “<a href="https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/">Genesis Mission</a>” unveiled Monday [...]

More Copy

Match Score: 82.18