Large Language Models Benchmarks

18h

Simbian Publishes World’s First Cyber Defense Benchmark; Finds Frontier LLMs Alone Do Poor Job at Attack Discovery

Simbian Cyber Defense Benchmark reveals LLMs find and exploit vulnerabilities but fail at defense out-of-the-box without a sophisticated harness.

58m

Xiaomi Just Made Powerful AI Open, Cheaper, and 60% More Efficient Than Others

Xiaomi has introduced two new open-source large language models, Xiaomi MiMo V2.5 and Xiaomi MiMo V2.5 Pro. Both models are released under the MIT ...

Unite.AI

Simbian Launches Cyber Defense Benchmark, Reveals Major Gap in AI Security Capabilities

A new benchmark released by Simbian is challenging one of the most widely held assumptions in artificial intelligence: that the same models capable of finding vulnerabilities can also defend against ...

AWS brings OpenAI’s AI models and Codex programming assistant to its cloud

OpenAI Group PBC’s large language models available on its cloud platform. The algorithms are accessible through Amazon ...

Hosted on MSN

Simbian benchmark shows AI fails real-world cyber defense

Simbian’s new Cyber Defense Benchmark found that no leading large language model (LLM) could pass realistic enterprise cyber defense tests, despite their offensive capabilities. The study highlights a ...

How to build custom reasoning agents with a fraction of the compute

The technique, called Reinforcement Learning with Verifiable Rewards with Self-Distillation (RLSD), combines the reliable ...

15hOpinion

China’s DeepSeek V4 And Qwen Reshape The Open-Source AI Race

China's DeepSeek Cuts AI Prices Again With New V4 Model A Year After Rattling Global AI Markets. The AI Price War Reignites A ...

American AI startup Poolside launches free, high-performing open model Laguna XS.2 for local agentic coding

By putting the weights of a highly capable, 33B-parameter agentic model in the hands of researchers and startups, Poolside is ...

ZME Science

Turns out, you can fool the world’s smartest AIs by using weird, monk-like language

The study suggests that some of the world’s most advanced language models still struggle to recognize malicious intent when ...

23h

Why Developers Are Switching to DeepSeek V4 Flash for Open-Source AI

Learn how the open-source DeepSeek V4 compares to ChatGPT in speed, pricing, and performance for developers building complex ...

AlphaGalileo

Scientific Reports | Insilico Medicine advances AI-driven target discovery with validated TargetPro–TargetBench framework

Insilico Medicine, a clinical-stage biotechnology company powered by generative artificial intelligence (AI), announced ...

11h

Bloomberg, the OG of financial data firms, has a potent new AI agent. How it built it holds lessons for other companies

In this edition…China blocks Meta’s purchase of Manus…OpenAI falls short of its revenue and growth targets…Anthropic shows AI ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results