This illustrates a widespread problem affecting large language models (LLMs): even when an English-language version passes a safety test, it can still hallucinate dangerous misinformation in other ...
A team led by Professor Daniel Abrams and PhD graduate Emma Zajdela (PhD ’23) created—and mined—the most comprehensive ...
Fashion followers know that trends tend to reappear on a 20-year cycle, and a new analysis of more than 150 years’ worth of ...
New benchmark study results show leading AI models, including ChatGPT, Claude, and Gemini, still lag humans in visual math reasoning.
Apple researchers have created an AI model that reconstructs a 3D object from a single image, while keeping light effects consistent across viewing angles.
The odds of a perfect bracket are 1 in 9.2 quintillion — but one math professor thinks he's cracked the code for March ...
VUB's Data Analytics Lab has published new results showing that it is possible to develop original mathematical proofs using commercial language models. In a paper posted to the arXiv preprint server, ...
The central limit theorem started as a bar trick for 18th-century gamblers. Now scientists rely on it every day.
In this post, we share the motivations, design choices, experiments, and learnings that informed its development, as well as an evaluation of the model’s performance and guidance on how to use it. Our ...
The GSMM Camp is a weeklong workshop that builds interdisciplinary problem-solving skills for graduate and advanced undergraduate students. Participants work in teams on mathematically rich problems ...
xVerify is an evaluation tool fine-tuned from a pre-trained large language model, designed specifically for objective questions with a single correct answer. It accurately extracts the final answer ...
A few months before the 2025 International Mathematical Olympiad (IMO) in July, a three-person team at OpenAI made a long bet that they could use the competition’s brutally tough problems to train an ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results