BullshitBench tests whether AI models can detect nonsensical questions—or if they'll confidently answer them anyway. The ...
I tried GPT-5.4, and most answers were really good - but a few had me concerned ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results