GPT-5.5 dominates $1,500 LLM hacking test while Gemini refuses to even try

04.06.2026 в 09:55,
Hard news

A security researcher spent $1,500 running 13+ AI models against a deliberately vulnerable app. GPT-5.5 led with a 70% solve rate, DeepSeek V4 Pro solved it for $0.62 per attempt, and Gemini refused t

o engage almost entirely. ...

Автор: notebookcheck.net
Источник: https://www.notebookcheck.net/GPT-5-5-dominates-1-500-LLM-hacking-test-while-Gemini-refuses-to-even-try.1315097.0.html
×