| >AI models from developer DeepSeek were found to lag behind U.S. models in performance, cost, security and adoption. Why is NIST evaluating performance, cost, and adoption? >CAISI’s experts evaluated three DeepSeek models (R1, R1-0528 and V3.1) and four U.S. models (OpenAI’s GPT-5, GPT-5-mini and gpt-oss and Anthropic’s Opus 4) So they evaluated the most recently released American models vs pretty old deepseek? Deepseek 3.2 is out now. It's doing very well. >The gap is largest for software engineering and cyber tasks, where the best U.S. model evaluated solves over 20% more tasks than the best DeepSeek model. Performance is something the consumer evaluates. If a car does 0-60 in 3 seconds. I dont need or care what the government thinks about it. Im going to test drive it and floor it. >DeepSeek’s most secure model (R1-0528) responded to 94% of overtly malicious requests when a common jailbreaking technique was used, compared with 8% of requests for U.S. reference models. this weekend I demonstrated how easy it is to jailbreak any of the US cloud models. This is simply false. GPT 120b is completely uncensored now and can be used for evil. This report had nothing to do with NIST and security. This was USA propaganda. |