Hacker News new | ask | show | jobs
by mordae 10 days ago
DeepSeek V4 Flash on default (high) reasoning had zero formatting failures, just genuine losses. Total score:

| DeepSeek V4 Flash | 97/100 | 41/50 | 41/50 | 92W / 102T / 6L |

Benchmark costed me just under $1.