Y
Hacker News
new
|
ask
|
show
|
jobs
by
mordae
10 days ago
DeepSeek V4 Flash on default (high) reasoning had zero formatting failures, just genuine losses. Total score:
| DeepSeek V4 Flash | 97/100 | 41/50 | 41/50 | 92W / 102T / 6L |
Benchmark costed me just under $1.