|
|
|
|
|
by jampekka
3 days ago
|
|
(Three out of) four experiments is anecdotal for sure, but the result meshes with more established instruction following benchmarking (although DeepSeek V4 pro does not top these): https://artificialanalysis.ai/evaluations/ifbench I found the writing clear and quite even handed. The lead is a bit salesy, but leads typically are. Knee-jerk dismissals based on vibes that something is LLM generated are quite low-effort. |
|