|
|
|
|
|
by sfaist
335 days ago
|
|
The reason we think this would be interesting to share here is that these llm benchmarks seem increasingly disconnected from reality. idc if the llm can solve a PhD math question or make scientific discoveries, I care if it can solve our problems, which in our case is automating API integrations. Turns out it mostly can't, which tracks well with our experience using cursor. |
|