|
|
|
|
|
by kkukshtel
96 days ago
|
|
"I make AI output lots of stuff" is not an intrinsically valuable thing. I can run the same thing on Claude in research mode and get a report with cited sources in a more digestable format on my phone. What's the eval here on if any of this is good? Is it even possible to test (ie, you cant really AB test startup ideas)? |
|
On the eval side, we ran Spine Swarm against GAIA Level 3 and Google DeepMind's DeepSearchQA and hit #1 on both.Full writeup: https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-...