|
|
|
|
|
by crazylogger
342 days ago
|
|
I think next year's AI benchmarks are going to be like this project: https://www.anthropic.com/research/project-vend-1 Give the AI tools and let it do real stuff in the world: "FounderBench": Ask the AI to build a successful business, whatever that business may be - the AI decides. Maybe try to get funded by YC - hiring a human presenter for Demo Day is allowed. They will be graded on profit / loss, and valuation. Testing plain LLM on whiteboard-style question is meaningless now. Going forward, it will all be multi-agent systems with computer use, long-term memory & goals, and delegation. |
|