|
|
|
|
|
by kmacdough
136 days ago
|
|
What are we testing here? It feels like a very odd test because it's such an unreasonable way to answer this with an LLM. Nothing about the task requires more than a very localized understanding. It's not like a codebase or corporate documentation, where there's a lot of interconectedness and context that's important. It also doesn't seem to poke at the gap between human and AI intelligence. Why are people excited? What am I missing? |
|