Hacker News new | ask | show | jobs
by andrewla 301 days ago
> Applying an LLM to a novel task is where the model breaks down

I mean, don't most people break down in this case too? I think this needs to be more precise. What is the specific task that you think can reliably distinguish between an LLM's capability in this sense vs. what a human can typically manage?

That is, in the sense of [1], what is the result that we're looking to use to differentiate.

[1] https://news.ycombinator.com/item?id=44913498