|
|
|
|
|
by AdamCraven
998 days ago
|
|
Well, they buried the lede with this one. Using LLMs were better for some tasks and actually made it worse for others. The first task was a generalist task ("inside the frontier" as they refer to it), which I'm not surprised has improved performance, as it purposely made to fall into an LLM's areas of strength: research into well-defined areas where you might not have strong domain knowledge. This also is the mainstay of early consultants' work, in which they are generalists in their early careers – usually as business analysts or similar – until they become more valuable and specialise later on. LLMs are strong in this area of general research because they have generalised a lot of information. But this generalisation is also its weakness. A good way to think about it is it's like a journalist of research. If you've ever read a newspaper, you often think you're getting a lot of insight. However, as soon as you read an article on an area of your specialisation, you realise they've made many flaws with the analysis; they don't understand your subject anywhere near the level you would. The second task (outside the frontier) required analysis of a spreadsheet, interviews and a more deeply analytical take with evidence to back it up. These are all tasks that LLMs aren't strong at currently. Unsurprisingly, the non-LLM group scored 84.5%, and between 60% and 70.6% for LLM users. The takeaway should be that LLMs are great for generalised research but less good for specialist analytical tasks. |
|
When I ask a programming question, chat GPT hallucinates something about 20% of the time and I can only tell because I’m skilled enough to see it. For all the other domains I ask it questions if I should assume at least as much hallucination and incorrect information.