Hacker News new | ask | show | jobs
by sdesol 515 days ago
> It's only folks who have absolutely no idea how LLMs work that find the strawberry thing meaningful.

I think it is meaningful in that it highlights how we need to approach things a bit differently. For example, instead of asking "How many r's in strawberry?", we say "How many r's in strawberry? Show each character in an ordered list before counting. When counting, list the position in the ordered list." If we do this, every model that I asked got it right.

https://beta.gitsense.com/?chat=167c0a09-3821-40c3-8b0b-8422...

There are quirks we need to better understand and I would say the strawberry is one of them.

Edit: I should add that getting LLMs to count things might not be the best way to go about it. Having it generate code to count things would probably make more sense.

3 comments

Yes, you should say "could you please write and execute a program to count the number of "r" characters in the string "strawberry"
I was impressed with Claude Sonnet the other day - gave it a photo of my credit card bill (3 photos actually - long bill) and asked it to break it down by recurring categories, counting anything non-recurring as "other". It realized without being asked that a program was needed, and wrote/ran it to give me what I asked for.
I think this will be the future. LLMs will know enough to know that it should hand things off to something else.
It's the present. ChatGPT, for example, is an application. It uses models, but it does all kinds of stuff at the application level too.
Is the instructions to pass something off built into the model or is it clever prompting or a bit of both?
In general it's "tool use" where the model's system prompt tells it to use certain tools for certain tasks, and having been trained to follow instructions, it does so!
It's not that hard of a problem to solve at the application level. It's just hard to get a single model to do all the things.
> It's not that hard of a problem to solve at the application level.

I think it will be easy if you are focused on one or two models from the same family, but I think the complexity comes when you try to get a lot models to act in the same way.