Hacker News new | ask | show | jobs
by lgbr 1184 days ago
It's absolutely fantastic that we have so many runtimes, so quickly, to the point where we have an awesome list.

However, given that the usefulness of chatbots depends more on the model being used, what I would find a lot more useful is a ranking of the various models that are available. Currently I'm having to rely on comments on the internet to find out if Alpaca 7B or LlaMA 65B is genuinely productive to use. As new models come out, I'd love it if I knew how well it tells jokes, answers complicated questions, or generates code.

1 comments

We have a whole team of folks just watching for these to come out and then go evaluate them.

Short answer: none of them do as well as the OG Davinci-003. Not even close. Even the 3.5 Turbo models from OpenAI don’t do as well.

We throw some sophisticated prompts at them to attempt chain of thought reasoning.

That's quite a confusing comment. `davinci-003` is from OpenAI, whereas ChatGPT is some sort of variants more "optimized" for chatting. Said differently, GPT3 or 3.5 is a customized version of `davinci-003`, made for chatting. Please don't ask me on the details, I don't know, but `davinci-003` is not an alternative to ChatGPT
>but `davinci-003` is not an alternative to ChatGPT

Why makes you believe that? In my testing davinci does better than gpt-3.5-turbo for most tasks.

I think people, and this article, is about suggesting alternatives (competitors) to ChatGPT. `davinci` is obviously not an alternative, ChatGPT is `davinci` made for chatting. As to whether davinci produces better responses than ChatGPT ... maybe? but that's a different question
It is an alternative. It’s just more expensive.
Do you have a citation for that?
would be interested in that as well
What kind of things have you seen davinci-003 do better than 3.5 turbo?
We need open benchmarks, clearly. Know any projects in that space?
Could you expand on this a bit more? What types of prompts? What are your evaluation criteria?

This actually sounds fascinating. Not unlike birdwatching! ))

That’s interesting - what about 4?