|
|
|
|
|
by dmix
1129 days ago
|
|
They released the product to the public… we might not have formal academic studies but millions of people trying it and determining it’s utility vs the competition is as good of a test as any. If pushing the context window turns out to not be the right approach it’s not like there won’t be 10 other companies chomping at the bit to prove them wrong with their own hypothesis. And it’s entirely possible there are multiple correct answers for different usecases. |
|
Disagree. We aren't polling these people. How do I even get a distilled view of what their thoughts are?
It's a far cry from the level of evaluation that existed before. The lack of benchmarks (until the last week or so - thank you huggingface and lm-sys!) has been very noticeable.
You will get people claiming that LLaMa outperforms ChatGPT, etc. We have no sense of how performance degrades over longer sequence lengths... or even what sort of sparse attention technique they are using for longer sequences (most of which have known problems). It's absurd.