Hacker News new | ask | show | jobs
by dmix 1129 days ago
They released the product to the public… we might not have formal academic studies but millions of people trying it and determining it’s utility vs the competition is as good of a test as any.

If pushing the context window turns out to not be the right approach it’s not like there won’t be 10 other companies chomping at the bit to prove them wrong with their own hypothesis. And it’s entirely possible there are multiple correct answers for different usecases.

4 comments

> millions of people trying it and determining it’s utility vs the competition is as good of a test as any.

Disagree. We aren't polling these people. How do I even get a distilled view of what their thoughts are?

It's a far cry from the level of evaluation that existed before. The lack of benchmarks (until the last week or so - thank you huggingface and lm-sys!) has been very noticeable.

You will get people claiming that LLaMa outperforms ChatGPT, etc. We have no sense of how performance degrades over longer sequence lengths... or even what sort of sparse attention technique they are using for longer sequences (most of which have known problems). It's absurd.

Biological evolution doesn’t do any special testing except reward whatever survives. And it works fine. Marketplaces implement the same algorithm faster and effectively.

There are many ways to find truth besides math and science.

Obviously, those two are the gold standard for difficult questions.

But when time is short (competitors at your heels), rewards are fast (lots of hype fueling prospective customers), and the tech isn’t even that hard (deep learning isn’t rocket science, lots of good ideas are panning out), then any organization that needs to acquire its own resources to survive should operate on a try-evaluate-ship loop as fast as they can.

Occasional missteps won’t be nearly as fatal as being slow and irrelevant.

No silver platter! You can even apply the same arguments for the Linux kernel. Where's the double blind peer review for linux 6.3.2????
Yeah, it's a weird comment to call it not "public, peer reviewed" when this article is about how it went public, giving people the opportunity to review it.
If I started selling a previously unknown cancer treatment over-the-counter in CVS, people would be justified in calling it not peer-reviewed, untested, etc. even if it is available to the public (giving people the opportunity to try it).
It could also end up like with the transition to digital cameras and megapixels. With companies adding more and more context just because the consumers minds are already imprinted with the idea that more is better. So in a few years we might have models with a window of 30 megatokens and it'll mean absolutely nothing.
What public? I've been waiting for weeks to try...