| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dmix 1129 days ago
	They released the product to the public… we might not have formal academic studies but millions of people trying it and determining it’s utility vs the competition is as good of a test as any. If pushing the context window turns out to not be the right approach it’s not like there won’t be 10 other companies chomping at the bit to prove them wrong with their own hypothesis. And it’s entirely possible there are multiple correct answers for different usecases.

4 comments

whimsicalism 1129 days ago

> millions of people trying it and determining it’s utility vs the competition is as good of a test as any.

Disagree. We aren't polling these people. How do I even get a distilled view of what their thoughts are?

It's a far cry from the level of evaluation that existed before. The lack of benchmarks (until the last week or so - thank you huggingface and lm-sys!) has been very noticeable.

You will get people claiming that LLaMa outperforms ChatGPT, etc. We have no sense of how performance degrades over longer sequence lengths... or even what sort of sparse attention technique they are using for longer sequences (most of which have known problems). It's absurd.

link

Nevermark 1128 days ago

Biological evolution doesn’t do any special testing except reward whatever survives. And it works fine. Marketplaces implement the same algorithm faster and effectively.

There are many ways to find truth besides math and science.

Obviously, those two are the gold standard for difficult questions.

But when time is short (competitors at your heels), rewards are fast (lots of hype fueling prospective customers), and the tech isn’t even that hard (deep learning isn’t rocket science, lots of good ideas are panning out), then any organization that needs to acquire its own resources to survive should operate on a try-evaluate-ship loop as fast as they can.

Occasional missteps won’t be nearly as fatal as being slow and irrelevant.

link

hnfong 1129 days ago

No silver platter! You can even apply the same arguments for the Linux kernel. Where's the double blind peer review for linux 6.3.2????

link

idopmstuff 1129 days ago

Yeah, it's a weird comment to call it not "public, peer reviewed" when this article is about how it went public, giving people the opportunity to review it.

link

whimsicalism 1129 days ago

If I started selling a previously unknown cancer treatment over-the-counter in CVS, people would be justified in calling it not peer-reviewed, untested, etc. even if it is available to the public (giving people the opportunity to try it).

link

dandellion 1129 days ago

It could also end up like with the transition to digital cameras and megapixels. With companies adding more and more context just because the consumers minds are already imprinted with the idea that more is better. So in a few years we might have models with a window of 30 megatokens and it'll mean absolutely nothing.

link

aatd86 1129 days ago

What public? I've been waiting for weeks to try...

link