| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by whimsicalism 1132 days ago

> millions of people trying it and determining it’s utility vs the competition is as good of a test as any.

Disagree. We aren't polling these people. How do I even get a distilled view of what their thoughts are?

It's a far cry from the level of evaluation that existed before. The lack of benchmarks (until the last week or so - thank you huggingface and lm-sys!) has been very noticeable.

You will get people claiming that LLaMa outperforms ChatGPT, etc. We have no sense of how performance degrades over longer sequence lengths... or even what sort of sparse attention technique they are using for longer sequences (most of which have known problems). It's absurd.

2 comments

Nevermark 1131 days ago

Biological evolution doesn’t do any special testing except reward whatever survives. And it works fine. Marketplaces implement the same algorithm faster and effectively.

There are many ways to find truth besides math and science.

Obviously, those two are the gold standard for difficult questions.

But when time is short (competitors at your heels), rewards are fast (lots of hype fueling prospective customers), and the tech isn’t even that hard (deep learning isn’t rocket science, lots of good ideas are panning out), then any organization that needs to acquire its own resources to survive should operate on a try-evaluate-ship loop as fast as they can.

Occasional missteps won’t be nearly as fatal as being slow and irrelevant.

hnfong 1132 days ago

No silver platter! You can even apply the same arguments for the Linux kernel. Where's the double blind peer review for linux 6.3.2????