| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by riku_iki 1196 days ago
	probably not if gap is so large: 7B vs 175B. Also, all those benchmark are trash because they can't track data leaks in training data. For example they trained llama on github, where GSM8k eval data is located, of course model will perform well on GSM8K, because it memorized answers.

2 comments

ShamelessC 1196 days ago

You don’t have to guess, the information is in the provided synopsis.

I’m sure there are issues similar to your description. Nevertheless, you seem to be a staunch defender of GPT-3, which to me indicates some kind of bias? Like, who cares if LLaMA is better - in fact, isn’t that a good indicator of progress?

link

riku_iki 1195 days ago

> You don’t have to guess, the information is in the provided synopsis.

yes, I checked benchmarks in paper, and there are many where gpt won over 7b llama. Also, it is not clean experiment, because models were trained on different datasets.

> I’m sure there are issues similar to your description. Nevertheless, you seem to be a staunch defender of GPT-3, which to me indicates some kind of bias? Like, who cares if LLaMA is better - in fact, isn’t that a good indicator of progress?

personality rants have been ignored.

link

ShamelessC 1195 days ago

:shrugging:

Okay, then.

link

int_19h 1196 days ago

You can run llama-30b right now on high-end consumer hardware (RTX 3090+) using int4 quantization. With two GPUs, llama-65b is within reach. And even 30b is surprisingly good, although it's clearly not as well trained as ChatGPT specifically for dialog-like task setting.

link