| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zavertnik 936 days ago
	> Building on top of any of these platforms provided by trillion dollar companies is a sucker's game. Until local models reach the fidelity and speed that these megacorps offer, what choice does anyone actually have with respect to AI? I was under the impression that even if you get over the initial cost of hardware to achieve speed, the fidelity of your outputs would still be of a lower overall quality relative to GPT/Claude/Bard(maybe?). I could be 100% wrong though.

1 comments

idonotknowwhy 936 days ago

The gap is closing. I'm finding goliath-120b does better than chat gpt 3.5

Nothing comes close to gpt4 though

link

zavertnik 936 days ago

For me, the gap between 3.5 and 4 is massive. If I'm stuck between using 3.5 and doing the work myself, more often than not, I'm choosing to do it myself. Not to imply 3.5 is unusable, its just my bar for minimum fidelity is closer to 4 than 3.5 with respect to tasks that I'm comfortable offloading onto an AI.

What are you running goliath-120b on? Is it costly to run all day every day? How long does it take to complete an output? I've thought about building a multi GPU node for local LLMs but I always decide against it on the premise that the tech is so new I figure in the next 3-4 years we'll see specialized hardware combined with efficiency improvements that would make my node obsolete.

link

idonotknowwhy 933 days ago

I run it on 2xRTX3090. I bought them used (probably ex-miners).

> I always decide against it on the premise that the tech is so new I figure in the next 3-4 years we'll see specialized hardware combined with efficiency improvements that would make my node obsolete.

You're probably right, this happened back in the day with bitcoin mining.

link

kristianp 936 days ago

How does Goliath-120b improve on llama2-70b by just combining two of them?

https://huggingface.co/alpindale/goliath-120b?text=Hi.

> An auto-regressive causal LM created by combining 2x finetuned Llama-2 70B into one.

link

idonotknowwhy 933 days ago

I.. don't know. Even the creator of the model doesn't know why it worked out so well.

It really is better (at reasoning) than the 70b models when I use it. Though some people reported that it makes spelling mistakes.

P.S. This doesn't always work out well, people have tried swapping different layers randomly and it makes the models incoherent.

link