| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by itake 745 days ago

I’m a bit confused. Your reasoning doesn’t align with the data you shared.

The startup costs for just messing around at home are huge: purchasing a server and gpus, paying for electricity, time spent configuring the api.

If you want to just mess around, $100 to call the world’s best api is much cheaper than spending $2-7k Mac Studio.

Even at production level traffic, the ROI on uptime, devops, utilities, etc would take years to recapture the upfront and on-going costs of self-hosting.

Self hosting will have higher latency and lower throughput.

4 comments

zeta0134 745 days ago

You are vastly overestimating the startup cost. For me this week it was literally these commands:

pacman -S ollama

ollama serve

ollama run llama3

My basic laptop with about 16 GB of RAM can run the model just fine. It's not fast, but it's reasonably usable for messing around with the tech. That's the "startup" cost. Everything else is a matter of pushing scale and performance, and yes that can be expensive, but a novice who doesn't know what they need yet doesn't have to spend tons of money to find out. Almost any PC with a reasonable amount of RAM gets the job done.

monkmartinez 745 days ago

llama3 at 8billion params is weak sauce for anything serious, it just isn't in the same galaxy as Sonnet 3.5 or GPT-4o. The smaller and faster models like Phi are even worse. Once you progress past asking trivial questions to a point where you need to trust the output a bit more, its not worth effort in time, money and/or sweat effort to run a local model to do it.

A novice isn't going to know what they need because they don't know what they don't know. Try asking a question to LLaMA 3 at 8 billion and the same question to LLaMA 3 at 70 billion. There is a night and day difference. Sonnet, Opus and GPT-4o run circles around LLaMA 3 70b. To run LLaMA at 70 billion you need serious horse power as well, likely thousands of dollars in hardware investment. I say it again... the calculus in time, money, and effort isn't favorable to running open models on your own hardware once you pass the novice stage.

I am not ungrateful that the LLaMA's are available for many different reasons, but there is no comparison between quality of output, time, money and effort. The API's are a bargain when you really break down what it takes to run a serious model.

jononor 744 days ago

Using an LLM as a general purpose knowledge base is only one particular application of an LLM. And on which is probably best served by ChatGPT etc.

A lot of other things are possible with LLMs using the context window and completion, thanks to their "zero shot" learning capabilities. Which is also what RAG builds upon.

Aurornis 745 days ago

I’m familiar with local models. They’re fine for chatting on unimportant things.

They do not compare to the giant models like Claude Sonnet and GPT4 when it comes to trying to use them for complex things.

I continue to use both local models and the commercial cloud offerings, but I think anyone who suggests that the small local models are on par with the big closed hosted models right now is wishful thinking.

sudohackthenews 745 days ago

People have gotten manageable results on all sorts of hardware. People have even squeezed a few tokens/second out of Raspberry PIs. The small models are pretty performant- they get good results on consumer gaming hardware. My 2021 laptop with a 3070m (only 8gb vram) runs 8b models faster than I can read, and even the original M1 chips can run the models fine.

monkmartinez 745 days ago

You are right of course.... IF your metric for manageable/useable is measured only tokens per second (tok/s).

If your metric is quality of output, time, money and tok/s, there is no comparison; Local models just aren't there yet.

LorenDB 745 days ago

And why would you buy a Mac Studio? You could build a reasonable GPU-accelerated Linux box for well under $1500. For example: https://pcpartpicker.com/guide/BCWG3C/excellent-amd-gamingst...

J_Shelby_J 745 days ago

Devs that refuse to move off Apple are severely disadvantaged in the LLM era.

jondwillis 745 days ago

lol tell that to the 3 year old laptop with 64 GB of RAM that I use exclusively for local LLMs while dev’ing on my work laptop with 96 GB of RAM…

wokwokwok 745 days ago

> The startup costs for just messing around at home are huge

No, they are zero.

Most people have extra hardware lying around at home they're not using. It costs nothing but time to install python.

$100 is not free.

If you can't be bothered, sure thing, slap down that credit card and spend your $100.

...but, maybe not so for some people?

Consider students with no credit card, etc; there are a lot of people with a lot of free time and not a lot of money. Even if you don't want to use it do you do seriously think this project is totally valueless for everyone?

Maybe, it's not for you. Not everything has to be for everyone.

You are, maybe, just not the target audience here?

Aurornis 745 days ago

> You are, maybe, just not the target audience here?

The difference between an open model running on a $100 computer and the output from GPT4 or Claude Sonnet is huge.

I use local and cloud models. The difference in productivity and accuracy between what I can run locally and what I can get for under $100 of API calls per month is huge once you get past basic playing around with chat. It’s not even close right now.

So I think actually you are not the target audience for what the parent comments are taking about. If you don’t need cutting edge performance then it’s fun to play with local, open, small models. If the goal is to actually use LLMs for productivity in one way or another, spending money on the cloud providers is a far better investment.

Exceptions of course for anything that is privacy-sensitive, but you’re still sacrificing quality by using local models. It’s not really up for debate that the large hosted models are better than what you’d get from running a 7B open model locally.

lynx23 745 days ago

And its not entitled to cliam that "Most people have extra hardware lying around at home". Your story doesn't sound plausible at all.

bryanrasmussen 745 days ago

Most people who would want to be running machine learning models probably have some hardware at home that can handle a slow task for playing around and determining if it is worthwhile to pay out for something more performant.

This is undoubtedly entitled, but thinking to yourself huh, I think it's time to try out some of this machine learning stuff is a pretty inherently entitled thing to do.

wokwokwok 745 days ago

This project is literally aiming to run on devices like old phones.

I don't think having an old phone is particularly entitled.

I think casually slapping down $100 on whim to play with an API... probably, yeah.

/shrug

itake 745 days ago

According to this tweet, Llama 3 costs about $0.20 per Million tokens using an M2.

https://x.com/awnihannun/status/1786069640948719956

In comparison, GPT3.5-turbo costs $0.50 per million tokens.

Do you think an old iPhone will less than 2x efficient?

nightski 745 days ago

FWIW depends on cost of power. Where I live cost of power is less than half the stated average.