Hacker News new | ask | show | jobs
by sschueller 21 hours ago
Amazon wasn't competing against open and free models that are starting to be good enough running on existing laptops.

OpenAI and Anthropic's moat is filling with cement faster than they can dig.

2 comments

OpenAI and Anthropic aren't competing against them either.

If you could go out on the street of anytown and find one person using an open model, I'd eat my GPU.

I have 128 GB of unified memory (M4 Max) and the user experience with local inference is still pretty bad. I'm so glad something like llama.cpp exists so I don't have to wrangle Python (which I hate), but OpenCode is entirely disrespectful of the KV-cache so I had to switch to Pi (but Pi is going relatively well actually).

Even so, I can't really run at hundreds of tokens per second which is practically table stakes for my work. Even if I did manage to run that fast, the model would probably be completely braindead and stomp all over the task.

Wish I could afford an M5 Max but I've been between jobs for months without even a single interview. Sucks to be a developer these days.

Try Kilocode with deepseek v4 (via API directly to deepseek, much cheaper than via kilo).

I have had very good results and compared to others it just costs pennies.

I use something similar to this https://github.com/ScotterMonk/AgentAutoFlow setup and switch between deepseek v4 to flash depending on task.

Deepseek Flash v4 actually runs on 128Gb systems (about 14 tok/sec). Antirez created a fabulous 2 bit quant and a highly tuned LLM server

https://github.com/antirez/ds4

I do use DeepSeek, it's exceptionally cheap! Inference is slow though, and it's not particularly intelligent but the experience is better than local inference.