| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by benreesman 851 days ago

The current HYPERN // MODERN // AI builds are using flox 1.0.2 to install llama.cpp. The default local model is dolphin-8x7b at ‘Q4_KM’ quantization (it lacks defaults for Linux/NXIDIA, that’s coming soon and it works, you just have to configure it manually and Mac gets more love because that’s what myself and the other main contributors have).

flox will also install properly accelerated torch/transformers/sentence-transfomers/diffusers/etc: they were kind enough to give me a preview of their soon-to-be-released SDXL environment suite (please don’t hold them to my “soon”, I just know it looks close to me). So you can do all the modern image stuff pretty much up to whatever is on HuggingFace.

I don’t have the time I need to be emphasizing this, but the last piece before I’m going to open source this is I’ve got a halfway decent sketch of a binary replacement/conplement for the OpenAI-compatible JSON/HTTP one everyone is using now.

I have incomplete bindings to whisper.cpp and llama.cpp for those modalities, and when it’s good enough I hope the bud.build people will accept it as a donation to the community managed ConnectRPC project suite.

We’re really close to a plausible shot at open standards on this before NVIDIA or someone totally locks down the protocol via the RT stuff.

edit: I almost forgot to mention. We have decent support for multi-vendor, mostly in practice courtesy of the excellent ‘gptel’, though both nvim and VSCode are planned for out-of-the-box support too.

The gap is opening up a bit again between the best closed and best open models.

This is speculation but I strongly believe the current opus API-accessible build is more than a point release, it’s a fundamental capability increase (though it has a weird BPE truncation issue that could just be a beta bug, but it could hint at something deeper.

It can produce verbatim artifacts from 100s of thousands of tokens ago and restart from any branch in the context, takes dramatically longer when it needs to go deep, and claims it’s accessing a sophisticated memory hierarchy. Personally I’ve never been slackjawed with amazement on anything in AI except my first night with SD and this thing.