Hacker News new | ask | show | jobs
by xaxis 440 days ago
I've started work on such a "Distributed Client-Side LLM" project but I'm skeptical of the practical use cases. Largely due to the inference that can be accomplished expediently across most client devices. I'm working on a SIMD (WASM compiled), WebGL, and WebGPU based inference engine for a couple of baseline models (llama-7b, etc).

So basically like WebLLM but that supports requests between user nodes. So entirely web based with a p2p orchestration layer serving inference requests across the connected nodes of the network, serving from active nodes that are not being used.

Anyhow, writing a cross-model highly-compatible inference engine that can fallback to minim available compute option (SIMD at the worst) is proving challenging enough, so I have lots of time to over think whether such a system would even prove useful.

A long while back (2011 - I'm old) I wrote a barely functional "Web CDN" that used WebRTC (the spec was very new) that distributed requests for content to a given site across active clients. This distributed LLM project is basically just that except instead of content (text, images, video) we're dealing with inference requests to supported models.