| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shahahmed 728 days ago
	arguably you can reduce even more latency by keeping the model on-device as well, but that would mean revealing the weights of the fine-tuned model. If the user preferred reduced latency and had the RAM, is that an option?

4 comments

daemonologist 728 days ago

This is true, but only if you have a GPU (/accelerator) comparable in performance to the one backing the service, or at least comparable after accounting for the local benefit. This is an expensive proposition because it will be sitting idle between completions and when you're not coding.

link

sqs 728 days ago

Not for this fine-tuned model yet, but Cody supports local models: https://sourcegraph.com/docs/cody/clients/install-vscode#sup....

I just used Cody with Ollama for local inference on a flight where the wifi was broken, and it never fails to blow my mind: https://x.com/sqs/status/1803269013310759236.

link

rdedev 728 days ago

Looking at their GitHub page or seems like they are using existing LLM services. It should be possible to modify cody to make it work with a local llm

link

ado__dev 728 days ago

You don’t have to modify anything. We support local LLMs for chat and completions with Ollama.

https://sourcegraph.com/blog/local-code-completion-with-olla...

link

s1mplicissimus 728 days ago

the model is probably most of the "secret sauce" of cody, so if they gave that away people could just copy it around like mp3s. my guess

link

morgante 728 days ago

Completely incorrect, as Sourcegraph has not historically trained models and Cody swaps between many open source and 3rd party models.

link