| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lbhdc 311 days ago

I almost exclusively self host the models I use.

Currently I am using llama.cpp for an interactive repl chat. I was previously using Alpaca (a GTK GUI), but was annoyed with how slow it was and some random crashes. I am transitioning some of this to self hosted in the cloud for things that can't run on my laptop.

I am looking to get away from my current interface, and write my own. Mostly for experience of deeply integrating agents into a program. If anyone knows a good library for interacting with a local model that doesn't involve standing up a webserver I am interested :)

My daily driver is gemma3n. Its been a nice balance between speed and performance without spinning up my laptop fans.

I am super interested in local models, partially because there is no friction from managed services, but also because I think as small models become more viable we will see an explosion of apps incorporating them.

1 comments

briansun 310 days ago

Gemma3n as a daily driver sounds nice—4b or 8b? and rough tokens/sec on your laptop? And have you A/B‑tested code generation quality across local models (e.g., Gemma3n vs others)?

link

lbhdc 310 days ago

I am using the smaller one, specifically the e2b-it flavor.

I get ~20-30 tok/sec. It's fast enough that its not frustrating, but if it were faster you could more easily skim as it generates.

I haven't done any serious testing. My process is typically learning about new models on HN or elsewhere, and trying to give them a real shake. I have some goto code generation prompts that I try on all of them. None succeed but they are getting close. I also do a lot of just feeling it out. The more I can use solutions unedited the better it feels.

link