Hacker News new | ask | show | jobs
by filleokus 883 days ago
An off topic question: Is there such a thing as a "small-ish language model". A model that you could simple give instructions / "capabilities" which a user can interact with. Almost like Siri-level of intelligence.

Imagine you have an API-endpoint where you can set the level of some lights and you give the chat a system prompt explaining how to build the JSON body of the request, and the user can prompt it with stuff like "Turn off all the lights" or "Make it bright in the bedroom" etc.

How low could the memory consumption of such a model be? We don't need to store who the first kaiser of Germany was, "just" enough to kinda map human speech onto available API's.

4 comments

There are "smaller" models, for example tinyllama 1.1B (tiny seems like an exaggeration). PHI2 is 2.7B parameters. I can't name a 500M parameter model but there is probably one.

The problem is they are all still broadly trained and so they end up being Jack of all trades master of none. You'd have to fine tune them if you want them good at some narrow task and other than code completion I don't know that anyone has done that.

If you want to generate json or other structured output, there is Outlines https://github.com/outlines-dev/outlines that constrains the output to match a regex so it guarantees e.g. the model will generate a valid API call, although it could still be nonsense if the model doesn't understand, it will just match the regex. There are other similar tools around. I believe llama.cpp also has something built in that will constrain the output to some grammar.

https://pypi.org/project/languagemodels/ can load some small models but forming JSON-reliably seems to require a larger-ish model (or fine tuning)

Aside: I expect Apple will do exactly what you're proposing and that's why they're exposing more APIs for system apps

Not really. You can use small models for task like text classification etc (traditional nlp) and those run in pretty much anything. We're talking about BERT-like models like distillbert for example.

Now, models that have "reasoning" as an emergent property... I haven't seen anthing under 3B that's capable of making anything useful. The smaller I've seen is litellama and while it's not 100% useless, it's really just an experiment.

Also, everything requires new and/or expensive hardware. For GPU you really are about 1k€ at minumum for something decent for running models. CPU inference is way slower and forget about anythin that has no AVX and preferably AVX2.

I try models on my old thinkpad x260 with 8Gb ram, which is perfectly capable for developing stuff and those small task oriented I've told you about, but even though I've tried everything under the sun, with quantization etc, it's safe to say you can only run decent LLMs with a decent inference speed with expensive hardware now.

Now, if you want task like, language detection, classifying text into categories, etc, very basic Question Answering, then go on HugginFace and try youself, you'll be capable of running most models on modest hardware.

In fact, I have a website (https://github.com/iagovar/cometocoruna/tree/main) where I'm using a small flask server in my data pipeline to extract event information from text blobs I get scraping sites. That runs every day in an old Atom + 4Gb RAM laptop that I use as sever.

Experts in the field say that might change (somewhat) with mamba models, but I can't really say more.

I've been playing with the idea of dumping some money. But I'm 36, unemployed and just got into coding about 1.5 years ago, so until I secure some income I don't want to hit my saving hard, this is not the US where I can land a job easy (Junior looking for job, just in case someone here needs one).

Speaking of, I imagine Alexa, Siri, etc, should now be replaced by LLMs? Or where they already implemented using LLMs?
Exactly this, I have not yet to run into a "small" model that is good enough (gpt-3) quality