Hacker News new | ask | show | jobs
by andy99 878 days ago
There are "smaller" models, for example tinyllama 1.1B (tiny seems like an exaggeration). PHI2 is 2.7B parameters. I can't name a 500M parameter model but there is probably one.

The problem is they are all still broadly trained and so they end up being Jack of all trades master of none. You'd have to fine tune them if you want them good at some narrow task and other than code completion I don't know that anyone has done that.

If you want to generate json or other structured output, there is Outlines https://github.com/outlines-dev/outlines that constrains the output to match a regex so it guarantees e.g. the model will generate a valid API call, although it could still be nonsense if the model doesn't understand, it will just match the regex. There are other similar tools around. I believe llama.cpp also has something built in that will constrain the output to some grammar.

1 comments

https://pypi.org/project/languagemodels/ can load some small models but forming JSON-reliably seems to require a larger-ish model (or fine tuning)

Aside: I expect Apple will do exactly what you're proposing and that's why they're exposing more APIs for system apps