|
Well, this is a 270M model which is like 1/3 of 1B parameters. In the grand scheme of things, it's basically a few matrix multiplications, barely anything more than that. I don't think it's meant to have a lot of knowledge, grammar, or even coherence. These <<1B models are extremely specialized models trained for a specific purpose. Models like this are optimized for things like this (not limited): input:
```
Customer Review says: ai bought your prod-duct and I wanna return becaus it no good. Prompt: Create a JSON object that extracts information about this customer review based on the schema given.
``` output:
```
{
"type": "review",
"class": "complaint",
"sentiment": -0.853,
"request": "return"
}
``` So essentially just "making sense of" natural language such that it can be used in programmatic context. (among other applications of course) To get good results, you probably need to fine tune this model to expected data very aggressively. The idea is, if a 270MB model can do with fine tuning, why ship a 32GB generalist model? |
Did you ask Gemma-3-270M whether 27 is closer to a quarter or a third of 100?