Hacker News new | ask | show | jobs
by occamrazor 820 days ago
Note that the model is based on RoBERTa and has only 125m parameter. It is not competing against any of the new popular models, not even small ones like Phi or GeMMa.
2 comments

It’s also not meant to be a generative model - only to be used as an encoder model (they list retrieval as a potential use case )
Given the current state of LLMs, I am not even sure this qualify to be called an LLM.
second opinion - BERT family are transformer-based, and that is a big threshold right there.. secondly I am not sure that two one-minute comments could capture what exactly went on with fine tuning or graph-based methods of constraint or whatnot.. with respect to the fitness of the production tools for intended purposes.