Hacker News new | ask | show | jobs
by miven 900 days ago
What do they consider to be an "LLM of this size"?

While this technique of scaling up an existing pre-trained model via fine-tuning is really impressive, it feels a bit unfair to compare what's essentially now an 8.3B model to mostly standard 7B ones, especially considering how important scale is in predicting LLM performance.