|
|
|
|
|
by miven
900 days ago
|
|
What do they consider to be an "LLM of this size"? While this technique of scaling up an existing pre-trained model via fine-tuning is really impressive, it feels a bit unfair to compare what's essentially now an 8.3B model to mostly standard 7B ones, especially considering how important scale is in predicting LLM performance. |
|