Y
Hacker News
new
|
ask
|
show
|
jobs
by
luke-stanley
336 days ago
When I read "from scratch", I assume they are doing pre-training, not just finetuning, do you have a different take? Do you mean it's normal Llama architecture they're using? I'm curious about the benchmarks!