Hacker News new | ask | show | jobs
by gac3 207 days ago
Was this trained on the same data as Dia 1?
1 comments

Would be interesting to know what improvements come from arch, data, and different tokenizer.