Hacker News new | ask | show | jobs
by wongarsu 1140 days ago
Sure, a 13B model can be fine-tuned to be pretty decent, which is quite remarkable compared to GPT3's 175B paramters. But a 3B model has 1/4th as many parameters as Vicune-13B, or about twice as many as GPT2. Can you really fine-tune that to do anything useful that wouldn't be better handled by a more specialized open-source model?