Hacker News new | ask | show | jobs
by turmeric_root 1199 days ago
So since making that comment I managed to get 65B running on 1 x A100 80GB using 8-bit quantization. Though I did need ~130GB of regular RAM on top of it.
1 comments

So is the model any good?
It seems to be about as good as gpt3-davinci. I've had it generate React components and write crappy poetry about arbitrary topics. Though as expected, it's not very good at instructional prompts since it's not tuned for instruction.

People are also working on adding extra samplers to FB's inference code, I think a repetition penalty sampler will significantly improve quality.

The 7B model is also fun to play with, I've had it generate Youtube transcriptions for fictional videos and it's generally on-topic.