Hacker News new | ask | show | jobs
by throwthrowuknow 643 days ago
Not going to work for training from scratch which is what the author is doing.
1 comments

192GByte of RAM are not enough to train 405B models. Reflection 70B requires 140GByte of RAM in fp16, 405 would need ~810Gbyte of RAM.
Pretty sure he said he’s inferencing llama3 405 and training his own custom model from scratch. He didn’t say how big his custom model will be.