Hacker News new | ask | show | jobs
by krasin 1166 days ago
Thank you for open sourcing these models!

I mentioned that the sizes of the models are relatively small (13B max). Is it an inherent limitation, or training a bigger model is possible, just has not been done in this exercise?

1 comments

Someone else can answer this better than I, so I'll probably end up deleting this in an hour or two. But I think the purpose of this research was not to create an excellent GPT model. I believe it was to explore the scaling effects on Cerebras hardware and determine a helpful framework for compute-optimal training regimes so that customers who might use Cerebras hardware can be confident that:

1) Standard AI/ML scaling assumptions still apply on this hardware.

2) They have a starting point for hyper-parameter estimation and can get better results sooner.

> But I think the purpose of this research was not to create an excellent GPT model.

Yes, understood. I feel that this phrase is a response to the other commenter that suggested that Cerebras should release a ChatGPT-competitive model. I don't think it's easy and I don't think it's a focus for a hardware maker, such as Cerebras.

> I believe it was to explore the scaling effects on Cerebras hardware and determine a helpful framework for compute-optimal training regimes so that customers who might use Cerebras hardware can be confident that:

> 1) Standard AI/ML scaling assumptions still apply on this hardware.

This is my point. Is it possible to train a 100B model on Cerebras hardware? 500B? In this respect, the quality is secondary to the capability for the purpose of demonstration of capabilities.