|
|
|
|
|
by runnerup
1166 days ago
|
|
Someone else can answer this better than I, so I'll probably end up deleting this in an hour or two. But I think the purpose of this research was not to create an excellent GPT model. I believe it was to explore the scaling effects on Cerebras hardware and determine a helpful framework for compute-optimal training regimes so that customers who might use Cerebras hardware can be confident that: 1) Standard AI/ML scaling assumptions still apply on this hardware. 2) They have a starting point for hyper-parameter estimation and can get better results sooner. |
|
Yes, understood. I feel that this phrase is a response to the other commenter that suggested that Cerebras should release a ChatGPT-competitive model. I don't think it's easy and I don't think it's a focus for a hardware maker, such as Cerebras.
> I believe it was to explore the scaling effects on Cerebras hardware and determine a helpful framework for compute-optimal training regimes so that customers who might use Cerebras hardware can be confident that:
> 1) Standard AI/ML scaling assumptions still apply on this hardware.
This is my point. Is it possible to train a 100B model on Cerebras hardware? 500B? In this respect, the quality is secondary to the capability for the purpose of demonstration of capabilities.