Hacker News new | ask | show | jobs
by cs-fan-101 1176 days ago
Someone posted this repost from the Cerebras Discord earlier, but sharing for visibility -

"We chose to train these models to 20 tokens per param to fit a scaling law to the Pile data set. These models are optimal for a fixed compute budget, not necessarily "best for use". If you had a fixed parameter budget (e.g., because you wanted to fit models on certain hardware) you would train on more tokens. We do that for our customers that seek that performance and want to get LLaMA-like quality with a commercial license"

2 comments

Sounds like we should crowd-fund the cost to train and open source one of these models with LLaMa-like quality.

I'd chip in!

TBH that seems like a good job for Cerebras.

There are plenty of such efforts, but the organizer needs some kind of significance to attract a critical mass, and a AI ASIC chip designer seems like a good candidate.

Then again, maybe they prefer a bunch of privately trained models over an open one since that sells more ASIC time?

> Cerebras Discord

This is really weird to hear out loud.

I still think of Discord as a niche gaming chatroom, even though I know that (for instance) a wafer scale IC design company is hosting a Discord now.