| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by viraptor 318 days ago
	Do we really have the data on this? I mean, it does happen on a smaller scale, but where's the 300B version of RWKV? Where's hybrid symbolic/LLM? Where are other experiments? I only see larger companies doing relatively small tweaks to the standard transformers, where the context size still explodes the memory use - they're not even addressing that part.

1 comments

hodgehog11 317 days ago

True, we can't say for certain. But there is a lot of theoretical evidence too, as the leading theoretical models for neural scaling laws suggest finer properties of the architecture class play a very limited role in the exponent.

We know that transformers have the smallest constant in the neural scaling laws, so it seems irresponsible to scale another architecture class to extreme parameter sizes without a very good reason.

link