Hacker News new | ask | show | jobs
by lhl 1149 days ago
While I agree we probably aren't getting exponentially increasing parameter counts (GPT4 is by all accounts 1T paramaters and of course, it is significantly better than GPT3) we are still seeing lots of improvements - 3.5 is much better than 3, based "just" on InstructGPT/RLHF training. Models are getting better as well - LLaMA 30B beats/matches GPT-3 on raw eval benchmarks at 1/6 the parameter count.

We're also seeing lots of optimizations with new models (RoPE/RoPER embedding, Swish/GeLU activation, Flash Attention, etc) but I think some the most interesting gains we'll be seeing soon is with inference-optimized training (-70% parameters for +100% compute) [1] combined with sparsity pruning (-50% size w/ almost no loss in accuracy) [2] and quantization [3] which will lead to significantly smaller models performing well.

[1] https://www.harmdevries.com/post/model-size-vs-compute-overh...

[2] https://arxiv.org/abs/2301.00774

[3] https://openreview.net/forum?id=tcbBPnfwxS