Hacker News new | ask | show | jobs
by ZunarJ5 1064 days ago
https://the-decoder.com/gpt-4-architecture-datasets-costs-an...

Not op, but this is where a cheeky google got me.

1 comments

"The idea is nearly 30 years old and has been used for large language models before, such as Google's Switch Transformer."

Innovation! :)