| LLMs are comprised of just three elements Data Compute Algorithms All three are just scratching the surface of what is possible. Data: What has been scraped off the internet is just <0.001% of human knowledge as most platforms cannot be scraped so easily, are in formats that are not in text like video, audio, or just plain old pieces of paper undigitized. Finally there are probably techniques to increase data through synthetic means, which is purportedly OpenAI's secret sauce to GPT-4's quality. Compute: While 3nm processes are approaching an atomic limit (0.21nm for Si), there is still room to explore more densely packed transistors or other materials like Gallium Nitride or optical computing. Not only that but there is a lot of room in hardware architecture to allow more parallelism and 3-D stacked transistors. Algorithms: The transformer and other attention mechanisms have several sub-optimal components to them like how arbitrary the Transformer is in terms of design decisions, and quadratic time complexity for attention. There also seems to be a large space of LLM augmentations like RLHF for instruction following and improvements in factuality and other mechanisms. And these ideas are just from my own limited experience. So I think its fair to say that LLMs have plenty of room to improve. |
> Data
> Compute
> Algorithms
Not to be facetious but so is all other software. LLMs appear to scale in correlation to the first two but it's not clear what the correlation is and that's the basis of the question being asked.