| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eternalban 1184 days ago

You are working under an assumption that this tech is an O(n) or better computational regime.

Ask ChatGPT: “Assume the perspective of an expert in CS and Deep Learning. What are the scaling characteristic (use LLMs and Transformer models if you need to be specific) of deep learning ? Expect answer in terms of Big O notation. Tabulate results in two rows, respectively “training” and “inference”. For columns, provide scaling characteristic for CPU, IO, Network, Disk Space, and time. ”

This should get you big Os for n being the size of input (i.e. context size). You can then ask for follow up with n being the model size.

Spoiler, the best scaling number in that entire estimate set is quadratic. Be “scared” when a breakthrough in model architecture and pipeline gets near linear.