Hacker News new | ask | show | jobs
by andy99 1035 days ago
I have a developing idea that AI can be thought of as "banking" (or sometimes laundering) human labelling. Neural networks don't work at all on things they haven't seen before (out-of-distribution) but can nicely interpolate within what they have seen. Concretely, when chatGPT gives a seemingly clever answer, a data labeller overseas somewhere has already manually given a similar answer to a similar question (similar in the eyes of the model). Depending on the expected distribution of your data, this can work really great for automation. With long tailed data, it either becomes a kind of whack-a-mole or (as with modern LLMs) you just go really big. I haven't seen anything suggesting we're actually able to escape the long tail though, and that in the margins AI NNs will always fail and the only way to improve them is to launder more manpower.

There was an old (2020) a16z article that's still relevant "taming the tail": https://a16z.com/2020/08/12/taming-the-tail-adventures-in-im...

2 comments

The thing I find most interesting about LLMs is the emergent skills that happen almost instantly(based on model size) that weren't inherently trained for.

Picking movie names from emojis seems like something other than interpolation.

https://www.assemblyai.com/blog/emergent-abilities-of-large-....

Not really IMO. At scale the model learns that the rainbow emoji is literally a synonym for “rainbow”. Computers are great at memorizing giant lists of facts such as this. It feels way more amazing for an LLM to guess the Carebear Movie from the rainbow emoji and bear emoji but it’s literally the same as asking the LLM to guess a movie that involves rainbows and bears with words.

It’s just an embedding space buried within the bowls of the LLM.

It’s inherent to their architecture. Feedforward deep neural networks are effectively massive piecewise linear functions. Unless I’m missing something, it’s literally impossible for any NN, no matter how large or how much data they’re trained on to give accurate out-of-training-data-bounds predictions for even a ridiculously simple function like x^2.