Hacker News new | ask | show | jobs
by Retric 1134 days ago
LLM’s do use human ”insight” into language with how they require tokenized inputs and outputs.

It’s one of those insights that seems obvious after the fact but really wasn’t.

1 comments

That could count I suppose but I don't think that's really the kind of insight Sutton is alluding to in his original writing. Insight in this case would be more like shoehorning one of the processes humans would use to solve the problem. There are no innate grammar rules the architecture looks to before each attempt, no tree or word search. Things like that.

Polishing the input in that way is neat but it's not like you can't go character or word level for a transformer. The current way is just far more compute efficient but the Transformer will figure out the seq to seq all the same.

It doesn’t just polish the input. Tokenizing the output also significantly reduces the risk of gibberish especially if you do a grammar pass to ensure tense matches etc. It means a model with a much worse understanding of the language can preform better than something operating on raw characters.
Fair, I didn't mean to dismiss the impact of tokenization as such.

But tokenization is still a process that's figured by another DL model. Human "insight" doesn't produce tokenization as it does. Another model trained on [insert language(s)] text figures out how best to break sentences into token parts.

That said, these things are a spectrum. I don't think, "no tips from biology whatsoever" or "no constraints at all" is really what Sutton had in mind. The less of it the better is the general idea.

Good point. I find it really reminiscent of how Alpha Zero ignored essentially all human knowledge about chess play, but still depended on insights into chess AI / search algorithms.

I think of deep neural networks as replicating long term memory/reflex rather than thought. I don’t know if that’s quite it, but they excel at a lot of very difficult AI problems when paired with just a tiny bit of handholding. Some of that might go away with even more compute, but I think approaching AGI is going to take more than just even more compute.