Hacker News new | ask | show | jobs
by jawon 746 days ago
Can anyone point me to models that look like they might actually be useful in moving towards AGI? I feel like I have a basic understanding of the transformer architecture, and multiplying X tokens in a sliding window across a set of static matrices to produce 1 new token does not look like a path to AGI.

Yes, the complex feature extraction is impressive. But are there any models that, I don't know, are more dynamic? Have a working memory? Have a less limited execution path?

6 comments

The answer is simple: AI systems aren’t just one technique, agent, or even agency — any somewhat anthropomorphic ones will be ensemblematic on an extensive and fundamentally-recursive level. LLMs are a groundbreaking technique that solve the “Frame Problem” by emulating human subconscious generative networks.

To paraphrase an old comment on here: the problem isn’t a chatbot gaining sapience inside a browser window, the problem is when billions of dollars are allocated to a self-administering ensemble of 10,000 GPT agents, each specialized for some task (aka functions). That, plus Wikipedia, Cyc, WolfraAlpha, YouTube, and Google Books at its fingertips.

“General” doesn’t even begin to cover what we’re already capable of, IMO.

See: Marvin Minsky, 1991; https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...

I look at all the advice on prompts and I don't feel like the "Frame Problem" has been solved. It feels like it has shifted into the "Frame Invocation Problem".

And it is this very problem which led me to ask my question about different architectures.

Well said. I’d say the new “problem” is using the word differently though, namely to denote “optimization environment” rather than the original’s sense of “unsolved paradox”
Current models can already argued to have something like working memory by storing information in little-used parts of the tokens. If placeholder tokens are handed to them that they can use as working memory, performance improves.

https://openreview.net/forum?id=2dnO3LLiJ1

https://news.ycombinator.com/item?id=40329675

Look at JEPA and Modulo-LLM.

Also AGI is a poor term to use because we as humans have no notion of what general intelligence is, does GI have morals and ethics, does it make decisions like we do based on executive functioning or does it work more like how ants do?

The answer might just be scale for all we know.
There are sporadic attempts at making things more dynamic, like the Neural Turing Machine. It doesn’t seem to buy much actual power.
xLSTM has a working memory and seems to outperform transformer architectures: https://arxiv.org/abs/2405.04517
Thanks for that. It looks like the kind of thing I'm looking for. I'll give it a read.