| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jawon 793 days ago
	Can anyone point me to models that look like they might actually be useful in moving towards AGI? I feel like I have a basic understanding of the transformer architecture, and multiplying X tokens in a sliding window across a set of static matrices to produce 1 new token does not look like a path to AGI. Yes, the complex feature extraction is impressive. But are there any models that, I don't know, are more dynamic? Have a working memory? Have a less limited execution path?

6 comments

bbor 793 days ago

The answer is simple: AI systems aren’t just one technique, agent, or even agency — any somewhat anthropomorphic ones will be ensemblematic on an extensive and fundamentally-recursive level. LLMs are a groundbreaking technique that solve the “Frame Problem” by emulating human subconscious generative networks.

To paraphrase an old comment on here: the problem isn’t a chatbot gaining sapience inside a browser window, the problem is when billions of dollars are allocated to a self-administering ensemble of 10,000 GPT agents, each specialized for some task (aka functions). That, plus Wikipedia, Cyc, WolfraAlpha, YouTube, and Google Books at its fingertips.

“General” doesn’t even begin to cover what we’re already capable of, IMO.

See: Marvin Minsky, 1991; https://ojs.aaai.org/aimagazine/index.php/aimagazine/article...

link

jawon 792 days ago

I look at all the advice on prompts and I don't feel like the "Frame Problem" has been solved. It feels like it has shifted into the "Frame Invocation Problem".

And it is this very problem which led me to ask my question about different architectures.

link

bbor 792 days ago

Well said. I’d say the new “problem” is using the word differently though, namely to denote “optimization environment” rather than the original’s sense of “unsolved paradox”

link

samus 793 days ago

Current models can already argued to have something like working memory by storing information in little-used parts of the tokens. If placeholder tokens are handed to them that they can use as working memory, performance improves.

https://openreview.net/forum?id=2dnO3LLiJ1

https://news.ycombinator.com/item?id=40329675

link

itissid 793 days ago

Look at JEPA and Modulo-LLM.

Also AGI is a poor term to use because we as humans have no notion of what general intelligence is, does GI have morals and ethics, does it make decisions like we do based on executive functioning or does it work more like how ants do?

link

idiotsecant 793 days ago

The answer might just be scale for all we know.

link

canjobear 793 days ago

There are sporadic attempts at making things more dynamic, like the Neural Turing Machine. It doesn’t seem to buy much actual power.

link

daavidhauser 793 days ago

xLSTM has a working memory and seems to outperform transformer architectures: https://arxiv.org/abs/2405.04517

link

jawon 792 days ago

Thanks for that. It looks like the kind of thing I'm looking for. I'll give it a read.

link