Hacker News new | ask | show | jobs
by ilaksh 1231 days ago
My guess is that multimodal transformers will probably eventually get us most of the way there for general purpose AI.

But AGI is one of those very ambiguous terms. For many people it's either an exact digital replica of human behavior that is alive, or something like a God. I think it should also apply to general purpose AI that can do most human tasks in a strictly guided way, although not have other characteristics of humans or animals. For that I think it can be built on advanced multimodal transformer-based architectures.

For the other stuff, it's worth giving a passing glance to the fairly extensive amount of research that has been labeled AGI over the last decade or so. It's not really mainstream except maybe the last couple of years because really forward looking people tend to be marginalized including in academia.

https://agi-conf.org

Looking forward, my expectation is that things like memristors or other compute-in-memory will become very popular within say 2-5 years (obviously total speculation since there are no products yet that I know of) and they will be vastly more efficient and powerful especially for AI. And there will be algorithms for general purpose AI possibly inspired by transformers or AGI research but tailored to the new particular compute-in-memory systems.

2 comments

Why do you think multimodal transformers will get us anywhere near general purpose AI? Multimodal transformers are basically a technology for sequence-to-sequence intelligent mappings and it seems to me extremely unlikely that general intelligence is one or more specific sequence-to-sequence mappings. Many specific purpose problems are sequence-to-sequence but these tend to be specialized functionalities operating in one or more specific domains.
A lot of people don't really get that our brains are a bunch of specialized subcomponents that work in concert (Your pre-frontal cortex just cannot beat your heart, not matter how optimized it gets). This is unsurprising, as our brains are one of the most complex/hard to monitor things on earth.

When an artificial tool that is really a point solution "tricks" us into thinking it has replicated a task that requires complex multi-component functioning within our brain, we assume the tool is acting like our brain is acting.

The joke of course being that if you maliciously edited GPT's index for translating vectors to words, it would produce gibberish and we wouldn't care (despite being the exact same core model).

We are only impressed by the complex sequence to sequence strings it makes because the tokens happen to be words (arguable the most important things in our lives).

EDIT: a great historic metaphor for this is how we thought about 'computer vision' and CNN's. They do great at identifying things in images, but notice that we still use image-based captcha's (Even on OpenAI sites no less!)?

That's because it turns out optical illusions and context-heavy images are things that CNN's really struggle at (since the problem space is bigger than 'how are these pixels arranged')

A couple of things.

1) As I said, many people have different ideas of what we are talking about. I assume that for you general purpose AI has more capabilities, such as the ability to quickly learn tasks to a high level on the fly. For me, it still qualifies as general purpose if it can do most tasks but relies on a lot of pre-training and let's say knowledgebase look up.

2) It seems obvious to me that ChatGPT proves a general purpose utility for these types of LLMs, and it is easy to speculate that something similar but with visual input/output also will be even more general. And so we are just looking at a matter of degree by that definition.

For 1) I agree but ChatGPT is a specific purpose sequence to sequence model. It’s fairly obvious to me it’s not general purpose and it even fails sometimes at correctly reading content it generates. It also doesn’t understand correctness and often ends up generating incorrect content. Our best example of this not being general purpose is how staggeringly bad ChatGPT is at math which is blatantly obvious when you think about how it is designed.
AGI will be AI which can improve it's own code after N iterations where N will be blurry.