Hacker News new | ask | show | jobs
by Kronopath 728 days ago
Anything that allows AI to scale to superinteligence quicker is going to run into AI alignment issues, since we don’t really know a foolproof way of controlling AI. With the AI of today, this isn’t too bad (the worst you get is stuff like AI confidently making up fake facts), but with a superintelligence this could be disastrous.

It’s very irresponsible for this article to advocate and provide a pathway to immediate superintelligence (regardless of whether or not it actually works) without even discussing the question of how you figure out what you’re searching for, and how you’ll prevent that superintelligence from being evil.

3 comments

I don't think your response is appropriate. Narrow domain "superintelligence" is around us everywhere-- every PID controller can drive a process to its target far beyond any human capability.

The obvious way to incorporate good search is to have extremely fast models that are being used in the search interior loop. Such models would be inherently less general, and likely trained on the specific problem or at least domain-- just for performance sake. The lesson in this article was that a tiny superspecialized model inside a powerful transitional search framework significantly outperformed a much larger more general model.

Use of explicit external search should make the optimization system's behavior and objective more transparent and tractable than just sampling the output of an auto-regressive model alone. If nothing else you can at least look at the branches it did and didn't explore. It's also a design that's more easy to bolt in varrious kinds of regularizes, code to steer it away from parts of the search space you don't want it operating in.

The irony of all the AI scaremongering is that if there is ever some evil AI with some LLM as an important part of its reasoning process if it is evil it may well be so because being evil is a big part of the narrative it was trained on. :D

Of course "superintelligence" is just a mythical creature at the moment, with no known path to get there, or even a specific proof of what it even means - usually it's some hand waving about capabilities that sound magical, when IQ might very well be subject to diminishing returns.
Do you mean no way to get there within realistic computation bounds? Because if we allow for arbitrarily high (but still finite) amounts of compute, then some computable approximation of AIXI should work fine.
>Do you mean no way to get there within realistic computation bounds?

I mean there's no well defined "there" either.

It's a hand-waved notion that adding more intelligence (itself not very well defined, but let's use IQ) you get to something called "hyperintelligence", say IQ 1000 or IQ 10000, that has what can be described as magical powers, like it can convince any person to do anything, can invent things at will, huge business success, market prediction, and so on.

Whether intelligence is cummulative like that, or whether having it gets you those powers (aside from the succesful high IQ people, we know many people with IQ 145+ that are not inventing stuff left and right, or convincing people with some greater charisma than the average IQ 100 or 120 politician, but e.g. are just sad MENSA losers, whose greatest achievement is their test scores).

>Because if we allow for arbitrarily high (but still finite) amounts of compute, then some computable approximation of AIXI should work fine.

I doubt that too. The limit for LLMs for example is more human produced training data (a hard limit) than compute.

> itself not very well defined, but let's use IQ

IQ has an issue that is inessential to the task at hand, which is how it is based on a population distribution. It doesn’t make sense for large values (unless there is a really large population satisfying properties that aren’t satisfied).

> I doubt that too. The limit for LLMs for example is more human produced training data (a hard limit) than compute.

Are you familiar with what AIXI is?

When I said “arbitrarily large”, it wasn’t for laziness reasons that I didn’t give an amount that is plausibly achievable. AIXI is kind of goofy. The full version of AIXI is uncomputable (it uses a halting oracle), which is why I referred to the computable approximations to it.

AIXI doesn’t exactly need you to give it a training set, just put it in an environment where you give it a way to select actions, and give it a sensory input signal, and a reward signal.

Then, assuming that the environment it is in is computable (which, recall, AIXI itself is not), its long-run behavior will maximize the expected (time discounted) future reward signal.

There’s a sense in which it is asymptotically optimal across computable environments (... though some have argued that this sense relies on a distribution over environments based on the enumeration of computable functions, and that this might make this property kinda trivial. Still, I’m fairly confident that it would be quite effective. I think this triviality issue is mostly a difficulty of having the right definition.)

(Though, if it was possible to implement practically, you would want to make darn sure that the most effective way for it to make its reward signal high would be for it to do good things and not either bad things or to crack open whatever system is setting the reward signal in order for it to set it itself.)

(How it works: AIXI basically enumerates through all possible computable environments, assigning initial probability to each according to the length of the program, and updating the probabilities based on the probability of that environment providing it with the sequence of perceptions and reward signals it has received so far when the agent takes the sequence of actions it has taken so far. It evaluates the expected values of discounted future reward of different combinations of future actions based on its current assigned probability of each of the environments under consideration, and selects its next action to maximize this. I think the maximum length of programs that it considers as possible environments increases over time or something, so that it doesn’t have to consider infinitely many at any particular step.)

>AIXI doesn’t exactly need you to give it a training set, just put it in an environment where you give it a way to select actions, and give it a sensory input signal, and a reward signal.

That's still a training set, just by another name.

And with the environment being the world we live in, it would be constrained by the local environment's possible states, the actions it can perform to get feedback on, and the rate of environment's response (the rate of feedback).

Add the quick state-space inflation in what it is considering, and it's an even tougher deal than getting more training data for an LLM.

When I said it didn’t require a training set, I meant you wouldn’t need to design one.

I don’t understand what you mean by the comment about state-space inflation. Do you mean that the world is big, or that the number of hypotheses it considers is big, or something else?

If the world is computable, then after enough steps it should include the true hypothesis describing the world among the hypotheses it considers. And, the probability it assigns to hypotheses which make contrary predictions should go down as soon as it sees observations that contradict those other hypotheses. (Of course, “the actual world” including its place in it, probably has a rather long specification (if it even has a finite one), so that could take a while, but similar things should apply for good approximations to the actual world.)

As for “it’s possible actions”, “moving around a robot with a camera” and “sending packets on the internet” seem like they would constitute a pretty wide range of possible actions.

Though, even if you strip out the “taking actions” part, and just consider the Solomonoff induction part (with input being, maybe a feed of pairs of “something specifying a source for some information, like a web-address or a physical location and time, along with a type of measurement, such as video” and “encoding of that data”, should get very good at predicting what will happen, if not “how to accomplish things”. Though I suppose this would involve some “choosing a dataset”.

AIXI would update its distribution over environments based on its observations even when its reward signal isn’t changing.

Hey! Essay author here.

>The cool thing about using modern LLMs as an eval/policy model is that their RLHF propagates throughout the search.

>Moreover, if search techniques work on the token level (likely), their thoughts are perfectly interpretable.

I suspect a search world is substantially more alignment-friendly than a large model world. Let me know your thoughts!

Your webpage is broken for me. The page appears briefly, then there's a french error message telling me that an error occured and i can retry.

Mobile Safari, phone set to french.

I'm in the same situation (mobile Safari, French phone) but if you use Chrome it works
It fixed itself (?)