| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by reissbaker 883 days ago

My arguments are:

(P1) Current SOTA AI is good at understanding implicit context, and improved versions will likely be better at understanding implicit context (much like gpt-4 is better at understanding context than gpt-3, and llama2 is better than llama1, and mixtral is better than gpt-3 and better than claude, etc).

(P2) Most misalignments within the observable behavior of current AI do not produce extinction-level goals, and given (P1), it is unclear why someone would believe it's likely going to in the future, since they'll be even better at understanding implicit human context of goals (e.g. implicit goals like do not make humanity extinct, don't turn the entire surface of the planet into an AI lab, etc).

I think there are several other arguments, though, e.g.:

(P1) Progress on AI capabilities is evolutionary, with dumber models slowly being replaced by derivative-but-better models, in terms of architectural evolutionary improvements (e.g. new attention variants), dataset evolutionary improvements as they grow larger and as finetuning sets grow higher quality, and in terms of benchmark and alignment evolutionary progress.

(P2) Evolutionary steps towards evil-AI will likely be filtered out during training, since it will not yet be generalized superhuman intelligence and will give away its misalignment during training, whereas legitimately-aligned AI model evolutions will be rewarded for better performance.

(P3) Generalized superhuman intelligence will likely be an evolutionary step from a well-aligned ordinary intelligence, which will be an evolutionary step from sub-human intelligence that is reasonably well aligned.

Or:

(P1) LLMs have architectural issues that will prevent them from quickly becoming generalized superintelligence of the "human vs slug" variety (bad/inefficient at math, tokenization issues, likelihood of hallucinations, limited ability to learn new facts without expensive and slow training runs, difficulty backtracking from incorrect chains of reasoning, etc).

(C) LLM research is not likely to soon produce a superhuman AI able to cause an extinction event for humanity, and should not be illegal.

However, ultimately my most strongly-believed personal argument is:

(P1) The burden of proof for making something illegal due to apocalyptic predictions lies on the prognosticator.

(P2) There is not much hard evidence of an impending apocalypse due to LLMs, and philosophical arguments for it are either self-referential and require belief in the apocalypse as a prerequisite, or are highly speculative, or both.

1 comments

foo3a9c4 881 days ago

(I don't currently have the energy to engage with each argument, so I'm just responding to the first.)

> (P1) Current SOTA AI is good at understanding implicit context, and improved versions will likely be better at understanding implicit context (much like gpt-4 is better at understanding context than gpt-3, and llama2 is better than llama1, and mixtral is better than gpt-3 and better than claude, etc).

I believe that (P1) is probably true.

> (P2) Most misalignments within the observable behavior of current AI do not produce extinction-level goals, and given (P1), it is unclear why someone would believe it's likely going to in the future, since they'll be even better at understanding implicit human context of goals (e.g. implicit goals like do not make humanity extinct, don't turn the entire surface of the planet into an AI lab, etc).

I'm confused about what exactly you mean by "goals" in (P2). Are you referring to (I) the loss function used by the algorithm that trained GPT4, or (II) goals and sub-goals which are internal parts of the GPT4 model, or (III) the sub-goals that GPT4 writes into a response when a user asks it "What is the best way to do X?"

link

reissbaker 880 days ago

I am referring to "goals" as used by the original argument you posted, "it is easy to find goals that are extinction-level bad."

link

foo3a9c4 880 days ago

My understanding is that (P3) of the original argument (https://aiadventures.net/summaries/agi-ruin-list-of-lethalit...) uses "goals" as in (II).

But earlier you said this:

> 1. States things like "Finding goals that are extinction-level bad and relatively useful appears to be easy: for example, advanced AI with the sole objective ‘increase company.com revenue’ might be highly valuable to company.com for a time, but risks longer term harms to society, if powerfully accruing resources and power toward this end with no regard for ethics beyond laws that are still too expensive to break." But even current-gen LLMs sidestep this pretty easily, and if you ask them to increase e.g. revenue, they do not propose extinction-level events or propose eschewing basic ethics.

And in this quote it looks to me that you are using "goals" as in (III).

(I'm not an expert on these matters and I am admittedly still very confused about them. Minimally I'd like to make sure that we aren't talking past one another.)

link

reissbaker 880 days ago

Sorry, I was referencing the quote "Finding goals that are extinction-level bad..." from your first link, https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk....

What that was referencing was finding goals that a human would want an AI to follow, e.g. "increase revenue" was one example explicit goal in the wiki the human might want an AI to follow. The argument in the wiki was that the AI would then do unethical things in service of that goal that would be "extinction-level bad." My counter-argument is that current SOTA AI already understands that despite having an explicit goal — let's say given in a prompt — of "increase revenue," there are implicit goals of "do not kill everyone" (for example) that it doesn't need stated; as LLMs advance they have become better at understanding implicit human goals, and better at instruction-following with adherence to implicit goals; and thus future LLMs will be likely to be even better at doing that, and unlikely to e.g. resurface the planet and turn it into an AI lab when told to increase revenue or told to produce better-aligned AI.

link