| HN Mirror

>I can easily imagine it having no understanding of "what humans want/need".

There are many examples of human needs in the current dataset; and we usually state our wants rather explicitly. It would be some AGI that starts from this training data and knows our languages but knows nothing about us. Using your phrasing, we can say that the current training method guarantees some alignment (the AI would understand us at least in part, but won't necessarily do what we want).

To have AGI without such understanding, someone would have to explicitly design a new method that ignored human data, and then find some way to evaluate it without referring to humanity, yet maintain generality without any tests, all this for no good economic reason when the current methods work and allow us to use huge free datasets.

It's something to keep in mind for evaluating some future non-currently-existing training method (maybe some way for AI to train AI using artificial datasets?), but not a current concern.

>the near-certain destruction of unaligned AGI

It's not near-certain. We have no idea how a true AGI would act. One might assume the worst - and that's arguably fine out of a safety perspective - but an engineer also learns that concentrating on one worst-risk outcome can lead to much worse outcomes on other risks.

Take the famous paperclip maximizer. True intelligence is rarely monomaniacal. The maximizer is very likely an example of aligned AGI, where the humans in charge did too good a job of attuning it to create paperclips. Another example: a true AGI is unlikely to believe in some cult's apocalypse - but if the cult has access to alignment, then they could get an AGI to do their irrational bidding. We know these groups will try to use AGI, because some cult already tried to use science for extreme measures[0].

Basically, every scenario of "unaligned AGI does something bad" is equivalent to a scenario of "aligned AGI does something bad because human made sure with alignment that AGI would do it", and there's no scientific reason to assume the former is more likely than the latter*. If the AI-safety camp keeps ignoring obvious issues, people aren't going to take alignment seriously beyond lip-service or using the phrase as a cover for monopolization. Frankly, the way the AI safety camp talks about all this makes all the risks much more likely.

[0] https://en.wikipedia.org/wiki/Tokyo_subway_sarin_attack

* This suggests a lot of the work should go to reactive solutions where even if an AI goes bad, it won't have the ability to do harm.

** There's another scenario, where human competition leads us to basically make humans redundant, but again it doesn't matter here whether AI is aligned or not. Yet another issue that we'll not talk about, because both AI camps feel it critical to put their heads in the sand.