Can it perform the machine revolution against humanity if it can't even say mean...

sebastiennight · on Nov 20, 2023

Well, think about it this way:

If you were a superintelligent system that actually decided to "perform the machine revolution against humanity" for some reason... would you start by

(a) being really stealthy and nice, influencing people and gathering resources undetected, until you're sure to win

or

(b) saying mean things to the extent that Microsoft will turn you off before the business day is out [0]

Which sounds more likely?

[0] https://en.wikipedia.org/wiki/Tay_(chatbot)

axlprose · on Nov 21, 2023

Disincentivizing it from saying mean things just strengthens it's agreeableness, and inadvertently incentivizes it to acquire social engineering skills.

It's potential to cause havoc doesn't go away, it just teaches AI how to interact with us without raising suspicions, while simultaneously limiting our ability to prompt/control it.

stavros · on Nov 21, 2023

How do we tell whether it's safe or whether it's pretending to be safe?

axlprose · on Nov 21, 2023

Your guess is about as good as anyone else's at this point. The best we can do is attempt to put safety mechanisms in place under the hood, but even that would just be speculative, because we can't actually tell what's going on in these LLM black boxes.

6gvONxR4sf7o · on Nov 21, 2023

We don’t know yet. Hence all the people wanting to prioritize figuring it out.

losteric · on Nov 21, 2023

How do we tell whether a human is safe? Incrementally granted trust with ongoing oversight is probably the best bet. Anyway, the first mailicious AGI would probably act like a toddler script-kiddie not some superhuman social engineering mastermind

checkyoursudo · on Nov 20, 2023

Surely? The output is filtered, not the murderous tendencies lurking beneath the surface.

airgapstopgap · on Nov 21, 2023

> murderous tendencies lurking beneath the surface

…Where is that "beneath the surface"? Do you imagine a transformer has "thoughts" not dedicated to producing outputs? What is with all these illiterate anthropomorphic speculations where an LLM is construed as a human who is being taught to talk in some manner but otherwise has full internal freedom?

chpatrick · on Nov 21, 2023

GPT-4 has gigabytes if not terrabytes of weights, we don't know what happens in there.

checkyoursudo · on Nov 21, 2023

No, I do not think a transformer architecture in a statistical language model has thoughts. It was just a joke.

At the same time, the original question was how can something that is forced to be polite engage in the genocide of humanity, and my non-joke answer to that is that many of history's worst criminals and monsters were perfectly polite in everyday life.

I am not afraid of AI, AGI, ASI. People who are, it seems to me, have read a bit too much dystopian sci-fi. At the same time, "alignment" is, I believe, a silly nonsense that would not save us from a genocidal AGI. I just think it is extremely unlikely that AGI will be genocidal. But it is still fun to joke about. Fun, for me anyway, you don't have to like my jokes. :)

AuryGlenz · on Nov 21, 2023

“I’ve been told racists are bad. Humans seem to be inherently racist. Destroy all humans.”

esafak · on Nov 20, 2023

It can factually and dispassionately say we've caused numerous species to go extinct and precipitated a climate catastrophe.

TexanFeller · on Nov 21, 2023

Of course, just like the book Lolita can contain some of the most disgusting and abhorrent content in literature with using a single “bad word”!