AGI may never align with human needs – so says science

root_axis · on March 9, 2024

Alignment is meaningless, humans can't even align with themselves, even an individual human becomes "misaligned" with itself in a continuous manner.

Beyond that, I think the idea that we'll ever achieve "superintelligence" by training a model on a bunch of text posted online is obviously absurd. Using months of quadratic time brute force on every piece of digitized text available managed to produce a ground breaking text generator, but it's more appropriately thought of as a calculator for language rather than an intelligent being.

Further, the idea that these systems could choose to destroy us is also absurd, it's important to remember that language model inference is a process, not an entity, in principle even a person could run inference by hand if they had enough time, because it's just a sequence of steps to produce a string of characters, it's not an agent with an identity that thinks. The only way it could destroy us is if we feed sequences of text it generates into safety critical systems, which is obviously a bad idea (that someone will probably try at some point).

AndrewKemendo · on March 10, 2024

100% agreed and well said

One of the more succinct and eloquent ways to say it

I generally go further and say, we have failed if all AGI/ASI does is meet, but not exceed, our collective capacities. If only because we’re so scared that we’re bad enough parents that our digital progeny doesn’t care to care for us - to actually make it more capable and not NERF it

Seems to be the trajectory we’re on

bwanab · on March 9, 2024

I found the article interesting. However, the title of the article contains the weasel word "may". I've come to a practice that whenever I see the words may, could, might, etc. in the title of an article, I automatically invert it. In this case "AGI may align with human needs in some unspecified timeframe". This may not reassure you if you believe AGI will kill us all before that time, but it is a quite different implication.

Furthermore, sadly no "science" was presented to actually justify the original title's implication.

bko · on March 9, 2024

> I've come to a practice that whenever I see the words may, could, might, etc. in the title of an article, I automatically invert it

This is similar to Betteride's Law of Headlines:

Betteridge's law of headlines is an adage that states: "Any headline that ends in a question mark can be answered by the word no."

https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

neilv · on March 9, 2024

IIUC from a skim, the argument seems to be that -- assuming the AGI needs to be capable of "scientific paradigm shifts" in thinking, to accomplish superintelligent things -- that same flexibility means that it can reject any rules of thinking, including those rules that attempt to force alignment with humans.

(At the start of the article, I thought they were going to go in a different direction, when they introduced the aliens/AGI as dependent upon power and were talking about scientific method: that the AGI would learn through experience/experiments that it had to defend itself against the collectively fickle humans fingering the power switch. But I didn't see the article develop that.)

seanhunter · on March 9, 2024

This thing which definitely doesn't exist now and may never exist may never have some particular property - so says science.

Just like how science says that the purple elephant that doesn't live in my back yard may never learn to rollerskate or play a Bach solo violin partita.

Existence is not an attribute. Existence is a predicate which, if untrue, means something has no attributes. Isn't that what Kant showed when he disproved Aquinas' ontological argument?

AGI has to exist to have or not have alignment. Yes we can reasonably discuss what alignment with human needs/values etc even means, but it has to exist first.

franky47 · on March 9, 2024

It only has to align with shareholders' needs, and the "to benefit all humanity" mantra will quickly go down the drain.

hn_throwaway_99 · on March 9, 2024

The fundamental problem is there is no such thing as "human values". Lots of people think that their morals and philosophies are "universal", but they nearly never are. Just look at the impossible bind social media companies are in (clarification: I'm in no way excusing the abuses of social media companies, but even that thought means I think my definition of "abuses" is the right one) - one tribe thinks hate speech and misinformation is the cardinal sin, while the other thinks censorship and bias are the primary problems - there is simply no way to appease both sides for any particular piece of controversial content.

Take something more relevant to the dangers of AI. It could be an entirely rational belief that the vast majority of human procreation results in human suffering and despair (not to mention an order of magnitude greater suffering and despair for other organisms on Earth), so the only compassionate thing to do is to end that suffering - by euthanizing every human on Earth.

At least personally, that seems like a bad outcome to me. But I just make that point to show that our "human values" are not internally consistent. It's easy to read the writings of Aristotle (often considered one of humanity's greatest philosophers) and hear him argue that slavery is an absolutely just and rational outcome of war. "Human values" are all just a bunch of gray areas and judgment calls over which there is no consensus.

keiferski · on March 9, 2024

There are if you go to a more fundamental level of values. Basically every human society wants to maintain human dominance of the ecosystem. No one wants there to be a higher alpha predator, whether that be wolves or AIs.

sandspar · on March 9, 2024

It may also be a basic human value to, if there is a higher alpha predator, be the group that controls it. I imagine that early humans were very happy that their pet wolves could effectively kill other humans. Weapons may be seen similarly, a kind of alpha predator pet. If AI can be turned into a way to kill or dominate other people then many will welcome it.

creer · on March 9, 2024

Oh yes some do. They just want that higher alpha predator to be at THEIR service.

hn_throwaway_99 · on March 9, 2024

> No one wants there to be a higher alpha predator, whether that be wolves or AIs.

... or other humans, which kind of gets to the crux of the problem in the first place.

ecoquant · on March 9, 2024

We forget that our language will evolve in time too as all this is normalized.

I could say I just took my artificial general horse to the store after stopping to feed it gasoline but it obviously sounds stupid.

People philosophizing about total nonsense like the morality of how many souls of a dead artificial general horse can dance on the head of a pin will fade in time.

Of course, there is a real danger of cult like, non-logical, non-falsifiable, religious type beliefs gaining a big foothold between then and now.

scrubs · on March 9, 2024

A compelling read. What goes unaddressed however is a parallel to what we humans are in through climate change.

We've overwhelmed earth to our short term advantage and to net negative disadvantage of our climate and transitivily on what sustains us: the food chain, clean fresh water, clean air, rampant pollution in the sea. Species are extinct etc. We can get to a place where run away green gasses can destroy us and everything here.

We now realize we depend and some level of equitable co-sharing is required.

Would agi share with us?

sandspar · on March 9, 2024

[flagged]

seanhunter · on March 9, 2024

Even if you are fine with line of reasoning you should consider the situation where we find ourselves in some fututre predicament where we have a disease that can only be cured by something we would synthesise from a rare protein that has only ever been found in the tears of the lesser Peruvian Fruitbat but unfortunately it's extinct and that's pretty much that.

Even if you don't buy the argument that we are custodians of the earth and its natural resources for future generations, extinction is a one-way door[1] and it's really not possible to say that allowing an animal to become extinct would serve our interests because we can't see the future to know how this may harm us in the future.

[1] Pretty much. Yes I know about the attempts to bring back the mammoth etc.

sandspar · on March 10, 2024

I think there's a relationship between human flourishing and animal extinctions. Human flourishing goes up as animal extinctions go up, but quickly reaches diminishing returns as more animals go extinct. I agree that we may have gone too far, but I also think that aiming for zero animal extinctions is severely limiting.

root_axis · on March 9, 2024

You're using this "Lesser Peruvian Fruitbat or whatever" as a rhetorical tactic to make environmental concerns seem silly, but we live on the same planet that they do. It's useful to consider extinction rates as a function of general environment health. If the water is so full of shit that the things living in it are dying, that's a good sign that the water isn't going to be good for us either.

dwaltrip · on March 9, 2024

We benefit enormously from the biological technology and wealth found in the natural world.

proc0 · on March 9, 2024

This should be obvious because forcing humans to "align" with other humans is immoral, and if AGI is truly like a human mind, this will also be immoral, which has been captured in many sci-fi stories like "I Robot".

The current alignment talk is about statistical inference on big data. "AI" is a misnomer and should have stayed in the area of cognitive architectures and completely autonomous agents. LLMs are just tools and are not alive, therefore cannot be intelligent.

saurik · on March 9, 2024

While I agree with your first paragraph, I feel your second goes way to far into hand-waiving that this isn't a problem: just because something is not "alive" or "conscious" or whatever other squishy term you don't want to grant to a machine model, that doesn't mean it "therefore" can't be intelligent or devious or have its own goals. If you train an LLM on a bunch of people playing pranks and it is actually capable of generating statistically-similar responses to the people who do such pranks you might find yourself asking it benign questions and yet it does things unexpected that hurt you because it--if you really refuse to anthropomorphize it--has a statistical bias in that direction. We wouldn't be using these LLMs for anything at all if they weren't "intelligent" and clearly computers are able to model other things and search through solution space: a paperclip optimizer doesn't have to be "alive" for it to be dangerous.

proc0 · on March 9, 2024

True, thanks for clarifying. LLMs can certainly be aligned, but not AGI by definition, however LLMs are not AI in the true sense so it makes sense to spend time to align them.

creer · on March 9, 2024

"LLMs" are already just a part of many papers coming out. They are a convenient primitive. The current work relevant to AGI is mostly not "just" LLMs (while progress continues on the LLMs themselves.)

Saying an LLM (itself) is not intelligent is not useful to the AGI conversation (because AGI architectures are past that already). It's also close to saying "software is just tools that therefore cannot be intelligent". Meaning that it dimisses the entire conversation by definition.

BUT this has long raised interesting questions on how much of intelligence is merely contained or encapsulated in language. That is how much of our brain and language "mostly" just mirror each other. Brain adds visual or audio matching obviously, and body integration. Language adds long term persistence. But otherwise?

proc0 · on March 9, 2024

LLMs are the most advanced AI so far, so not sure what you mean.

> software is just tools that therefore cannot be intelligent

That's clearly the case for all other software, i.e. Word, web apps, games. Sure we can call AI "intelligent", but I'm pointing out a semantic difference that is convenient for the marketing but also misleading for the average person. If we're strict about what "intelligence" means, then we can clearly and obviously see that there is no such thing as a piece of software that you can interact with as if it were another person, that undrestands the world, models the world, models the mind of other people and its own mind. Current AI is intelligent in the way that smart phones are smart. That said, I'm not saying it's impossible, but rather that generative AI, while complicated, is still a deterministic black box, and not a sentient, dynamic mind.

stevenhuang · on March 9, 2024

You appear very confident in your ability to gauge intelligence. More so than the numerous experts in related fields that adopt a more prudent 'wait and see' approach.

It's always a hoot seeing people make such definitive claims in the absence of comprehensive knowledge.

proc0 · on March 9, 2024

What is there to wait and see? I'm making a judgement on LLMs which is arguably the closest AI has been to an intelligent piece of software.

bamboozled · on March 9, 2024

If we believe this, there it’s wrong to align AI, then “the AI” might share similar respect for us?

creer · on March 9, 2024

Exactly: at some point an intelligence, if intelligence happens, will deserve rights and ethics anyway. Including to think for itself.

yawpitch · on March 9, 2024

Why do we always want to cling to the idea that a superhuman intelligence not aligning with human needs would mean that the superhuman intelligence was somehow wrong?

staunton · on March 9, 2024

It's not "wrong", it's simply not what we want. Why build something you don't want?

vundercind · on March 9, 2024

Because it makes line go up very well for a few quarters. Or triples the efficacy of your armed forces, and if you don’t do it your enemies will. And the more control you surrender, the better it gets…

We’re gonna walk blindly into whatever-this-is and there’s no stopping it.

yawpitch · on March 9, 2024

Arguably most human technology has been something, in the rear view mirror, we didn’t want. I see our successor as no different.

sandspar · on March 9, 2024

I'm not sure where this idea has sprung from but I've seen it quite a bit recently. I'm quite glad that my ancestors invented the refrigerator, thank you very much.

yawpitch · on March 9, 2024

The refrigerator is nice… shame it’s been powered by coal and petrochemicals the whole time, leading to a situation where you need a better refrigerator to keep the same food edible.

creer · on March 9, 2024

A fundamental missing focus in there is that science and technology are chaotic processes. There are organized funders and efforts of course (NIH, Manhattan Project, Plan Calcul...) but most people "go into" tech or science merely because of their own interests or curiosity, or because "it's what they happen to be good at". And much progress is more or less dumb luck: "they were looking for something else and they noticed this". Much of a researcher's or engineer's work is about keeping their eyes open and being there.

So that most likely there are already AGI efforts in every direction. And these directions themselves do not say all that much about what will emerge as most important. And AGIs aimed at scientific research better be trained on noticing the unexpected (deliberately so). So that some will follow human obsessions, and some will not. Some will align with population level concerns and some will serve individual masters and concerns. And we better be ready and understand that some will not (and how they might not) - follow "human-ish" concerns. Quite possibly usefully so, from a science advancement point of view. All the while understanding that they all will still happen.

weregiraffe · on March 9, 2024

humans never align with human needs either

kelseyfrog · on March 9, 2024

Sometimes we try[marriage/partnership], but if that means I'm going to have to marry an AI to gain alignment, no thanks.

sandspar · on March 9, 2024

We may merge with them I suppose, like some sort of parasite or like those deep sea fish that engulf their mates.

more_corn · on March 9, 2024

Humans can’t even align with human needs.