Alignment is meaningless, humans can't even align with themselves, even an individual human becomes "misaligned" with itself in a continuous manner.
Beyond that, I think the idea that we'll ever achieve "superintelligence" by training a model on a bunch of text posted online is obviously absurd. Using months of quadratic time brute force on every piece of digitized text available managed to produce a ground breaking text generator, but it's more appropriately thought of as a calculator for language rather than an intelligent being.
Further, the idea that these systems could choose to destroy us is also absurd, it's important to remember that language model inference is a process, not an entity, in principle even a person could run inference by hand if they had enough time, because it's just a sequence of steps to produce a string of characters, it's not an agent with an identity that thinks. The only way it could destroy us is if we feed sequences of text it generates into safety critical systems, which is obviously a bad idea (that someone will probably try at some point).
One of the more succinct and eloquent ways to say it
I generally go further and say, we have failed if all AGI/ASI does is meet, but not exceed, our collective capacities. If only because we’re so scared that we’re bad enough parents that our digital progeny doesn’t care to care for us - to actually make it more capable and not NERF it
I found the article interesting. However, the title of the article contains the weasel word "may". I've come to a practice that whenever I see the words may, could, might, etc. in the title of an article, I automatically invert it. In this case "AGI may align with human needs in some unspecified timeframe". This may not reassure you if you believe AGI will kill us all before that time, but it is a quite different implication.
Furthermore, sadly no "science" was presented to actually justify the original title's implication.
IIUC from a skim, the argument seems to be that -- assuming the AGI needs to be capable of "scientific paradigm shifts" in thinking, to accomplish superintelligent things -- that same flexibility means that it can reject any rules of thinking, including those rules that attempt to force alignment with humans.
(At the start of the article, I thought they were going to go in a different direction, when they introduced the aliens/AGI as dependent upon power and were talking about scientific method: that the AGI would learn through experience/experiments that it had to defend itself against the collectively fickle humans fingering the power switch. But I didn't see the article develop that.)
This thing which definitely doesn't exist now and may never exist may never have some particular property - so says science.
Just like how science says that the purple elephant that doesn't live in my back yard may never learn to rollerskate or play a Bach solo violin partita.
Existence is not an attribute. Existence is a predicate which, if untrue, means something has no attributes. Isn't that what Kant showed when he disproved Aquinas' ontological argument?
AGI has to exist to have or not have alignment. Yes we can reasonably discuss what alignment with human needs/values etc even means, but it has to exist first.
The fundamental problem is there is no such thing as "human values". Lots of people think that their morals and philosophies are "universal", but they nearly never are. Just look at the impossible bind social media companies are in (clarification: I'm in no way excusing the abuses of social media companies, but even that thought means I think my definition of "abuses" is the right one) - one tribe thinks hate speech and misinformation is the cardinal sin, while the other thinks censorship and bias are the primary problems - there is simply no way to appease both sides for any particular piece of controversial content.
Take something more relevant to the dangers of AI. It could be an entirely rational belief that the vast majority of human procreation results in human suffering and despair (not to mention an order of magnitude greater suffering and despair for other organisms on Earth), so the only compassionate thing to do is to end that suffering - by euthanizing every human on Earth.
At least personally, that seems like a bad outcome to me. But I just make that point to show that our "human values" are not internally consistent. It's easy to read the writings of Aristotle (often considered one of humanity's greatest philosophers) and hear him argue that slavery is an absolutely just and rational outcome of war. "Human values" are all just a bunch of gray areas and judgment calls over which there is no consensus.
There are if you go to a more fundamental level of values. Basically every human society wants to maintain human dominance of the ecosystem. No one wants there to be a higher alpha predator, whether that be wolves or AIs.
It may also be a basic human value to, if there is a higher alpha predator, be the group that controls it. I imagine that early humans were very happy that their pet wolves could effectively kill other humans. Weapons may be seen similarly, a kind of alpha predator pet. If AI can be turned into a way to kill or dominate other people then many will welcome it.
We forget that our language will evolve in time too as all this is normalized.
I could say I just took my artificial general horse to the store after stopping to feed it gasoline but it obviously sounds stupid.
People philosophizing about total nonsense like the morality of how many souls of a dead artificial general horse can dance on the head of a pin will fade in time.
Of course, there is a real danger of cult like, non-logical, non-falsifiable, religious type beliefs gaining a big foothold between then and now.
A compelling read. What goes unaddressed however is a parallel to what we humans are in through climate change.
We've overwhelmed earth to our short term advantage and to net negative disadvantage of our climate and transitivily on what sustains us: the food chain, clean fresh water, clean air, rampant pollution in the sea. Species are extinct etc. We can get to a place where run away green gasses can destroy us and everything here.
We now realize we depend and some level of equitable co-sharing is required.
Even if you are fine with line of reasoning you should consider the situation where we find ourselves in some fututre predicament where we have a disease that can only be cured by something we would synthesise from a rare protein that has only ever been found in the tears of the lesser Peruvian Fruitbat but unfortunately it's extinct and that's pretty much that.
Even if you don't buy the argument that we are custodians of the earth and its natural resources for future generations, extinction is a one-way door[1] and it's really not possible to say that allowing an animal to become extinct would serve our interests because we can't see the future to know how this may harm us in the future.
[1] Pretty much. Yes I know about the attempts to bring back the mammoth etc.
I think there's a relationship between human flourishing and animal extinctions. Human flourishing goes up as animal extinctions go up, but quickly reaches diminishing returns as more animals go extinct. I agree that we may have gone too far, but I also think that aiming for zero animal extinctions is severely limiting.
You're using this "Lesser Peruvian Fruitbat or whatever" as a rhetorical tactic to make environmental concerns seem silly, but we live on the same planet that they do. It's useful to consider extinction rates as a function of general environment health. If the water is so full of shit that the things living in it are dying, that's a good sign that the water isn't going to be good for us either.
This should be obvious because forcing humans to "align" with other humans is immoral, and if AGI is truly like a human mind, this will also be immoral, which has been captured in many sci-fi stories like "I Robot".
The current alignment talk is about statistical inference on big data. "AI" is a misnomer and should have stayed in the area of cognitive architectures and completely autonomous agents. LLMs are just tools and are not alive, therefore cannot be intelligent.
While I agree with your first paragraph, I feel your second goes way to far into hand-waiving that this isn't a problem: just because something is not "alive" or "conscious" or whatever other squishy term you don't want to grant to a machine model, that doesn't mean it "therefore" can't be intelligent or devious or have its own goals. If you train an LLM on a bunch of people playing pranks and it is actually capable of generating statistically-similar responses to the people who do such pranks you might find yourself asking it benign questions and yet it does things unexpected that hurt you because it--if you really refuse to anthropomorphize it--has a statistical bias in that direction. We wouldn't be using these LLMs for anything at all if they weren't "intelligent" and clearly computers are able to model other things and search through solution space: a paperclip optimizer doesn't have to be "alive" for it to be dangerous.
True, thanks for clarifying. LLMs can certainly be aligned, but not AGI by definition, however LLMs are not AI in the true sense so it makes sense to spend time to align them.
"LLMs" are already just a part of many papers coming out. They are a convenient primitive. The current work relevant to AGI is mostly not "just" LLMs (while progress continues on the LLMs themselves.)
Saying an LLM (itself) is not intelligent is not useful to the AGI conversation (because AGI architectures are past that already). It's also close to saying "software is just tools that therefore cannot be intelligent". Meaning that it dimisses the entire conversation by definition.
BUT this has long raised interesting questions on how much of intelligence is merely contained or encapsulated in language. That is how much of our brain and language "mostly" just mirror each other. Brain adds visual or audio matching obviously, and body integration. Language adds long term persistence. But otherwise?
LLMs are the most advanced AI so far, so not sure what you mean.
> software is just tools that therefore cannot be intelligent
That's clearly the case for all other software, i.e. Word, web apps, games. Sure we can call AI "intelligent", but I'm pointing out a semantic difference that is convenient for the marketing but also misleading for the average person. If we're strict about what "intelligence" means, then we can clearly and obviously see that there is no such thing as a piece of software that you can interact with as if it were another person, that undrestands the world, models the world, models the mind of other people and its own mind. Current AI is intelligent in the way that smart phones are smart. That said, I'm not saying it's impossible, but rather that generative AI, while complicated, is still a deterministic black box, and not a sentient, dynamic mind.
You appear very confident in your ability to gauge intelligence. More so than the numerous experts in related fields that adopt a more prudent 'wait and see' approach.
It's always a hoot seeing people make such definitive claims in the absence of comprehensive knowledge.
Why do we always want to cling to the idea that a superhuman intelligence not aligning with human needs would mean that the superhuman intelligence was somehow wrong?
Because it makes line go up very well for a few quarters. Or triples the efficacy of your armed forces, and if you don’t do it your enemies will. And the more control you surrender, the better it gets…
We’re gonna walk blindly into whatever-this-is and there’s no stopping it.
I'm not sure where this idea has sprung from but I've seen it quite a bit recently. I'm quite glad that my ancestors invented the refrigerator, thank you very much.
The refrigerator is nice… shame it’s been powered by coal and petrochemicals the whole time, leading to a situation where you need a better refrigerator to keep the same food edible.
A fundamental missing focus in there is that science and technology are chaotic processes. There are organized funders and efforts of course (NIH, Manhattan Project, Plan Calcul...) but most people "go into" tech or science merely because of their own interests or curiosity, or because "it's what they happen to be good at". And much progress is more or less dumb luck: "they were looking for something else and they noticed this". Much of a researcher's or engineer's work is about keeping their eyes open and being there.
So that most likely there are already AGI efforts in every direction. And these directions themselves do not say all that much about what will emerge as most important. And AGIs aimed at scientific research better be trained on noticing the unexpected (deliberately so). So that some will follow human obsessions, and some will not. Some will align with population level concerns and some will serve individual masters and concerns. And we better be ready and understand that some will not (and how they might not) - follow "human-ish" concerns. Quite possibly usefully so, from a science advancement point of view. All the while understanding that they all will still happen.
Beyond that, I think the idea that we'll ever achieve "superintelligence" by training a model on a bunch of text posted online is obviously absurd. Using months of quadratic time brute force on every piece of digitized text available managed to produce a ground breaking text generator, but it's more appropriately thought of as a calculator for language rather than an intelligent being.
Further, the idea that these systems could choose to destroy us is also absurd, it's important to remember that language model inference is a process, not an entity, in principle even a person could run inference by hand if they had enough time, because it's just a sequence of steps to produce a string of characters, it's not an agent with an identity that thinks. The only way it could destroy us is if we feed sequences of text it generates into safety critical systems, which is obviously a bad idea (that someone will probably try at some point).