This is impressive. Yet, before you go the AGI is nigh, ask yourself a simple question: will this spiral in or spiral out? If we feed everything the model comes up with back as training data, will we get Endless Forms Most Beautiful or will we get an equilibrium?
Honestly astonished that superintelligence is a mainstream idea. The story it tells makes sense only if you never bothered to dig further than its surface.
- Replace 'AI' with 'God', does it still make sense?
- Exponents still take time. 2^33 to get to current world population with no hitches.
- Solomonoff / Bekenstein / Gödel - name your favorite limiting theorem.
- For any optimization method we can literally construct a learning problem that it can never successfully learn. Take it a step further and you have a communication channel where the AI listens to everything and understands nothing.
- Was any force ever able to get close to world domination? At one point in history the US had nuclear power and no one else had it. Was that edge enough?
When we get closer to manufacturing universal intelligence its more impressive incarnations will look more like countries and corporations than omnipotent deities. The problems we’ll have to face will have more to do with consciousness and human rights than with alignment. Alignment is really more about automation at the incomprehensible scale, where the clash between dimensionality reduction and Goodhart’s Law becomes absurd.
You're throwing random "objections" with no elaboration, which doesn't prove anything.
> - Replace 'AI' with 'God', does it still make sense?
Maybe in some cases. So what? God is a fairly broad concept in many people's imaginations.
> - Exponents still take time. 2^33 to get to current world population with no hitches.
It's certainly a fairly contested topic in AI safety - how fast will this thing happen? Are we talking seconds? Hours? Months? Years?
There are at least some valid reasons to think it will be on the faster timescale, so not sure saying that exponents take time is a big counterargument.
> - Solomonoff / Bekenstein / Gödel - name your favorite limiting theorem.
I don't think anyone who's well versed in any of these theorems believes that they have anything at all to say about this.
As a simple counterargument - whatever limiting theorem "limits" an AI, can similarly "limit" us.
> - Was any force ever able to get close to world domination? At one point in history the US had nuclear power and no one else had it. Was that edge enough?
I think others have pointed it out, but as compared to the rest of life on the planet, humans are exactly such a force.
> Was any force ever able to get close to world domination?
Evolution? 2.5bn years ago stromatolites changed the atmosphere from a CO2-rich to O2-rich through photosynthesis, because they had no competition.
Now plants dominate the earth (≈450 Gt C, the dominant kingdom), then animals (≈2 Gt C, mainly marine, and bacteria (≈70 Gt C) and archaea (≈7 Gt C).
In 2020, global human-made mass exceeded all living biomass ( nature.com/articles/s41586-020-3010-5).
Yes. The only force we know that achieves this is undirected, and no single part of it stays at the top for long. Contrast with superintelligence, a single entity which does not evolve but optimizes in a directed way.
I think evolution is not an undirected process in that sense because it's an optimization process, that optimizes to create more copies of itself. Superintelligence will likely use some Evolutionary Computation (see en.wikipedia.org/wiki/Evolutionary_computation ).
Also see Karl Sims 'Creatures' from the 90s:
youtube.com/watch?v=JBgG_VSP7f8
or
OpenAI's Multi-Agent Hide and Seek:
youtube.com/watch?v=kopoLzvh5jY
> Was any force ever able to get close to world domination? At one point in history the US had nuclear power and no one else had it. Was that edge enough?
I'd rebut that most current technologically and financially leading-edge civilizations have optimized for that, at the expense of world domination. Even China has indeed evolved into a more financially (rather than physically) optimized state.
The last time we had major superpowers devoted to total expansionist war was Nazi Germany and Imperial Japan, and neither of them possessed nuclear weapons (or ISTAR, chemical or biological weapons, space capabilities, logistics, or data networks in the modern sense).
If a modern superpower (which is to say, excluding the current Russian government) devoted itself to preparing for and then launching a war of global conquest, who's to say?
I think what does cut against the rebuttal and buttress the "there can never be physical world domination" is the sheer amount of space, relative to a potential aggressor.
There are no world-spanning empires anymore. Consequently, you would have to fight through opponents sequentially, meaning each one is better prepared than the last. And that sounds like futility.
What the Gwern timeline does accurately identify is that the key element is time.
Either everything is over quickly and before opposition begins to mobilize, or there can be no world domination.
We haven't reached a limit, no where close, natural resource supply is not limiting growth. While Moore's law is slowing, machines and productivity gains are still being made. Fusion is theoretically possible, when it happens will transform the world
Just replace K and JavaScript with German and English to see the vacuity of this argument. Either of several possible representations can become native to one’s thinking. The question is which is a better aid in reaching some non-arbitrary goal. The only merit of K presented and emphasized here was the supposed brevity of its programs. Personally I’ve found the habitable zone somewhere that allows for more air between ideas.
How about this: K is an interpreted language, but your whole program's source code plus the interpreter plus the whole database engine fits inside your server's L1 cache, so K programs tend to be faster than their C equivalents (in addition to all the array operations being highly optimized).
And you don't get to waste time scrolling, your typical module's code fits on your single screen.
I know it's all been said before, but the performance take here is somewhere between misleading and wrong. K runs code quickly for an interpreter because it has a simple grammar and a small number of types, but you don't get up to compiled speed just by reducing overhead, so it will lose to C, Javascript, or LuaJIT in this regard. If you can concentrate the program's work in a few operations on large arrays (not always possible), then K might beat idiomatic C. I don't think I've ever seen an example of this.
Anything about the L1 cache and K is just wrong, usually. At 600KB the K4 database engine is much too large to fit in L1 (K9 from Shakti is somewhat smaller but still a few times too large). And L1 instruction cache misses aren't a bottleneck for other languages, so there's little benefit in reducing them even to the extent K does it.
> but your whole program's source code plus the interpreter plus the whole database engine fits inside your server's L1 cache, so K programs tend to be faster
Is that really the bottleneck? I've done quite a lot of profiling on high performance code and I've almost never hit a bottleneck in the instruction cache. Data access bottlenecks or branching hit performance harder and sooner than instruction fetching.
> And you don't get to waste time scrolling, your typical module's code fits on your single screen.
How much of the time you save scrolling is spent on decoding an array of symbols and remembering what those symbols are?
Everything is pattern matching (or memorization). You can use this approach to half-automate the solution to a known existing class of problems, but how do you come up with anything new? How did Paul Cohen came up with the forcing technique? Who figured out probabilistic proofs as a possible vector of attack?
"Both these properties, predictability and stability, are special to integrable systems... Since classical mechanics has dealt exclusively with integrable systems for so many years, we have been left with wrong ideas about causality. The mathematical truth, coming from non-integrable systems, is that everything is the cause of everything else: to predict what will happen tomorrow, we must take into account everything that is happening today.
Except in very special cases, there is no clear-cut "causality chain," relating successive events, where each one is the (only) cause of the next in line. Integrable systems are such special cases, and they have led to a view of the world as a juxtaposition of causal chains, running parallel to each other with little or no interference."
I think the key to innovation is to first know what’s out there. Then you’re able to combine, twist and augment known ideas.
Special theory of relativity did not come out of nowhere. Neither did the geometry, nor algebra. It was all about humans’ curious mind and joy of exploring what’s “possible” out there.
This is such a great quote. There is a similar line in the book "Creation: Life and How to Make It." The author says that causality is a web, not a chain.
Yes, my thoughts follow a web pattern, not only a chain! There are chains of thought, but they jump all over the place, even in loops. And it all ends in philosophy [0].
Hyperlinks on the web are one-directional. But links are much stronger if they're bidirectional. That's possible using backlinks, or in real life, by saying "thank you".
Thank you planet-and-halo for reminding us of the web analogy. Thank you zR0x for relating the abstract maths to tangible reality. Thank you tarxzvf for suggesting that everything is pattern matching (I agree, matter & energy are finite, it's only the connections between them that we can create).
I believe that these connections hold true for dad jokes, social situations, software, maths, physics, chemistry, biology... every created thing. Let's thank our creator, and all the teachers who helped us grow.
Are there under 6 degrees of separation between everything in the universe? Or is it as few as 3.5 degrees? [1]
Feels like a take from an alternate reality. I can’t think of a single great developer I know who was not self-taught. In my experience, if you got the will, drive, attitude, and curiosity, you had it for a while, and any given situation can only slow or accelerate your pace. And if you don’t got them, you don’t got them, and no sage shove from the outside is going to help.
I think you need both intrinsic talent / motivation and external guidance.
Like with research, nobody can take arithmetic and single-handedly discover calculus, number theory, and higher concepts. You're not going to discover good software development paradigms without reading about some of them. You especially can't write anything useful without using high-level libraries and working with others.
At the same time, I think intrinsic motivation significantly helps learning these concepts. To some random person, learning about "one function one purpose" could be like learning random historical dates to me. "Why can't I just copy / paste the code? Why do I need good function names?" These people would have to discipline themselves and power through learning this stuff. But to me, I didn't have to discipline myself, because for some reason I was genuinely interested in writing "clean" code and making my development more efficient. This gave me an advantage. And there are people who love writing code more than I do, so when I get tired and brain fog after a couple hours they keep writing.
I'm not them. It's better to assume we all need help and direction. Just about everyone I've ever worked with has had a beautiful insight or three. The real trick seems to be finding a process that produces good enough results that isn't so soul-crushing that it extinguishes those rare brilliant insights.
Academic citations work the same way: saying thank you to the people who helped builds their reputation, and is a positive-feedback cycle of growth and joy. Thank you jfoutz for humbly saying that you also need help and direction; me too.
You need to add Gauss, who basically figured out arithmetic on his own the way he tells it, but you probably need to remove Ramanujan. Yes, he was brilliant and self-taught with the proper material as inspiration, but where he ended up wasn't understandable to other mathematicians and neither where they to him.
In my career, I've had the pleasure of watching a few developers go from enthusiastic newbies to great developers. I'd describe them as self-taught, but none of them developed the whole field of computer science by themselves. They got there by reading other people's code, by writing bad code and getting shown a better way, by reading books and articles and taking a few early CS classes. Of course, I know a few great developers who are older than me. They never made such mistakes or needed such help that I saw. But they were novices once, and they learned from others who came before them.
Nothing in the article contradicts your statement, IMO. Literally none of the things she mentions are things that are taught in school. She’s more stressing the importance of learning from peers, not some guru from on high.
Yes, though it does give a feel of dismissing people who mostly learned on their own.
In my experience, these two learning sources - your peers at work, and your own research - yield different type of knowledge. To show you what I mean, let's dissect an example from the article:
> Think about the difference between being able to write functions that print out to your terminal versus creating a class with methods that return text to pass to other methods that checks for sanitized inputs, and then passes it to a front-end.
This belongs to "objective level" of working with code: how to write correct and efficient code, how to build the right abstractions, how to make it work in context of a larger system, etc. This kind of knowledge is something very amenable to self-directed learning: studying books, writing code, reading code, playing around, getting a feel for handling complexity, for how thoughts map to code, and code maps to execution.
> Now imagine that that class is a function that has to be packaged to work in the cloud. And, on top of that, imagine that the function has to be version-controlled in a repo where 5-6 people are regularly merging code, pass CI/CD, and is part of a system that returns the outputs of some machine learning model with latency constraints.
This belongs to "meta level" of working with code: how to write code in business context, how to collaborate with colleagues. It's not software skills - it's business skills and people skills. Both learned best through experience on the job.
Point being, the two types of knowledge/experience are somewhat orthogonal, though they reinforce each other. You won't learn how to write good code by just interacting with your peers at work, not unless one of those peers is learning independently and applying their knowledge to raise the craftsmanship level in the shop. But then, working in a team, under time and budget pressure, introduces tradeoffs that affect the way you code, in a way you just can't reproduce on personal projects.
To sell your skills to a business, and for that business to make use of them, you really need both types of learning. Companies understand the need for learning from peers well (perhaps too well, it's becoming a pro-office argument), but I wish they also understood the need for self-directed learning better, and allocated resources accordingly. As things are right now, I feel the progress in our industry mostly rests on people who are either paid to do R&D, have time to do learning off-work, or are just learning independently at work and not telling their boss.
As someone who’s had a grand total of 3 CS courses, in which were mostly taught things I already knew, I didn’t find it at all dismissive. Remember, it says “no one becomes a good software engineer by themselves,” (emphasis added), not “you can’t learn to program well on your own, or “you can’t effectively self-study CS.” There’s a big difference.
The problem goes much deeper than these adversarial examples. The main issue is Solomonoff Uncomputability (or the No Free Lunch in Search and Optimization theorem, or any of the other hard limiting theorems).
In short, it’s not only that you can devise adversarial examples that find the blindspots of the function approximator and fool it into misprediction, it’s that for any learning optimization algorithm you can abuse its priors and biases and create an environment in which it will perform terribly. This is a fundamental and inherent feature of how we go about machine learning — equating it with optimizing functions — and we will need a paradigm shift to go around it.
It’s curious to me how most of these results are known for decades, yet most researchers seem dead set on ignoring them.
I think machine learning researchers are well aware that successful optimisation is only possible using the right priors. This is explicit in bayesian machine learning but also implicit in neural networks in the choice of the architecture, optimisation algorithm and hyper parameters. It's a well discussed problem and a lot of researchers have a serious background in optimisation, theoretical machine learning and other related areas.
What exactly are the right priors for general intelligence? And keep in mind, whichever prior you choose, I can design learning problem where it will lead you astray.
Related question: What are the adversarial examples for human intelligence? We know some for the visual and auditory systems, but what about the arguably general intelligence of humans?
Maybe we can work our way backwards from the adversarial examples to the inductive biases?
'Thinking Fast and Slow' is basically all about the rough edges of human thinking.
The interesting tradeoff with ML systems is that you trade lots of individual human crap for one big pile of machine crap. The advantage of the machine crap is that you can actually go in and find systemic problems and work on fixing them at a 'global' level. On the human side, you're always going to be stuck with an unknown array of individual human biases which are incredibly difficult to correct.
That's for reinforcement learning, right? What is the adversarial learning problem in say, classification based on Solomonoff?
If hypercomputation is possible, then anything based on Kolmogorov complexity would be SOL, but if not... is Solomonoff induction just too expensive in practice?