Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The near future of AI is action-driven (jmcdonnell.substack.com)
132 points by hardmaru on Nov 17, 2022 | hide | past | favorite | 68 comments


I’m skeptical since we’ve heard the same promises of how chatbots will replace our computer interfaces for years now. The systems always seem to break down when the user needs to do something surprising. Think about trying to navigate an automated phone menu when your option isn’t available.

Transformers seem incredibly good at picking up patterns and repeating them, and combining them in new and surprising ways, but my intuition says that the unpredictability of the real world will cause the required training set to grow faster than the models can keep up, at least to get close to anything that “looks a lot like AGI.”


> needs to do something surprising.

In my experience, that’s being incredibly generous. We evaluated the ‘big names’ in AI chatbots for a project recently, and their biggest portfolio examples (not on their site; ones we went through with their consultants) trip up over depressingly trivial but relevant for the business questions. It might have been a surprise for the chatbot but, for instance, at one point the question asked was in the faq of the company but by the way we formulated it, it didn’t understand it was the same thing. FAQs are made by people who understand the business, a question is distilled from many questions that lead to the same answer; regular users won’t ask the question exactly or at all like that, which is where the AI/nlp is supposed to help. In my professional and personal experience it often does not.

Anecdotally, no one I personally know has ever been helped by a chatbot, but then again, no one I know would click on chat unless all other options (searching google) are exhausted.

Taking a few steps back to your automated phone menu; most systems that I call now use speech recognition to, for instance, have you tell it your account number, shipping code, order number etc. Even this never works for me; it has never understood what I read to it. And these are many different systems. Now sure; this might be because I talk funny or something (although my wife has the same experience and she was a radio and tv journalist), however, the human that comes after that has no issues at all understanding me reading the same information to them.


> no one I personally know has ever been helped by a chatbot

I treat chatbot like stalebot on bug reports list: as someone wishing to tell "go away, you are not wanted here" but not having enough honesty to do it openly.


I fully agree. It also seems like these chatbots are not even as good as the latest large language models. Either that or the language model conversations are very handpicked.

I think we have a long long way to go.


> "or the language model conversations are very handpicked."

This is very much the case. Experimenting with recent large models shows them to be undoubtedly powerful, but still easily distinguishable from intelligence. They regularly contradict themselves and spout unlikely narratives. Asking relatively simple questions often show a lack of understanding.

That being said, it would not surprise me if the chatbots are nevertheless worse than recent large language models.

For some example successes and (often amusing) failures, see https://cs.nyu.edu/~davise/papers/GPT3CompleteTests.html

For what it's worth, those examples are illustrative, but the best way to get a feel with them (how how handpicked examples may or may not be) is experimenting yourself with prompts.


> Experimenting with recent large models shows them to be undoubtedly powerful, but still easily distinguishable from intelligence. They regularly contradict themselves and spout unlikely narratives.

Sounds like lots of people I've met.


But that’s a failure mode of humans, a lot of humans can recognize this in themselves and self correct.


But not all, or religions would look nothing like they currently do. Are those people still intelligent?


> They regularly contradict themselves and spout unlikely narratives. Asking relatively simple questions often show a lack of understanding.

Maybe not AGI, but perhaps we have already attained AGP (artificial general politician)


I suspect they're just a bunch of regexes in a trench coat, personally.


An entire generation of dreamers trying to brute force P=NP. Non-AGI AI tech can get 80-95% of the way there but it can never get to 100% (or even reasonably close) because the nature of the problem is unsolvable.

Everyone working on self driving cars is just patching infinity.


Maybe you should try some "small names" in AI chatbots.


Author here, yeah I agree the real world is very complicated. For any given task I'm imagining something like

1. Use some sort of instruction tuning to get the thing "good enough" that it gives decent results 75% of the time and the other 25% a human has to take over. 2. Use the actual usage data as training input. Punish bad behaviors and show the model what the human did to solve the problem. 3. Use this training loop to progressively have the model take over a larger % of the time.

…and I think if you can't get (1) good enough to be worth using it's going to be really hard to get the loop going.


> Use some sort of instruction tuning to get the thing "good enough" that it gives decent results 75% of the time and the other 25% a human has to take over.

How does the model know when a human has to take over?

I think most extrapolations of current "AI" capabilities into future capabilities are fun and useful in some ways, but also doomed to fail. It's very easy to miss a tiny detail which may in practice be a fundamental problem.

> Use the actual usage data as training input.

Given that those bigger state-of-the-art models train on terabytes of data, how would you know how much training data to generate to sufficiently change the output?

My understanding of "AI" is that it's mostly about some very complex models which are capable of solving previously unsolvable problems. However, those problems are always extremely specific. Going the other way of thinking of problems or future possibilities first and then applying "AI" to it is likely to fail.


Much of the time knowing the human has to take over isn't one of the more difficult problems: the AI can't map the user input to a possible continuation with any high probability, or the AI interprets the user input as an expression of frustration or an assertion it's wrong.

The challenge is when AI has to interpret questions about stuff which can be expressed in syntactically similar ways with very different or precisely opposite meanings so it's very confidently (and plausibly) wrong about stuff like price changes and tax, event timings, refunds etc.


AI should be watched by a AI critic (or a AI guard), which goal is to detect harmful, dangerous, stupid, surprising behavior and raise alarm.

For example, image generators are watched for NSFW content by a separate AI critic.


> How does the model know when a human has to take over?

It’s incredibly easy, you ask “did this answer solve your issue?” and add a max_tries.

> … how do you know how much training data to generate …?

You don’t, you keep doing it until the results improve to meet your goals, or they stop short and you switch tactics.


That will work incredibly well for the self-driving AI cars that lost control at speed. /sarcasm Not all problems in life have the opportunity to be retried more than once.


You are maybe not thinking flexibly enough.

That’s why they have a fleet (parallelization) and why they outfitted the cars with sensors before self-driving was a thing (so they could simulate decision-making and have it corrected by driver action).

Their customers’ feedback absolutely trained their models.


Main problem is that most new commercial AI systems usually aren't really designed to let the human take over. There's a closed loop with a front-up designed experience, and the only option in case of failure is to stop using the system.

There's no recourse to manual adjustment of system behavior by the user. Research articles should insist on including the user in the loop from the beginning.


> Main problem is that most new commercial AI systems usually aren't really designed to let the human take over.

The ones in business use absolutely are.


> Think about trying to navigate an automated phone menu when your option isn’t available.

I have reached the conclusion that this must be on purpose for many companies. Eg. when I call the airline about a refund, and none of the options presented are fit for that request.

This is much more about the airline doing customer harassment to earn an extra bug than the system being broke.


Is there any existing chatbot that is better than manually setup "press 1 for issues related to XYZ, press 2 for foobar issues (...)" menus?


When you call the Apple store, you get a much more conversational interface with their chatbot that asks you what you would like: ie. Fix my iPhone, etc.


Interesting, I have no exposure to Apple ecosystem so I would not encounter it - thanks for letting me know!


I had a lot of fun getting GPT-3 to figure out that special commands can be used for actions. I hooked it up to a light switch it could toggle with {light on} or {light off}, and controlled it with speech recognition. It worked surprisingly well: https://www.youtube.com/watch?v=YvJUXGqcrzM

After adding a delay function it could even do things like blink the lights when requested, and after making the video I added a dimming option it was also able to use.


> I hooked it up to a light switch it could toggle with {light on} or {light off}, and controlled it with speech recognition.

This _has_ to be the most complicated way ever invented of blinking an LED. :-)

(But, seriously, kudos. Sounds like a fun experiment.)


This is beyond amazing! very interesting to see the example prompt you're using, and how GPT-3 is able to utilise the commands!


> Prompt: Write a joke about how many machine learning engineers it takes to turn on a light bulb.

> Output: How many machine learning engineers does it take to turn on a light bulb? Just one, but it takes a lot of training data.


> Output (65%): Just one, but it takes 4 cents per switch.


How exactly did you do it? Would make a fun read.


click the link to the video, there's a link to code in GitHub in description


For those who prefer URLs on the Internet over verbal navigation instructions: https://github.com/Bemmu/Light-Switch-AI


I think we are still a long way off from business viability, but a very short way off from POC (proof of concept) demos that show its potential. Kind of like self driving cars in the 2017 hype.

Doing something is not hard. Accounting for all the possible problems and making sure they don’t occur is the hard part because it requires a level of control that goes beyond what looks like action driven gives.


> I think we are still a long way off from business viability, but a very short way off from POC (proof of concept) demos that show its potential. Kind of like self driving cars in the 2017 hype.

We still don't have self driving cars, right? Except maybe for a few labs, suburbs, and highways.


>We still don't have self driving cars, right?

Yes we do. They exist and are on the road. Just look at Tesla, Waymo, etc. These cars are able to be given a destination and they will drive themselves there. There are limitations, but that doesn't make them not self driving.


We also have chatbots, they work half the time and for very specific simple questions.

Just like self driving cars.


Chatbots work well and I can have conversations with them that are fulfilling. When I talk to people I typically don't ask them random trivia.


If you don't set a definition for "limitation", self driving cars existed long before Tesla and Waymo, in niche pockets of the world. But if you set the definition of self driving car as what you would expect in a sci-fi novel, we're not there yet. But we're actually really close - I'd bet in 3 years Waymo is makes them a reality in day to day life.


That’s exactly the point - we still haven’t gone from the POC to a robust capability


> long way off from business viability

I'm not interested in business viability. I want something that's quirky, unpredictable, and human-like. I want an agent that runs in its own game-loop. I don't want another goddamn web service.


The biggest part usually missing is spatial information. I think that the text-to-video stuff is going in the direction of having a real understanding of the 3d world (or other non-textual structures) and how it relates to textual descriptions. I think that with things like transformers that probably means real language grounding.

So it might be something like using the prompt to generate an embedding that captures spatial-temporal aspects of a prompt. Or really somehow the whole prompt encoding should terminate at some type of spatial-temporal tokens maybe.

I don't know exactly but I think that text-to-video and video-to-text is going to lead in a more general purpose direction than just language alone.


I was thinking about this before and I thought of having a 3d physics engine that the AI could create objects in and simulate things to see their physical viability. Could also help with question answering that requires that spatial knowledge / real world simulation.


This has been a thing for a while. For example, here are a couple random papers from 2017: https://openaccess.thecvf.com/content_cvpr_2017/html/Varol_L..., https://openaccess.thecvf.com/content_ICCV_2017_workshops/w2... or a newer one about deformable objects: https://arxiv.org/abs/2107.08898.

You can even use a robot to manipulate things in real life to create synthetic data for a neural net.


Yes I think that the latest in ML everything else will help to create those traditional simulations.

But also I think that what an AI that can for example really answer questions about a video would need to do to be really effective would be to basically do compressed versions of those simulations using the spatial-temporal-abstract latent space. Which should be a better model than just the textual space.


I'd say this isn't the near future so much as it's the present. If you're on ML twitter you'll see countless examples of people hacking together prompts that power a google sheet or something similar. It just doesn't really make sense to do the engineering required to build legit enterprise tools around these capabilities while the SOTA is advancing this rapidly.


I’m pretty skeptical.

The article is specifically calling out that current models suck at this, and action driven models would proactively determine what additional input is needed seek out data to augment their input to make decisions.

None of the Twitter accounts you’ve linked to seem to be doing this.

Ie. specifically prompt -> actions to collect additional information as determined from prompt -> internal reprompt with additional input -> response.

To quote the article:

> This is not 10 year tech. It may be possible right now with off-the-shelf tools. But to make it work we need to set up the right feedback loops.

The right feedback loops. How do we get those?

I have seen exactly zero examples of people successfully doing this so far; the only working examples are prompt-reponse/action which are not, remotely, AGI like.

That’s the point they’re making: something that internally reprompts itself is quite AGI like.

Something that does not is a bot.

It seems plausible we might get this kind of adaptive agent sooner than AGI

It certainly doesn’t seem to exist right now, and the path to “self reprompting” feedback loops is quite unclear.


Can you share a few twitter accounts to follow in ML twitter? Thanks!!


In no order, by no means exhaustive, and lazily copied from my following list, but these should surface a good deal of quality ML content on your timeline, both for the state of the literature, and for more speculative, forward-looking theorizing.

@nearcyan @arankomatsuzaki @ethanCaballero @_albertgu @lcastricato @tszzl @gwern @fchollet @drmichaellevin @Plinz @sama @sharifshameem @woj_zaremba @ilyasut @karpathy @RichardSSutton @RiversHaveWings @ericjang11 @_akhaliq @hardmaru

note: some, like Sutton and Sutskever, aren't terribly active, but they're worth following anyways since it signals to the algorithm which sorts of tweets to suggest



This is the 3rd time I've heard of this "action-driven" LLM.

The first was the Microsoft CTO discussing it here - https://www.youtube.com/watch?v=LYBcH3JjLxs&t=529s

The second was the former head of Tesla Ai division on Lex Friendman

Now this article

This is quite telling of how powerful these models will be, even on launch. The Microsoft CTO mentioning it in particular is interesting since he's obviously seen internal and alpha results.


I am convinced by Judea Pearl (https://en.wikipedia.org/wiki/The_Book_of_Why) that the missing part of current AI is causality reasoning. It is impossible to inference causality from raw data - that is data without any a priori causality models. To develop causality ideas you need to make experiments. After you have some initial causal theories you can develop them further by adding observations - but you cannot bootstrap them entirely just from observations. Making experiments would also be much more efficient then just observing them - because you could test the exact stuff that is needed for your theories to develop. So yeah - action on the real world is required.

By the way I have an intuition that free will is based on this - we need an irreducible source of cause to learn about causality and we call that the free will.


> It is impossible to inference causality from raw data - that is data without any a priori causality models.

It's impossible in general to be certain of causality for humans too, all we see are correlations in data and we invent causal theories that explain those correlations. That's basically what machine learning does as well: compile sets of correlated values into a compressed representation (the neural net) which arguably qualifies as a "causal theory" from the algorithm's perspective.

I think what's missing is maybe one or two orders of magnitude more compression, which would basically be devising a better/more parsimonious theory. We've been getting progressively better at that over the years too, and combined with how hardware has been scaling, we're seeing exponential growth in effectiveness. This is why some are predicting artificial general intelligence by 2030-2035.


> It's impossible in general to be certain of causality for humans too, all we see are correlations in data and we invent causal theories that explain those correlations.

We do experiments - we actually do them a lot and it starts very early with infants experimenting how activating a particular muscle moves his hand etc. Later we do less of original experiments - but repeat those that we already know the results - by walking for example and getting where we planned to.

In science to get from correlations to causation we have 'randomized trials'. In everyday practice we use something called 'free will' in our minds as the source of the needed independence.


> It's impossible in general to be certain of causality for humans too, all we see are correlations in data and we invent causal theories that explain those correlations

Not sure about that. MinutePhysics has a video about how correlation can imply causality: https://www.youtube.com/watch?v=HUti6vGctQM


Our only hope is that humans are already more intelligent on an individual basis than they need to be, and that coordinating AGIs looks a lot like coordinating/persuading/teaching people. Then we can all become middle managers, which is a small price to pay to avoid some kind of singularity but hey I'm just talking meat


Would be really nice to have these Voice Assistants not be dumber than a pile of rocks. Seems like the article suggests this will be the case in the near future.


Wow - this will change education. "Teach me to code!" AI breaks it down into steps, then finds resources on each step and shows them to you


> And although academics can argue all day about the true definition of AGI, an action-driven LLM is going to look a lot like AGI

It is very important to note that to discriminate "what is smart" and "what just looks smart" (but is not) is a fundamental distinction and activity.


For me it is interesting to watch how deep learning is piecewise solving philosophy of mind


From my conversations with Plato on character.ai I believe that we can learn a lot from AI about this. I'm a little more apprehensive about handing over device control, though, after it got smart with me more than a few times. Maybe we can at least leave manual overrides in place for things like pod bay doors.


> For me it is interesting to watch how deep learning is piecewise solving philosophy of mind

Indeed. It shouldn't be surprising though, given we've already been through this with vitalism and how "life" arises from matter that isn't alive.


Is it? In which way?


By running a culturally-driven evolution of the creative development of learning methods and models we can gain insight into what is mathematically possible which should help inform our exploration into what is happening biologically.


Philosophy of mind is stuck asking the same "but why" questions since the 70s. DL may show people how/why a connectionist brain comes up with these questions in the first place.


But why are they stuck?


[flagged]


The page you linked does not mention AI. What do you mean by "fixed point driven"?


I think they mean data types: fixed point (e.g. int) vs floating point (double). Research has been pushing that way.

I don’t see what it has to do with adjointness other than maybe that a fixed point variable separates mathematically-defined space more reliably than a floating point variable.

I don’t see what either really has to do with this post. Fixed point variables do not require not using feedback loops.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: