"It seems obvious from the demos that GPT-3 is capable of reasoning.
But not consistently.
It would be critical, imo, to see if we can identify a pattern of activity in it associated with the lucid responses vs activity when it prodcues nonsense.
If/when we have such a apattern we would need to find a way to enforce it to happen in every interaction"
And people agree:
"Dunno why you are getting downvoted, I agree with you. It seems like to get GPT-3 to do good reasoning you have to convince it that it is writing about a dialogue between two smart people. Talking to Einstein, giving some good examples, etc. all seem to help. Shaping really seems to matter, but I don’t think we have enough access to the hidden state to determine if there are quantitative differences between when it is more lucid and when it isn’t.
It’s like Gwern said: “sampling cannot prove the absence of knowledge, only the presence of it” (because whenever it fails, maybe with a different context, different sampling parameters, using spaces between letters, etc. it would have worked)"
Its interesting that this kind of speculation is entering the conversation. I think we are on the cusp
I'm not convinced that "capable of reasoning, but not consistently" is a meaningful claim. The examples seem to primarily consist of people spending hours trying things, until eventually GPT-3 outputs a chunk of reasoning they could personally do in seconds. Does that mean that GPT-3 is doing the reasoning, or does it mean that GPT-3 is an English-based lookup table and they managed to find a clever sequence of search keys?
The fact that there could be reasoning going on is certainly exciting by itself. But I don't think it's fair to call it obvious without a compact specification for how to make GPT-3 perform a general class of reasoning. Less "here's a script to make it output stuff about balanced parens", more "here's a strategy to teach it most basic string manipulations".
> I'm not convinced that "capable of reasoning, but not consistently" is a meaningful claim.
Suppose an entity will consistently do reasoning well, but only when the humidity and temperature are each in a quite narrow range. It seems like it makes sense to say that such an entity is capable of reasoning. Now, suppose we don't know that the conditions for it to do reasoning well are that the humidity and temperature are in that range, we just know that sometimes it looks like it does, sometimes it looks like it doesn't (and maybe we aren't yet sure if it seeming to reason is just an illusion in the way you describe).
I think in such a situation, it would be accurate to say that it can reason, but we haven't yet found a way to make it do so consistently.
So, I think the statement that it is "capable of reasoning, but not consistently" is a meaningful statement.
However, whether it is an accurate statement is a very different question, and one which I am not claiming an answer to.
As I mentioned, I don't think any single prompt can demonstrate the presence of true reasoning. If the prompt isn't shown to broadly generalize, it might just be doing a text match to something that was said before on the depths of the internet. You can see this in the next section; Kevin Lacker gets GPT-3 to demonstrate it knows some basic trivia questions, but it "knows" any prompt with the same textual structure as a basic trivia question, even if the prompt is nonsense. This strongly suggests that it's parsing out key words and doing a lookup on them rather than accessing a consistent internal model.
It's not a matter of belief at this time. The evidence is right in front of our eyes. But it takes intelligence to recognize intelligence. Contextual extension by abstract inference is the basic (and hardest to achieve) building block of AGI, and we have achieved it. The rest is about utilizing this same power for the querying (priming) part of the reasoning loop, within the contexts we are interested in.
Part of me is glad we're not exactly there yet, because the thought of this running autonomously in a thinking cycle is downright scary. What will you find when you sit behind the console in the morning? It took us months to start understanding this in its current one-shot mode.
I don't care about the ideological downvotes, but we will do better if we start taking this very seriously. It's no longer theoretical that this (and machine learning as such) will have unprecedented (and impossible to predict) impact on everything we know, and the timeline is now measured in months instead of years or decades.
>> Contextual extension by abstract inference is the basic (and hardest to achieve) building block of AGI, and we have achieved it.
What is "contextual extension by abstract inference" and why do you say it's "the basic building block of AGI"? Can you point to an authoritative source for the two parts of the statement (i.e. a source that defines "contextual extension by abstract inference" and a source that asserts this is "the basic building block of AGI")?
"It seems obvious from the demos that GPT-3 is capable of reasoning.
But not consistently.
It would be critical, imo, to see if we can identify a pattern of activity in it associated with the lucid responses vs activity when it prodcues nonsense.
If/when we have such a apattern we would need to find a way to enforce it to happen in every interaction"
And people agree:
"Dunno why you are getting downvoted, I agree with you. It seems like to get GPT-3 to do good reasoning you have to convince it that it is writing about a dialogue between two smart people. Talking to Einstein, giving some good examples, etc. all seem to help. Shaping really seems to matter, but I don’t think we have enough access to the hidden state to determine if there are quantitative differences between when it is more lucid and when it isn’t.
It’s like Gwern said: “sampling cannot prove the absence of knowledge, only the presence of it” (because whenever it fails, maybe with a different context, different sampling parameters, using spaces between letters, etc. it would have worked)"
Its interesting that this kind of speculation is entering the conversation. I think we are on the cusp