- autonomously design, code and distribute whole apps (but not the most complex ones)
This is a bold claim. Today LLMs have not been demonstrated to be capable of synthesizing novel code. There was a post just a few days ago on the performance gap between problems that had polluted the training data and novel problems that had not.
So if we project forward from the current state of the art: it would be more accurate to say autonomously (re-)design, (re-)code and distribute whole apps. There are two important variables here:
* The size of the context needed to enable that task.
* The ability to synthesize solutions to unseen problems.
While it is possible that "most complex" is carrying a lot of load in that quote, it is worth being clear about it means.
> Today LLMs have not been demonstrated to be capable of synthesizing novel code.
They are capable of doing that (to some extend). Personally, I've generated plenty of (working) code to solve novel problems and I'm 100% sure that code wasn't part of the training set.
I’ll second that. A simple example is asking it to write pyplot or tikz code to draw maps and pictures. I got it to draw a correct floor plan for the White House entirely with python code. It amazes me that it understands spatial layouts from training only on text such that it can draw physically accurate diagrams, and it understands graphics libraries well enough to draw with them. Apparently predicting text about spatial locations requires an internal spatial map. Thinking about the chain of understanding of different concepts that have to be integrated together to accomplish this shows it’s not a simple task.
> It amazes me that it understands spatial layouts from training only on text such that it can draw physically accurate diagrams, and it understands graphics libraries well enough to draw with them.
Is there evidence of this? The Whitehouse floor plan is very well known, and available online in many different formats and representations. Transforming one of those into a sequence of calls would be easier.
Have you tried this with a textual description of a building that does not have any floorplans available, i.e. something unique?
This is a bold claim. Today LLMs have not been demonstrated to be capable of synthesizing novel code. There was a post just a few days ago on the performance gap between problems that had polluted the training data and novel problems that had not.
So if we project forward from the current state of the art: it would be more accurate to say autonomously (re-)design, (re-)code and distribute whole apps. There are two important variables here:
* The size of the context needed to enable that task.
* The ability to synthesize solutions to unseen problems.
While it is possible that "most complex" is carrying a lot of load in that quote, it is worth being clear about it means.