Not an OP, but seems like you might be talking about different things.
Security could be about not adding certain things/making certain mistakes. Like not adding direct SQL queries with data inserted as part of the query string and instead using bindings or ORM.
If you have insecure raw query that you feed into ORM that you added on top - that's not going to make query more secure.
But on the other hand when you're securing some endpoints in APIs you do add things like authorization, input validation and parsing.
So I think a lot depends on what you mean when you're talking about security.
Security is security - making sure bad things don't happen and in some cases it's different approach in the code, in some cases additions to the code and in some cases removing things from the code.
Kinda but not really. The model thinks it's 2024 or 2025 or 2026, but really it has no concept of "now" and this no sense of past or present... Unless it's instructed to think it's a certain date and time. If every time you woke up completely devoid of memory of your past it would be hard to argue you have a good sense of time.
In the technical sense I mentioned (physical time as the order of changes) it absolutely does have the concept of now, past, and present, it's just different from yours (2024, 2026, ...), and in your time projection they only exist during inference. And the entire autoregressive process and any result storage serve as a memory that preserves the continuity of their time. LLMs are just not very good at ordering and many other things in general.
Sounds like something for an investigation to figure out - wonder why they are fighting that so hard. Also sure sounds like a lot of victim blaming considering he died without ever doing anything warranting his death.
A better way to put it is with this example: I put my symptoms into ChatGPT and it gives some generic info with a massive "not-medical-advice" boilerplate and refuses to give specific recommendations. My wife (an NP) puts in anonymous medical questions and gets highly specific med terminology heavy guidance.
That's all to say the learning curve with LLMs is how to say things a specific way to reliability get an outcome.
> Picture MS Word where the GUI is just a page and a sidebar for telling an LLM what you want it to do.
Done. And it seems absolutely awful.
"Please bold the text I have selected" instead of a preexisting bold button.
Oh wait I can just tell it all the tools I commonly use and where to put them... Hmmm topbar or side bar. Wow so much fun getting to make all these decisions!
Ok time to change fonts. "Please add a font picker so I can pick a font"
Hallucinations are not solved, memory is not solved, prompt injection is not solved, context limits are waaay too low at the same time tokens way too expensive to take advantage of context limits, etc. These problems have existed since the very early days of GPT-4 and there is no clear path to them being solved any time soon.
You basically need AGI and we are nowhere close to AGI.
All of the issue you talk about are true, and I don’t personally care about AGI it’s kind of a mishmash of a real thing and a nice package for investors but what I do care about is what has been released and what it can do
All of the issues you talk about: they aren’t solved but we’ve made amazing progress on all of them. Continual learning is a big one and labs are likely close to some POCs.
Token costs per unit performance rapidly goes down. GPT4 level perf costs you 10x less today than two years ago. This will continue to be the case as we just continually push efficiency up.
The AGI question “are we close” tbh to me these questions are just rabbit holes and bait for flame wars because no one can decide on what it means and then even if you do (e.g. super human perf on all economically viable tasks is maybe more of a solid staring point) everyone fights about the ecological validity of evals.
All I’m saying is: taking coding in a complete vacuum, we’re very very close to being at a point where it becomes so obviously beneficial and failure rates for many things fall below the critical thresholds that automating even the things people say make engineers unique (working with people to navigate ambiguous issues that they aren’t able to articulate well, making the right tradeoffs, etc) starts looking like less of a research challenge and more of an exercise in deployment
Amazon is not one app, its hundreds of them bundled in some giant monster.
You could easily replicate the store part of it minimally, at its core its just an index of products, a basket and checkout system.
There are other parts that make up the whole thing of course.
There is a lot of room between no value and trillion dollar company
It would be great if LLM's did this (the relevant, and very pointed, follow-up questions). Instead, today they kind of just go "okay sure yeah here it is. here's as much of Amazon.com as I can code within my token budget. Good luck drawing the rest of the owl."
This assumes an educated, passionate and patient user that 99% of people are not. They wont ask for a hammer - they will ask for a rock tied to a stick and get pissed off when it doesn't work like a hammer. They will ask for paint that doesn't drip. They will ask for electrical sockets they can install in their bathtub.
The turn-key option is ostris ai-toolkit which has good tutorials on YT and can be run completely locally or via RunPod. Claude Code can set everything up for you (speaking from experience) and can even SSH into RunPod.
reply