I've been playing with AI agents for months, and most of them are pretty bad. They often get stuck in loops, which is frustrating. This happens in MultiOn, AutoGPT, and others.
I've used Devin a few times (see: https://x.com/varunshenoy\_/status/1767591341289250961?s=20), and while it's far from perfect, it's by far the best I've seen. It doesn't get stuck in loops, and it keeps trying new things until it succeeds. Devin feels like a fairly competent high school intern.
Interestingly, Devin seems better suited as an entry-level analyst than a software engineer. We've been using it internally to scrape and structure real estate listings. Their stack for web RPA and browser automation works _really_ well. And it makes sense why this is important: if you want to have a successful agent, you need to provide it with good tools. Again, it's not flawless, but it gives me hope for the future of AI agents.
Slightly different set of trade-offs, but similar mental model. You always use large batch sizes (compute bound) and the bottleneck usually ends up communication between GPUs/nodes.
Good question. Yes, the 10GB available for batching is in the HBM. In a single forward pass, you move the entire model from HBM -> SRAM exactly once. In a batched forward pass, this is still the case, so you end up doing more compute for the same amount of memory movement.
You can calculate the SRAM as follows: an A100 has 108 SMs, and each SM has 192 KB in SRAM (shared memory, aka its L1 cache) [1]. Multiplied out, this is ~20 MB of total SRAM. This happens to match up with the diagram in the Flash Attention paper [2].
Awesome job guys, and thank you for creating it. Curious if you guys have any insights on long-term memory and if there are better ways to do retreivel apart from top-k.
Seems weird that every RAG app uses top-k especially since you might pull in information irrelevant to the context (e.g. if you were asking for the names of the authors of paper, you probably only want the top-1 embedding).
Definitely, top-k is a very naive way to do RAG. I think people have experimented with using a cross encoder like approach or even letting the LLM choose the sources. We will experiment with more approaches like this :)
I've used Devin a few times (see: https://x.com/varunshenoy\_/status/1767591341289250961?s=20), and while it's far from perfect, it's by far the best I've seen. It doesn't get stuck in loops, and it keeps trying new things until it succeeds. Devin feels like a fairly competent high school intern.
Interestingly, Devin seems better suited as an entry-level analyst than a software engineer. We've been using it internally to scrape and structure real estate listings. Their stack for web RPA and browser automation works _really_ well. And it makes sense why this is important: if you want to have a successful agent, you need to provide it with good tools. Again, it's not flawless, but it gives me hope for the future of AI agents.