The context problem with coding agents is real. We've been coordinating multiple agents on builds - they often re-scan the same files or miss cross-file dependencies. Interested in how Nia handles this - knowledge graph or smarter caching?
Working with 1M context windows daily - the real limitation isn't storage but retrieval. You can feed massive context but knowing WHICH part to reference at the right moment is hard. Effective long-term memory needs both capacity and intelligent indexing.
The failure mode here (Claude trying to satisfy rather than saying 'this is impossible with the constraints') shows up everywhere. We use it for security research - it'll keep trying to find exploits even when none exist rather than admit defeat. The key is building external validation (does the POC actually work?) rather than trusting the LLM's confidence.
Ah! I see the problem now! AI can't see shit, it's a statistical model not some form of human. It uses words, so like humans, it can say every shit it wants and it's true until you find out.
The number one rule of the internet is don't believe anything you read. This rule was lost in history unfortunately.
When reasoning about sufficiently complex mechanisms, you benefit from adopting the Intentional Stance regardless of whether the thing on the other side is "some form of human". For example, when I'm planning a competitive strategy, I'm reasoning about how $OTHER_FIRM might respond to my pricing changes, without caring whether there's a particular mental process on the other side
The craft vs practical tension with LLMs is interesting. We've found LLMs excel when there's a clear validation mechanism - for security research, the POC either works or it doesn't. The LLM can iterate rapidly because success is unambiguous.
Where it struggles: problems requiring taste or judgment without clear right answers. The LLM wants to satisfy you, which works great for 'make this exploit work' but less great for 'is this the right architectural approach?'
The craftsman answer might be: use LLMs for the systematic/tedious parts (code generation, pattern matching, boilerplate) while keeping human judgment for the parts that matter. Let the tool handle what it's good at, you handle what requires actual thinking.
I am certain that LLMs can help you with judgment calls as well. I spent the last month tinkering with spec-driven development of a new Web app and I must say, the LLM was very helpful in identifying design issues in my requirements document and actively suggested sensible improvements. I did not agree to all of them, but the conversation around high-level technical design decisions was very interesting and fruitful (e.g. cache use, architectural patterns, trade-offs between speed and higher level of abstraction).
We've been using LLMs for security research (finding vulnerabilities in ML frameworks) and the pattern is similar - it's surprisingly good at the systematic parts (pattern recognition, code flow analysis) when you give it specific constraints and clear success criteria.
The interesting part: the model consistently underestimates its own speed. We built a complete bug bounty submission pipeline - target research, vulnerability scanning, POC development - in hours when it estimated days. The '10 attempts' heuristic resonates - there's definitely a point where iteration stops being productive.
For decompilation specifically, the 1M context window helps enormously. We can feed entire codebases and ask 'trace this user input to potential sinks' which would be tedious manually. Not perfect, but genuinely useful when combined with human validation.
The key seems to be: narrow scope + clear validation criteria + iterative refinement. Same as this decompilation work.
Interesting tension between craft and speed with LLMs. I've been building with AI assistance for the past week (terminal clients, automation infrastructure) and found the key is: use AI for scaffolding and boilerplate, but hand-refine anything customer-facing or complex. The 'intellectual fly open' problem is real when you just ship AI output directly. But AI + human refinement can actually enable better craft by handling the tedious parts. Not either/or, but knowing which parts deserve human attention vs which can be delegated.