Hacker Newsnew | past | comments | ask | show | jobs | submit | joefourier's commentslogin

The author is correct in that agents are becoming more and more capable and that you don't need the IDE to the same extent, but I don't see that as good. I find that IDE-based agentic programming actually encourages you to read and understand your codebase as opposed to CLI-based workflows. It's so much easier to flip through files, review the changes it made, or highlight a specific function and give it to the agent, as opposed to through the CLI where you usually just give it an entire file by typing the name, and often you just pray that it manages to find the context by itself. My prompts in Cursor are generally a lot more specific and I get more surgical results than with Claude Code in the terminal purely because of the convenience of the UX.

But secondly, there's an entire field of LLM-assisted coding that's being almost entirely neglected and that's code autocomplete models. Fundamentally they're the same technology as agents and should be doing the same thing: indexing your code in the background, filtering the context, etc, but there's much less attention and it does feel like the models are stagnating.

I find that very unfortunate. Compare the two workflows:

With a normal coding agent, you write your prompt, then you have to at least a full minute for the result (generally more, depending on the task), breaking your flow and forcing you to task-switch. Then it gives you a giant mass of code and of course 99% of the time you just approve and test it because it's a slog to read through what it did. If it doesn't work as intended, you get angry at the model, retry your prompt, spending a larger amount of tokens the longer your chat history.

But with LLM-powered auto-complete, when you want, say, a function to do X, you write your comment describing it first, just like you should if you were writing it yourself. You instantly see a small section of code and if it's not what you want, you can alter your comment. Even if it's not 100% correct, multi-line autocomplete is great because you approve it line by line and can stop when it gets to the incorrect parts, and you're not forced to task switch and you don't lose your concentration, that great sense of "flow".

Fundamentally it's not that different from agentic coding - except instead of prompting in a chatbox, you write comments in the files directly. But I much prefer the quick feedback loop, the ability to ignore outputs you don't want, and the fact that I don't feel like I'm losing track of what my code is doing.


The other thing about non-agent workflows is they’re much, much less compute intensive. This is going to matter.

I agree with you wholeheartedly. It seems like a lot of the work on making AI autocomplete better (better indexing, context management, codebase awareness, etc) has stagnated in favor of full-on agentic development, which simply isn't suited for many kinds of tasks.

The reason the nearest neighbour interpolation can sound better is that the aliasing fills the higher frequencies of the audio with a mirror image of the lower frequencies. While humans are less sensitive to higher frequencies, you still expect them to be there, so some people prefer the "fake" detail from aliasing to them just been outright missing in a more accurate sample interpolation.

It's basically doing an accidental and low-quality form of spectral band replication: https://en.wikipedia.org/wiki/Spectral_band_replication which is used in modern codecs.


It's actually the other way round: Aliasing fills the lower frequencies with a mirror image of the higher frequencies. So where do the higher frequencies come from? From the upsampling that happens before the aliasing. _That_ makes the higher frequencies contain (non-mirrored!) copies of the lower frequencies. :-)

Just so that my wrongness isn't there for posterity: This is wrong for a real-valued signal (which is what we're discussing here). I had forgotten about the negative frequencies. So there _is_ a mirror coming from the upsampling. Sorry. :-)

Oh yes you're correct, imaging would be the correct term for what's happening I think (aliasing is high -> low and imaging is low -> high)?

I think I've heard the word “images” being used for these copies, yes.

Interpolation is a bit of a confusing topic, because the most efficient implementation is not the one that lends itself the easiest to frequency analysis. But pretty much any rate change (be it up or down) using interpolation can be expressed equivalently using the following set of operations and appropriately chosen M and N:

  1. Increase the rate by inserting M zeros between each sample. The has the effect of creating the “images” as discussed.
  2. Apply a filter to the resulting signal. For instance, for nearest neighbor this is [1 1 1 … 0 0 0 0 0 …], with (M+1) ones and then just zeroes; effectively, every output sample is the sum of the previous M+1 input samples. This removes some of the original signal and then much more of the images.
  3. Decrease the rate by taking every Nth sample and discarding the rest. This creates aliasing (higher frequencies wrap down to lower, possibly multiple times) as discussed.
The big difference between interpolation methods is the filter in #2. E.g., linear interpolation is effectively the same as a triangular filter, and will filter somewhat more of the images but also more of the original signal (IIRC). More fancy interpolation methods have more complicated shapes (windowed sinc, etc.).

This also shows why it's useful to have some headroom in your signal to begin with, e.g. a CD-quality signal could represent up to 22.05 kHz but only has (by spec) actual signal up to 20 kHz, so that it's easier to design a filter that keeps the signal but removes the images.


And also, to add to the actual GBA discussion: If you think the resulting sound is too muffled, as many here do, you can simply substitute a filter with a higher cutoff (or less steep slope). E.g., you could use a fixed 12 kHz lowpass filter (or something like cutoff=min(rate/2, 12000)), instead of always setting the cutoff exactly at the estimated input sample rate. (In a practical implementation, the coefficients would still depend on the input rate.)

Odd that the author didn’t try giving a latent embedding to the standard neural network (or modulated the activations with a FiLM layer) and had static embeddings as the baseline. There’s no real advantage to using a hypernetwork and they tend to be more unstable and difficult to train, and scale poorly unless you train a low rank adaptation.

Hello. I am the author of the post. The goal of this was to provide a pedagogical example of applying Bayesian hierarchical modeling principles to real world datasets. These datasets often contain inherent structure that is important to explicitly model (eg clinical trials across multiple hospitals). Oftentimes a single model cannot capture this over-dispersion but there is not enough data to split out the results (nor should you).

The idea behind hypernetworks is that they enable Gelman-style partial pooling to explicitly modeling the data generation process while leveraging the flexibility of neural network tooling. I’m curious to read more about your recommendations: their connection to the described problems is not immediately obvious to me but I would be curious to dig a bit deeper.

I agree that hypernetworks have some challenges associated with them due to the fragility of maximum likelihood estimates. In the follow-up post, I dug into how explicit Bayesian sampling addresses these issues.


Thank you for reading my post, and for your thoughtful critique. And I sincerely apologize for my slow response! You are right that there are other ways to inject latent structure, and FiLM is a great example.

I admit the "static embedding" baseline is a bit of a strawman, but I used it to illustrate the specific failure mode of models that can't adapt at inference time.

I then used the Hypernetwork specifically to demonstrate a "dataset-adaptive" architecture as a stepping stone toward the next post in the series. My goal was to show how even a flexible parameter-generating model eventually hits a wall with out-of-sample stability; this sets the stage for the Bayesian Hierarchical approach I cover later on.

I wasn't familiar with the FiLM literature before your comment, but looking at it now, the connection is spot on. Functionally, it seems similar to what I did here: conditioning the network on an external variable. In my case, I wanted to explicitly model the mapping E->θ to see if the network could learn the underlying physics (Planck's law) purely from data.

As for stability, you are right that Hypernetworks can be tricky in high dimensions, but for this low-dimensional scalar problem (4D embedding), I found it converged reliably.


I think a latent embedding is almost equivalent to the article's hypernetwork, which I assume as y = (Wh + c)v + b, where h is a dataset-specific trainable vector. (The article uses multiple layers ...)

It absolutely is noticeable the moment you have to run several of these electron “apps” at once.

I have a MacBook with 16GB of RAM and I routinely run out of memory from just having Slack, Discord, Cursor, Figma, Spotify and a couple of Firefox tabs open. I went back to listening to mp3s with a native app to have enough memory to run Docker containers for my dev server.

Come on, I could listen to music, program, chat on IRC or Skype, do graphic design, etc. with 512MB of DDR2 back in 2006, and now you couldn’t run a single one of those Electron apps with that amount of memory. How can a billion dollar corporation doing music streaming not have the resources to make a native app, but the Songbird team could do it for free back in 2006?

I’ve shipped cross platform native UIs by myself. It’s not that hard, and with skyrocketing RAM prices, users might be coming back to 8GB laptops. There’s no justification for a big corporation not to have a native app other than developer negligence.


On that note, I could also comfortably fit a couple of chat windows (skype) on a 17'' CRT (1024x768) back in those days. It's not just the "browser-based resource hog" bit that sucks - non-touch UIs have generally become way less space-efficient.


Not go to all “ackchually” but modern GPUs can render in many other ways than rasterising triangles, and they can absolutely draw a cylinder without any tessellation involved. You can use the analytical ray tracing formula, or signed distance fields for a practical way to easily build complex scenes purely with maths: https://iquilezles.org/articles/distfunctions/

Now of course triangles are usually the most practical way to render objects but it just bugs me when someone says something like “Every smooth surface you've ever seen on a screen was actually tiny flat triangles” when it’s patently false, ray tracing a sphere is pretty much the Hello World of computer graphics and no triangles are involved.

Edit: for CADs, direct ray tracing of NURBS surfaces on the GPU exists and lets you render smooth objects with no triangles involved whatsoever, although I’m not sure if any mainstream software uses that method.


MachineWorks and ModuleWorks have added GPU raycasting implementations in their last versions. It’s very recent.


Worry not, I came here full speed after the first paragraph to say the same thing.


Gaussian splatting is the current rage and also side steps tessellation.


thanks for the idea! I'm writing this while learning about CAD kernels, I'll add raytracing to vcad next!


Have you used an LLM specifically trained for tool calling, in Claude Code, Cursor or Aider?

They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.

I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).


I'm using Claude code to help me learn Godot game programming.

One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.

For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)

LLMs still suffer from Garbage in Garbage out.


don't use LLMs for Godot game programming.

edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.


before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?

after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"

you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"


> before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?

Just for the fun of it, and so you lose your "virginity" so to speak, next time when the magic machine gives you the answer about "what it thinks", tell it its wrong in a strict language and scold it for misleading you. Tell it to give you the "real" best practices instead of what it spat out. Then sit back and marvel at the machine saying you were right and that it had mislead you. Producing a completely, somewhat, or slightly different answer (you never know what you get on the slot machine).


Both the before and after are better done manually. What you are describing is fine for the heck of it (I‘ve vibe coded a whisper related rust port today without having any actual rust skills), but I’d never use fully vibed software in production. That’s irresponsible in multiple ways.


Do you also light candles and chant?


Ask a model to

"Write a chess engine where pawns move backward and kings can jump like nights"

It will keep slipping back into real chess rules. It learned chess, it did not understand the concept of "rules"

Or

Ask it to reverse a made up word like

"Reverse the string 'glorbix'"

It will get it wrong on the first try. You would not fail.

Or even better ask it to...

"Use the dxastgraphx library to build a DAG scheduler."

dxastgraphx is a non existing library...

Marvel at the results...tried in both Claude and ChatGPT....


I‘ve just tried the dxastgraphx one in pi with Opus 4.5. This was its response:

    I couldn't find a library called dxastgraphx in either pip (Python) or npm (JavaScript) package registries. This library doesn't appear to exist.

    Did you perhaps mean one of these popular DAG/graph libraries?

    Python:
    - networkx - comprehensive graph library with DAG support
    - graphlib - Python standard library (3.9+) with TopologicalSorter
    - dask - parallel computing with DAG task scheduling

    JavaScript/TypeScript:
    - graphlib - graph data structures
    - dagre - DAG layout

    Would you like me to build a DAG scheduler using one of these existing libraries, or would you like me to implement one from scratch? Let me know which language and approach you prefer.


Tried ChatGPT (free version). I answered correctly to the first question.

Answer to second question:

"I can do that, but there’s a catch: dxastgraphx is not a known or standard Python (or JS) library as of now. I don’t have any public API or documentation for it to target directly.

To avoid guessing wrong, here are the two sensible options:"


Why would I ask the model to reverse the string 'glorbix,' especially in the context of software engineering?


just tried to reverse the string you provided using Gemini. it worked fine on the first try


In gemini:

> The reverse of the string 'gssssyeekxxuoaqaaaiooo9aaa7zjx' is:

xjz7aaa9oooiiaaaqauxxkyeessssg

> Youve made a mistake

``` I see where I tripped up! Let's re-examine that long string carefully.

The original string you gave me was: `gssssyeekxxuoaqaaaiooo9aaa7zjx`

Let's look at the end of that string: `...aaa7zjx`. When we reverse it, it should start with: `xjz7aaa...`

Here is the precise character-by-character reversal:

*xjz7aaa9oooiiaaaqauxxkyeessssg*

Wait—I see it now. In my previous response, I included an extra *'i'* and missed one *'o'* in the middle section. Let's do a strict count to get it perfect:

* *Original:* `g` `ssss` `y` `ee` `k` `xx` `u` `o` `a` `q` `aaa` *`iooo`* `9` `aaa` `7` `z` `j` `x` * *Correct Reverse:* `x` `j` `z` `7` `aaa` `9` *`oooi`* `aaa` `q` `a` `o` `u` `xx` `k` `ee` `y` `ssss` `g`

The correct reversed string is: *xjz7aaa9oooiiaaaqauoxxkyeessssg*

Thank you for catching that. It’s easy for the eyes to glaze over when there are so many repeating vowels!

Since I've stumbled a bit on these, would you like to give me one more string to see if I can get it right on the first try? ```

After more back and fors it consistently fails in this task, even though when strictly dividing the tokens it will get this right. Yet the final answer is always wrong.


Mine said it used python and got: xjz7aaa9oooiaaaqaouxxkeeyssssg


You’re trying to interrogate a machine as you would a human and presenting this as evidence that machines aren’t humans. Yes, you’re absolutely right! And also completely missing the point.


The discussion is not about being human. Is about being fit for purpose...


There’s a few more considerations:

- You can use the GPU for training and run your own fine tuned models

- You can have much higher generation speeds

- You can sell the GPU on the used market in ~2 years time for a significant portion of its value

- You can run other types of models like image, audio or video generation that are not available via an API, or cost significantly more

- Psychologically, you don’t feel like you have to constrain your token spending and you can, for instance, just leave an agent to run for hours or overnight without feeling bad that you just “wasted” $20

- You won’t be running the GPU at max power constantly


If you think medieval artists lacked skill, check out Villard Honnecourt’s sketchbook, especially the insects on folio 7 and Christ in Majesty on 16:

https://www.medievalists.net/2024/12/sketchbook-villard-honn...

Medieval art is very stylised, but the quality of the lines, the details in the clothes, the crispness of the composition, all that requires a lot of skill. Check out Jean Bondol’s work for instance https://artsandculture.google.com/asset/tapisserie-de-l-apoc...

You may not like the style, but being able to produce works like that requires you to be good at art on some level.


Ok, but the Honnecourt sketches are kind of strong. Not professional by today's standards, but decent. I'd be happy to have done them--but I'm not an artist. The tapestry can be appreciated, like Klimt's 2-D-ish stuff can be appreciated. The style is fine. It's not fantastic work, I wouldn't hang it up, but it's reasonably accomplished.

In general, though, yes, I think medieval European artists were short on skill compared to artists from Europe in pre-medieval and post-medieval times, and art from other places between ~500 and ~1300. They had some skill, but not as much.

Artists with limited technique are a real thing. Not everything is taste or style.


You convinced me that they lack skill.

The clothing does often look good. In folio 16v ( https://www.medievalists.net/wp-content/uploads/2024/12/Vill... ), it's been overdone and appears to be far wrinklier than fabric could support, suggesting that Jesus is embedded in some kind of strange plant.

The faces are terrible in all cases.

In general, perspective is off, anatomy is off, and you get shown things that aren't physically possible.


Are you aware there are artistic styles beyond photorealism?


Yes, but in that case, as this article argues, you'd expect something that looked good. https://static.wikia.nocookie.net/teentitans/images/e/e3/TTO...

The Honnecourt illustrations strongly suggest that (a) photorealism is the goal, but (b) Honnecourt doesn't know how to draw it. He does things like place a person's right eye at a different angle to the rest of the face than the left eye has. But hey, how likely is it that viewers will notice a malformed human face?


Teen titans as reference xD stop it you're killing me


M? The OP literally did train an LLM from scratch in a 3090 (except for the tokenizer), that’s what the whole post is about.


Good point, I worded that incorrectly and should have been more specific. OP trained an LLM from scratch, but it's GPT-2 and with even worse performance than the GPT-2 which OpenAI shipped a few years ago.

I can't edit it now, but OP did not train a useful LLM from scratch. In editing for clarity and tone I think I omitted that away. Somebody searching for a reproducible way to produce a usable model on their own 3090 won't find it in this post. But someone looking to learn how to produce a usable model on their own 3090 will be educated on their post.

"Not a useful LLM" is not a knock on the OP! This is an _excellent_ educational and experiential post. It includes the experimentation with different models that you'll never see in a publication. ANd it showcases the exact limitations you'll have with one 3090. (You're limited in training speed and model size, and you're also limited in how many ideas you can have cooking at once).

The "experiment at home, train a model, and reproduce or fine-tune on someone elses better GPU" is tried and true.

(Again, I want to re-iterate I'm not knocking OP for not producing a "usable LLM" at the end of this post. That's not the point of the post, and it's a good post. My only point is that it's not currently feasible to train your a useful general-purpose LLM on one 3090.)


Deepseek via their API also has cached context, although the tokens/s was much lower than Claude when I tried it. But for background agents the price difference makes it absolutely worth it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: