More

mayukhdeb · on Feb 1, 2025

In this paper, we don't zero out the weights. We remove them.

vlovich123 · on Feb 2, 2025

Thanks for the correction! Can it be retrofitted into existing models through distillation or do you have to train the model from scratch?

mayukhdeb · on Jan 31, 2025

Thank you for sharing this! We'll read through this and update the camera-ready version accordingly for ICLR 2025.

mayukhdeb · on Jan 31, 2025

Thank you for your kind words!

Indeed. The problem with most AI research today is they simply do trial and error with large amounts of compute. No room for taking inspiration from nature, which is requires more thought and less FLOPS.

mayukhdeb · on Jan 31, 2025

Thanks for clarifying your reason for renaming the title.

The explanation for the original title is this plot from our publication in ICLR 2025: https://toponets.github.io/webpage_assets/FigureEfficiencyNa...

You can find more details on the website: https://toponets.github.io (see section: "Toponets deliver sparse, parameter-efficient language models")

We find out that inducing topographic structure in the weights of GPTs made them compressible (during inference) without losing out on performance.

I encourage you to revert the name if you find it justified after looking into the evidence I've shown here. Thanks.

mayukhdeb · on Jan 31, 2025

If by popular fantasy you mean replicating the functional profiles of the visual and language cortex of the brain, then yes. These ideas in neuroscience are popular, but not fantasy. I encourage you to read up on functional organization in the brain, it's very fascinating.

> it’s not scientifically useful

Having structured weights in GPTs enables us to localize and control various concepts and study stuff like polysemanticity, superposition, etc. Other scientific directions include sparse inference (already proven to work) and better model editing. Turns out, topographic structure also helps these models better predict neural data, which is yet another direction we're exploring in computational neuroscience.

photon_lines · on Jan 31, 2025

I love the paper - don't read into the negative comments. I find that a lot of online feedback (more so on Reddit and much less so on HN (usually)) tends to be opinionated and misinformed by quite a bit these days. Fantastic work and fantastic read.

mayukhdeb · on Jan 31, 2025

Thank you for your kind words!

brrrrrm · on Jan 31, 2025

I probably came in too hot on that (dealing with some personal stuff). Although I disagree with purported the impact of the paper, I don’t think this is fundamentally incorrect or bad science and I wish you the best on future research.

mayukhdeb · on Jan 31, 2025

It is indeed brain-like in a functional way. Topographic structure is what enables the brain to have low dimensionality and metabolic efficiency. We find that inducing such structure in neural nets made them have significantly lower dimensionality and also more parameter efficient (After training, we could take advantage of the structure to remove ~80% of the weights in topographic layers without sacrificing performance)

devmor · on Feb 1, 2025

>After training, we could take advantage of the structure to remove ~80% of the weights in topographic layers without sacrificing performance

This is really interesting to me. Is it that the structure clustered the neurons in such a way that they didn't need to be weighted because their function were grouped by similar black box properties?

mayukhdeb · on Feb 1, 2025

> Is it that the structure clustered the neurons in such a way that they didn't need to be weighted

Yep. Because of the structure, we did not have to compute the output of each weight column and simply copied the outputs of nearby weight columns whose outputs were computed.

devmor · on Feb 3, 2025

That is really cool and deserves the descriptor "brain-like", thank you for answering my questions!

mayukhdeb · on Feb 6, 2025

Thanks for the kind words! Happy to know that there are people out there who find this stuff just as interesting as I do.

mayukhdeb · on Jan 31, 2025

Yep. That is exactly the idea here. Our compression method is super duper naive. We literally keep every n-th weight column and discard the rest. Turns out that even after getting rid of 80% of the weight columns in this way, we were able to retain the same performance in a 125M GPT.

mayukhdeb · on Jan 31, 2025

Indeed. What's cool is that we were able to localize literal "regions" in the GPTs which encoded toxic concepts related to racism, politics, etc. A similar video can be found here: https://toponets.github.io

More work is being done on this as we speak.

fakeparmesean · on Jan 31, 2025

My understanding coming from mechanistic interpretability is that models are typically (or always) in superposition, meaning that most or all neurons are forced to encode semantically unrelated concepts because there are more concepts than neurons in a typical LM. We train SAEs (where we apply L1 reg and a sparsity penalty to “encourage” the encoder output latents to yield sparse representations of the originating raw activations), to hopefully disentangle these features, or make them more monosemantic.This allows us to use the SAE as a sort of microscope to see what’s going on in the LM, and apply techniques like activation patching to localize features of interest, which sounds similar to what you’ve described. I’m curious what this work means for mech interp. Is this a novel alternative to mitigating polysemanticity? Or perhaps neurons are still encoding multiple features, but the features tend to have greater semantical overlap? Fascinating stuff!

mayukhdeb · on Jan 31, 2025

> the features tend to have greater semantical overlap?

This is true. The features closer together now have much stronger semantic overlap. You can watch how the weights self-organize in a GPT here: https://toponets.github.io/webpage_assets/banner_video.mp4

We're already studying the effects of topographic structure on polysemanticity.

cwillu · on Jan 31, 2025

Was it toxicity though as understood by the model, or just a cluster of concepts that you've chosen to label as toxic?

I.e., is this something that could (and therefore, will) be turned towards identifying toxic concepts as understood by the chinese or us government, or to identify (say) pro-union concepts so they can be down-weighted in a released model, etc?

mayukhdeb · on Jan 31, 2025

We localized "toxic" neurons by contrasting the activations of each neuron for toxic v/s normal texts. It's a method inspired by old-school neuroscience.

immibis · on Jan 31, 2025

Defining all politics as toxic is concerning, if it's not just a proof of concept. That's something dictatorships do so that people won't speak up.

mayukhdeb · on Jan 31, 2025

The motivation was to induce structure in the weights of neural nets and see if the functional organization that emerges aligns with that of the brain or not. Turns out, it does -- both for vision and language.

The gains in parameter efficiency was a surprise even to us when we first tried it out.

mayukhdeb · on Jan 31, 2025

> The only potential benefit

Other benefits:

1. Significantly lower dimensionality of internal representations 2. More interpretable (see: https://toponets.github.io)

> 7B model down to 6B

We remove ~80% of the parameters in topographic layers and retain the same performance in the model. The drop in parameter count is not significant because we did not experiment with applying TopoLoss in all of the layers of the model (did not align with the goal of the paper)

We are currently performing those strong sparsity experiments internally, and the results look very promising!