Hacker Newsnew | past | comments | ask | show | jobs | submit | benob's commentslogin

"That allows us to license the open source project under the more permissive MIT license."

I would say:

- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors

- memorization: some patterns are recurrent in many domains such as power low

- multitask: exploit cross-domain connections such as weather vs electricity


Ollama is a user-friendly UI for LLM inference. It is powered by llama.cpp (or a fork of it) which is more power-user oriented and requires command-line wrangling. GGML is the math library behind llama.cpp and GGUF is the associated file format used for storing LLM weights.

i've found llama.cpp (as i understand it, ollama now uses their own version of this) to work much better in practice, faster and much more flexible.

This is the worst lay-people explanation of an AI component I have seen in a long time. It doesn't even seem AI generated.

I think it is though-

“ TurboQuant, QJL, and PolarQuant are more than just practical engineering solutions; they’re fundamental algorithmic contributions backed by strong theoretical proofs. These methods don't just work well in real-world applications; they are provably efficient and operate near theoretical lower bounds.”


I also instinctively reacted to that fragment, but at this point I think this is overreacting to a single expression. It's not just a normal thing to say in English, it's something people have been saying for a long time before LLMs existed.

There are tells all over the page:

> Redefining AI efficiency with extreme compression

"Redefine" is a favorite word of AI. Honestly no need to read further.

> the key-value cache, a high-speed "digital cheat sheet" that stores frequently used information under simple labels

No competent engineer would describe a cache as a "cheat sheet". Cheat sheets are static, but caches dynamically update during execution. Students don't rewrite their cheat sheets during the test, do they? LLMs love their inaccurate metaphors.

> QJL: The zero-overhead, 1-bit trick

> It reduces each resulting vector number to a single sign bit (+1 or -1). This algorithm essentially creates a high-speed shorthand that requires zero memory overhead.

Why does it keep emphasizing zero overhead? Why is storing a single bit a "trick?" Either there's currently an epidemic of algorithms that use more than one bit to store a bit, or the AI is shoving in extra plausible-sounding words to pad things out. You decide which is more likely.

It's 1:30am and I can't sleep, and I still regret wasting my time on this slop.


I say you're fixating on the wrong signal here. "Redefine" and "cheat sheet" are normal words people frequently use, and I see worse metaphors in human-written text routinely.

It's the structure and rhythm at the sentence and paragraph levels that's the current tell, as SOTA LLMs all seem to overuse clarification constructs like "it's not X, it's Y" and "it's X, an Y and a Z", and "it's X, it's essentially doing Y".

Thing is, I actually struggle to find what's so off-putting about these, given that they're usually used correctly. So far, the best hypothesis I have for what makes AI text stand out is that LLM output is too good. Most text written by real humans (including my own) is shit, with the best of us caring about communicating clearly, and most people not even that; nobody spends time refining the style and rhythm, unless they're writing a poem. You don't expect a blog post or a random Internet article (much less a HN comment) to be written in the same style as a NYT bestseller book for general audience - but LLMs do that naturally, they write text better at paragraph level than most people ever could, which stands out as jarring.

> Either there's currently an epidemic of algorithms that use more than one bit to store a bit, or the AI is shoving in extra plausible-sounding words to pad things out. You decide which is more likely.

Or, those things matter to authors and possibly the audience. Which is reasonable, because LLMs made the world suddenly hit hard against global capacity constraints in compute, memory, and power; between that and edge devices/local use, everyone who pays attention is interested in LLM efficiency.


> they write text better

Not if you view text as a medium for communication, i.e. as a way for a sender to serialize some idea they have in their mind and transfer it to the reader for deserialization.

The AI doesn't know what the sender meant. It can't add any clarity. It can only corrupt and distort whatever message the sender was trying to communicate.

Fixating on these tells is a way for the receiver of the message to detect that it has been corrupted and there is no point in trying to deserialize it. The harder you try to interpret an AI-generated message, the less sense it will make.


LLM prose is very bland and smooth, in the same way that bland white factory bread is bland and smooth. It also typically uses a lot of words to convey very simple ideas, simply because the data is typically based on a small prompt that it tries to decompress. LLMs are capable of very good data transformation and good writing, but not when they are asked to write an article based on a single sentence.

That's true. I.e. it's not that they're not capable of doing better, it's just whoever's prompting them is typically too lazy to add an extra sentence or three (or a link) to steer it to a different region of the latent space. There's easily a couple dozen dimensions almost always left at their default values; it doesn't take much to alter them and nudge the model to sample from a more interesting subspace style-wise.

(Still, it makes sense to do it as a post-processing style transfer space, as verbosity is a feature while the model is still processing the "main" request - each token produced is a unit of computation; the more terse the answer, the dumber it gets (these days it's somewhat mitigated by "thinking" and agentic loops)).


Because it’s a lot of fluff to convey things in a way that’s not very accurate.

Looks like Google canned all their tech writers just to pivot the budget into H100s for training these very same writers

Capex vs. opex

> "Redefine" is a favorite word of AI. Honestly no need to read further.

You're not wrong, but it certainly is an annoying outcome of AI that we're not allowed to use.. words.. anymore.


"The X Trick" or "The Y Dilemma" or similar snowclones in a header is also a big AI thing. Humans use this construction too, but LLMs love it out of all proportion. I call it The Ludlum Delusion (since that's how every Robert Ludlum book is titled).

There is also the possibility that the article when through the hands of the company's communication department which has writers that probably write at LLM level.

Another instinctual reaction here. This specific formulation pops out of AI all the time, there might as well have been an emdash in the title

Genius new idea: replace the em-dashes with semicolons so it looks less like AI.

You're absolutely right. That's not just a genius idea; it's a radical new paradigm.

Damnit.

There goes another bit of my writing style that will get mistaken for an LLM.


I read "this clever step" and immediately came to the comments to see if anyone picked up on it.

It reads like a pop science article while at the same time being way too technical to be a pop science article.

Turing test ain't dead yet.


> Turing test ain't dead yet.

Only because people are lazy, and don't bother with a simple post-processing step: attach a bunch of documents or text snippets written by a human (whether yourself or, say, some respected but stylistically boring author), and ask the LLM to match style/tone.


Maybe they quantized a bit too much the model parameters...

It is AI generated. Or was written by someone a bit far from the technical advances IMHO. The Johnson-Lindenstrauss Lemma is a very specific and powerful concept, when in the article the QLJ explanation is vacuous. A knowledgeable human would not have left the reader wanting for how that relates to the Lemma.

Honestly, the bigger miss is people treating JL as some silver bullet for "extreme" compression, as if preserving pairwise distances for a fixed point set somehow means you still keep the task-relevant structure once you're dealing with modern models.

Try projecting embeddings this way and watch your recall crater the moment you need downstream task performance instead of nearest-neighbor retreival demos. If you're optimizing for blog post vibes instead of anything measurable sure, call it a breakthrough.


Yeah, and some parts of the article are just bizarre:

> Instead of looking at a memory vector using standard coordinates (i.e., X, Y, Z) that indicate the distance along each axis, PolarQuant converts the vector into polar coordinates using a Cartesian coordinate system. This is comparable to replacing "Go 3 blocks East, 4 blocks North" with "Go 5 blocks total at a 37-degree angle”

Why bother explaining this? Were they targeting the high school and middle school student reader base??


It feels very much like Gemini’s writing style - overly excited with lots of unnecessary contrasts.

This reminds me of Intel talking about faster web browsing with the new Pentium

Ha, I wasn't old or into it enough at the time to remember that, but it is consistent with just about every IC datasheet ever with their list of possible applications. (Like: logic gate; applications include Walkman, Rocket ship, Fuzzy Logic Washing Machine, mobile phone, AGI co-processor, ...)

A lot of this happening.

The Dell marketing machine in particular is bludgeoning everyone that will listen about Dell AI PCs. The implication that folks will miss the boat on AI by not having a piddly NPU in their laptop is silly.


The real question is when will you resort to bots for rejecting low-quality PRs, and when will contributing bots generate prompt injections to fool your bots into merging their PRs?


Reminds me of "Universal pre-training by iterated random computation" https://arxiv.org/pdf/2506.20057, with bit less formal approach.

I wonder if there is a closed-form solution for those kinds of initialization methods (call them pre-training if you wish). A solution that would allow attention heads to detect a variety of diverse patterns, yet more structured than random init.


I'm partial to "pre-pre-training" myself.


Time to start zig++


It's funny that real value is now in test suites. Or maybe it's always been...


I don't think this would qualify as clean room (the Library was involved in learning to generate programs as a whole). However, it should be possible to remove the library from the OLMO training data and retrain it from scratch.

But what about training without having seen any human written program? Coul a model learn from randomly generated programs?


> I don't think this would qualify as clean room (the Library was involved in learning to generate programs as a whole)

Hm... I mean this is really one for the lawyers, but IMO you would likely successfully be able to argue that the marginal knowledge of general coding from a particular library is likely close to nil.

The hard part here imo would be convincingly arguing that you can wipe out knowledge of the library from the training set, whether through fine tuning or trying to exclude it from the dataset.

> But what about training without having seen any human written program? Coul a model learn from randomly generated programs?

I think the answer at this point is definitely no, but maybe someday. I think it's a more interesting question for art since it's more subjective, if we eventually get to a point where a machine can self-teach itself art from nothing... first of all how, but second of all it would be interesting to see the reaction from people opposed to AI art on the basis of it training off of artists.

Honestly given all I've seen models do, I wouldn't be too surprised if you could somehow distill a (very bad) image generation model off of just an LLM. In a sense this is the end goal of the pelican riding a bicycle (somewhat tongue in cheek), if the LLM can learn to draw anything with SVGs without ever getting visual inputs then it would be very interesting :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: