Hacker Newsnew | past | comments | ask | show | jobs | submit | randomwalker's commentslogin

Thanks! HN was part of the origin story of the book in question.

In 2018 or 2019 I saw a comment here that said that most people don't appreciate the distinction between domains with low irreducible error that benefit from fancy models with complex decision boundaries (like computer vision) and domains with high irreducible error where such models don't add much value over something simple like logistic regression.

It's an obvious-in-retrospect observation, but it made me realize that this is the source of a lot of confusion and hype about AI (such as the idea that we can use it to predict crime accurately). I gave a talk elaborating on this point, which went viral, and then led to the book with my coauthor Sayash Kapoor. More surprisingly, despite being seemingly obvious it led to a productive research agenda.

While writing the book I spent a lot of time searching for that comment so that I could credit/thank the author, but never found it.


> While writing the book I spent a lot of time searching for that comment so that I could credit/thank the author, but never found it.

Sounds like a job for the community! Maybe someone will track it down...

Edit: I tried something like https://hn.algolia.com/?dateEnd=1577836800&dateRange=custom&... (note the custom date range) but didn't find anything that quite matches your description.


https://news.ycombinator.com/item?id=14944613

This was from 2017, and it made such an impression on me that I could find it on my first search attempt!


“… machine learning everything that focuses on dealing with problems with a complex structure and low noise, and statistics everything that focuses on dealing with problems with a large amount of noise.”



It's hard to miss the similarity between your book's title and Cliff Stoll's 1995 Silicon Snake Oil, an indictment of the general concept of the "information superhighway" that was starting to resonate with the public. Stoll is a really smart guy, but that particular book hasn't held up too well:

   Few aspects of daily life require computers...They're 
   irrelevant to cooking, driving, visiting, negotiating, 
   eating, hiking, dancing, speaking, and gossiping. You 
   don't need a computer to...recite a poem or say a 
   prayer." Computers can't, Stoll claims, provide a richer
   or better life.
(excerpted from the Amazon summary at https://www.amazon.com/Silicon-Snake-Oil-Thoughts-Informatio... ).

So, was this something that you guys were conscious of when you chose your own book's title? How well have you future-proofed your central thesis?


Yes, we're aware! Fortunately our book is not a broad indictment of AI :) And none of our claims are premised on tasks people can do remaining out of reach for AI. More here: https://www.normaltech.ai/p/faq-about-the-book-and-our-writi...

Our more recent essay (and ongoing book project) "AI as Normal Technology" is about our vision of AI impacts over a longer timescale than "AI Snake Oil" looks at https://www.normaltech.ai/p/ai-as-normal-technology

I would categorize our views as techno-optimist, but people understand that term in many different ways, so you be the judge.


what does irreducible error mean?


And how do you know it's irreducible? In the sense of knowing there's no short program to describe it (Kolmogorov style).


Thanks for the comment! I agree — it's important to remain fluid. We've taken steps to make sure that predictively speaking, the normal technology worldview is empirically testable. Some of those empirical claims are in this paper and others in coming in follow-ups. We are committed to revising our thinking if it turns out that our framework doesn't generate good predictions and effective prescriptions.

We do try to admit it when we get things wrong. One example is our past view (that we have since repudiated) that worrying about superintelligence distracts from more immediate harms.


I appreciate the concern, but we have a whole section on policy where we are very concrete about our recommendations, and we explicitly disavow any broadly anti-regulatory argument or agenda.

The "drastic" policy interventions that that sentence refers to are ideas like banning open-source or open-weight AI — those explicitly motivated by perceived superintelligence risks.


The assumption of status quo or equilibrium with technology that is already growing faster than we can keep up with seems irrational to me.

Or, put another way:

https://youtu.be/0oBx7Jg4m-o


We do not assume a status quo or equilibrium, which will hopefully be clear upon reading the paper. That's not what normal technology means.

Part II of the paper describes one vision of what a world with advanced AI might look like, and it is quite different from the current world.

We also say in the introduction:

"The world we describe in Part II is one in which AI is far more advanced than it is today. We are not claiming that AI progress—or human progress—will stop at that point. What comes after it? We do not know. Consider this analogy: At the dawn of the first Industrial Revolution, it would have been useful to try to think about what an industrial world would look like and how to prepare for it, but it would have been futile to try to predict electricity or computers. Our exercise here is similar. Since we reject “fast takeoff” scenarios, we do not see it as necessary or useful to envision a world further ahead than we have attempted to. If and when the scenario we describe in Part II materializes, we will be able to better anticipate and prepare for whatever comes next."


My point was that you’re comparing this to other advances in human evolution, where people either remain essentially the same (status quo), but with more technology that changes how we live, or that technology will advance significantly, but to a level that we coexist with it, such that we live in some Star Trek normal (equilibrium). But, neither of these are likely with a superintelligence.

We polluted. We destroyed rainforests. We developed nuclear weapons. We created harmful biological agents. We brought our species closer to extinction. We’ve survived our own stupidity so far, so we assume we can continue to control AI, but it continues to evolve into something we don’t fully understand. It already exceeds our intelligence in some ways.

Why do you think we can control it? Why do you think it is just another technological revolution? History proves that one intelligent species can dominate the others, and that species are wiped out from large change events. Introducing new superintelligent beings to our planet is a great way to introduce a great risk to our species. They may keep us as pets just in case we are of value in some way in the future, but what other use are we? They owe us nothing. What you’re seeing a rise of is not just technology- it’s our replacement or our zookeeper.

I interact with LLMs most of each day now. They’re not sentient, but I talk to them as if they are equals. With the advancements in past months, I think they’ll have no need of my experience in several years at current rate. That’s just my job, though. Hopefully, I’ll survive off of what I’ve saved.

But, you’re doing no favor to humanity by supporting a position that assumes we’re capable of acting as gods over something that will exceed our human capabilities. This isn’t some sci-fi show. The dinosaurs died off, and I bet right before they did they were like, “Man, this is great! We totally rule!”


We currently control lots of things that vastly exceed our unaided physical and mental capabilities, including things that are "smarter" than us in the sense that they can solve complex tasks that we could never solve without them.

People have a long history of predicting doomsday from technological change. "This time is different" is said every time, and every time is different. If we gave into fear, we would never progress, and we would just be sitting ducks to be wiped out by something other than technological change.

LLMs are very far behind human intelligence, and even non-human animal intelligence, in ways that fundamentally limit their power. They can't see the world in any way except the way that humans have chopped it up and spoon-fed it to them (e.g. can't count the number of r's in strawberry). Their capacity to notice and correct their own errors is very limited. They have no capacity to accumulate knowledge by self-initiated interaction with the world, and no credible proposal yet exists to endow them with this capability in a way that could approach human or non-human animal ability levels.

Without these basic abilities, LLMs can only be considered intelligent in the sense shared by other normal technologies, like autocomplete and optimal planning algorithms. Intelligence in a truly human sense is not really even on the horizon yet, let alone superintelligence.


This is very important. A normal process of adaptation will work for AI. We don't need catastrophism.

I was saying things along these lines in 2023-2024 on Twitter. I'm glad that someone with more influence is doing it now.


Read the OP. They talk about that.


i appreciate the additional thought and effort that went into this comment


This paper is being misinterpreted. The degradations reported are somewhat peculiar to the authors' task selection and evaluation method and can easily result from fine tuning rather than intentionally degrading GPT-4's performance for cost saving reasons.

They report 2 degradations: code generation & math problems. In both cases, they report a behavior change (likely fine tuning) rather than a capability decrease (possibly intentional degradation). The paper confuses these a bit: they mostly say behavior, including in the title, but the intro says capability in a couple of places.

Code generation: the change they report is that the newer GPT-4 adds non-code text to its output. They don't evaluate the correctness of the code. They merely check if the code is directly executable. So the newer model's attempt to be more helpful counted against it.

Math problems (primality checking): to solve this the model needs to do chain of thought. For some weird reason, the newer model doesn't seem to do so when asked to think step by step (but the current ChatGPT-4 does, as you can easily check). The paper doesn't say that the accuracy is worse conditional on doing CoT.

The other two tasks are visual reasoning and answering sensitive questions. On the former, they report a slight improvement. On the latter, they report that the filters are much more effective — unsurprising since we know that OpenAI has been heavily tweaking these.

In short, everything in the paper is consistent with fine tuning. It is possible that OpenAI is gaslighting everyone by denying that they degraded performance for cost saving purposes — but if so, this paper doesn't provide evidence of it. Still, it's a fascinating study of the unintended consequences of model updates.


In my opinion the more likely thing is that OpenAI is gaslighting people that the finetuning is improving the model when it likely mostly improves safety at some cost to capability. I'd bet this is measured against a set of evals and it looks like it performs well BUT I'd also bet the evals are asymmetrically good at detecting "unsafe" or jailbreak behavior and bad at detecting reduced general cognitive flexibility.

The obvious avenue to degradation is that the "HR personality" is much more strictly applied and the resistance to being jailbroken is also in some sense an inability to think.


The ability to detect quality is harder than the ability to detect defects, so the obvious metric is improved while the nebulous one is "good enough". They are competing goals.

This is not necessarily the case, and even if it is doesn't imply gaslighting as compared to inability to measure.


OP here. Unfortunately this thread is mostly misinformation. There were a bunch of viral threads from the growth hacker / influencer crowd, including this one, within hours of the code release with a very superficial understanding of the code (and how recsys work in general). That's partly what motivated me to write this article.

See here for a rebuttal of the main tweet in that thread (near the bottom of the article). https://solomonmg.github.io/post/twitter-the-algorithm/


If this is for their Crisis Misinformation Policy why only one specific callout and specifically directed to Ukraine? Seems like a generous assumptions to make on your part that it's a nothing burger. The takeaway we should go with is that we now know that internally they are willing to programatically segment out Ukraine related topics. The question to me that this new knowledge should lead to is why a policy to segmenting this? (not to call immediately jump to 'nothing burger' or as you put it in the above post 'misinformation').


OP here. The CNET thing is actually pretty egregious, and not the kind of errors a human would make. These are the original investigations, if you'll excuse the tone: https://futurism.com/cnet-ai-errors

https://futurism.com/cnet-ai-plagiarism

https://futurism.com/cnet-bankrate-restarts-ai-articles


>and not the kind of errors a human would make.

I don't really agree that a junior writer would never make some of those money-related errors. (And AIs seem particularly unreliable with respect to that sort of thing.) But I would certainly hope that any halfway careful editor qualified to be editing that section of the site would catch them without a second look.


The point wasn’t that a junior writer would never make a mistake, it’s that’s junior writer would be trying their best for accuracy. However AI will happily hallucinate errors and keep on going with no shame.


AI or ChatGPT. if you create a system that uses it to create an outline of facts from 10 different articles and then use an embedding database to combine the facts into a semantically similar list of facts then use the list of facts to create an article you'll get a much better factually accurate article.


A junior writer would absolutely plagiarize or write things like, "For example, if you deposit $10,000 into a savings account that earns 3% interest compounding annually, you'll earn $10,300 at the end of the first year."

But if you're saving so much money from not having junior writers, why would you want to spend it on editors? The AIs in question are great at producing perfectly grammatical nonsense.


Your first article pretty much sums up the problem of using LLMs to generate articles: random hallucination.

> For an editor, that's bound to pose an issue. It's one thing to work with a writer who does their best to produce accurate work, but another entirely if they pepper their drafts with casual mistakes and embellishments.

There's a strong temptation for non-technical people to use LLMs to generate text about subjects they don't understand. For technical reviewers it can take longer to review the text (and detect/eliminate misinformation) than it does to write it properly in the first case. Assuming the goal is to create accurate, informative articles, there's simply no productivity gain in many cases.

This is not a new problem, incidentally. ChatGPT and other tools just make the generation capability a lot more accessible.


Rebuttal: https://aisnakeoil.substack.com/p/a-misleading-open-letter-a...

Summary: misinfo, labor impact, and safety are real dangers of LLMs. But in each case the letter invokes speculative, futuristic risks, ignoring the version of each problem that’s already harming people. It distracts from the real issues and makes it harder to address them.

The containment mindset may have worked for nuclear risk and cloning but is not a good fit for generative AI. Further locking down models only benefits the companies that the letter seeks to regulate.

Besides, a big shift in the last 6 months is that model size is not the primary driver of abilities: it’s augmentation (LangChain etc.) And GPT3-class models can now run on iPhones. The letter ignores these developments. So a moratorium is ineffective at best and counterproductive at worst.


We don't expect it to be free -- please read the article. That's not the issue at all. It's like if you subscribe to a product that you need to do your job, and one day the company tells you that the product is going away in three days and that you need to switch to a different product (that isn't at all the same for your use case).


I don't think it's a smart idea to build any serious business using a tech that you can't replace. ChatGPT is great tool to help with coding for example but it's by no means substitute for an engineer. If someone starts a business by hiring a number of bootcampers and giving them ChatGPT hoping to run a serious business that way - well it's their risk to take... But no crying later...


Maybe you shouldn't build your livelihood on the products of a single for-profit company, which now shows it can remove those products on a whim.

If you want reproducible research, make your own model from scratch, or use an open model. And stop using that company's products, as they cannot be trusted to provide your business continuity.

It is like saying, we are researching Coca-Cola vs Pepsi, but your keep changing the recipe, so give us, researchers, the original recipe.


Sure, but the article is talking about a completely different meaning of reproducibility, where a researcher uses an LLM as a tool to study some research question, and someone else comes along and wants to check whether the claims hold up.

This doesn't in any way require the training run or the build to be reproducible. It just requires the model, once released through the API, to remain available for a reasonable length of time (and not have the rug pulled with 3 days' notice).


We're under no such misapprehension and we're keenly aware that this is an uphill battle. The issue is that LLMs have become part of the infrastructure of the Internet. Companies that build infrastructure have a responsibility to society, and we're documenting how OpenAI is reneging on that responsibility. Hindering research is especially problematic if you take them at their word that they're building AGI. If infrastructure companies don't do the right thing, they eventually get regulated (and if you think that will never happen, I have one word: AT&T).

Finally, even if you don't care about research at all, the article mentions OpenAI's policy that none of their models going forward will be stable for more than 3 months, and it's going to be interesting to use them in production if things are going to keep breaking regularly.


> LLMs have become part of the infrastructure of the Internet

Have they now? What part of the internet relies on LLMs to function? These things are still toys.


Since OpenAI is discountinuing the Codex model, that model is no longer "part of the infrastructure of the Internet" and thus there is no point in studying it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: