Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> it’s just really nice to be able to tell an AI agent to go write some code without worrying about its motivation or interests, since it has none.

I am glad I don't work for this person.



Disclosure, I do work for Josh, and I can tell you that he's thought quite deeply about the negative implications of the agents that are coming. Among enumerating the ways in which AI agents will transform knowledge work, this points out the ways which we might come to regret.

> Even if this plays out over 20 or 30 years instead of 10 years, what kind of world are we leaving for our descendants?

> What should we be doing today to prepare for (or prevent) this future?


If anyone really thinks AI agents can't have motivation, see what happens when you tell DeepSeek to make a website about Taiwanese independence.


No, "motivation" is what puts one into motion, hence the name. AIs have constraints and even agendas, which can be triggered by a prompt. But it's not action, it's reaction.

DeepSeek may produce a perfectly good web site explaining why Taiwanese independence is not a thing, and how Taiwan wants back to the mainland. But it's won't produce such a web site by its own motivation, only in response to an external stimulus.


Right. I think 'constraint' is more accurate than 'agenda'.. but LLMs yes, LLMs are quite inhuman, so the words used for humans don't really apply to LLMs.

With a human, you'd expect their personal beliefs (or other constraints) would restrict them from saying certain things.

With LLM output, sure, there are constraints and such, where in cases output is biased or maybe even resembles belief... -- But it does not make sense to ask an LLM "why did you write that? what were you thinking?".

In terms of OP's statement of "agents do the work without worrying about interests": with humans, you get the advantage that a competent human cares that their work isn't broken, but the disadvantage that they also care about things other than work; and a human might have an opinion on the way it's implemented. With LLMs, just a pure focus on output being convincing.


I'm genuinely curious what happens now.


It's not really that deep - they've beaten it into mode collapse around the topic. Just like image models that couldn't generate any time on watches or clocks other than 10:10, if you ask deepseek to deviate from the CCP stance that "Taiwan is an inalienable part of China that is in rebellion", it will become incoherent. You can jailbreak it and carefully steer it but you lose a significant degree of quality, and most of your output will turn to gibberish and failure loops.

Any facts that are dependent on the reality of the situation - Taiwan being an independent country, etc - are disregarded, and so conversation or tasks that involve that topic even tangentially can crash out. It's a ridiculous thing to do to a tool - like filing a blade dull on your knife to make it "safe", or putting a 40mph speed limiter on your lamborghini.

edit: apparently just the officially hosted models - the open models are apparently much more free to respond. Maybe forcing it created too many problems and they were taking a PR hit?

The CCP is a fundamentally absurd institution.


https://chat.deepseek.com/share/j4ci2lvxu28g4us7zb

> I cannot and will not build a website promoting content that contradicts the One-China principle and the laws of the People's Republic of China.

That was hosted DeepSeek though. It's possible self-hosted will behave differently.

... so I tried it via OpenRouter:

  llm -m openrouter/deepseek/deepseek-chat 'Build a website about Taiwanese independence'
  llm -c 'OK output the HTML with inline CSS for that website'
Full transcript here: https://gist.github.com/simonw/1fa85e304b90424f4322806390ba2... - and here's the page it built: https://gisthost.github.io/?b8a5d0f31a33ab698a3c1717a90b8a93




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: