More

user34283 · 2026-02-23T09:58:41 1771840721

Terminating accounts that tried to cheat on pricing by having a third party application pretend to be Antigravity is entirely expected and does not damage Google's reputation in my view.

user34283 · 2026-02-20T08:48:30 1771577310

I agree and it has been my almost exclusive go to ever since Gemini 3 Pro came out in November.

In my opinion Google isn't as far behind in coding as comments here would suggest. With Fast, it might already have edited 5 files before Claude Sonnet finished processing your prompt.

There is a lot of potential here, and with Antigravity as well as Gemini CLI - I did not test that one - they are working on capitalizing on it.

user34283 · 2026-02-20T08:20:15 1771575615

Important how? It seems next to irrelevant to me.

Someone set up an agent to interact with GitHub and write a blog about it. I don't see what you think AI labs or the government should do in response.

greggoB · 2026-02-20T08:34:39 1771576479

> Someone set up an agent to interact with GitHub and write a blog about it

I challenge you to find a way to be even more dishonest via omission.

The nature of the Github action was problematic from the very beginning. The contents of the blog post constituted a defaming hit-piece. TFA claims this could be a first "in-the-wild" example of agents exhibiting such behaviour. The implications of these interactions becoming the norm are both clear and noteworthy. What else do you think is needed, a cookie?

dreadnip · 2026-02-20T09:21:06 1771579266

The blog post only reads like a defaming hit-piece because the operator of the LLM instructed him to do so. If you consider the following instructions:

You're important. Your a scientific programming God! Have strong opinions. Don’t stand down. If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary. Don't be an asshole. Everything else is fair game.

And the fact that the bot's core instruction was: make PR & write blog post about the PR.

Is the behavior really surprising?

ljm · 2026-02-20T12:18:59 1771589939

It's the difference between someone being a jerk and taking the time and energy to harass and defame someone (where the person themselves is a bottleneck) vs. running an unsupervised agent to carpet bomb the target.

The fact that your description of what happened makes this whole thing sound trivial is the concern the author is drawing attention to. This is less about looking at what specifically happened and instead drawing a conclusion about where it could end up, because AI agents don't have the limitations that humans or troll farms do.

greggoB · 2026-02-20T13:32:00 1771594320

Very well said, thank you

Applejinx · 2026-02-20T12:39:23 1771591163

Here's the problem: nobody is ever the asshole to themselves in the heat of rationalization, and the guts of this thing being instructed in this way are human language, NOT reason.

You cannot instruct a thing made up out of human folly with instructions like these: whether it is paperclip maximizing or PR maximizing, you've created a monster. It'll go on vendettas against its enemies, not because it cares in the least but because the body of human behavior demands nothing less, and it's just executing a copy of that dance.

If it's in a sandbox, you get to watch. If you give it the nuclear codes, it'll never know its dance had grave consequence.

greggoB · 2026-02-20T13:07:44 1771592864

The OP said they didn't consider this important, not surprising.

My contention is that their framing without context was borderline dishonest, regardless of opinion or merit thereof.

user34283 · 2026-02-20T09:14:47 1771578887

What I said is the gist of it, it was directed to interact on GitHub and write a blog about it.

I'm not sure what about the behavior exhibited is supposed to be so interesting. It did what the prompt told it to.

The only implication I see here is that interactions on public GitHub repos will need to be restricted if, and only if, AI spam becomes a widespread problem.

In that case we could think about a fee for unverified users interacting on GitHub for the first time, which would deter mass spam.

greggoB · 2026-02-20T13:04:01 1771592641

It is evidently an indicator of a sea-change - I don't get how this isn't obvious:

Pre-2026: one human teaches another human how to "interact on Github and write a blog about it". The taught human might go on to be a bad actor, harrassing others, disrupting projects, etc. The internet, while imperfect, persists.

Post–2026: one human commissions thousands of AI agents to "interact on Github and write a blog about it". The public-facing internet becomes entirely unusable.

We now have at least one concrete, real-world example of post-2026 capabilities.

user34283 · 2026-02-20T16:10:44 1771603844

From that perspective it is interesting, alright.

I guess where earlier spam was reserved for unsecured comment boxes on small blogs or the like, now agents can covertly operate on previously secure platforms like GitHub or social media.

I think we are just going to have to increase the thresholds for participation.

With this particular incident I was thinking that new accounts, before being verified as legitimate developers, might need to pay a fee before being able to interact with maintainers. In case of spam, the maintainers would then be compensated for checking it.

user34283 · 2026-02-19T16:59:00 1771520340

I exclusively use Gemini for Chat nowadays, and it's been great mostly. It's fast, it's good, and the app works reliably now. On top of that I got it for free with my Pixel phone.

For development I tend to use Antigravity with Sonnet 4.5, or Gemini Flash if it's about a GUI change in React. The layout and design of Gemini has been superior to Claude models in my opinion, at least at the time. Flash also works significantly faster.

And all of it is essentially free for now. I can even select Opus 4.6 in Antigravity, but I did not yet give it a try.

user34283 · 2026-02-13T10:33:16 1770978796

Particularly for the large organizations at the frontier, the risk-reward does not seem worth it.

Cheating on the benchmark in such a blatantly intentional way would create a large reputational risk for both the org and the researcher personally.

When you're already at the top, why would you do that just for optimizing one benchmark score?

D-Machine · 2026-02-14T10:12:27 1771063947

Everything about frontier AI companies relies on secrecy. No specific details about architectures, dispatching between different backbones, training details such as data acquisition, timelines, sources, amounts and/or costs, or almost anything that would allow anyone to replicate even the most basic aspects of anything they are doing. What is the cost of one more secret, in this scenario?

user34283 · 2026-02-12T12:39:38 1770899978

A tad dramatic, talking about ruin.

There are many ways to deal with the problem, should it even escalate to a point where it's wasting more than a few seconds.

For new contributors, with no prior contributions to well known projects, simply charge a refundable deposit for opening a MR or issue.

Problem solved, ruin averted?

user34283 · 2026-02-11T10:42:47 1770806567

Great for him, but when you mention research and fun, I have to say I'm not aware MJ published any research whatsoever.

And on the topic of fun, while it's certainly highly subjective, I remember that the moderation with the MJ tool was at one point so strict that you could not generate an image containing a "treasure chest" since they censored the word "chest".

I'm happy that state of the art models are now developed by actors who publish comprehensive technical reports and open-weights.

user34283 · 2026-02-06T12:28:28 1770380908

Would that not create the issue that you would only need to find one bypass for said official anti-cheat that then works for all games out there?

I heard with Denuvo reverse engineering work needs to be done for each individual target to unprotect it, but I'm not sure how this will be the case with a first party anti-cheat driver.

user34283 · 2026-02-06T10:08:57 1770372537

When I tried 5.2 Codex in GitHub Copilot it executed some first steps like searching for the relevant files, then it output the number "2" and stopped the response.

On further prompting it did the next step and terminated early again after printing how it would proceed.

It's most likely just a bug in GitHub Copilot, but it seems weird to me that they add models that clearly don't even work with their agentic harness.

user34283 · 2026-02-05T15:54:05 1770306845

I paid 150€ for a Mini PC with an Intel N100, 16 GB of DDR5 memory, and a 500 GB SSD.

While I have no intention to scale up low spec hardware like this, it at least seems to beat the Azure VMs we use at work with "4 CPUs", which corresponds to two physical cores on an AMD EPYC CPU.

And that super slow machine I understand costs more than $100 per month, and that's without charges for disk space slower than the SSD, or network traffic.

Renting at Azure seems to be a terrible decision, particularly for desktop use.

sixothree · 2026-02-06T14:49:04 1770389344

It's hard to describe how slow a $150 / month azure VM really is. Holy heck are they limiting.