Hacker Newsnew | past | comments | ask | show | jobs | submit | stalfie's commentslogin

If I can play devils advocate in favor of public disinterest about these events, I think you can argue that cybersecurity doesn't really matter, in the grand scheme of things. At least data exfiltration.

What would the consequences for humanity be if every single electronic patient record was leaked onto the internet? Immediately hugely bad for some groups, unfortunately. After a good deal of embarrassment and drama however, some severe, perhaps the net effect is positive. It would most likely facilitate a lot of scientific inquiry. A lot of people, especially in medical deserts, also use Chatgpt as an md. Providing AI companies with high quality medical data is actually a public service.

So it goes for many things in life, and except for financial and destructive wipe attacks, data security is mostly about protecting the IP of incumbents, which is somewhere between irrelevant and a net negative. It's hard to say what the long term consequences of the IP system breaking down would be, but there is a good argument to be made that it's not necessarily bad.

As for individual people, most don't really care or are resigned to the fact that Google already knows everything about them, and probably abstractly enjoy the fact that a major company gets brought down to their reality. Plenty of societies have extremely collectivistic mindsets of public info being shared, like Scandinavian countries having public tax filings, and they work just fine.

I think most people would secretly relish the outcomes of everything leaking everywhere. Just like people relish the Epstein files being released, and probably would have loved an unredacted version being leaked. Secrets are something human beings naturally gravitate towards to dig up and sharing, and this is actually for good, sensible reasons. Evolution has simply favored groups that did not hoard knowledge, at least not internally. There is a reason the scientific method has openness as a virtue, and is arguably one of the pillars that has carried humanity out of the dark ages.


It would be terrible, I don’t think you’re thinking about what kinds of discrimination can happen due to things like medical records. You can have laws in place to prevent it but if someone can freely see your entire medical history then people WILL take advantage of that. Not to mention how things like citizens traveling to states where abortion is legal, or if a parent disagrees with an operation could affect someone if things are public. This is only talking about medical records, too, the implications of other kinds of espionage have significant repercussions as well. Cybersecurity absolutely does matter

Actually you're right, upon reflection the medical records example is a terrible one, given the proclivities of many governments and/or vindictive mobs. Although the greater issue here is that there exists governments that care about abortions, and the fact people accept living under their reign one way or another. Unfortunately those government are often in positions of power to figure this out and punish individuals no matter what.

And I'd just like to underline the fact that this is truly a devil's advocate position, not something I'd argue strongly for.

But for the LLM training data company, does that leak matter? I guess that depends on your stance about AI proliferation and safety. But if you don't it's at worst a boost for open source LLMs. Rockstar? A great deal of hard work has surely gone into GTA-6 between all the union busting but, but it hardly matters for humanity what particular game people use to entertain themselves. And the medical device company, although the wipe part is truly just senseless destruction, actually might benefit humanity more if a few bootleg factories of their products appear.

Many of these are very stretched scenarios. But for instance in the case of espionage, the problem is not the fact that people are spying, the problem is that there is a war. And the more nefarious regimes tend to depend more on secrecy and lies in order to perpetuate themselves. If total transparency was applied to all governments equally, most democracies would be positively affected. The problem is not the leakage of the Epstein files. It's that this kind of activity could occur in secret and remained covered up.


This is the most pragmatic answer. It was valued fairly. Those who stand to lose got spooked. For consumers we're looking at less privacy/new dangers in a globally connected world. We'll need to adapt, these corporations are trying to adapt to new risks. The labs will be held liable for corporate and sovereign losses when the damage is big enough, like meta/facebook recently

It's very different for Google, the giant faceless corporation, to know someone's search history. Making it _public_ is a different ballgame.

I can't believe I have to say this, but you can't simply delete an important facet of society (expectation of privacy) and expect things to turn out alright. People will still have hangups around prudish topics and traditions. And privacy has always worked as an escape hatch for people in bad situations, either locally (controlling parents and partners) or society-wide (facist governments, genocides).

Just because we can imagine a society where this information is public and everything still works, doesn't mean that there's a path from here to there.


One problem with this mentality is that reality doesn't really make the ideological distinction between whats private and what isn't, or who pays for what. Healthcare is not an intersubjective field, and so actions have consequences, no matter what you think about them.

Vaccines are a good example of this, herd immunity is needed for many of them to work. Antibiotic stewardship is another, unregulated usage of antibiotics risks breeding superbugs.

More generally, "private" ideas are rarely private. Kids born to idiots practicing alternative medicine often die. This scales to societal effects if you have enough idiots. Even though capitalism makes this very fuzzy, many resources in medicine are in fact finite, meaning that time and money spent on one person might mean that another dies. Sometimes that other person is in another, usually poorer country. COVID vaccine availability illustrated that effect nicely.

Essentially what you are advocating is widespread natural selection, with potential consequences affecting anywhere from small local communities to the entire planet in rare cases (COVID is a good one, look up Trichophyton Indotineae for a recent example). And even if you actually do want that, unless you truly follow through, this also comes a huge amount of waste of very limited resources. That is unless you are willing to go the distance and advocate that unvaccinated kids with pneumonia from a measles infection should just go ahead and die because of their parents or neighbors stupid choices.

If you take Kants approach to ethics, that you should only act on principles that you would want to become a universal law, then the principle of healthcare being a private matter is a bit of a non-starter, at least by most ethical systems.


I don't think that's what's being hinted at. The system card seems to say that the model is both token efficient and slow in practice. Deep research modes generally work by having many subagents/large token spend. So this more likely the fact that each token just takes longer to produce, which would be because the model is simply much larger.

By epoch AIs datacenter tracking methods, anthropic has had access to the largest amount of contiguous compute since late last year. So this might simply be the end result result of being the first to have the capacity to conduct a training run of this size. Or the first seemingly successful one at any rate.


"Slow and token-efficient" could be achieved quite trivially by taking an existing large MoE model and increasing the amount of active experts per layer, thus decreasing sparsity. The broader point is that to end users, Mythos behaves just like Deep Research: having it be "more token efficient" compared to running swarms of subagents is not something that impacts them directly.

Non blinded self experimentation is not a useful branch of empiricism.

I had an ME/CFS patient that had tried 100s of things and documented the effects thoroughly. She had a quite impressive list. Roughly 30% had had an effect to begin with, but the trend she observed was that it lasted for around a month at most. Placebo was her overall conclusion, but she occasionally got relief anyways so we both agreed that there was no harm in continuing. I'm sure several "peptides" is on her list by now.

There is nothing new under the sun, and fad cures for diffuse conditions have come and gone many times before. This is especially the case for conditions involving pain or tiredness, which are extremely sensitive to both placebo and nocebo.

What would be revolutionary would be 2-3 double blinded RCTs showing a lasting effect. Which would be great if someone did! But you have to actually bother to do it. And personally I would put money on the outcome being "no effect".


What do you think about the mis-alignment between goals here?

For medical research, the goal is to find general practices that will broadly help, and identify risks with the intervention. Even then, with many interventions, it's understood that they will effect people differently.

For individuals, they don't care about variation in communities, or standard medical practices, they are looking for relief for their specific condition.

Of course, declaring that just because something worked for one person, it should work for others, is wrong in both camps.

I feel like a big part of the disconnect here, and a big reason why people are talking past each other, is that they actually have different goals, and aren't really aware of that difference.


Well, to be honest I think the primary disconnect is in epistemological understanding. The OP did not declare peptides to be a personal revolution, he/she seemingly generalised their own experience to be widely applicable.

Basic human thought patterns usually lead people to think that anecdotes about their personal experience is valuable for understanding the world, but this is wrong. The scientific revolution basically illustrated the flaw in this premise outside of hypothesis generation. It takes specific education to make human beings truly believe that their anecdotal experiences are mostly irrelevant beyond understanding their immediate circumstances. The proportion of humanity that truly think this way is relatively small.

Understanding the world through anecdotes still works okay-ish for a lot of areas, but ascertaining relatively subjective effects of experimental pharmaceuticals is not one of them. But to many people it's non obvious that this is the case. And as a general method of thinking about this issue, it is just the wrong way to go about things.

And that's the disconnect, in my opinion. The OP drew a conclusion from a thought pattern that comes easily to human beings, but that is just wrong in this situation. Of course, perhaps this is reinforced by underlying motivations, but that's not what makes people talk past each other. These kinds of discussion are usually driven by so called "deep disagreements" in epistemological understanding, in my experience.


> Non blinded self experimentation is not a useful branch of empiricism.

Amen to this. The plural of anecdote is not data.

People have been hawking snake oil for centuries, and people have been believing snail oil cured them for centuries.


The blind spot exploiting strategy you link to was found by an adverserial ML model...


This counterpoint doesn't address the issue, and I would argue that it is partially bad faith.

Yes, making it to the test center is significantly harder, but in fact the humans could have solved it from their home PC instead, and performed the exact same. However, if they were given the same test as the LLMs, forbidden from input beyond JSON, they would have failed. And although buying robots to do the test is unfeasible, giving LLMs a screenshot is easy.

Without visual input for LLMs in a benchmark that humans are asked to solve visually, you are not comparing apples to apples. In fact, LLMs are given a different and significantly harder task, and in a benchmark that is so heavily weighted against the top human baseline, the benchmark starts to mean something extremely different. Essentially, if LLMs eventually match human performance on this benchmark, this will mean that they in fact exceed human performance by some unknown factor, seeing as human JSON performance is not measured.

Personally, this hugely decreased my enthusiasm for the benchmark. If your benchmark is to be a North star to AGI, labs should not be steered towards optimizing superhuman JSON parsing skills. It is much more interesting to steer them towards visual understanding, which is what will actually lead the models out into the world.


I just realized that this also means that the benchmark is in practice unverified by third parties, as all tasks are not verified to be solvable through the JSON interface. Essentially there is no guarantee that it is even possible to understand how to complete every task optimally through the JSON interface alone.

I assume you did not develop the puzzles by visualizing JSON yourselves, and so there might be non obvious information that is lost in translation to JSON. Until humans optimally solve all the puzzles without ever having seen the visual version, there is no guarantee that this is even possible to do.

I think the only viable solution here is to release a version of the benchmark with a vision only harness. Otherwise it is impossible to interpret what LLM progress on this benchmark actually means.


Oookay. I actually tried the harness myself, and there was a visual option. It is unclear to me if that is what the models are using on the official benchmark, but it probably is. This probably means that much of my critique is invalid. However, in the process of fiddling with the harness, building a live viewer to see what was happening, and playing through the agent API myself, I might have found 3-4 bugs with the default harness/API. Dunno where to post it, so of all places I am documenting the process on HN.

Bug 1: The visual mode "diff" image is always black, even if the model clicked on an interactive element and there was a change. Codex fixed it in one shot, the problem was in the main session loop at agent.py (line 458).

Bug 2: Claude and Chatgpt can't see the 128x128 pixel images clearly, and cannot or accurately place clicks on them either. Scaling up the images to 1028x1028 pixels gave the best results, claude dropped off hard at 2048 for some reason. Here are the full test results when models were asked to hit specific (manually labeled) elements on the "vc 33" level 1 (upper blue square, lower blue square, upper yellow rectangle, lower yellow rectangle):

Model | 128 | 256 | 512 | 1024 | 2048

claude-opus-4-6 | 1/10 | 1/10 | 9/10 | 10/10 | 0/10

gemini-3-1-pro-preview | 10/10 | 10/10 | 10/10 | 10/10 | 10/10

gpt-5.4-medium | 4/10 | 8/10 | 9/10 | 10/10 | 8/10

Bug 3: "vc 33" level 4 is impossible to complete via the API. At least it was when I made a web-viewer to navigate the games from the API side. The "canal lock" required two clicks instead of one to transfer the "boat" when water level were equilibriated, and after that any action whatsoever would spontaneously pop the boat back to the first column, so you could never progress.

"Bug" 4: This is more of a complaint on the models behalf. A major issue is that the models never get to know where they clicked. This is truly a bit unfair since humans get a live update of the position of their cursor at no extra cost (even a preview of the square their cursor highlights in the human version), but models if models fuck up on the coordinates they often think they hit their intended targets even though they whiffed the coordinates. So if that happens they note down "I hit the blue square but I guess nothing happened", and for the rest of the run they are fucked because they conclude the element is not interactive even though they got it right on the first try. The combination of an intermediary harness layer that let the models "preview" their cursor position before the "confirmed" their action and the 1024x1024 resolution caused a major improvement in their intended action "I want to click the blue square" actually resulting in that action. However, even then unintended miss-clicks often spell the end of a run (Claude 4.6 made it the furthest, which means level 2 of the "vc 33" stages, and got stuck when it missed a button and spent too much time hitting other things)

After I tried to fix all of the above issues, and tried to set up an optimal environment for models to get a fair shake, the models still mostly did very badly even when they identified the right interactive elements...except for Claude 4.6 Opus! Claude had at least one run where it made it to level 4 on "vc 33", but then got stuck because the blue squares it had to hit became too small, and it just couldn't get the cursor in the right spot even with the cursor preview functionality (the guiding pixel likely became too small for it to see clearly). When you read through the reasoning for the previous stages though, it didn't truly fully understand the underlying logic of the game, although it was almost there.


This is incredibly naive. Hunter gatherer communities, especially those in regions without an abundance of food, are and were extremely selective about who were accepted and who weren't. This starts from infancy where non-desirable babies were simply killed. Estimates vary greatly but perhaps around a third to half of "modern" hunter gatherer tribes practice infanticide. The stated reasoning behind infanticides is often extremely vicious and comes down to "he/she is not a good fit for the tribe", or in other words "nobody likes him/her". This fact alone might be one of the major explanations of the high rate of prehistoric infant mortality.

But if you are even allowed to grow up and become an individual, things might be somewhat better once you are part of the in-group, but that does not factor in the fact that human empathy has an overall tendency to switch off if you're not. Even if you're loved because you're kin, your neighboring tribe might still kill you, or you and your kin might kill them, for entirely petty or cynical reasons. The prehistoric bone record supports this as well, seemingly human-weapon related reasons is the most common cause of death.

You can also examine your own emotions to get some idea of our evolutionary environment. Loneliness hurts, to the point where it has measurable negative health impacts equivalent to smoking a pack of cigarettes each day. Your brain is screaming at you not to be lonely, but why? Well, in our ancestral environment, being excluded from the social group meant death, so most individuals that did not have a profound and visceral fear of that happening got their genes consistently removed from the gene pool. For loneliness to be that big of deal, being excluded must have been an easily available option. If everyone loved and accepted everyone unconditionally, this emotional state would simply not have evolved.

Humans quickly become extremely brutal once the environment necessitates it, up to and including cannibalizing your own kin. Infanticide and murder of both ingroups and outgroups is historically commonplace because it was also commonplace prehistorically. Even modern tribes, that live in relative abundance, are still brutal in many ways to this very day.

But of course, when you look at any group of individuals in a tribe survivorship bias will dictate that it all looks nice and rosy. But you might want to check the skeletons in the cave before you pick that as your conclusion.


Don't wake him up from his "Noble Savage" fever-dream.


Yes let's all revel in the sunlight of Enlightenment thinking. That's really going well.


>when you look at any group of individuals in a tribe, survivorship bias will dictate that it all looks nice and rosy

there is a lot of conjecture in your overall post, but I think this is a fair takeaway you put at the end.


Late reply, but in case you'll check I think most of what I said is sourced from sources of varying quality and salience, but at least it's sourced from somewhere. But I just typed it all out quickly without checking anything over, so a lot might be wrong. But it's not entirely pulled out of my ass at least.

Evolutionary history is of course always difficult. I think the loneliness part comes mostly from the kurzgesagt video on loneliness, as well as some other stuff here and there. Rate of infanticide is roughly correct with quick Google. Rest of tribal stuff is from a variety of books and high school social anthropology. I think I actually have the "reasoning for infanticide" part from sex at dawn, of all places.

I'm always scared to run a deep research service to find the counterpoints after I type this kind of stuff out, but feel free to do so for me and dress me down. At least survivorship bias is a classic that's pretty much always worth keeping in mind on any topic.


Well, to be fair, judging by the shift in the general vibes of the average HN comment over the past 3 years, better use of agents and advanced models DID solve the previous temporary setbacks. The techno-optimists were right, and the nay-sayers wrong.

Over the course of about 2 years, the general consensus has shifted from "it's a fun curiosity" to "it's just better stackoverflow" to "some people say it's good" to "well it can do some of my job, but not most of it". I think for a lot of people, it has already crossed into "it can do most of my job, but not all of it" territory.

So unless we have finally reached the mythical plateau, if you just go by the trend, in about a year most people will be in the "it can do most of my job but not all" territory, and a year or two after that most people will be facing a tool that can do anything they can do. And perhaps if you factor in optimisation strategies like the Karpathy loop, a tool that can do everything but better.

Upper managment might be proven right.


If self-driving is any indication, it may take 10+ years to go from 90% to 95%.


[flagged]


Your definition of a glorified autocomplete is … oof. So in short, “try ask it to do something you’d hate on bad code you’d yourself fail at and it might fail”.

And I’m pretty sure I could try Claude on a repo as you describe and it wouldn’t in fact fail. You’re letting your opinions of what LLMs were like a few months ago influence what you think of them now.

Comments like yours really annoy me because they are ridiculously confident about AI being “glorified autocomplete”, but also clearly not informed about the capabilities. I don’t get how some people can be on HN and not actually … try these things, be curious about them, try them on hard problems.

I’m a good engineer. I’ve coded for 24 years at this point. Yesterday in 45 minutes I built a feature that would have taken me three months without AI. The speed gains are obscene and because of this, we can build things we would never have even started before. Software is accelerating.


Alternatives naturally become more viable over time as more and more people find car use impossible, but its kind of hard to tell in advance which lanes of public transport are most necessary to improve. So imo the best solution is just to do it, and then see what happens and adapt. It's too hard to plan out everything in advance, and if you try you get deadlocked politically and nothing ends up happening. So you just find the best lever you can to reduce traffic immediately, and just start pressing it. But you warn everyone that you're pressing it, and when you do so you do it slowly.

The reality is that a lot of traffic is simply unnecessary, and dissipates once you add some friction. The most extreme example of that is the rise of remote work during and after Covid. As it turns out, none of these people actually needed to go anywhere.

And more generally, cars induce their own demand simply by virtue of being the fastest and most comfortable option, and they shape the environment around them to depend on them. Small local shops get outcompeted by distant behemoths due it being more convenient to drive. People move to a large house in a distant suburb rather than a small apartment because they know it's just thirty minutes away from work by car anyways. The easier it is to drive, the more entrenched driving becomes. And any way you slice it, undoing that process will cause pain, so you might as well go ahead and start, because you're never going to find a way to prevent the consequences anyway.


This is a nice case study of the downside of creating explicit policies of "no AI comments" without a technical method of enforcing it. I am sure the hacker news comment quality will suffer almost as much from an escalating culture of accusation and paranoia that it will from LLM comment themselves.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: