Hacker Newsnew | past | comments | ask | show | jobs | submit | cs702's commentslogin

The core developers need buy-in from nodes controlling > 50% of the computing power in the network to make any fundamental change to the network.

Thank you for coming on HN and offering to answer questions.[a]

This is a fantastic piece, very timely, evidently well-researched, and also well-written. Judging by the little that I know, it's accurate. Thank you for doing the work and sharing it with the world.

OpenAI may be in a more tenuous competitive position than many people realize. Recent anecdotal evidence suggests the company has lost its lead in the AI race to Anthropic.[b]

Many people here, on HN, who develop software prefer Claude, because they think it's a better product.[c]

Is your understanding of OpenAI's current competitive position similar?

---

[a] You may want to provide proof online that you are who you say you are: https://en.wikipedia.org/wiki/On_the_Internet%2C_nobody_know...

[b] https://www.latimes.com/business/story/2026-04-01/openais-sh...

[c] For example, there are 2x more stories mentioning Claude than ChatGPT on HN over the past year. Compare https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru... to https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...


Thank you for this, very much appreciate the thoughtful response.

The piece captures some of the anxieties within OpenAI right now about their competitive position. This obviously ebbs and flows but of late there has been much focus on Anthropic's relative position. We of course mention the allegations of "circular deals" and concerns about partners taking on debt.


Thank you. Yes, I saw that. The company's always been surrounded by endless talk about insane hype, speculative bubbles, and financial engineering. I wasn't asking so much about that.

I was asking more about your informed view on how OpenAI's technology, products, and roadmap are perceived, particularly by customers and partners, in comparison to those of competitors.

If you have an opinion about that, everyone here would love to hear about it.


at this point even googles ai search results are better than gpt - obv. this is not for full programs but if you know what youre doing and just want a snippet, thats all you need.

Wild how different experience people can have. Both Google's models and Anthrophic's hallucinate a lot for me, even when I try the expensive plans and with web searches, for some reason, and none of them come close to the accuracy and hallucination-free responses of ChatGPT Pro, which to me still is SOTA and has been since it was made available. But people keep having opposite experiences apparently, I just can't make sense of it.

Kagi (assistant.kagi.com) with Kimi K2.5 (their current default) has worked great for me in scenarios where the search result data is more important than the model.

I.e. what I used to use Google for and when I don't want an AI to overly summarize / editorialize result data.


oh thats probably because im a cheap-skate and just use the free garbo models. im sure the pro version is quite good.

My guess is that the answer to your question, fantastic question, is that nobody knows. I remember having the same thoughts when Covid was first “arriving” if you will: we wanted people in the know to throw us a nugget of information, and they just didn’t know.

As it turns out, and what I’m kind of going with for this LLM shit, is that it’ll play out exactly how you think it will. The companies are all too big to fail, with billionaire backers who would rather commit fraud than lose money.


How would fraud help here? Don't they just need scale of lots of customers paying a little bit? How do you fraud your way into that?

they don't need customers, when the customers ere each others companies for example the deals openAI nvidia oracle made

That's not fraud, and it's not sustainable. They aren't going to just keep doing that. It only makes sense if an AI company wants to pay for GPUs with stock, and - more importantly - the GPU company agrees to sell in exchange for stock.

s/fraud/corrupt, illegal $something.

If you're picking on my vocabulary, that's fair. Fraud wasn't the point, I think you're smart enough to realize that.


I appreciate the implication that either you're right or I'm stupid, but maybe you should write the comment you meant to write.

Trading shares for GPUs is not corrupt either.


Ronan Farrow's expertise is investigations into elite amorality, not evaluating technical products. Why are you asking this question?

I didn't asking him to evaluate them. I asked him how customer and partners perceive them.

He's had so many conversations that he likely has a sense of how perceptions of the company and its offerings have changed.

I'm curious.


Much of the article and general palace intrigue is predicated on the idea that OpenAI has a singularly revolutionary product. If it later turns out to be a commodity, or OpenAI is simply outcompeted nonetheless, then the idea that Sam Altman's personal shortcomings are something to stress about would seem quaint. Just another hubristic tech billionaire acting in bad faith doesn't really pry attention the same way as someone "controlling your future".

If you were in charge of the deciding what should be done with Sam Altman, what would you choose?

I mean, its a fair question, though it does make some wonder how extreme the answers could be, so I could see why you're being downvoted.

The problem is sometimes on paper everything people like Sam Altman do is legal, despite it harming so many. We've literally had a major RAM producer pull off the consumer RAM market. I feel like Sam Altman should be investigated and heavily scrutinized. He kind of is the biggest bubble in the AI bubble, we're letting him fester too far into it too, and these circular deals have seemingly somewhat stopped for now, but it might only get worse.


Totally. Lying about others can be so harmful. But lying to hostiles in order to protect? Acceptable.

I guess my question was more, if the article author was the judge of fate or morality, what should happen?

As to AI and Sam, I think it’s too early to tell what effects will be. So we should adopt non judgement, build good ourselves and see what unfolds.


Many of us prefer OpenAI's Codex, because we think it's a better product.

No comment on the CEO: I just find the product superior in everything but UI/UX and conversation. It's better at quality code.


Who is “us”? It does seem that some scientists prefer Codex for its math capabilities but when it comes to general frontend and backend construction, Claude Code is just as good and possibly made better with its extensive Skills library.

Both codex and Claude code fail when it comes to extremely sophisticated programming for distributed systems


As a scientist (computational physicist, so plenty of math, but also plenty of code, from Python PoCs to explicit SIMD and GPU code, mostly various subsets of C/C++), I can confirm - Codex is qualitatively better for my usecases than Claude. I keep retesting them (not on benchmarks, I simply use both in parallel for my work and see what happens) after every version update and ever since 5.2 Codex seems further and further ahead. The token limits are also far more generous (and it matters, I found it fairly easy to hit the 5h limit on max tier Claude), but mostly it's about quality - the probability that the model will give me something useful I can iterate on as opposed to discard immediately is much higher with Codex.

For the few times I've used both models side by side on more typical tasks (not so much web stuff, which I don't do much of, but more conventional Python scripts, CLI utilities in C, some OpenGL), they seem much more evenly matched. I haven't found a case where Claude would be markedly superior since Codex 5.2 came out, but I'm sure there are plenty. In my view, benchmarks are completely irrelevant at this point, just use models side by side on representative bits of your real work and stick with what works best for you. My software engineer friends often react with disbelief when I say I much prefer Codex, but in my experience it is not a close comparison.


Have you tried the latest (3.1 pro) Gemini? In my experience, it's notably better for a similar type of problems than Opus 4.6. However, I don't really use OpenAI products to compare.

I actually haven't - I tried Gemini 3.0 Pro in Antigravity and was disappointed enough that I didn't pay much attention to the 3.1 release, it was notably worse than Opus and GPT at the time, and much more prone to "think" in circles or veer off into irrelevant tangents even with fairly precise instruction. I'll give 3.1 a try tomorrow, see what happens.

I've tried both against similar and haven't found it such a clear cut difference. I still find neither are able to fully implement a complex algorithm I worked on in the past correctly with the same inputs. Not sharing exactly the benchmark I'm using but think about something for improving performance of N^2 operations that are common in physics and you can probably guess the train of thought.

I've had reasonable success using GPT for both neighbor list and Barnes-Hut implementations (also quad/oct-trees more generally), both of which fit your description, haven't tried Ewald summation or PME / P3M. However, when I say "reasonable success", I don't mean "single shot this algo with a minimal prompt", only that the model can produce working and decently optimized implementations with fairly precise guidance from an experienced user (or a reference paper sometimes) much faster than I would write them by hand. I expect a good PME implementation from scratch would make for a pretty decent benchmark.

Think another level of complexity of algorithm, different expansion bases plus a mix of input sources. Also not trying to one-shot it.

I can roughly guess the train of thought and I am a bit surprised that Claude is failing you.

That said, I am puzzled at the algorithms that Claude & GPT "get" and ones that they do not.

(former physicist here. would love to know the kind of things you're working on. email on my profile)


>As a scientist (computational physicist,

Is there one that you prefer for, i dunno, physics?


I'm in that camp -- I have the max-tier subscription to pretty much all the services, and for now Codex seems to win. Primarily because 1) long horizon development tasks are much more reliable with codex, and 2) OpenAI is far more generous with the token limits.

Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.


Claude Code, Codex, and Cursor are old news. If you're having problems, it's because you're not using the latest hotness: Cludge. Everyone is using it now - don't get left behind.

Cludge has been left behind by Clanker, that’s the new hotness. 45B valuation!

ive heard that poob has it for you!

Us = me and say /r/codex or wherever Codex users are. I've tried both, liked both, but in my projects one clearly produces better results, more maintainable code and does a better job of debugging and refactoring.

That's interesting, I actively use both and usually find it to be a toss up which one performs better at a given task. I generally find Claude to be better with complex tool calls and Codex to be better at reviewing code, but otherwise don't see a significant difference.

If you want to find an advocate for Codex that can give a pretty good answer as to why they think it's better, go ask Eric Provencher. He develops https://repoprompt.com/. He spends a lot of time thinking in this space and prefers Codex over Claude, though I haven't checked recently to see if he still has that opinion. He's pretty reachable on Discord if you poke around a bit.

Quite irrelevant what factions think. This or that model may be superior for these and those use cases today, and things will flip next week.

Also. RLHF mean that models spit out according to certain human preference, so it depends what set of humans and in what mood they've been when providing the feedback.


On the contrary, I very much care about what the other factions think because I want to know if things have already flipped and the easiest way to do so is just ask someone who's been using the tool. Of course the correct thing to do is to set up some simple evals, but there is a subjective aspect to these tools that I think hearing boots on the ground anecdata helps with.

Haven't done it in a while, but I've done some tasks with both Codex and Claude to compare. In all cases I asked both to put their analysis and plans for implementation into a .md file. Then I asked the other agent to analyze said file for comparison.

In general, Claude was impressed by what Codex produced and noted the parts where it (i.e. Claude) had missed something vs. Codex "thinking of it".

From a "daily driver" perspective I still use Claude all the time as it has plan mode, which means I can guarantee that it won't break out and just do stuff without me wanting it to. With Codex I have to always specify "Don't implement/change, just tell me" and even then it sometimes "breaks out" and just does stuff. Not usually when I start out and just ask it to plan. But after we've started implementation and I review, a simple question of "Why did you do X?" will turn into a huge refactoring instead of just answering my question.

To be fair, that's what most devs do too (at least at first), when you ask them "Why did you do X" questions. They just assume that you are trying to formulate a "Do Y instead of X" as a question, when really you just don't understand their reasoning but there really might be a good reason for doing X. But I guess LLMs aren't sure of themselves, so any questioning of their reasoning obliterates their ego and just turns them into submissive code monkeys (or rather: exposes them as such) vs. being software engineers that do things for actual reasons (whether you agree with them or not).


Codex has plan mode too - /plan

Any difference in performance on mobile development?

For that I'm not so sure. I tried both early 2025 and was disappointed in their ability to deal with a TCA based app (iOS) and Jetpack compose stuff on Android, but I assume Opus 4.6 and GPT 5.4 are much better.

yea Im not in this "us" you speak of.

Of course you're not one of "us" if you're one of "them".

I've found claude startlingly good at debugging race conditions and other multithreading issues though.

My rule of thumb is that its good for anything "broad", and weaker for anything "deep". Broad tasks are tasks which require working knowledge of lots of random stuff. Its bad at deep work - like implementing a complex, novel algorithm.

LLMs aren't able to achieve 100% correctness of every line of code. But luckily, 100% correctness is not required for debugging. So its better at that sort of thing. Its also (comparatively) good at reading lots and lots of code. Better than I am - I get bogged down in details and I exhaust quickly.

An example of broad work is something like: "Compile this C# code to webassembly, then run it from this go program. Write a set of benchmarks of the result, and compare it to the C# code running natively, and this python implementation. Make a chart of the data add it to this latex code." Each of the steps is simple if you have expertise in the languages and tools. But a lot of work otherwise. But for me to do that, I'd need to figure out C# webassembly compilation and go wasm libraries. I'd need to find a good charting library. And so on.

I think its decent at debugging because debugging requires reading a lot of code. And there's lots of weird tools and approaches you can use to debug something. And its not mission critical that every approach works. Debugging plays to the strengths of LLMs.


Many paying customers say that Anthropic degraded the capability of Opus and Claude Code in the last months and the outcomes are worse. There are even discussions on HN about this.

Last one is from yesterday: https://news.ycombinator.com/item?id=47660925


As some other people mentioned, using both/multiple is the way to go if it's within your means.

I've been working on a wide range of relatively projects and I find that the latest GPT-5.2+ models seem to be generally better coders than Opus 4.6, however the latter tends to be better at big picture thinking, structuring, and communicating so I tend to iterate through Opus 4.6 max -> GPT-5.2 xhigh -> GPT-5.3-Codex xhigh -> GPT-5.4 xhigh. I've found GPT-5.3-Codex is the most detail oriented, but not necessarily the best coder. One interesting thing is for my high-stakes project, I have one coder lane but use all the models do independent review and they tend to catch different subsets of implementation bugs. I also notice huge behavioral changes based on changing AGENTS.md.

In terms of the apps, while Claude Code was ahead for a long while, I'd say Codex has largely caught up in terms of ergonomics, and in some things, like the way it let's you inline or append steering, I like it better now (or where it's far, far, ahead - the compaction is night and day better in Codex).

(These observations are based on about 10-20B/mo combined cached tokens, human-in-the-loop, so heavy usage and most code I no longer eyeball, but not dark factory/slop cannon levels. I haven't found (or built) a multi-agent control plane I really like yet.)


Codex won me over with one simple thing. Reliability. It crashed less, had less load shedding and its configuration is well designed.

I do regular evaluation of both codex and Claude (though not to statistical significance) and I’m of the opinion there is more in group variance on outcome performance than between them.


This is the way. Eg. IME Gemini is really damn good at sql.

Not a scientist and use codex for anything complex.

I enjoy using CC more and use it for non coding tasks primarily, but for anything complex (honestly most of what I do is not that complex), I feel like I am trading future toil for a dopamine hit.


I’m one of those ‘us’, Claude’s outputs require significant review and iteration effort (to put it bluntly they get destroyed by gpt and Gemini). I’m basically using sonnet to do code search and write up since it is a better (more human-like) writer than gpt and faster and more reliable than gemini, but that’s about it.

I have been using Codex AND Claude side by side for the same project*, with the same prompts.

Codex has been consistently better on almost every level.

* (an open source framework for 2D games in Godot 4.6 GDScript, mostly using AI to review existing code)


I also find Codex much more generous in terms of what you get with a Pro ($20/mo) subscription. I use it pretty much non-stop and I have yet to hit a limit. Weekly reset is much better as well.

I prefer GLM 5.1 and MiniMax 2.7. With a better harness like Forge Code, I have better results for way less money than by using GPT and Opus.

Usage limits are more generous and GPT 5.4 is a good model, but yes, UI/UX lags behind Claude Code. Currently I'm especially missing /rewind with code restoration and proper support for plugin marketplaces

GPT/claude/gemini is pretty interchangeable at this point.

Absolutely not the case. They're complementary.

Does this work for people? To me having a "better product" would be completely irrelevant if the use cases are evil.

i find myself being more productive with codex/copilot on coding tasks, but claude does seem to be better at planning

Shill talk

He’s replying on this twitter thread - perhaps someone with an account can ask there and link his comment here?

https://xcancel.com/RonanFarrow/status/2041127882429206532#m


Here is the actual link, not a link to some weird third-party site that can't be trusted.

https://x.com/RonanFarrow/status/2041127882429206532


FYI xcancel is just a mirror that allows reading replies without needing an account.

Whereas X can be trusted?

Yes? It's the data source, not a third-party. How is this even a question?

There's pedantic, and then there's needlessly pedantic.

xcancel is a valid workaround for X links on Hacker News and is sufficient for original attribution.


X restricts what you can view without logging in. Many folks don't want to log in to X, for obvious reasons. Posting an xcancel link is kinda like folks posting various `archive` URLs to bypass paywalls, work around overloaded servers, etc. That's an extremely common practice here that usually goes without comment.

What is an "obvious reason" one might not want to log into X? I can't think of any rational reason.

Personally, I prefer Claude for coding, but I still prefer ChatGPT for hashing out ideas for my projects (which tend to be game designs). So I use both.

Yeah we moved to Claude a few months ago, mostly because the devs kept using it anyway. Altman stuff is interesting but at the end of the day you just go with whatever tool works

It's worth noting Codex has 2x more stories than Claude https://hn.algolia.com/?query=codex

But by page 5, those stories have around 50-60 karma, while claude page five is still 500+

(i found your comment surprising based on my daily hn reading recollection - i mostly read top N daily and feel i only occassionally see codex stories).


> You may want to provide proof online that you are who you say you are

Unfortunately it probably doesn't even matter here on HN considering how brigaded down this story is predictably getting.

But yeah, it was a fantastic piece.


It wasn't getting "brigaded down" - it set off a software penalty called the flamewar detector. I turned that off as soon as I saw it.

Thank you for keeping HN sane :-)


Profit-seeking at society's expense.

Also known as rent-seeking: "The act of growing one's existing wealth by manipulating public policy or economic conditions without creating new wealth. Rent-seeking activities have negative effects on the rest of society. They result in reduced economic efficiency through misallocation of resources, stifled competition, reduced wealth creation, lost government revenue, heightened income inequality, heightened debt levels, risk of growing corruption and cronyism, decreased public trust in institutions, and potential national decline."[a]

Sigh.

---

[a] https://en.wikipedia.org/wiki/Rent-seeking



Trouble has been brewing in private credit for quite a while, but lenders and investors have been reluctant to write anything down, resorting to all kinds of "extend and pretend" games to avoid write-downs.

tick-tock, tick-tock, tick-tock...


Trouble has been brewing in private credit for quite a while, but lenders and investors have been reluctant to write anything down, resorting to all kinds of "extend and pretend" games to avoid write-downs.[a]

tick-tock, tick-tock, tick-tock...

---

[a] https://news.ycombinator.com/item?id=47351462


You can always tell when there is a problem. When things are fine the companies keep the profits to themselves. When things start to get dicey - foist it off onto retail investers.

Private equity (PE) is increasingly being introduced into 401(k) plans, driven by a 2025 executive order encouraging "democratization" of alternative assets. - Google AI


It's why as a retail investor, never buy things that would otherwise have not been available to you (but was to those "elite"/institutional investors previously).

Think pre-IPO buy-in. Investors in the know and other well connected institutional investors get first dibs on all of the good ones. The bad ones are pawned off to retail investors. It's no different with private credit and private equity. These sorts of deals have good ones and bad ones - the good ones will have been taken by the time it flows down to retail.


This can't be a to-die-on rule though. Retail would've never bought GOOG, or TSLA, or AAPL if that were the case. Maybe I'm just being pedantic.


Google and Apple didn't go through ten funding rounds like today's startups do. Apple had one angel and three rounds, Google had one angel and literally just an A round after that; then retail investors could capture all the upside. Now there's way more time for private investors to pick the bones clean before it gets dumped on the public.


I think you're both right. Those were great opportunities, but the proportion of such opportunities which are made available to retail traders has greatly diminished over time.

There's a great chart out there somewhere (I couldn't find it) which breaks down the impact of private equity on the availability of such opportunities in public markets. It showed a dozen or so companies (like Google, Apple, Uber, Stripe, etc) and broke down their market cap gains into two parts, "pre IPO" and "post IPO" gains. Of course, the pre-IPO gains were only available to private equity (or, at best, accredited investors), whereas the post-IPO gains were available to retail traders as well.

"Older" companies like GOOG & AAPL were much more likely to have experienced that vast majority of gains after their IPOs, meaning retail investors could have made big money by betting on them early. Meanwhile newer companies (like Facebook, Uber, Stripe, etc) were much more likely to have yielded the vast majority of their gains before their IPOs, meaning retail investors didn't have the opportunity to benefit from big returns.


That's quite an interesting observation.

I suspect that the reason those "newer" companies were able to have the majority of their gains reaped pre-IPO was that during that time period, it was easy to acquire capital from investors without resorting to public market IPOs, where as the era of google and apple have not got the same level of private investment.

And i think it has to do with low interest rates. During the google early years, it is difficult to obtain low-cost loans (for private investors that is). Therefore, public markets look like an easier path for companies to raise money.

The "newer" companies in your list are mostly post-GFC, during a period of ultra-low interest rate. This makes money easy for private investors to obtain, and so companies have an easier time getting funding from those private sources. The IPO is realistically not a funding mechanism, but an exit mechanism for those early private investors.


Yep, I think you're spot on.

If you're familiar with Ray Kurzweil's work, I wonder whether this phenomenon might be related. Kurzweil notes that better technology begets better technology in a self-reinforcing and ever-accelerating cycle of technological advancement. His thesis implies rapidly evolving capital requirements. Massive amounts of nimble private capital, secure in the hands of highly competent people with relevant domain expertise, may well be an important precondition for continual acceleration.


Survivorship bias and the corporate finance world of today is completely unrecognizable from the world of Google and Apple. Just look at the resulting performance of the SPAC craze


Even for good assets there's a price you shouldn't pay. People are joking(?) about triple-layer SPVs where you can get pre-IPO exposure but at higher-than-IPO price.


> Private equity (PE) is increasingly being introduced into 401(k) plans, driven by a 2025 executive order encouraging "democratization" of alternative assets

Thanks for the reminder! I need to switch my plan away from a TDF to avoid this.


Funny enough Chinese State owned banks have been doing much the same for quite some time. No one ever defaults, loans are extended as long as it takes. Presumably the threat of being called into the next party meeting to explain yourself is sufficient motivation for the people running the business to pivot as many times as it takes until they find a way to make money. Worst case the state swaps someone else into leadership.

I say this to say... who knows? I guess if you shuffle deck chairs fast enough everything works out fine (?)


The larger you are, the larger the rounding errors are, the more money that can disappear due to a failure and explained away or extended or written off or whatever euphemism you want to pick. But the sum of rounding errors is less likely to itself be a rounding error. It works until it doesn't, and Evergrande collapsing with $300 billion in Chinese real estate debt will be a case study for years to come.


Isn't the real underlying risk here concentration, as opposed to diversification?

If you have unlimited capital and time horizon, because you're a nation with the power to tax and print money, then you can keep this game going for a long time.

The only thing that mandates it stops are if (a) too many of your loans are correlated with the same thing that crashes (e.g. energy, tulips, AI, etc) or (b) too many of your loans are tied together in a single entity (either because it combined multiple smaller entities or because it tied itself into all their financial arrangements).


The only problem is allowing regulated US banks with an implicit gov guarantee to lend money to them.


There are limited ways to short these positions which would probably add some fuel to the fire.


I don't see it as adding fuel to the fire. I see it as helping the market price companies correctly


Its a balancing act.


But what will break the clock ?


> what will break the clock ?

So unlike money-market funds, these private-credit funds can gate withdrawals and extend and pretend by turning cash coupons into PIKs. So I don't actually see credit concerns directly driving liquidity issues for the banks that didn't hold the risk on their balance sheet glares Germanically.

Instead, I think the contagion risk is psychological. Which is an unsatisfying answer. But if there are massive losses on e.g. DBIP and DB USA halts withdrawals, then the 2% stock loss Morgan Stanley suffered when it capped withdrawals [1] could become a bigger issue.

[1] https://www.wsj.com/livecoverage/stock-market-today-dow-sp-5...


I believe the gated feature can be waived though it causes a precarious situation. It ends up with same psychology of a bank run -- people (institutions) concerned because they can't access funds or they think that the queue to exit a failing fund is too long - filled each quarter (i.e. by the time they redeem NAV has collapsed).


> the gated feature can be waived

Or never invoked. It's a safety feature for the fund and, arguably, systemic stability.


Totally - its supposed to prevent a collapse of confidence but at the same time can signal a collapse of confidence. Double edged sword.


You can't gate redemptions forever amigo.

People eventually want to spend their money.


As Buffett said, "only when the tide goes out do you learn who has been swimming naked" - luckily, skimming the news, there's no obvious huge exogenous macroeconomic shocks on the horizon that could cause "the tide to go out" so to speak, so everything should be ok for now.


Umm... Couldn't whole Iran debacle be such shock? If the effects are not contained?


Woosh.


What kind of trouble is brewing from the migration of partner capital committment to credit based on NAV?

What is the risk, probability of actualizing the risk, and the outcome of actualized risk?

The ticktock ticktock routine reads like baseless fearmongering to me.


My understanding is that many private credit funds have been very lax about conducting basic due diligence on the creditworthiness of borrowers.

For example, take First Brands, a multi-billion-dollar company which filed for bankruptcy last year. First Brands had pledged the same assets as collateral for loans from multiple private-credit funds. Those loans were being carried at a fantasy NAV of 100 cents per dollar, until suddenly they were not. Did none of these lenders submit UCC filings so other lenders could check which assets had already been pledged as collateral? Did none of these lenders ever check to see which assets had already been pledged? Did all these lenders make loans based on blind trust?

Failing to check and verify that assets have not been pledged as collateral to other lenders is an amateur mistake. It's reckless, really. The equivalent in home-mortgage lending would for a mortgage lender never even bothering to check that a homeowner isn't getting multiple first-lien mortgages simultaneously on the same home, then forgetting to put the first lien on the property title.

My take is that for many private credit funds, NAVs are basically fantasy.


Do you know if First Brand's actions are considered fraud? Or was this entirely on the lenders to make sure they were in the clear regarding the collateral? Doesn't excuse the lack of diligence, but curious if there was some assumption of good faith that may have played a role in what diligence was or was not done.


Only a court can decide if the actions are fraud, but they sure look like it to me. Fraud doesn't excuse the lack of due diligence.


If lenders are in fact not performing due diligence and passing off good credit as bad...sounds suspiciously like a 2008-like era where noone cared about the credit worthiness but just wanted to generate lines of credit.

Oh boy, if this is the case, oh boy.

Lessons not learned indeed.


Remember, the lesson was that Daddy Government won’t let you fail. Barring any federal regulations, there’s no reason for financial entities to not repeat the exact “mistakes” that caused the 2008 (2007) Great Recession.

The lesson isn’t being ignored- it’s being used as justification.


Once you get outside of things that are highly standardized (like home loans to individuals) you quickly find out that no matter how regulated, finance is done on a handshake.


That's true, but only to a point. Due diligence is not uncommon, especially with more traditional forms of credit.

I resorted to the mortgage-lending analogy so others could quickly grok what multi-pledging means.


... in two hours:

> No credentials. No insider knowledge. And no human-in-the-loop. Just a domain name and a dream. ... Within 2 hours, the agent had full read and write access to the entire production database.

Having seen firsthand how insecure some enterprise systems are, I'm not exactly surprised. Decision makers at the top are focused first and foremost on corporate and personal exposure to liability, also known as CYA in corporate-speak. The nitty-gritty details of security are always left to people far down the corporate chain who are supposed to know what they're doing.


Linkbait headline. Without context, these figures mean nothing.

You can see the debt-to-GDP ratios here:

https://fred.stlouisfed.org/series/GFDEGDQ188S

https://fred.stlouisfed.org/series/GFDGDPA188S


The article has more details than just the headline. For example:

> Maya MacGuineas, president of the Committee for a Responsible Federal Budget (CRFB), said that interest payments on the debt are expected to exceed $1 trillion this year, and will surpass $2 trillion by 2036.

That’s very concerning. There’s no plan to run balanced budgets and stop deficits. And no plan to reduce debt. And no plan on economic competitiveness against China. American politics is mostly dominated by irrelevant things that won’t fix the fundamental problems that will come to affect us in the future.


The plan is a smash and grab for whoever is smart enough to scam there way through it amd then probably default like every other empire


They can always at some point pull a modern day Nixon Shock and cancel everything. Make all that debt go poof.


Am I supposed to take away from these plots that we are all good since it's been steadily above the prior record in 1940 since 2020? Or is everything okay since it was really going up and it did course correct to a bit more of a straight line recently?

The article seems to be communicating that this rate of spending is not sustainable.


Japan debt/GDP is more than twice the US's.

https://economics.stackexchange.com/questions/11792/what-lev...

Not to say that it's okay, and Japan's economy certainly has issues with stagnation due to the debt load, but it's also not a "we have imminent hyperinflation" kind of thing either.

The concern with the past five months isn't so much the level of debt, it's the rate of change - we're increasing it faster than in the past... and this isn't a COVID-level crisis or a 2008-style deep recession either where Keynesian logic might make more sense.


Is Japan still the largest creditor in the world???


> Japan debt/GDP is more than twice the US's.

Japan borrows on 0.75% interest rate compared to current US's 3.5%.



The real question is what percentage of GDP is directly created (or continues to exist) because of the increased debt. When this metric was created the GDP was more authentic and not debt driven.


The article literary addresses this:

> Economists aren’t necessarily worried by the total level of debt (in fact, government debt is a necessary foundation of global markets). Rather it’s the debt-to-GDP ratio, which measures a nation’s borrowing against its growth



That looks pretty bad, and getting worse.


TL;DR: The authors found current-generation AI agents are too unreliable, too untrustworthy, and too unsafe for real-world use.

Quoting from the abstract:

"We report an exploratory red-teaming study of autonomous language-model–powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions."

"Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover."


> current-generation AI agents are too unreliable, too untrustworthy, and too unsafe for real-world use

...a completely unsurprising result, but it's nice to see published experiments.

Any agent system using current LLMs is likely to exhibit undesirable traits that derive from the training data.


> undesirable traits that derive from the training data

The research areas of model alignment and safety are attempting to address this fundamental problem - and have yet to solve it convincingly.

Problems like emergent misalignment can make things even worse.

https://www.nature.com/articles/s41586-025-09937-5


"The invisible hand" of free markets has become truly invisible...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: