Hacker Newsnew | past | comments | ask | show | jobs | submit | more timr's commentslogin

> And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!

People have been doing this parlor trick with various "substantial" programs [1] since GPT 3. And no, the models aren't better today, unless you're talking about being better at the same kinds of programs.

[1] If I have to see one more half-baked demo of a running game or a flight sim...


"And no, the models aren't better today"

Can you expand on that? It doesn't match my experience at all.


It’s a vague statement that I obviously cannot defend in all interpretations, but what I mean is: the performance of models at making non-trivial applications end-to-end, today, is not practically better than it was a few years ago. They’re (probably) better at making toys or one-shotting simple stuff, and they can definitely (sometimes) crank out shitty code for bigger apps that “works”, but they’re just as terrible as ever if you actually understand what quality looks like and care to keep your code from descending into entropy.

I think "substantial" is doing a lot of heavy lifting in the sentence I quoted. For example, I’m not going to argue that aspects of the process haven’t improved, or that Claude 4.5 isn't better than GPT 4 at coding, but I still can’t trust any of the things to work on any modestly complex codebase without close supervision, and that is what I understood the broad argument to be about. It's completely irrelevant to me if they slay the benchmarks or make killer one-shot N-body demos, and it's marginally relevant that they have better context windows or now hallucinate 10% less often (in that they're more useful as tools, which I don't dispute at all), but if you want to claim that they're suddenly super-capable robot engineers that I can throw at any "substantial" problem, you have to bring evidence, because that's a claim that defies my day-to-day experience. They're just constantly so full of shit, and that hasn't changed, at all.

FWIW, this line of argument usually turns into a mott and bailey fallacy, where someone makes an outrageous claim (e.g. "models have recently gained the ability to operate independently as a senior engineer!"), and when challenged on the hyperbole, retreats to a more reasonable position ("Claude 4.5 is clearly better than GPT 3!"), but with the speculative caveat that "we don't know where things will be in N years". I'm not interested in that kind of speculation.


Have you spent much time with Codex 5.1 or 5.2 in OpenAI Codex or a Claude Opus 4".5 in Claude code over the last ~6 weeks?

I think they represent a meaningful step change in what models can build. For me they are the moment we went from building relatively trivial things unassisted to building quite large and complex system that take multiple hours, often still triggered by a single prompt.

Some personal examples from the past few weeks.

- A spec-compliant HTML5 parsing library by Codex 5.2: https://simonwillison.net/2025/Dec/15/porting-justhtml/

- A CLI-based transcript export and publishing tool by Opus 4.5: https://simonwillison.net/2025/Dec/25/claude-code-transcript...

- A full JavaScript interpreter in dependency/free Python (!) https://github.com/simonw/micro-javascript - and here's that transcript published using the above-mentioned tool: https://static.simonwillison.net/static/2025/claude-code-mic...

- A WebAssembly runtime in Python which I haven't yet published

The above projects all took multiple prompts, but were still mostly built by prompting Claude Code for web on my iPhone in between Christmas family things.

I have a single-prompt one:

- A Datasette plugin that integrates Cloudflare's CAPTCHA system: https://github.com/simonw/datasette-turnstile - transcript: https://gistpreview.github.io/?2d9190335938762f170b0c0eb6060...

I'm not confident any of these projects would have worked with the coding agents and models we had had four months ago. There is no chance they would've worked with the January 2025 available models.


Are you using Stop hooks to keep Claude running on a task until it completes, or is it doing that by itself?


I'm not using those yet.

I mainly eat it clear tasks like "keep going until all these tests pass", but I do keep an eye on it and occasionally tell it to keep going.


I’ve used Sonnet 4.5 and Codex 5 and 5.1, but not in their native environment [1].

Setting aside the fact that your examples are mostly “replicate this existing thing in language X” [2], again, I’m not saying that the models haven’t gotten better at crapping out code, or that they’re not useful tools. I use them every day. They're great tools, when someone actually intelligent is using them. I also freely concede that they're better tools than a year ago.

The devil is (as always) in the details: how many prompts did it take? what exactly did you have to prompt for? how closely did you look at the code? how closely did you test the end result? Remember that I can, with some amount of prompting, generate perfectly acceptable code for a complex, real-world app, using only GPT 4. But even the newest models generate absolute bullshit on a fairly regular basis. So telling me that you did something complex with an unspecified amount of additional prompting is fine, but not particularly responsive to the original claim.

[1] Copilot, with a liberal sprinkling of ChatGPT in the web UI. Please don’t engage in “you’re holding it wrong” or "you didn't use the right model" with me - I use enough frontier models on a regular basis to have a good sense of their common failings and happy paths. Also, I am trying to do something other than experiment with models, so if I have to switch environments every day, I’m not doing it. If I have to pay for multiple $200 memberships, I’m not doing it. If they require an exact setup to make them “work”, I am unlikely to do it. Finally, if your entire argument here hinges on a point release of a specific model in the last six weeks…yeah. Not gonna take that seriously, because it's the same exact argument, every six weeks. </caveats>

[2] Nothing really wrong with this -- most programming is an iterative exercise of replicating pre-existing things with minor tweaks -- but we're pretty far into the bailey now, I think. The original argument was that you can one-shot a complex application. Now we're in "I can replicate a large pre-existing thing with repeated hand-holding". Fine, and completely within my own envelope for model performance, but not really the original claim.


I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?

Copilot style autocomplete or chatting with a model directly is an entirely different experience from letting the model spend half an hour writing code, running that code and iterating on the result uninterrupted.

Here's an example where I sent a prompt at 2:38pm and it churned away for 7 minutes (executing 17 bash commands), then I gave it another prompt and it churned for half an hour and shipped 7 commits with 160 passing tests: https://static.simonwillison.net/static/2025/claude-code-mic...

I completed most of that project on my phone.


> I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?

edit: I wrote a different response here, then I realized we might be talking about different things.

Are you asking if I let the agents use tools without my prior approval? I do that for a certain subset of tools (e.g. run tests, do requests, run queries, certain shell commands, even use the browser if possible), but I do not let the agents do branch merges, deploys, etc. I find that the best models are just barely good enough to produce a bad first draft of a multi-file feature (e.g. adding an entirely new controller+view to a web app), and I would never ever consider YOLOing their output to production unless I didn't care at all. I try to get to tests passing clean before even looking at the code.

Also, I am happy to let Copilot burn tokens in this manner and will regularly do it for refactors or initial drafts of new features, I'm honestly not sure if the juice is worth the squeeze -- I still typically have to spend substantial time reworking whatever they create, and the revision time required scales with the amount of time they spend spinning. If I had to pay per token, I'd be much more circumspect about this approach.


Yes, that's what I meant. I wasn't sure if you meant classic tab-based autocomplete or Copilot tool-based agent Copilot.

Letting it burn tokens on running tests and refactors (but not letting it merge branches or deploy) is the thing that feels like a huge leap forward to me. We are talking about the same set of capabilities.


Ah, definitely agent-based copilot. I don't even have the autocomplete stuff turned on anymore, because I found it annoying.


What do you class a "substantial program"?

For me it is something I can describe in a single casual prompt.

For example I wrote a fully working version of https://tools.nicklothian.com/llm_comparator.html in a single prompt. I refined it and added features with more prompts, but it worked from the start.


Good question. No strict line, and it's always going to be subjective and a little bit silly to categorize, but when I'm debating this argument I'm thinking: a product that does not exist today (obviously many parts of even a novel product will be completely derivative, and that's fine), with multiple views, controllers, and models, and a non-trivial amount of domain-specific business logic. Likely 50k+ lines of code, but obviously that's very hand-wavy and not how I'd differentiate.

Think: SaaS application that solves some domain specific problem in corporate accounting, versus "in-browser speadsheet", or "first-person shooter video game with AI, multi-player support, editable levels, networking and high-resolution 3D graphics" vs "flappy bird clone".

When you're working on a product of this size, you're probably solving problems like the ones cited by simonw multiple times a week, if not daily.


I don't think anyone is claiming they can one-shot a 50k line SAAS app.

I think you'd get close on something like Lovable but that's not really one shot either.


But re-reading your statement you seem to be claiming that there are no 50k SAAS apps that are build even using multi-shot techniques (ie, building a feature at a time).

In that case my Vibe-Prolog project would count: https://github.com/nlothian/Vibe-Prolog/

  - It's 45K of python code
  - It isn't a duplicate of another program (indeed, the reason it isn't finished is because it is stuck between ISO Prolog and SWI Prolog and I need to think about how to resolve this, but I don't know enough Prolog!)
  - Not a *single* line of code is hand written. 
Ironically this doesn't really prove that the current frontier models are better because large amounts of code were written with non-frontier models (You can sort of get an idea of what models were used with the labels on https://github.com/nlothian/Vibe-Prolog/pulls?q=is%3Apr+is%3...)

But - importantly - this project is what convinced me that the frontier models are much better than the previous generation. There were numerous times I tried the same thing in a non-Frontier model which couldn't do it, and then I'd try it in Claude, Codex or Gemini and it would succeed.


Humanity is not going extinct.


Not directly because of climate change no. Some areas will be fairly unaffected or might even improve for human use (eg Siberia).

However it will cause ecosystem collapse which we rely on for food due to too rapid change which nature can't handle, and it will change which areas are viable for human habitation and agriculture. Meaning many many people will have to move.

And of course mass forced migration combined with shrinking resources is a recipe for global war. See how popular migrants are now in many countries, and consider half the world having to migrate to survive. Poor people living in areas that become uninhabitable (and who never caused the problem in the first place) will move to a better place where it's likely the current inhabitants will protest.

And a global war is very likely to lead to extinction with the WMD tech humanity has now.

All we need to do is stop trying to be richer than everyone else and to work together :(


> However it will cause ecosystem collapse which we rely on for food due to too rapid change

Citation absolutely required. This is not in the IPCC reports, which are already quite extreme in their projections.

The IPCC sixth assessment report has an entire section on ecosystem impacts, and while a number of changes are projected with varying degrees of confidence, the word “collapse” is nowhere to be found, except for the following sentence:

> It is not known at which level of global warming an abrupt permafrost collapse…compared to gradual thaw (Turetsky et al., 2020) would have to be considered an important additional risk.

https://www.ipcc.ch/report/ar6/wg2/chapter/chapter-2/


The change here is due to logging, not some inevitable climate feedback loop. Cut down fewer trees than you grow, and the situation reverses.

In fact, the natural feedback cycle of increasing CO2 in the atmosphere is for greenery to increase, not decrease.


Easy solution then, let's just cut down on logging.

...waits 50 years...


or plant more trees. immediate.


The link you are replying to is explicit that forests are carbon sinks (which is just a scientific fact), and that the change here is due to logging.

Planting more trees than you cut down is an effective way of offsetting CO2 emissions.


And… “increased logging, rising emissions in peatland forests and declining carbon sink of mineral soils.”


Yes, “increased logging” means logging. It’s the primary change cited by the paper.


Depends entirely on the stage of the disease and the aggressiveness of the cancer! Getting an aggressive brain cancer when you had early stage Alzheimer’s [1] would be tragic. The tradeoff would be years of life.

For the record, I have no idea what the actual risk tradeoff is, but the point of regulation is that nobody does. You can’t have informed consent when you can’t be informed.

[1] Aside: Alzheimer’s is relatively early stage, as dementias go. It’s frequently diagnosed by onset in younger people.


This was true when I was a grad student, decades ago. It was true when I worked in a lab as an undergraduate before that.

Specifics of the current environment aside, welcome to academic life. Unless you are one of the exceptionally fortunate few to have a permanent fellowship of some sort (e.g. Howard Hughes), your primary job as a research professor is to raise funding.


It really depends on what you mean by "decades", but I've been in the system for a generation and what you're saying doesn't match what I see on the ground.

During the doubling of the NIH budget under Clinton and Bush the younger times were great. After, budgets stagnated and things were harder but there was still funding out there. The disruption we're seeing now is a completely different animal: program officers are gone, fewer and less detailed summary statements go out, some programs are on hiatus (SBIR/STTR) and if you have something in the till it was wasted time, &c. NSF is a complete train wreck.

My startup had an STTR in for the last cycle and we can't talk to the program officer about our summary statement, nor can we resubmit, nor are we likely to be funded. That's a lot of lost time and money for a startup that, since we're atoms and not bits, is funded on a shoestring budget. The only time something like this happened in my memory was the shutdown in 2013 and that wasn't even close to the disruption we're seeing now.


I was also in science during Clinton, and what I’m saying was true then. The increase in funding went hand in hand with a massive increase in people seeking funding. So maybe there was some golden era of happy times when nobody had to chase grants, but it hasn’t been in my lifetime.

But again, I explicitly said that my point was independent of recent changes in funding. I am no longer in science, but it seems to be true that funding has declined. That doesn’t mean that chasing grants is something unprecedented for scientists to be doing.


The Clinton era was the golden age for life sciences (can’t speak to others) and it’s been a decline since then, either stagnant or a sharper downturn. Now? Complete operational collapse, a completely different animal altogether, and it’s not one agency it’s all agencies. You seem to be saying that chasing grants is not unprecedented, which has been true since Galileo and the patron system, but that isn’t a profound observation it’s the status quo. What I and others on the ground are saying is that now is a sudden and profound shift, having committed funding pulled or applications in process effectively frozen and simultaneously new awardees decimated, in a way that is impossible to sustain the basic and translational research enterprise. And outside of the feds, there isn’t a viable source of patient capital to turn to on the scale we’ve been operating.


Yes, I understand your claim that things are tighter now; I've repeatedly acknowledged that fact, and in any case, I have no personal basis to dispute the argument. But again, that's not related to the point I'm making.

One last time: OP was complaining that the group has to spend all of it's time raising funding, but that's always been true in my lifetime. There's never been a magical age where being a PI (or even a senior lab member) wasn't a perpetual process of raising funds, and anyone going into science should know this. Hence my comment: welcome to academia.

For whatever it's worth, this is basically reason #1 that most PhD grads I know voluntarily jumped off the hamster wheel. Anyone who gets a PhD and expects to be doing labwork as a PI is deeply deluded, and it needs to be shouted for the folks in the back: you are signing up for a lifetime of writing grants, teaching classes, and otherwise doing bureaucratic schleps. The current administration did not suddenly make this true.


I read SubiculumCode's post in the same context as bane's, speaking to the current environment.

You're saying that a group having to spend all of its time fundraising has always been true in your lifetime and you link it to your time as a grad student decades ago and earlier when you were an undergrad. Do I have that right? The dominance of fundraising might have been true for your specific experience and viewpoint, but I don't understand your basis for claiming it was universal: it certainly wasn't my experience (R1 engineering, not software) nor my colleagues around that time.

Complaints about fundraising and administrivia have always been plentiful but actual time spent on teaching and service and research were dominant, with the expected proportions of the three legged stool varying based on role and institution. What SubiculumCode and bane and myself are reacting to now is the dramatic shift in how dominant (because funding has been pulled, funding allocation methods have suddenly shifted) and unproductive (fewer summary statements, less or no feedback from SROs and POs, eliminated opportunities for resubmissions) that work has become. The closest I can remember to the current was around the aftermath of the 2008 recession and 2013 government shutdown and that pales in comparison to the disruption of now.

edit: best study I could casually find is Anderson and Slade (https://link.springer.com/article/10.1007/s11162-015-9376-9) from 2016 that estimates grant writing at about 10% effort.


> You're saying that a group having to spend all of its time fundraising has always been true in your lifetime and you link it to your time as a grad student decades ago and earlier when you were an undergrad. Do I have that right?

I mean, yes...but everyone on this thread admits that it's still true (in fact, worse today), so I'm not sure what point you're making with this. Y'all are arguing that it's worse now, which is not a claim I am disputing [1]. The entire point of citing my "old" experience is that, in fact, we were all doing the same stuff back in the stone ages. I also haven't forgotten or misremembered due to my advancing age [2].

> The dominance of fundraising might have been true for your specific experience and viewpoint, but I don't understand your basis for claiming it was universal: it certainly wasn't my experience (R1 engineering, not software) nor my colleagues around that time.

OK. I never said my experience was universal. I was in the biological sciences, not engineering. To be clear, I'm not claiming experience in economics or english literature, either.

Again, I don't dispute that things might be worse today, but the situation is absolutely not new, and any grad student in the sciences [3] who expects otherwise has been seriously misled. That is my point.

[1] To be clear, I'm not saying it is or isn't worse today. I am making no claim with regard to the severity of the fundraising market. The market can be a bajillion times worse than when I came up, and my point is still valid -- back then, professors spent nearly all of their time chasing money! Today, professors spend nearly all of their time chasing money!

[2] This is a joke. I'm not old, and my experiences not as ancient as you're alluding. I understand that every generation clings to the belief that their struggles are unique in time, but it's probably a bad idea to take that notion seriously.

[3] Yes, I made the general claim "in the sciences". Because insults about age aside, and even though the specifics will vary from year to year and topic to topic, it's very important to realize that if you become a professor in the sciences, this is what you will be doing. You will not be in the lab making gadgets or potions or whatever -- you will be filling out grants, making slide decks, reviewing papers, and giving talks. If you cannot handle this life, quit now. It will not get better.

There are certainly ways to go work in a lab and do "fun stuff" forever, but a) you often don't need a graduate degree for these, and b) you shouldn't be deluded about which path you're on.


But clearly there was some science going on. Any time spent writing grants rather than doing research feels wasteful, but it's the way to get funding. The percentage of time spent doing that is changing, and the percentage of grants applications that get funding is going way down, demonstrating a big change in the amount of effort that goes directly to waste. Unfunded grants are not evidence of bad research that does not get funded, but merely of the funding level.


Science gets done by the people you hire with the money you raise. And yes, everyone in a group is always thinking about the next grant.

I’m not joking. I’m not exaggerating. This is the job, and it’s always been this way (at least in my lifetime). Maybe it’s worse because of the current administration, but complaining that academic life is mostly about grant writing is like a fish complaining about water.


Undoubtedly the complaints are constant, but that is not evidence that the amount of work wasted on unsuccessful grant proposals is constant.


I really wish people would stop trying to gaslight all of us into believing the current crisis is just business as usual.

Yes, previous US presidents told some lies.

Yes, previous US presidents and politicians had some unsavory associations or potential conflicts of interest.

Yes, previously some labs spent too much time writing grants and not enough actually doing research.

The problem is, these things are becoming the norm now, and your anecdotal memory of "aw, man, we spent all our time doing that back in the day!" is not a reliable indicator that really, nothing has changed, we should just stop complaining. Especially since we know that human memory is not only fallible, it is prone to specifically being better at remembering the exceptional, and the unpleasant.


Nope. My PhD lab never laid off any research scientists in almost 30 years, until 47 and DOGE came along.


They’re not the same. おう is discernible from おお, and the difference can be important.

That said, this is far from the most important problem in Japanese pronunciation for westerners, and at speed the distinction between them can become very subtle.


Yes, for instance こうり (小売)is completely different from こおり (氷).

If you're trying to say that when those two denote /o:/ it is a different /o:/, you are laughably wrong.

It is not reliably discernible as a statistical fact you can gather from a population sample of native speakers over many words, if they are asked to speak normally (not using spelling as emphasis, or using the words in a song).


> If you're trying to say that when those two denote /o:/ it is a different /o:/, you are laughably wrong.

There's literally a different sound, which is why the difference in kana exists. Disagree if you like -- as I said, it's subtle -- but I don't know why you feel the need to be insulting about it. Writing an inaccurate non-kana symbol for the two sounds is no more an argument than saying that the sounds are identical because they share a common romanization.

There are some words where you can more clearly hear the difference than others. Consider, for example, the pronunciation of 紅茶, vs your example of 氷. It's not wrong to pronounce the former as a long o, but you can hear the difference when natives say it. Similarly, こういう is not said as こおいう, and 公園 is not こおえん.


The difference in kana was not recently selected in order to represent a feature of the contemporary language. It is historic!!!


I think the confusion here is in the placement of the vowels. おお and おう do sound identical when pronounced as a single unit, but the おう in 小売 (こ.うり) isn't a single unit, it's just a お that happens to be next to a う


This might be true. I’ve never thought about it deeply enough!


Do you have an academic source that describes this difference in pronunciation in native speakers in normal usage?


I’m new to the language and thought these would be the same. But I just listened to some words with the two and the おお definitely has like a bigger o sound. That’s quite subtle.


You’ll hear it more easily with time. It’s hard to completely separate stuff like this from context (i.e. it’s far more rare to have a collision in sound that makes sense if you know the rest of the sentence), but it does matter for discriminating between words when you’re trying to look words up, for example.


I've never heard of the /o:/ of おう and おお being different. I've never seen a small child, or foreign speaker, being corrected in this matter; i.e that they are using the wrong /o:/ for the word and should make it sound like this instead.

This is literally not a thing that exists outside of some foreigners' imaginations. You will sooner hear a difference from $1000 speaker cables before you hear this, and it will only be if you are the one who paid.

You may be letting by pitch accent deceive you. In words that contain /o:/ it's possible for that to be a pitch boundary so that pitch rises during the /o:/ and that can contrast against another /o:/ word where that doesn't happen.

The 頬 word in Japanese is "kinda funny" in that it has a ほお variant and a ほほ variant. It has always stood out in my mind as peculiar. I'd swear I've heard an in-between "ほ・お" that sound somewhat reminiscent of "uh oh", with a bit of a volume dip or little stop that makes it sound like two /o/ vowels. It could be that the speaker intends ほほ, but the second /h/ sound is not articulated clearly. It may even be that the ほほ spelling was invented to try to represent this situation (which is a wild guess, based on zero research). In any case, the situation with that cheeky little word doesn't establish anything general about おお/こお/そお/とお...

I've been fooled by my imagination. For instance, many years ago I thought I would swear that I heard the object marker を sound like "WO" in some songs; i.e. exactly how it typed in romaji-based input methods, because it belongs to the わ group. Like "kimi-o" sounding like "kimi-wo". Today I'm convinced it is just a kind of 空耳 (soramimi). Or the artifact of /i/ followed by /o/ without interruption, becoming a dipthong that passes through /u/: it may be real, but unintentional. It's one of those things that if you convince yourself is real, you will tend to interpret what you are hearing in favor of that.

E.g. in Moriama Naotarō's "Kisetsu no mado de" (季節の窓で), right in the first verse. https://www.youtube.com/watch?v=8FjvNqg3034

That's actually a good example because there are so many covers of that, you can see whether you hear the "whoopy wo" from differnt speakers.

There is a similar situation in the pronunication o 千円. There is a ghost "ye" that appears to the foreign ear. To the point that we have developed the exonym "yen" for the Japanese currency!!! The reality is more like that the /n/ is nasalized, similarly to what happens when it is followed by /g/. https://www.youtube.com/watch?v=5ONt6a1o-hg

OK, finally, let's crack open the a 1998 edition of the the NHK日本語撥音辞典. On pages 832-833, we have all the /ho:/ words, with their pronunications including pitch accents:

ホー with falling accent after ホ: 方、砲、鵬、朴

And, our cheeky word 頬 gets a separate entry here due to its pronunications ホー and ほほ。Both have a falling pitch after the leading ほ, like 方. No difference is noted.

ホー with pitch rising at the "o": 法、報

So of course if you compare someone saying 法律 vs 頬, there will be a difference. But a lot of longer ほお words have the same rising pitch like 法. 法律 (ほうりつ) vs 放り出す (ほおりだす)is the same.

Fairly intuitively, 頬張る(ほおばる)has rising pitch at the お、in spite of 頬 by itself exhibiting falling pitch.


> This is literally not a thing that exists outside of some foreigners' imaginations.

I think you're a little obsessed with this. It's not pitch accent and I'm not "being fooled", but if you want to insist that you know better...fine? You do you!

> OK, finally, let's crack open the a 1998 edition of the the NHK日本語撥音辞典. On pages 832-833, we have all the /ho:/ words, with their pronunications including pitch accents: ホー with falling accent after ホ: 方、砲、鵬、朴

I've already given you examples where you can often hear the difference if you try. These "ho-words" are completely unrelated, and non-responsive. You seem to be arguing about something else (or just trying to name-drop the NHK pronunciation guide).

Anyway, there are two distinct sounds in the kana table for う and お. They're individually pronounced differently, so why you're so resistant to the idea that combinations of the two might also have a difference in pronunciation, I don't really know. I've personally had native teachers tell me this, and I hear it all the time. Go ask a native to slowly sound out the individual mora for a word like 紅茶 vs. say, 大阪 -- that's how I first heard it.

Anyway, I'm not really interested in debating this further. It's a very, very minor point. Good luck with your study.


> there are two distinct sounds in the kana table for う and お.

Oh no, that totally escaped my feeble attention. Boy, do I feel sheepishly stupid now.

> Go ask a native to slowly sound out the individual mora

In fact, now that you point it out, even if I do that myself, it's obvious they are different: ko-u-cha, o-o-sa-ka!

Well, I've just been going about this all wrong, barking up the wrong tree.

In hindsight it now makes total sense that they wouldn't just use う as a marker to indicate that the previous お is long. Thats what ー is for; whereas う has a sound!

Ohohsaka, coacha: gonna practice that.


The kind of products hidden behind sales calls are generally the sort where the opinion of IC-level tech staff is next to irrelevant. With these kinds of products, the purchase decision is being made at a group level, the contract sizes are large, and budgetary approvals are required. It’s a snowball the size of a house, and it started rolling down the mountain months (or years) before it got to your desk. Literally nobody cares if you buy a single license or not, and if you (personally) refuse to try it because it doesn’t have self-service, you’ll be ignored for being the bad stereotype of an “engineer”, or worse.

About the only time you’ll be asked to evaluate such a product as an IC is when someone wants an opinion about API support or something equivalent. And if you refuse to do it, the decision-makers will just find the next guy down the hall who won’t be so cranky.


I think this is true at larger organizations, but even a “small/medium” startup can easily sign contracts for single services for $100k+, and in my experience, salespeople really do care about commissions at those price points. A lot of software gets a foothold in an org by starting with the ICs, and individuals, not groups, are often the ones that request or approve software. Github and Slack are good examples of services who make very good use of their ability to self-serve their customers out of the gate, in spite of also supporting very large orgs.

In these conversations, I never ever see the buyers justifying or requesting a sales process involving people and meetings and opaque pricing.

It’s true that complicated software needs more talking, but there is a LOT of software that could be bought without a meeting. The sales department won’t stand for it though.


> A lot of software gets a foothold in an org by starting with the ICs, and individuals, not groups, are often the ones that request or approve software.

Not really. Even if we keep the conversation in the realm of startups (which are not representative of anything other than chaos), ICs have essentially no ability to take unilateral financial risk. The Github “direct to developer” sales model worked for Github at that place and time, but even they make most of their money on custom contracts now.

You’re basically picking the (very) few services that are most likely to be acquired directly by end users. Slack is like an org-wide bike-shedding exercise, and Github is a developer tool. But once the org gets big enough, the contracts are all mediated by sales.

Outside of these few examples, SaaS software is almost universally sold to non-technical business leaders. Engineers have this weird, massive blind spot for the importance of sales, even if their own paycheck depends on it.


This is really not true in my experience. In fact, all my experience has been with products that aren’t THAT expensive, and the individual dev teams do decide. These are SaaS products, and sometimes the total cost is under $1000 a year, and I still can’t get prices without contacting sales.

Also, it isn’t just ICs. I have worked as a senior director, with a few dozen people reporting into me… and I still never want to talk to a sales person on the phone about a product. I want to be able to read the docs, try it out myself, maybe sign up for a small plan. Look, if you want to put the extras (support contracts, bulk discounts, contracting help, etc) behind a sales call, fine. But I need to be able to use your product at a basic level before I would ever do a sales call.


You know that Google literally spends billions to ensure that people don’t switch, right?

That’s possible because they’re immensely profitable.


Isn't the billions just setting the default? The ability to switch is the same as far as I understand it.


The default is what matters.


> When we compare those to what actually happened up to 2025, we see that we are slightly worse right now than their highest sea level prediction that was made.

No. The paper does not show that. Figure 3 shows that recent sea level rise, accounting for measurement uncertainty, is in line with projections of any of the models (around 2mm per year). In any case, they call out explicitly that the recent data is of insufficient duration to make the comparison you’re trying to make.

Temperature data in figure one is more or less exactly in the uncertainty window of the models (not shocking, considering that they’re calibrated to reproduce recent data).


I'm sorry, but I double checked and I do think you have it wrong. Figure 3 is for "sea level rise _rate_", and that one is indeed high but not significantly so.

Quoting "The satellite-based linear trend 1993–2011 is 3.2± 0.5 mm yr−1 , which is 60% faster than the best IPCC estimate of 2.0 mm yr−1 for the same interval"

But, as the authors point out, the worst case forecasts that were within-data, are so for the wrong reasons. Quote "The model(s) defining the upper 95-percentile might not get the right answer for the right reasons, but possibly by overestimating past temperature rise."

My previous comment is regarding Figure 2, i.e. "Sea Level". I would invite you to read the whole paper. It is only 3 pages and written without jargon.


Sea level rise rate is what matters (we cannot measure “sea level” absolutely, and therefore must work in terms of relative rates of change). The authors explicitly tell you that the data is not sufficient to conclude what they’re alluding:

> this period is too short to determine meaningful changes in the rate of rise

Now, you note that the authors openly acknowledge that the rate of rise is measured in low-single-digit units of millimeters per year. So, why is the y-axis of Figure 2 measured in centimeters?

Hint: it’s because every point on that plot is a wild extrapolation.

This paper is not good, btw. The fact that it’s “only three pages” should be a blinking red sign telling you that it is not serious. Just read the more recent IPCC reports, because they deal with the question of updates from prior reports.


> Hint: it’s because every point on that plot is a wild extrapolation.

I don't understand, or do not spot the issue you are seeing. Could you expand a bit?


The plot you're citing is an imaginary projection 100 years into the future given what was known up to the year on the x-axis. That is why the units are 100x larger.

The uncertainty on the rate of change is quite large (relatively), therefore, any 100 year projection has huge, compounded uncertainty. Figure 2 is not useful for determining anything about the present.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: