More

benschulz · 2025-09-25T14:57:17 1758812237

This seems like an uncharitable reading of the post.

> They talk about the importance of purity for automatic optimizations but in the real world there’s all sorts of practical reasons for needing to debug production compiled code

I imagine they're talking about their defaults. One can commonly reconfigure how different build profiles work.

> Also blaming the users of your language for your language not being able to meet their needs isn’t a good look.

Isn't that what the whole post is about though? They even say the following.

> Returning to earth: we may be academics, but we are trying to build a real programming language. That means listening to our users and that means we have to support print debugging. The question is how?

vlovich123 · 2025-09-25T17:02:30 1758819750

> What do you mean turning off the fuel for the engines crashes the plane? I thought you said this was a safe airplane?!

Things like this - they're painting users of programming languages as the ones being unreasonable.

> I imagine they're talking about their defaults. One can commonly reconfigure how different build profiles work.

From the article:

> We don’t want published packages to (a) lie to the type and effect system, or (b) contain print debugging statements > As a result, using dprintln in production mode causes a compilation error.

There is no documentation about the existence of build profiles or how they might work. I think you're reading too charitably.

benschulz · 2025-08-05T07:04:23 1754377463

Most approaches, I assume, will leverage conditional compilation: When (deterministic simulation) testing, use the deterministic async runtime. Otherwise, use the default runtime. That means there's no (runtime) overhead at the cost of increased complexity.

I'm using DST in a personal project. My biggest issue is that significant parts of the ecosystem either require or prefer your runtime to be tokio. To deal with that, I re-implemented most of tokio's API on top of my DST runtime. Running my DST tests involves patching dependencies which can get messy.

benschulz · on Jan 30, 2025

Because the compiler optimizes based on the assumption that consecutive reads yield the same value. Reading from uninitialized memory may violate that assumption and lead to undefined behavior.

(This isn't the theoretical ivory tower kind of UB. Operating systems regularly remap a page that hasn't yet been written to.)

kazinator · on Jan 30, 2025

If you read something where you have not written, who cares whether the compiler optimizes things such that if you read from there again, you get the same value, even though that is not true?

lmm · on Jan 31, 2025

Anyone who wants to be able to sanely debug. Code is imperfect, mistakes happen. If the compiler can optimise so that any mistake anywhere in your program could mean insane behaviour anywhere else in your program, then you get, well, C.

(E.g. imagine doing a write to an array at offset x - this is safe in Rust, so the compiler turns that into code that checks that x is within the bounds of that array, then writes at that offset. If the value of x can change, then now this code can overwrite some other variable anywhere in your program, giving you a bug that's very hard to track down)

kazinator · on Jan 31, 2025

I see what you're getting at: situations in which the compiler trusts that the location has not changed, but needs to re-load it because the cached value is not available. When the location is reloaded, the security test (like a bounds check) is not re-applied to it, yet the value being trusted is not the one that had been checked.

This is not exactly an optimization though, in the sense that it will mess up even thoroughly unoptimized code (with more likelihood, due to caching optimizations being absent).

So that is to say, even the generation of basic unoptimized intermediate code for a language construct relies on assumptions like that certain quantities will not spontaneously deviate from their last stored value.

That's baked into the code generation template for the construct that someone may well have written by hand. If it is optimization, it is that coder's optimization.

The intermediate code for a checked array access, though, should be indicating that the value of the indexing expression is to be moved into a temporary register. The code which checks the value and performs the access refers to that temporary register. Only if the storage for the temporary registers (the storage to which they are translated by the back end) changes randomly would there be a problem. Like if some dynamically allocated location is used as an array index, e,g. array[foo.i] where foo is a reference to something heap allocated, the compiler cannot emit code which checks the range of foo.i, and then again refers to foo.i in the access. It has to evaluate foo.i to an abstract temporary, and refer to that. In the generated target code, that will be a machine register, or a location on the stack. If the machine register or stack are flaky, all bets are off, sure. But we have been talking about memory that is only flaky until it is written to. The temporary in question is written to!

lmm · on Jan 31, 2025

> The intermediate code for a checked array access, though, should be indicating that the value of the indexing expression is to be moved into a temporary register. The code which checks the value and performs the access refers to that temporary register. Only if the storage for the temporary registers (the storage to which they are translated by the back end) changes randomly would there be a problem.

You'd almost certainly pass it as a function parameter, prima facie in a register/on the stack, sure, and therefore in unoptimised code nothing weird would happen. But an optimising compiler might inline the function call, observe that the value doesn't escape, and then if registers are already full it might choose to access the same memory address twice (no reason to copy it onto the stack, and spilling other registers would cost more).

I don't know how likely this exact scenario is, but it's the kind of thing that can happen. Today's compilers stack dozens of optimisation passes, most of which don't know anything about what the others are doing, and all of which make basic assumptions like that the values at memory addresses aren't going to change under them (unless they're specifically marked as volatile). When one of those assumptions is broken, even compiler authors can't generally predict what the effects will be.

kazinator · on Feb 1, 2025

Makes sense. When a temporary is the result of a simple expression with no side effects that is expected to evaluate to the same value each time, the temporary can be taken back. An obvious example of this is constant folding. We set a temporary t27 to 42. Well, that can just be 42 everywhere, so we don't need the temporary. The trust "evaluate to same value each time" is based on assumptions, which, if they are wrong, things are screwed.

kazinator · on Jan 30, 2025

How common is it for operating systems to do anything other than this:

1. Initially map the not-yet-written page to a read-only page full of zeros (the same one for all allocations: only one exists in the whole system).

2. When a write takes place, copy-on-write clone that page to a newly allocated zero-filled-page, then allow the write to proceed.

lmm · on Jan 31, 2025

The "Giving advice about use of memory" section of the article answers this question directly.

benschulz · on April 14, 2022

IIRC integer literals are the blocking issue here. Bounds checking (and elision) happens anyway, but when `Index<T>` is implemented for multiple integer types, `foo[0]` becomes ambiguous.

benschulz · on Jan 25, 2022

> It's difficult to escape the notion that FreeBSD is becoming an operating system by, and for, FreeBSD developers.

I'm curious: Why _shouldn't_ it be an OS primarily for its developers? Why should they focus on others when they're the one building it?

throwawaylinux · on Jan 25, 2022

We could quibble about what exactly "primarily" means, but that's not the phrase he used which is "by, and for" without the qualifier. So here's two reasons to make FreeBSD for others as well:

- They use a lot of code that they don't develop. If FreeBSD is not for others then those external projects and developers would be disinclined to make their stuff work on FreeBSD.

- Every new FreeBSD developer comes from a non-FreeBSD developer who is interested in FreeBSD and probably uses it. More developers ~= better FreeBSD for FreeBSD developers.

- More users (very roughly) means more money. Whether that's money to pay for more FreeBSD developers, or incentive to make your hardware work with FreeBSD or port your software to it, there are some positive effects on the system and possibly your wallet.

- Personal satisfaction to develop software lots of people use. Also the recognition that comes with that can get you a job or help you meet people and go places.

So lots of reasons. Even being purely selfish and hoping to extract the most from it, there are plausible reasons why some amount of focus on others might be the best way to go about developing a the project.

beowulfey · on Jan 25, 2022

Well, it’s fine if you aren’t a FreeBSD developer yourself and don’t want to use FreeBSD.

How many people run TempleOS these days, I wonder?

nix23 · on Jan 25, 2022

>How many people run TempleOS these days, I wonder?

Probably more then ever ;)

benschulz · on Jan 12, 2022

I wish there was an `Either` type in std. I realize that there used to be one and we have `Result` now. However, now that we have `impl Trait` it's worth revisiting, I believe. If we don't we'll have the one from `itertools`, the one from `futures` etc.

In most GCed languages it wouldn't matter so much because you'd return a boxed `Iterator` or `Future` or what have you. But in Rust you generally want to avoid the allocation.

paavohtl · on Jan 12, 2022

IMHO most of the time it's better to create a special 2-case enum for your specific use case. It's 3 lines of code, which gives you significantly clearer naming than "Either" and "Left/Right".

throwaway6734 · on Jan 12, 2022

I ran into this recently and solved the problem via Enums. Not sure of the performance impact.

the_mitsuhiko · on Jan 12, 2022

I think most people end up regretting code that users itertools::Either at the moment and use it reluctantly.

benschulz · on Jan 12, 2022

Can you expand on that? As long as the public type is opaque I don't see what there is to regret.

zaarn · on Jan 12, 2022

As someone with difficulties telling left and right apart, using Either isn't particularly straight forward. This plus the left and right variant become meaningless on their own.

Ie, if I wanted an enum of either a string or an integer, it becomes "Either<&str, i64>".

But what I wanted is that it's either a string identifier or the database Id of something to be later referenced, which might be better described as "enum IdentOrDbId { Ident(String), DbId(i64) }".

This is of course a much simplified example of things I've faced.

benschulz · on Jan 12, 2022

Sorry, based on the replies I'm getting I did not make myself clear. I only want to use Either in cases like the following.

    fn some_func() -> impl SomeTrait {
        if some_condition {
            Either::Left(a)
        } else {
            Either::Right(b)
        }
    }

Both a and b will implement SomeTrait which is all callers care about. However, because they're structurally different they must be wrapped in an Either that delegates all methods from SomeTrait to a or b respectively.

codeflo · on Jan 12, 2022

I don't think that would work in general: trait methods can have signatures for which you can't synthesize an implementation for the sum type (something that takes a second Self as an argument, like std::ops::Add, comes to mind).

For object-safe traits where this would be possible, you can at least do this at the cost of an allocation, as you probably know:

  fn some_func() -> Box<dyn SomeTrait> {
      if some_condition {
          Box::new(a)
      } else {
          Box::new(b)
      }
  }

the_mitsuhiko · on Jan 12, 2022

The moment you have an Either::Left(Either::Left(x)) you're starting to revisit everything. Yet that's what happens with impl Iterator a lot when you have branches.

benschulz · on Nov 18, 2021

Quoting from the paper's description of Phase 1:

> (b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted.

It says an acceptor must respond with the highest-numbered proposal (if any) that it has accepted.

How is acceptor C going to do that after step 9? That's where the bug is introduced, I think, not anywhere in the paper.

ignoramous · on Nov 18, 2021

mjb (author) links to an SO post which goes:

> If you look at _Lamport's paxos proofs_ he treats an accept as a promise... But this is not pointed out in _Paxos Made Simple_. In fact, it appears Lamport took great pains to specify that an accept was not a promise.

> The problem is when you combine the weaker portions of both variants; as the OP did and several implementations do. Then you run into this catastrophic bug.

That is, Lamport's proofs and Paxos Made Simple contradict each other ever so slightly to trip everyone all at the same time:

> What's more I looked through several proprietary and open-source paxos implementations and they all had the bug submitted by the OP!

https://stackoverflow.com/questions/29880949/contradiction-i...

Lamport himself acknowledges as much: https://www.youtube.com/watch?t=4398&v=8-Bc5Lqgx_c

benschulz · on Nov 18, 2021

Lamport acknowledges the report, not the error. I'm still not convinced there even is an error.

The state after step 9 should be the same as after step 7, i.e. `A(-:-,100) B(100:b,100) C(100:b,-)` because C needs to retain the "highest-numbered proposal" it accepted, not the one it accepted last. That means the nine steps outlined in the post/on StackOverflow do not, by themselves, demonstrate any problem.

So what additional steps are missing/what alternative steps actually produce an inconsistency/divergence?

ignoramous · on Dec 1, 2021

There isn't an "error" / "bug" in either papers when one looks at them in isolation. Together, they seem to have a very subtle difference in the algorithm to trip most distributed systems practioners.

nvarsj · on Nov 18, 2021

It's a misunderstanding of the paper I think. The blog post author and SO post both make the mistake of altering the Paxos Made Simple algorithm to allow sending phase 2 messages to any majority set of acceptors. But that doesn't work unless you also alter the P2b wording so that an acceptor also treats prior accepts as promises. I can see there is a slight ambiguity in the reading of the PMS paper itself that would make you think this is okay, but saying it is a bug in the paper is a bit of stretch.

benschulz · on Oct 21, 2021

I believe this is mostly due to the switch to LLVM 13[1].

[1]: https://twitter.com/ryan_levick/status/1443202538099073027

nicoburns · on Oct 21, 2021

I believe the new pass manager isn’t due to be enabled by default until the next version (1.57)

runevault · on Oct 21, 2021

Yeah the significant improvements from 13 will require that last I heard.

eska · on Oct 21, 2021

That seems to have been a mixed bag. But they also enabled PGO (or was it LTO?) and that was mentioned to be a bigger improvement.

woodruffw · on Oct 21, 2021

PGO requires a runtime profile, so I doubt they've enabled that by default :-)

Rust has had LTO for quite a while, and it's normally a source of longer compilation times rather than shorter ones (since LTO in LLVM-world involves mashing all of the bitcode together and (re-)running a lot of expensive analyses to further optimize across translation unit boundaries.

OTOH they've been making continuous improvements to the incremental compilation mode since 1.51/2, so that's probably among the sources of improvements here.

mastax · on Oct 22, 2021

They're referring to the use of PGO when building the compiler itself: https://github.com/rust-lang/rust/pull/88069

woodruffw · on Oct 22, 2021

TIL. That's fantastic!

benschulz · on Oct 7, 2021

I don't know that artist in particular, but perhaps you can find them on The Mod Archive?

https://modarchive.org/index.php?request=search&query=axel+f...

crtasm · on Oct 7, 2021

Axel F is the theme tune from 80s film Beverly Hills Cop https://www.youtube.com/watch?v=6E93VEydSRM

mod archive is a great site!

benschulz · on July 30, 2021

I always assumed it's the verb uring[1], but that might well be wrong.

[1]: https://en.wiktionary.org/wiki/uring