Even an uncontended mutex can create a lot of cache-line bouncing, making it les...

gsliepen · on April 24, 2023

But a lockless algorithm would have the same amount of cache-line bouncing in the uncontended case.

In the LWN series about lockless algorithms, there is a lot of detail about the atomic primitives used in them, but apart from ringbuffers, linked lists and RCU, I actually didn't see any mention of higher level lockless algorithms. For single-producer, single-consumer queues and for linked lists, lockless algorithms are known and performant, but many others are either not that performant, or they come with severe restrictions. For example, RCU sounds great, but you have to worry about grace periods, and you should only use it when you read much more often than you modify.

cryptonector · on April 24, 2023

I've an RCU-like scheme that's performant and doesn't have anything like grace periods, and it works in user-land. https://github.com/cryptonector/ctp

slaymaker1907 · on April 24, 2023

Just looking at this initially, thread_safe_var_init is not correct. There is no guarantee whatsoever that "*vpp = vp" actually does anything, the compiler is perfectly free to just never allocate vp and just use the memory at vpp (thus allowing partial initialization) and the situation is even worse on ARM. You need some sort of memory barrier between that last assignment and it should probably be a volatile/std::atomic pointer.

cryptonector · on April 27, 2023

I've added use of Helgrind's `ANNOTATE_HAPPENS_BEFORE()` and `ANNOTATE_HAPPENS_AFTER()` macros to the functions in `atomics.c` and now this is Helgrind-clean as well as TSAN-clean.

The only bugs found by TSAN (as noted in another reply) were the one you found and two assert()s that didn't use atomics. Those two asserts could be a big problem if one ran a non-NDEBUG build in production.

So now this is TSAN- and Helgrind-clean.

Thanks for prompting me to do this extra work!

cryptonector · on April 27, 2023

I fixed that. TSAN also found a pair of data races involving assert()s (oof). All three are fixed.

cryptonector · on April 24, 2023

Hmmm, good point, the init function does need a memory barrier. Thanks for the review!

nyanpasu64 · on April 25, 2023

> O(N log(N)) where N is the maximum number of live threads that have read the variable and M is the number of values that have been set and possibly released).

If you reference N twice and M zero times, why did you define M? Did you make a typo in the O() or is this intentional?

> Most threads only ever need to call thread_safe_var_get().

> reference count the data.

Does getting a new value automatically release a reference to the old one (so it's no longer safe to read)?

cryptonector · on April 27, 2023

> If you reference N twice and M zero times, why did you define M? Did you make a typo in the O() or is this intentional?

That was a typo. It should have read `O(N log(M))`.

> Does getting a new value automatically release a reference to the old one (so it's no longer safe to read)?

Correct. Every read (get) of the variable causes the previous value to no longer be safe to use in that thread.