Recognizing your username, I'd probably best not argue, but wouldn't iterating a...

munificent · on Aug 16, 2021

> two component arrays only cost you two pages in the cache at any given time

It sounds like you're thinking of virtual memory (i.e. pages of memory either being in RAM or on disk). But CPU caching is about the much smaller L1, L2, and L3 caches directly on the chip itself.

Let's say you have two kinds of components, A and B. You have those stored in two contiguous arrays:

    AAAAAAAAAAAAAAAAAAA...

    BBBBBBBBBBBBBBBBBBB...

Each entity has an A and B and your system needs to access both of those to do its work. The code will look like:

    for each entity:
      access some data component A
      access some data component B
      do some computation

On the first access of A, the A component for that entity and a bunch of subsequent As for other entities get loaded into the cache line. On the next access of B, the B for that entity along with a bunch of subsequent Bs gets loaded into a cache line. If you are lucky the A and B arrays will be at addresses such that the chip is able to put them in different cache lines. At that point, I think you'll mostly be OK.

But if you're unlucky and they are fighting for the same cache line, then each access can end up evicting the previous one and forcing a main memory look up for every single component.

Twisol · on Aug 16, 2021

> It sounds like you're thinking of virtual memory (i.e. pages of memory either being in RAM or on disk).

I think I mostly just used the wrong term (page instead of line) -- I'm on board with the distinction you're drawing.

> But if you're unlucky and they are fighting for the same cache line, then each access can end up evicting the previous one and forcing a main memory look up for every single component.

Yeah, I think this specifically is what I'm having a hard time imagining. I guess it depends on how which memory regions get mapped into which cache lines, and I think I'm assuming something closer to fully-associative, while you're allowing for something closer to direct-mapped.

I can see the trouble -- you don't really have much control over the mapping, unless you make some really deep assumptions about the CPU architecture and go to great lengths to position your arrays appropriately in memory. That would certainly be in the original spirit of data-oriented design, but it's perhaps a bit beyond reasonable for most systems.

Thanks for working through this with me!

(P.S. For others, I found this illustration that helped load a bit of CPU cache architecture back into my brain: http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-asso...)

quotemstr · on Aug 16, 2021

You should be able to use some kind of coloring approach to avoid that kind of false sharing, right?

munificent · on Aug 16, 2021

I'm not an expert at the details of hardware architecture, but my understanding is that you're basically stuck with whatever associativity and cache policy the chip supports. That, and the addresses that your components happen to be at, will determine whether they end up fighting over cache lines or not.

https://en.wikipedia.org/wiki/Cache_placement_policies

renox · on Aug 17, 2021

'false sharing' == two items next to another in memory are accessed by different thread, as they are closed in memory they are then in the same cache line which is 'shared' between two CPUs: read only is okay but read&write is a performance killer

Here it's a cache (associativity) conflict, as gugagore said it can impacts performance even in read only situation.. And yes, you're describing a way to avoid the conflicts.

gugagore · on Aug 16, 2021

I don't think this is false sharing since the issue can occur without any writes.