> I don't understand why it is slower. It has to be zeroed anyway.
Memory pages freed from userspace might be reused in kernelspace.
If, for instance, the memory is re-used in the kernel's page cache, then the kernel doesn't need to zero it out before copying the to-be-cached data into the page.
Edit: I seem to remember back in the 1990s that the kernel at least in some cases wouldn't zero-out pages previously used by the kernel before giving them to userspace, sometimes resulting in kernel secrets being leaked to arbitrary userspace processes. Maybe I'm missremembering, and it was just leakage of secrets between userspace processes. In any case, in the 1990s, Linux was way too lax about leaking data from freed pages.
and if the system isn't idle but also doesn't use all phys. memory it might not be zeroed for a very long time
> Is it not zeroed if the memory is assigned to the same process???
idk. what the current state of this is in linux but at least in the past for some systems for some use cases related to mapped memory this was the case
As far as I know, the Linux kernel never inspects the userspace thread to adjust behavior based on what the thread is going to do next. This would be a very brittle sort of optimization.
More importantly, it's not safe. Another thread in the same process can see ptr between the malloc and the memcpy!
Edit: also, of course, malloc and memcpy are C runtime functions, not syscalls, so checking what happens after malloc() would require the kernel to have much more sophisticated analysis than just looking a few instructions ahead of the calling thread's %%eip/%%rip. While handling malloc()'s mmap() or brk() allocation, the kernel would need to be able to look one or two call frames up the call stack, past the metadata accounting that malloc is doing to keep track of the newly acquired memory, perhaps look at a few conditional branches, trace through the GOT and PLT entries to see where the memcpy call is actually going, and do so in a way that is robust to changes in the C runtime implementation. (Of course, in practice, most C compilers will inline a memcpy implementation, so in the common case, it wouldn't have to chase the GOT and PLT entries, but even then, it's way too complicated for the kernel to figure out if anything non-trivial is happening between mmap()/brk() and the memory being overwritten.)
Edit 2: To be robust in the completely general case, even if it were trivial to identify the inlined memcpy implementation, and it were clearly defined "something non-trivial happens", determining if "something non-trivial happens" between mmap()/brk() and memcyp() would involve solving the halting problem. (Imposssible in the general case.)
malloc() == 'reservation' (but not paged in!) memory
// If touched / updated THEN the memory's paged in
A copy _might_ not even become a copy if the kernel's smart enough / able to setup a hardware trigger to force a copy on writes to that area, at which point the physical memory backing two distinct logical memory zones would be copied and then different.
That's a good point that Linux doesn't actually allocate the pages until they're faulted in by a read or write. So, if it were doing some kind of thread inspection optimization, it would presumably just need to check if the faulting thread is currently in a loop that will overwrite at least the full page.
However, that wouldn't solve the problem of other threads in the same process being able to see the page before it's fully overwritten, or debugging processes, or using a signal handler to invisibly jump out of the initialization loop in the middle, etc. There are workarounds to all of these issues, but they all have performance and complexity costs.
malloc gets memory from the heap which may or may not be paged in/reused. That means you may get reused memory from the heap (which is up to the CRT).
If you want make sure it is zero you will want calloc. If you know you are going to copy something in on the next step like your example you probably can skip calloc and just us malloc. calloc is nice for when you are doing thigs like linked lists/trees/buffers and do not want to have steps to clean out the pointers or data.
Just a guess but since apps can fail to free memory correctly you probably have to zero it on allocation and deallocation (to be secure) when you enable the feature. So you aren't swapping one for the other, you are now doing both.
> Just a guess but since apps can fail to free memory correctly
That's not relevant here; from the perspective of the kernel pages are either assigned to a process, or they're not. If an application fails to free memory correctly, that only means it'll keep having pages assigned to it that it no longer uses, but eventually those pages will always be released (by the kernel upon termination of the process, in the worst case).
That is the worst case if the process had leaked that part of the heap, but it is an optimal case on process exit. On OS with any kind of process isolation walking over most of the heap before exiting as to "correctly free it" is pure waste of the CPU cycles and in worst case even IO bandwidth (when it causes parts of the heap to be paged in).
Pages can be completely avoided to be paged in if the intention is to just zero them. The kernel could either just "forget" them, or use copy-on-write with a properly zeroed out page as a base.
The point is that you do not want to do any kind of heap cleanup before exit. The intention isn't to zero the pages, but to outright discard all of them (which is going to be done by the kernel anyway).
In the normal configuration:
Is it not zeroed if the memory is assigned to the same process???
Is it zeroed when the system is idle???
Is it zeroed in batches that are more memory friendly???