Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an awesome blog post! Could the fact that the bug was stochastic have to do something with multi-threading? Also, how do you use BIOS to zero out the section of memory?


No, given the early stage of boot at which it crashed, no threading was happening. The randomness was because the initial zeroing out of the kernel's global and static variables might or might not happen, as a result of a physically random process (electrical discharge), instead of being ensured by software.

Most bootloaders (well, a BIOS usually refers to one step before the bootloader, but still) have a pretty primitive command shell, through which you issue the commands telling it how to load the initial kernel (e.g. from storage, or over the network). My guess would be she had to add a line to the boot script that zeroed out the relevant RAM; that, or rewrite the bootloader and add a loop in machine code to zero out the memory.


There was already code written to zero out the BSS shared across all the bootloaders for PowerPC, the call to it had just gotten lost when our enthusiastic fellow kernel dev rewrote bootloaders for platforms they couldn't test. I assume I just added the call to the existing code back in.


No, as the post states, the non-determinism is due to the fact that DRAM cells lose their charge over time unless they are constantly refreshed. When the system is rebooted after having been powered off for a long time, the DRAM cells are all discharged, and thus uninitialized memory will be 0. The kernel was relying on the memory in the bss section to be 0, but was not actually zeroing it out. Therefore, the code would only work if the memory actually was 0 due to being discharged.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: