Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If your child processes spawn grandchild processes and then exit or otherwise die, and envkey-source is running as PID 1, then envkey-source will become the parent of those orphaned grandchild processes. When those orphans exit, envkey-source must check those process's exit status. Until then, those process IDs can't be reused, and the processes stick around as zombie processes.

In other words, PID 1 is special, because it may have child processes that it never created, and needs to be aware and handle them properly. Otherwise, you can end up leaking zombie processes.

It sounds like envkey-source isn't aware that it may adopt orphaned child processes. Killing them isn't the main issue. Checking their exit statuses is the main issue.



Thanks, I'll have to look into this more deeply. Currently cleanup is being left to the watched process, but it sounds like more rigorous monitoring of grandchild processes is needed.



Also, after you SIGTERM/SIGKILL your child processes, you check their exit statuses, right? Otherwise your child processes also sit around as zombie processes until you exit, the zombie orphans become children of init, and init properly "reaps" the zombies by checking their exit statuses.


The process management code lives here: https://github.com/envkey/envkey/blob/main/public/sdks/envke...

Basically, on unix systems, the command you pass in to envkey-source is run via:

exec.Command("sh", "-c", c)

(c is the command you passed as a string.)

Stdout/stderr is piped through, and .Wait() is called on the command. If envkey-source is in watch mode, it will send a SIGTERM when the environment is updated, then re-run the process once the initial process has died. I can verify that, for example, if a server listening on ports is restarted in this way, the process will die and the ports will be cleared before the new process is started (this has been well-tested).

Do you see a problem with this approach? We will prioritize making all this bulletproof.


In the short term you could just tell people to use your 'eval' approach, and punt on the issue. :)

Looking at your code, what's missing is a SIGCHLD handler. Basically, your code doesn't know when one of its children dies. You're making an assumption that you know how many children you currently have, based on how many you spawned; but this is misleading due to PID1 semantics re: orphaned processes.

SIGCHLD lets you know that a child process has died. For each SIGCHLD received, your program should (must!) call 'waitpid' (or one of its related functions) to wait on the dead child process. You don't need to waitpid inside the signal handler; you just need to make sure that the counts of signals and waitpid calls eventually match up.

This is in a different language, but here's a nicely writen article about implementing PID1 in Rust:

https://www.fpcomplete.com/rust/pid1/

Someone in Go-land must have written a similar module. Your solution might be an 'import' away.


Thank you. This comment sums up why I love HN! We will improve this ASAP.

Is there somewhere I can ping you once we make the changes? Would be great to be sure we haven't missed anything in your estimation.

Edit: made an issue to track this: https://github.com/envkey/envkey/issues/3


Sure. I've just subscribed to the ticket.

I'm not an expert on the topic. Like you said, we're on HN: there's probably five people here who have written PID1 for an actual Unix. :) But I'm happy to take a look.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: