If your child processes spawn grandchild processes and then exit or otherwise di...

danenania · on March 30, 2022

Thanks, I'll have to look into this more deeply. Currently cleanup is being left to the watched process, but it sounds like more rigorous monitoring of grandchild processes is needed.

sciurus · on March 30, 2022

For some examples of what to do, see

https://github.com/Yelp/dumb-init

https://github.com/krallin/tini

KMag · on March 30, 2022

Also, after you SIGTERM/SIGKILL your child processes, you check their exit statuses, right? Otherwise your child processes also sit around as zombie processes until you exit, the zombie orphans become children of init, and init properly "reaps" the zombies by checking their exit statuses.

danenania · on March 30, 2022

The process management code lives here: https://github.com/envkey/envkey/blob/main/public/sdks/envke...

Basically, on unix systems, the command you pass in to envkey-source is run via:

exec.Command("sh", "-c", c)

(c is the command you passed as a string.)

Stdout/stderr is piped through, and .Wait() is called on the command. If envkey-source is in watch mode, it will send a SIGTERM when the environment is updated, then re-run the process once the initial process has died. I can verify that, for example, if a server listening on ports is restarted in this way, the process will die and the ports will be cleared before the new process is started (this has been well-tested).

Do you see a problem with this approach? We will prioritize making all this bulletproof.

gmfawcett · on March 30, 2022

In the short term you could just tell people to use your 'eval' approach, and punt on the issue. :)

Looking at your code, what's missing is a SIGCHLD handler. Basically, your code doesn't know when one of its children dies. You're making an assumption that you know how many children you currently have, based on how many you spawned; but this is misleading due to PID1 semantics re: orphaned processes.

SIGCHLD lets you know that a child process has died. For each SIGCHLD received, your program should (must!) call 'waitpid' (or one of its related functions) to wait on the dead child process. You don't need to waitpid inside the signal handler; you just need to make sure that the counts of signals and waitpid calls eventually match up.

This is in a different language, but here's a nicely writen article about implementing PID1 in Rust:

https://www.fpcomplete.com/rust/pid1/

Someone in Go-land must have written a similar module. Your solution might be an 'import' away.

danenania · on March 30, 2022

Thank you. This comment sums up why I love HN! We will improve this ASAP.

Is there somewhere I can ping you once we make the changes? Would be great to be sure we haven't missed anything in your estimation.

Edit: made an issue to track this: https://github.com/envkey/envkey/issues/3

gmfawcett · on March 30, 2022

Sure. I've just subscribed to the ticket.

I'm not an expert on the topic. Like you said, we're on HN: there's probably five people here who have written PID1 for an actual Unix. :) But I'm happy to take a look.