After many years, I settled on a constraint based design philosophy:
1. type checking, data marshaling, sanity checks, and object signatures
2. user rate-limits and quota enforcement for access, actions, and API interfaces
3. expected runtime limit-check with watchdog timers (every thread has a time limit check, and failure mode handler)
4. controlled runtime periodic restarts (prevents slow leaks from shared libs, or python pinning all your cores because reasons etc.)
5. regression testing boundary conditions becomes the system auditor post-deployment
6. disable multi-core support in favor of n core-bound instances of programs consuming the same queue/channel (there is a long explanation why this makes sense for our use-cases)
7. Documentation is often out of date, but if the v.r.x.y API is still permuting on x or y than avoid the project like old fish left in the hot sun. Bloat is one thing, but chaotic interfaces are a huge warning sign to avoid the chaos.
8. The "small modular programs that do one thing well" advice from the *nix crowd also makes absolute sense for large infrastructure. Sure a monolith will be easier in the beginning, but no one person can keep track of millions of lines of commits.
9. Never trust the user (including yourself), and automate as much as possible.
10. "Dead man's switch" that temporarily locks interfaces if certain rules are violated (i.e. host health, code health, or unexpected reboot in a colo.)
As a side note, assuming one could cover the ecosystem of library changes in a large monolith is silly.
Good code in my opinion, is something so reliable you don't have to touch it again for 5 years. Such designs should not require human maintenance to remain operational.
There is a strange beauty in simple efficient designs. Rather than staring at something that obviously forgot its original purpose:
1. type checking, data marshaling, sanity checks, and object signatures
2. user rate-limits and quota enforcement for access, actions, and API interfaces
3. expected runtime limit-check with watchdog timers (every thread has a time limit check, and failure mode handler)
4. controlled runtime periodic restarts (prevents slow leaks from shared libs, or python pinning all your cores because reasons etc.)
5. regression testing boundary conditions becomes the system auditor post-deployment
6. disable multi-core support in favor of n core-bound instances of programs consuming the same queue/channel (there is a long explanation why this makes sense for our use-cases)
7. Documentation is often out of date, but if the v.r.x.y API is still permuting on x or y than avoid the project like old fish left in the hot sun. Bloat is one thing, but chaotic interfaces are a huge warning sign to avoid the chaos.
8. The "small modular programs that do one thing well" advice from the *nix crowd also makes absolute sense for large infrastructure. Sure a monolith will be easier in the beginning, but no one person can keep track of millions of lines of commits.
9. Never trust the user (including yourself), and automate as much as possible.
10. "Dead man's switch" that temporarily locks interfaces if certain rules are violated (i.e. host health, code health, or unexpected reboot in a colo.)
As a side note, assuming one could cover the ecosystem of library changes in a large monolith is silly.
Good code in my opinion, is something so reliable you don't have to touch it again for 5 years. Such designs should not require human maintenance to remain operational.
There is a strange beauty in simple efficient designs. Rather than staring at something that obviously forgot its original purpose:
https://en.wikipedia.org/wiki/File:Giant_Knife_1.jpg
https://en.wikipedia.org/wiki/Second-system_effect
Good luck, and have a wonderful day =3