Personally I like to treat the Git in GitOps as an implementation detail. The underlying principle of GitOps is that your entire environments setup and config is written in code, and version controlled. So you can in theory pick any version of your entire env, throw it at blank slate, and reliably get the environment specified by that git hash.
Then there’s the whole constant reconciliation of your version controlled env specification, and the actual env, and how you automatically resolve differences. With the most important principle being that the version controlled code/config is absolute truth and something needs to figure out how to bend the world to match.
But importantly in all of this, Git isn’t that important. Version control is important, infrastructure as code is important, but Git isn’t. Arguably Git isn’t a great tool for GitOps due to issues like the ones you mention. But the huge ecosystem around Git makes the pain worth it.
I would argue the “correct” solution to your problem is a tool that automatically creates the correct cherrypicks and reverts for you based on a request to rollback application X.
Treat git as a dumb version control system, and broadly ignore “good practice”, because at lot of those good practices are designed for software development, not infrastructure development. We need to develop new working practices, built on top of Git fundamental components, rather than trying to rationalise existing working practices against the new problems that appear in GitOps.
> So you can in theory pick any version of your entire env, throw it at blank slate, and reliably get the environment specified by that git hash.
The trap here is this only works for stateless infrastructure. If you do it with stateful resources, you'll lose all data. Your gitops tool will happily recreate EC2 instances, S3 buckets and RDS instances, all empty/initialized to whatever you defined.
There are 2 different thoughts here I think. If using GitOps in Kubernetes, then application and set up (Pods) aren’t associated with Nodes (EC2). And both can be torn down and rebuilt without state issues. When state is required, then PVCs and Stateful Sets come into play.
For managed services like S3 and RDS, there are other GitOps tools like Crossplane.io which you can use for similar GitOps management. But the paradigm shift might also be that you add GitOps config to perform regular backups, and also add config to ensure that if it is being recreated, it restores from a backup.
> When state is required, then PVCs and Stateful Sets come into play.
And that's the problematic part. With GitOps, you manage the infrastructure, so in the case of PVCs these are PVC manifests - you need to manage data separately.
But even if you exclude the data, even some PVC manifest changes like size change can be tricky. Also some properties are immutable (like its storage class, access modes etc.) so that you cannot modify a PVC without recreating it.
You can decide you want to avoid the problem completely and store your data outside of Kubernetes. Now you have two problems...
If you use GitOps to manage apps, you better isolate them somehow, for example put each app in a different directory. In this case, a revert for App A wouldn't cause problems for B and C.
But frankly, GitOps works best with stateless apps. Managing stateful apps is possible but you need to take care of state yourself.
This one is easy. I say this in spite of the spate of different answers that say otherwise... it should be easy?
You version your apps, of course, and you would publish some artifact that represents the release. Historically this has been a Helm chart, but for Flux we are seeing many people use OCI repositories for this now. They give many of the benefits of Helm repositories without most of the drawbacks, the way that Flux uses them you retain traceability to the Git commit that started your release, and even Helm itself has already adopted OCI repositories in the current version, (just waiting for many chart publishers to catch up, we are getting there!)
The app manages its own manifests in the app repo, the app devs deploy from the main branch or a dev branch on their own app repo, but everyone else that uses the app will deploy from a release artifact. Those artifacts are tagged with semver numbers, so you can automatically move to the next version of the app as soon as its published with a valid signature.
If your app devs are the only ones using the app, then this should not change anything as they are building for production it should be versioned and managed like any production concern – whether it's for distribution or not, you still do releases.
It's not any more complicated than what you are already doing with `docker build` and `docker push` I assure you, it's nearly the same. And since those OCI manifest release tags all logically come from a git tag, there's traceability and it is still GitOps in every important sense of the word.
Automation as policy directives state declaratively that an app is always on the latest published version at any given time, a `spec.semver` with a wildcard accomplishes this very simply with a one-liner addition to your config in Flux.
When you need to roll back app A, you remove the automation (in Flux the easiest way is a wildcard) and pin your gitops config for that one app to the particular version that you wanted in git, the cluster repo, the app is pinned to the one version that doesn't have an issue. Then as the issue is resolved, you may remove the pin and put the automation back in place.
As an added benefit, you get a permanent history that shows when incident response began, how long the automation was disabled, and what version was pinned at that time, so you can calculate metrics like "MTTR" and "MBTF" that we love so much.
Imagine 10 apps deployed. All are actively deployed let's say few times a day.
You want to go back 10 days for App A. But in doing so you would have reverted whole state and all apps as they were 10 days ago.
Only way is to cherry pick particular commit and revert it.
No? I mean how git can be useful in rolling back singel components back and forth?