They have some many parts of spinning up/change the infrastructure (Terraform), connect the services (Consul), run the apps (Nomad) but not their own way to tell you how well they do. Also monitoring is quite sticky and high margin. I think it makes sense but have no special insight.
I'm not so sure. There's very little wrong with prometheus or influx + Grafana. We're about due another iteration of logging stuff now we've all gone graylog->splunk->elk->Loki though. (And they all suck)
The thing with Netdata is you don't make compromises on number of metrics and Cardinality as the data stay with the node. Netdata.cloud can aggregate on the fly without storing. Check it out
That's why we put the CPUThrottlingHigh alert into the kubernetes-mixin project. It a least let folks know. The Node Exporter for example is always throttled and I don't mind. For the user facing parts I'd rather not be in the same situation. Ultimately latency should tell me though.
Prometheus supports writing (replicating) data to a remote endpoint on a per scrape basis with a protocol called remote-write.
You can pretty easily set that up on any Prometheus instance.
There are quite some implementations to receive those remote-write requests: https://prometheus.io/docs/operating/integrations/#remote-en...
You're probably exactly looking for something like that. In fact, I've given a talk about a similar scenario at the KubeCon San Diego: https://www.youtube.com/watch?v=FrcfxkbJH20
Disclosure: I work on Thanos and Thanos Receiver which implements that protocol.
You want to start measuring the closet to your users. In most cases that would be some sort of load balancer. I don't think there's much we can do without going to the client side.
The problem for us monitoring / observability people with kustomize is its limitation to be purely templating for Kubernetes. However we also want to template a lot of things like for example Prometheus configuration. Jsonnet can bridge that gap between the two worlds and in the end generate a ConfigMap YAML file that includes another YAML file Prometheus, as an example.
Maintainer of jsonnet-bundler, kube-prometheus and some monitoring mixins, that are all based on jsonnet, here:
Currently we're mostly keeping a close look at CUE, but not really using it as of right now. However, during the holiday break I've been trying to get into CUE again and there are some things I need to figure out before being able to tell how to incorporate or replace some of our jsonnet projects with CUE, if we really want that.
Some parts of CUE seem like an obvious improvement to what jsonnet currently offers. So 2020 will be exciting in that regard.
It's quite simple to start with on the Go backend side of things and the frontend things aren't too complicated either.