Risk analysis and security compliance in Kube-prometheus
Kube-prometheus is an open-source project written in Jsonnet, which bundles Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator. Prometheus-Operator & Kube-prometheus have grown quite popular in the Kubernetes monitoring space and are used by several companies across the world.
With that in mind, we want kube-prometheus to be a secure application by default. To accomplish this feat, on Jan 17th we introduced manifest scanning in our CI with Kubescape. Kubescape is a K8s open-source tool capable of providing risk analysis and security compliance via CLI, great threat documentation and it just made things super simple for us to identify possible threats in our configuration and tackle them one by one.
In this blog I’ll show how we were when we first introduced security scanning to our CI, the actions we’ve made to reduce threat risk and at what level of security we managed to achieve:
The Initial Scanning
The initial result can be seen below:
The first thing to notice is that a lot of tests were skipped. This happens because Kubescape is capable of scanning a running Kubernetes cluster, besides YAML manifests. Since Kube-prometheus does not represent a running cluster, our CI only scans our generated YAML manifests.
Now, for the non-skipped tests:
- Allowed hostpath — Risk score: 0%
- Application credentials in configuration files — Risk score: 0%
- Cluster-admin binding — Risk score: 0%
- Exec into container — Risk score: 0%
- Insecure capabilities — Risk score: 0%
- Non-root containers — Risk score: 0%
- Privileged container — Risk score: 0%
- Resource policies — Risk score: 0%
Partial or completely failed tests
- Allow privilege escalation — Risk score: 100%
- Automatic mapping of service account — Risk score 100%
- Container hostPort — Risk score: 16%
- Host PID/IPC privileges — Risk score: 16%
- HostNetwork access — Risk score: 16%
- Immutable container filesystem — Risk score: 100%
- Ingress and Egress blocked — Risk score: 100%
- Linux hardening — Risk score: 100%
Although all failed tests do represent security threats, some of them cannot be fixed due to the nature of our components, forcing us to create security exceptions.
As shown in the image above, we started with an overall 33.39% security risk. I’ll go over every failed test, and the efforts made by the kube-prometheus team to solve them!
Allow privilege escalation
Vulnerability documentation: https://hub.armo.cloud/docs/c-0016
Looking at Kubernetes documentation (Jan 2022), we see the following:
AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the
no_new_privsflag gets set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged OR 2) has
By saying that “AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged OR 2) has
CAP_SYS_ADMIN“, we believe that the opposite is true as well. If a container is not running as Privileged and don’t have
CAP_SYS_ADMIN capabilities, then AllowPrivilegeEscalation is false. Kubescape doesn’t care, it asks us to explicitly set this field to false anyways. All components used by kube-prometheus don’t run as privileged and don’t have the risky linux capability, so we can just set AllowPrivilegeEscalation to false without being worried.
Due to the nature of our project, we build our own configuration for some components, but we also import configuration from other projects.
Here are the involved efforts to eliminate this vulnerability:
Automatic mapping of service account
Vulnerability documentation: https://hub.armo.cloud/docs/c-0034
Again reading Kubernetes documentation, if
automountServiceAccountToken is not set to false then the service account token will be mounted into the container. Such token may provide access to Kubernetes API that could be harmful in case of an attack.
Some of our components do need to talk with Kubernetes API though, so we cannot disable this field to everyone:
- Prometheus-Adapter — Talks with API server to collect info from nodes, pods and the metrics.k8s.io API
- Prometheus-Operator, Kube-state-metrics and Blackbox-exporter — Not because of what those components do, but because of the kube-rbac-proxy sidecar that runs alongside those 3 components. Kube-rbac-proxy is added to mitigate another security risk.
Vulnerability documentation: https://hub.armo.cloud/docs/c-0044
This one was quite confusing to decide if we could disable it or not. Looking at the vulnerability docs, hostPort is dangerous because it binds the host port to the container and it can cause scheduling problems on big clusters.
For kube-prometheus, there is no strong reason to have it enabled and we almost removed it completely in this PR, but while testing it we noticed that hostPort was still there.
Turns out that Kubernetes will always set a hostPort if the pod is running with HostNetwork access. Since we can’t remove hostNetwork from node-exporter, we won’t be able to remove this vulnerability as well. We just focused on documenting that better and adding an exception to the Kubescape config.
Host PID/IPC privileges
Vulnerability documentation: https://hub.armo.cloud/docs/c-0038
HostPID and HostIPC may give the container direct access to the host and should be avoided unless extremely necessary. Unfortunately, node-exporter depends on hostPID to be able to gather statistics from the host.
Our action here is to just document and add an exception to Kubescapes’ config.
Vulnerability documentation: https://hub.armo.cloud/docs/c-0041
Host Network controls whether the pod may use the node network namespace. Doing so gives the pod access to the loopback device, services listening on localhost, and could be used to snoop on network activity of other pods on the same node.
Just like the previous vulnerability, this one cannot be removed since node-exporter depends on the Host Network access to gather network statistics from the node.
We’ve added an exception to Kubescape’s config.
Immutable container filesystem
Vulnerability documentation: https://hub.armo.cloud/docs/c-0017
By not setting readOnlyRootFilesystem to true on container level, we allow possible attackers to download and run malicious files from our containers, while most of our components don’t even need any write access at all. If a container needs write access, remediation can be done by allowing write access to a volume mount very specific to the path which the application needs access to.
And oof, after trying to roll this out we’ve got reports of broken components opened by our community, with a PR promptly opened for a quick fix 😄.
Ingress and Egress blocked
Vulnerability documentation: https://hub.armo.cloud/docs/c-0030
The biggest challenge here was actually to configure our CI to test with NetworkPolicies enabled. Our end-to-end tests deploy our stack into a KinD cluster and hits the Prometheus API checking if the stack is working as expected and one of those checks makes sure that Prometheus is able to scrape all components of kube-prometheus. Since NetworkPolicies can definitely break the stack if misconfigured, we want to make sure our CI considers NetworkPolicies as well.
Although Kubernetes accepts NetworkPolicies resources, it won’t do anything unless we install a network plugin capable of handling them. We went with Calico, after reading this comment on an issue requesting NetworkPolicy support in the kubernetes-sigs/kind github repository.
Vulnerability documentation: https://hub.armo.cloud/docs/c-0055
This one can be quite controversial for users of kube-prometheus. It can be easily remediated by adding a default seccomp profile to the container’s security context, like this for example:
- name: sec-ctx-demo
command: [ "sh", "-c", "sleep 1h" ]
type: RuntimeDefault # Set a default seccomp profile
While definitely more secure than running containers with the Unconfined seccomp profile, relying on RuntimeDefault means that our components are probably getting access to more Linux Capabilities than necessary, depending on the container runtime used.
To remove this dependency on container runtime defaults, we can make sure that all components have only what is really needed by explicitly declaring what Linux capabilities each component will have. And truth is, we can drop all capabilities to all components except two:
- Node-exporter — May require CAP_SYS_TIME if timex collector is enabled.
- Blackbox-exporter — Requires NET_RAW if doing ICMP probes.
Everything else can be just dropped:
With our hats of maintainers of the kube-prometheus, there is no point going beyond manifests scanning during CI. Every user of our project is responsible for keeping security compliance after the stack is deployed.
With that said, we know that only scanning manifests during CI is not enough to guarantee low risk scores. A common patter in Kubernetes is the use of CRDs and operator that can create and manage new resources as they get applied to Kubernetes. Kube-prometheus uses the prometheus-operator to manage Prometheus, Alertmanager, Thanos-sidecar and Config-reloader containers which are only present after our manifests are applied to a running cluster.
To continue our tests, we’ve deployed kube-prometheus and Armo Cluster components into a KinD so we can get a deeper look into vulnerabilities that couldn’t be caught by CI.
As expected, resources managed by the operator also needed some love
And there also another PR to enable readonlyRootFilesystem on prometheus container, which seems to be more tricky than expected, users of the prometheus-operator can use the field QueryLogFile to enable Prometheus Query Log feature. Users can set any path and the volume/volumeMount must be set accordingly, but there are two caveats:
- Users can mount an extra persistentVolume to the queryLog path, which means we don’t want to create an extra volume for it.
- Users can set the queryLog path to /dev/stdout, which will make the logs available to the container’s stdout. Another possibility where we don’t want an extra volume.
At the time when this blog was written, the solution for the problem above is still under discussion.
If you want to know more about what we’re doing, or even participate in making this project better, feel free to join us on different communication channels: