In the previous post, Boost Your EKS Efficiency: Migrating from Cluster-Autoscaler to Karpenter, we added Karpenter to our AWS EKS reachable by private VPN and created via Terraform.
We can then deploy our applications and scale them in the most appropriate way.
However, we also need to be able to monitor our applications and the infrastructure they use.
But what is the fastest and most efficient way to monitor a Kubernetes cluster and its nodes, both Linux and Windows?
There is a stack called kube-prometheus that was created specifically to provide a ready-made and complete solution that includes everything needed to monitor your Kubernetes cluster.
In this way, in one go we install not only Prometheus, Grafana and also other necessary software such as the exporter for Windows nodes, but we already have all the configurations, dashboards and alarms ready, prepared by a specialized community.
In this post we will see how to install kube-prometheus on our test infrastructure via Terraform in the fastest way to be ready to monitor our microservices.
It is possible to install kube-prometheus via its Helm chart and therefore also via Terraform thanks to the Helm chart provider.
We will use a ready-made Terraform module made for this purpose, which we can find on the Terraform registry under the name “prometheus-stack”.
Resuming the project of the previous post, migrating-from-cluster-autoscaler-to-karpenter, we have therefore removed the installation of the cluster-autoscaler, we have kept the installation of Karpenter and we are going to add the installation of kube-prometheus via the mentioned module.
module "kube_prometheus_stack" {
source = "sparkfabrik/prometheus-stack/sparkfabrik"
version = "4.0.0"
count = local.kube_prometheus_enabled ? 1 : 0
prometheus_stack_chart_version = "66.3.0"
namespace = "kube-prometheus-stack"
prometheus_stack_additional_values = ["${file("${path.module}/files/kube-prometheus-stack-additional-values.yaml")}"]
}
In this case the kube-prometheus Helm chart values override file is used to indicate which nodes its pods can be scheduled to and enable Windows node monitoring.
alertmanager:
alertmanagerSpec:
tolerations:
- key: "type"
operator: "Equal"
value: "service"
effect: "NoSchedule"
prometheusOperator:
admissionWebhooks:
patch:
tolerations:
- key: "type"
operator: "Equal"
value: "service"
effect: "NoSchedule"
tolerations:
- key: "type"
operator: "Equal"
value: "service"
effect: "NoSchedule"
prometheus:
prometheusSpec:
tolerations:
- key: "type"
operator: "Equal"
value: "service"
effect: "NoSchedule"
thanosRuler:
thanosRulerSpec:
tolerations:
- key: "type"
operator: "Equal"
value: "service"
effect: "NoSchedule"
grafana:
tolerations:
- key: "type"
operator: "Equal"
value: "service"
effect: "NoSchedule"
kube-state-metrics:
tolerations:
- key: "type"
operator: "Equal"
value: "service"
effect: "NoSchedule"
windowsMonitoring:
enabled: true
We also add in the eks.tf file the gp3 storage class as default, which is needed by kube-prometheus as well as our microservices.
resource "kubernetes_storage_class" "gp3" {
count = local.gp3_storage_class_enabled ? 1 : 0
metadata {
name = "gp3"
annotations = {
"storageclass.kubernetes.io/is-default-class" : true
}
}
storage_provisioner = "ebs.csi.aws.com"
reclaim_policy = "Delete"
allow_volume_expansion = true
volume_binding_mode = "WaitForFirstConsumer"
}
We are ready to create our test infrastructure by adding kube-prometheus for Kubernetes monitoring.
The project related to this post can be found here: monitoring-eks-with-kube-prometheus
The procedure is very similar to the one used in the previous post, but in this case we will also enable the deployment of kube-prometheus and add a Karpenter node pool for microservices that also use Windows nodes.
Creating VPC, VPN, and EKS
First of all, let’s create our infrastructure, including EKS and private VPN to access it.
To do this, simply indicate your ip to use the vpn and set the flags about Karpenter and kube-prometheus to false in the locals.tf file, then use terraform init and terraform apply.
karpenter_enabled = false
gp3_storage_class_enabled = false
kube_prometheus_enabled = false
vpn_allowed_cidr_blocks = ["your_ip/32"]
Connecting to your EKS
Get the ovpn file with the following command to connect via your private VPN to your EKS cluster.
terraform output -raw ec2_client_vpn_configuration
Add the cluster to your kubeconfig:
aws eks update-kubeconfig --name test-kube-prometheus --region us-east-1
Deploy Karpenter
We get the credentials for the registry where to find the Karpenter Helm chart and enable its deployment and the gp3 storage class configuration as default, then we can use terraform apply again.
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
karpenter_enabled = true
gp3_storage_class_enabled = true
kube_prometheus_enabled = false
Deploy the kube-prometheus stack
Once Karpenter is installed, we apply to our EKS the resources needed to have not only a node pool for workloads, as in the previous post, but also a node pool dedicated to service software, such as Prometheus and Grafana.
You can find this resources in the file “karpenter-node-pools-linux.yaml” inside the “k8s-tests” folder.
We can then enable the flag for the kube-prometheus deployment and run terraform apply.
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: workloads
spec:
amiSelectorTerms:
- alias: al2023@latest
role: Karpenter-test-kube-prometheus
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: test-kube-prometheus
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: test-kube-prometheus
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
iops: 10000
deleteOnTermination: true
throughput: 125
tags:
Test: test-kube-prometheus
NodeGroup: workloads
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: workloads
spec:
template:
metadata:
labels:
nodegroup: "workloads"
spec:
nodeClassRef:
name: workloads
group: karpenter.k8s.aws
kind: EC2NodeClass
taints:
- key: type
value: workloads
effect: NoSchedule
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["m"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: In
values: ["4", "8"]
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values: ["nitro"]
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["2"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
limits:
cpu: 500
memory: 500Gi
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: service
spec:
amiSelectorTerms:
- alias: al2023@latest
role: Karpenter-test-kube-prometheus
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: test-kube-prometheus
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: test-kube-prometheus
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
iops: 10000
deleteOnTermination: true
throughput: 125
tags:
Test: test-kube-prometheus
NodeGroup: service
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: service
spec:
template:
metadata:
labels:
nodegroup: "service"
spec:
nodeClassRef:
name: service
group: karpenter.k8s.aws
kind: EC2NodeClass
taints:
- key: type
value: service
effect: NoSchedule
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: In
values: ["4", "8", "16", "32"]
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values: ["nitro"]
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["2"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
limits:
cpu: 500
memory: 500Gi
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
kubectl apply -f karpenter-node-pools-linux.yaml
karpenter_enabled = true
gp3_storage_class_enabled = true
kube_prometheus_enabled = true
You will now be able to see the pods of the kube-prometheus stack in the eponymous namespace, for example via Freelens.
Let’s add Linux and Windows deployments
Let’s add a Windows Karpenter node pool and then some sample application deployments, both Linux and Windows.
This way we can verify that the kube-prometheus stack with its ready-made configurations and dashboards allows us to monitor applications using Kubernetes nodes with both operating systems.
Let’s apply in order the following files that define the resources needed by Kubernetes.
“karpenter-node-pools-windows.yaml”
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: win
namespace: win
spec:
amiFamily: Windows2019
role: Karpenter-test-kube-prometheus
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: test-kube-prometheus
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: test-kube-prometheus
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 10000
deleteOnTermination: true
throughput: 125
tags:
Test: test-kube-prometheus
NodeGroup: win
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: win
namespace: win
spec:
template:
metadata:
labels:
nodegroup: "win"
spec:
nodeClassRef:
name: win
taints:
- key: type
value: win
effect: NoSchedule
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
- key: "karpenter.k8s.aws/instance-cpu"
operator: In
values: ["4", "8", "16", "32"]
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values: ["nitro"]
- key: "karpenter.k8s.aws/instance-generation"
operator: Gt
values: ["2"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
limits:
cpu: 1000
memory: 1000Gi
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
“deployments-windows.yaml”
apiVersion: apps/v1
kind: Deployment
metadata:
name: windows-server-iis-ltsc2019
namespace: default
spec:
selector:
matchLabels:
app: windows-server-iis-ltsc2019
tier: backend
track: stable
replicas: 1
template:
metadata:
labels:
app: windows-server-iis-ltsc2019
tier: backend
track: stable
spec:
containers:
- name: windows-server-iis-ltsc2019
image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
ports:
- name: http
containerPort: 80
imagePullPolicy: IfNotPresent
command:
- powershell.exe
- -command
- "Add-WindowsFeature Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbinaries.blob.core.windows.net/servicemonitor/2.0.1.6/ServiceMonitor.exe' -OutFile 'C:\\ServiceMonitor.exe'; echo '<html><body><br/><br/><H1>Windows Container Workshop - Windows LTSC2019!!!<H1></body><html>' > C:\\inetpub\\wwwroot\\iisstart.htm; C:\\ServiceMonitor.exe 'w3svc'; "
nodeSelector:
nodegroup: win
kubernetes.io/os: windows
tolerations:
- key: type
operator: Equal
value: win
effect: NoSchedule
“deployments-linux.yaml”
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: default
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
resources:
limits:
cpu: '1'
memory: 1Gi
requests:
cpu: '1'
memory: 1Gi
nodeSelector:
nodegroup: workloads
tolerations:
- key: type
operator: Equal
value: workloads
effect: NoSchedule
Karpenter will request the necessary nodes from AWS EC2 in this case. In the case of Windows deployment it could take up to 7 minutes for example, while for Linux one minute can be enough, but in this case for example we are reusing the same node pool used for the kube-prometheus pods.
You will be able to see for example through Freelens the pods of both applications running and the relative nodes.
Let’s connect to Grafana and see the dashboards
Now through Freelens we can connect via forward to the port of the Kubernetes service related to the Grafana included in the kube-prometheus stack.
You can find the username and password to connect to Grafana in a secret to be viewed and decrypted always via Freelens.
Log in and click on the dashboard section.
You will then find all the Grafana dashboards already prepared by default by the kube-prometheus stack for the purpose of monitoring your Kubernetes cluster both with pods on Linux nodes and with pods on Windows nodes.
Grafana dashboards to monitor Kubernetes cluster
Some of the dashboards are dedicated to the Kubernetes cluster itself and its services, such as the API, kubelet, CoreDNS and networking.
Grafana dashboards to monitor Kubernetes Linux workloads
There are several dashboards where you can monitor your microservices on Kubernetes Linux nodes.
You can monitor from node to pod, even selecting the namespace.
Grafana dashboards to monitor Kubernetes Windows workloads
If we need to have Kubernetes clusters with Windows nodes, the kube-prometheus stack already provides dashboards and related metrics to monitor workloads with Windows pods.
Additional dashboards for Grafana
To get more details, compared to those given by the dashboards already provided as standard with kube-prometheus, it is possible to add other dashboards already present in the community. Below we see one dedicated to monitoring Kubernetes clusters with Windows nodes, called “Windows monitoring“.
Ready-made Prometheus alert manager alerting rules
With kube-prometheus in addition to the ready-made metrics and dashboards, we also have a set of alarm rules for the alerting manager ready-made, as we can see in the following screen.
Next steps
We have seen how to add monitoring to our test Kubernetes cluster in the fastest and most effective way possible.
Of course, in this post we have only scratched the surface of the possibilities and actions necessary for monitoring a Kubernetes cluster.
In the next posts we will go forward, for example, by seeing how to expose our applications and their APIs to the users who need to use them.