Skip to main content

Monitoring your containers in an AKS cluster with Prometheus

Monitoring and alerting is arguably one of the most important thing in Cloud Engineering and DevOps. It's the difference between your clients stack being up and a client being down. Most of us have SLA's to abide by (for good reason). Today we're going to learn how to spin up Prometheus in an AKS cluster to monitor our applications.

Pre-reqs;
1. Intermediate knowledge of Kubernetes
2. An AKS cluster spun up in Azure

Recently AKS supports Prometheus via Helm, so we'll use that for an automated solution to spin this up. This installs kube-prometheus, which is a containerized version of the application. With raw Prometheus, there are a few things that are needed for the operator;

1. Prometheus: Defines a desired deployment.
2. ServiceMonitor: Specifies how groups of services should be monitored
3. Alertmanager: Defines the operator to ensure services and deployments are running by matching the resource

With kube-prometheus, it is all packaged for you. This means configuring all of that is not needed. It also comes with Grafana, an open source UI to monitor and manage your deployments/services.

First, let's go into our Cloud Shell in Azure to confirm we can access our AKS cluster. If you cannot see anything after running kubectl get nodes -o wide, you will need to authenticate by running:


Your --name and --resource-group will of course be different, so please put in the metadata needed for this.

Next we'll have to configure Helm by running helm init in your Cloud Shell. If all is well, you will see that Helm and Tiller is installed. Helm is a package manager much like yum or apt in Linux. Tiller is the component that talks directly to the Kubernetes API for installing your Helm resources into the cluster.

After that, we'll go ahead and install CoreOS into our Kubernetes cluster. CoreOS is the leading container Operating System. It will provide us automation, security, and scalability for critical applications. For more info, please visit: https://coreos.com/why/

helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/




We need a namespace for Prometheus to live, so let's go ahead and create a monitoring namespace by doing the following;

kubectl create namespace monitoring

Now let's go ahead and install the Prometheus operator and kube-prometheus. For this, we will need to give Tiller an RBAC (Role Based Access Control) policy for kube-system namespace and the monitoring namespace.

Please run the following yml files with kubectl apply -f. For example, if you give your YAML files a name of rbac.yml, you will run kubectl apply -f rbac.yml to apply the RBAC policies.


apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system


apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: monitoring


After the above is ran, you should be able to install Prometheus.

The following two lines will install the Prometheus operator and kube-prometheus

helm install coreos/prometheus-operator --name prometheus-operator --namespace monitoring

helm install coreos/kube-prometheus --name kube-prometheus --set global.rbacEnable=false --namespace monitoring

You should see something similar to the following screenshots;



By installing kube-prometheus, we got the following;

1. One Prometheus server
2. A service monitor that allows us to monitor the cluster itself
3. Grafana installed that has a set of pre-configured dashboards.

Now that we have everything set up, I'd like to access my Grafana dashboard from my localhost. Ideally in a production environment, you'd have a jumpbox in Azure for this purpose. For us demoing, doing it on localhost is fine. You'll have to connect to the AKS cluster by running the following in a PowerShell Window;

az aks get-credentials --name AKSTest10 --resource-group AKSTest

Please run the above with the proper metadata (your AKS cluster name and Resource Group)

Now that we have our server up, we have to connect to the pod that is running the server. We can do this with some Kubernetes port forwarding. Please use the following command;

kubectl --namespace monitoring port-forward $(kubectl get pod --namespace monitoring -l prometheus=kube-prometheus -l app=prometheus -o template --template "{{(index .items 0).metadata.name}}") 9090:9090

We will forward our traffic for our Grafana UI to port 9090. We're also forwarding to our local host so we can access the Grafana dashboard from our computer.


If all is working as it should, you should be able to go to http://localhost:9090/targets and see your UI.

Open up your Cloud Shell again and this time, go to Bash. Run the following to get your username and password to log in.

echo username:$(kubectl get secret --namespace monitoring kube-prometheus-grafana -o jsonpath="{.data.user}"|base64 --decode;echo) echo password:$(kubectl get secret --namespace monitoring kube-prometheus-grafana -o jsonpath="{.data.password}"|base64 --decode;echo)

Next we'll forward to port 3000 to the pod that hosts Grafana (please do this on your localhost in another PowerShell window while keeping the other PowerShell window for forwarding traffic on 9090)

kubectl --namespace monitoring port-forward $(kubectl get pod --namespace monitoring -l app=kube-prometheus-grafana -o template --template "{{(index .items 0).metadata.name}}") 3000:3000

Now open up http://localhost:3000/login to log into your dashboard. Log in with your username and password




And there you have it! Your container monitoring solution is up and running!


Comments

Popular posts from this blog

DevOps tooling in the Microsoft realm

When I really started to dive into automation and practicing DevOps with specific tooling, there were a few key players. At the time Microsoft was not one of them. They were just starting to embrace the open source world, including the art and practice of DevOps. Since then Microsoft has went all in and the tech giant has made some incredible tooling. Recently I switched to a Microsoft-heavy environment and I love it. I went from AWS/Python/Ansible/Jenkins to Azure/PowerShell/ARM/Azure DevOps. My first programming language was PowerShell so being back in the saddle allowed me to do a full circle between all of the different types of tooling in both worlds. Today I want to share some of that tooling with you.

The first thing I want to talk about is ARM. What is ARM? ARM is a configuration management tool that allows you to perform software-defined-infrastructure. Much like Ansible and Terraform, ARM allows you to define what you want your environment to look like at scale. With ARM, yo…

Run PowerShell code with Ansible on a Windows Host

Ansible is one of the Configuration Manager kings in the game. With it's easy-to-understand syntax and even easier to use modules, Ansible is certainly a go-to when you're picking what Configuration Management you want to use for your organization. Your question may be "but Ansible is typically on Linux and what happens when I'm in a Windows environment?". Luckily I'm here to tell you that Ansible will still work! I was pleasantly surprised with how easy it is to use Ansible on Windows with a little WinRM magic. Let's get started.

Pre-requisites for this post:
1) WinRM set up to connect to your Windows host from Ansible
2) Ansible set up for Windows Remote Management
3) SSH access to the Ansible host
4) Proper firewall rules to allow WinRM (port 5985) access from your Ansible host to your Windows host
5) Hosts file set up in Ansible that has your IP or hostname of your Windows Server.
6) At least one Linux host running Ansible and one Windows Server host …