Categories
DevOps Monitoring

Adding health check to Prometheus

So you pulled the official Prometheus docker image and added Prometheus to your monitoring stack. Next, you tried to add a health check for Prometheus and got stuck? Keep reading to find out how to do it 🙂

Previously, we discussed how to create a custom Prometheus exporter for Jenkins . Lately, I’ve discovered a bug in it. Custom Jenkins exporter was sometimes up before Jenkins was available. Therefore, the exporter couldn’t connect to Jenkins and extract metrics from it. To solve this, the exporter has to start only after Jenkins is available. We can check for availability using health check.

Below, I’m going to show how to add health check to Prometheus in a sample Jenkins monitoring project. Of course, we are in containerized world of orchestrated services or you use docker-compose.

Prerequisites

Install on your machine:

  • docker
  • docker-compose
  • git

The tutorial assumes familiarity with:

  • basic git, docker and docker-compose commands.

Isn’t adding health check as easy as using curl?

It turns out that it’s not. This is because the official docker image of Prometheus is based on a custom busy box which doesn’t have curl installed. What is busy box? In short, it’s a small runtime environment with custom versions of Linux utilities. Curl is not one of them. So you wonder how to add a health check to Prometheus without curl? Using wget instead is the option I’ll show below.

Sample Prometheus monitoring stack

To see how wget will replace curl in health check follow the below steps:

  • clone the repository from my github.: git clone [email protected]:w7089/jenkins-monitoring.git
  • cd jenkins-monitoring
  • build all docker images: docker-compose build
  • run the monitoring stack: docker-compose -p jenkins-monitoring up -d
  • you should see that all services are healthy in the output of docker-compose-p jenkins-monitoringps

Prometheus health check looks like below:

    healthcheck:
      test: ["CMD", "wget", "http://localhost:9090"]
      interval: 10s
      timeout: 15s
      retries: 10
      start_period: 40s 

Summary

I hope the post helped you to save time. As always, feel free to share and comment. You may find interesting below articles I wrote:

Bonus: Recommended courses on Pluralsight:

Sign up using this link to get exclusive discounts like 50% off your first month or 15% off an annual subscription)

Recommended books on Amazon:

Prometheus: Up & Running: Infrastructure and Application Performance Monitoring