Aiven Part 3: Grafana with InfluxDB, Monitors and Alerts

Published: Dec 5, 2022 by Isaac Johnson

We’ve covered using Datadog and other tools to monitor and display metrics. Today we’ll wrap our series on Aiven.io by looking at setting up a hosted Grafana instance and tying a custom DNS name to it.

We’ll setup a hosted InfluxDB to collect metrics from Kafka and Redis, then show how to use native Aiven integrations to link InfluxDB to Grafana.

Lastly, we’ll dig in to Grafana queries and dashboards and wrap by setting up Email, Teams and Discord integrations (including setting up a new Discord server)

Grafana

We first create the service from the Services page. It’s one of the least expensive hosted services from Aiven.io starting (presently) as US$35/mo in GCP

/content/images/2022/11/aiventwo-96.png

Once running, we can login with the credentials in the service page

/content/images/2022/11/aiventwo-45.png

And use on the Grafana URL

/content/images/2022/11/aiventwo-46.png

Adding InfluxDB

Of the Options to gather metrics in Aiven, InfluxDB is one of the less expensive at US$60/mo for a startup size instance in US Central

/content/images/2022/11/aiventwo-50a.png

I’ll create and wait on the new service to get past ‘Rebuilding’

/content/images/2022/11/aiventwo-51.png

Custom DNS for Grafana

creating a custom domain name

/content/images/2022/11/aiventwo-42.png

and

/content/images/2022/11/aiventwo-43.png

Hmm.. that domain has issues; let’s use our primary

I’ll create a CNAME record

$ cat r53-aiven-grafana.json
{
  "Comment": "CREATE awx fb.s CNAME record ",
  "Changes": [
    {
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "grafana.freshbrewed.science",
        "Type": "CNAME",
        "TTL": 300,
        "ResourceRecords": [
          {
            "Value": "grafana-3e22cc9-isaac-1040.aivencloud.com"
          }
        ]
      }
    }
  ]
}

$ aws route53 change-resource-record-sets --hosted-zone-id Z39E8QFU0F9PZP --change-batch file://r53-aiven-grafana.json
{
    "ChangeInfo": {
        "Id": "/change/C06282161TELBVGW1ZHNQ",
        "Status": "PENDING",
        "SubmittedAt": "2022-11-24T14:47:15.942Z",
        "Comment": "CREATE awx fb.s CNAME record "
    }
}

Then apply it in Aiven.io

/content/images/2022/11/aiventwo-44.png

In a bit, we’ll see it has a proper cert and responds

/content/images/2022/11/aiventwo-47.png

Then we can log in

/content/images/2022/11/aiventwo-48.png

Adding Data Sources

Before we can graph anything, we need to add a data source

/content/images/2022/11/aiventwo-49.png

Besides setting it up in Grafana, we can setup Data sources from Aiven via the Integrations section

/content/images/2022/11/aiventwo-50.png

With InfluxDB now running, let’s add it to Kafka Metrics

/content/images/2022/11/aiventwo-52.png

In Grafana, we’ll then add a new Integration

/content/images/2022/11/aiventwo-53.png

We’ll want to “Receive Data”

/content/images/2022/11/aiventwo-54.png

At which point we can choose the InfluxDB

/content/images/2022/11/aiventwo-55.png

Within Grafana we now see InfluxDB listed in Data Sources

/content/images/2022/11/aiventwo-56.png

To use it, we can use Influx QL

e.g.

SELECT * FROM "_internal".."database" LIMIT 10

Or us the Explore query creator. For instance, query the CPU usage

/content/images/2022/11/aiventwo-57.png

I can then add it to a new Dashboard

/content/images/2022/11/aiventwo-58.png

and save it with a name

/content/images/2022/11/aiventwo-59.png

Alerting with Grafana

To send any kind of alert, we first need to define a “contact point”

We can get there from Alerting/Contact Points, then selecting “New contact point”

/content/images/2022/11/aiventwo-60.png

We can, for instance, define an email address to which we will send alerts.

/content/images/2022/11/aiventwo-61.png

We can send a test mail

/content/images/2022/11/aiventwo-62.png

However, this just informed me SMTP was not setup out of the box

/content/images/2022/11/aiventwo-63.png

In an on-prem install, we would likely change the .ini file used by Grafana to set the [SMTP] section.

For Aiven.io, we will do it in the Advanced Configuration area

/content/images/2022/11/aiventwo-64.png

I’ll set the settings for GMAIL

/content/images/2022/11/aiventwo-65.png

I found using a real 2FA password won’t work.

/content/images/2022/11/aiventwo-68.png

You need to follow these steps to generate an “App” password we can use for STMP

/content/images/2022/11/aiventwo-66.png

which I can now see in the list (and revoke later)

/content/images/2022/11/aiventwo-67.png

Which brings me to a minor issue I have with Aiven - I don’t like seeing my passwords left visible plain text

/content/images/2022/11/aiventwo-69.png

and we can see it worked

/content/images/2022/11/aiventwo-70.png

Creating Grafana alerts

Before we being, we’ll create a folder

/content/images/2022/11/aiventwo-71.png

Then we can create a query

/content/images/2022/11/aiventwo-72.png

we can set the evaluation behavior

/content/images/2022/11/aiventwo-73.png

then set our notification label

/content/images/2022/11/aiventwo-74.png

We can now create a Notification Policy that uses it

/content/images/2022/11/aiventwo-75.png

and now we can see it has been saved

/content/images/2022/11/aiventwo-76.png

We can create other contact points such as Teams and Pagerduty

/content/images/2022/11/aiventwo-77.png

Quick Note on Teams: This has been plaguing me for a while - I can never sign in. I’ve been stymied by this over and over for a long while. I finally figured it out today. There are two (insert profane explicative) Teams apps. One with Blue and one with a White “T” tile.

/content/images/2022/11/aiventwo-78.png

The White tile, which Windows would always suggest when searching for teams is not going to let you sign into Orgs. The blue tile will. I think only recently did they add “(work or school)” to the App name.

Real Alerts and Metrics

Several days later I actually got a real alert

/content/images/2022/11/aiventwo-99.png

Had I set a proper description, I might have put in a link to wiki pages, or who to call, perhaps a runbook on remediation.

/content/images/2022/11/aiventwo-100.png

However as we can see above, 5 minutes later it was resolved.

While the error message would lead me to believe it was from Kafka, the Spike (since I aggregated all the metrics) was actually from Grafana itself (times are in Zulu/GMT)

/content/images/2022/11/aiventwo-101.png

I’m only sending out Kafka Metrics to Datadog and Cloudwatch

/content/images/2022/11/aiventwo-102.png

But we can see that in both cases, our Kafka cluster was fine

/content/images/2022/11/aiventwo-103.png

(The drop at the end is because I just stopped the cluster in the last 5 minutes)

I can see similar (including the drop off of delivered metrics) in Cloudwatch

/content/images/2022/11/aiventwo-104.png

Setting up Teams

We can go to the Connectors for the Channel

/content/images/2022/11/aiventwo-79.png

Then we’ll configure an Incoming Webhook

/content/images/2022/11/aiventwo-80.png

I’ll use an icon (which I’ll leave for you to grab here):

/content/images/2022/11/43383564-afa9ea6c-93db-11e8-855b-de8be4f79756.png

And we can fill out the Webhook details

/content/images/2022/11/aiventwo-81.png

We’ll then want to copy that URL

/content/images/2022/11/aiventwo-82.png

Back in Grafana we can setup a Teams Notification Contact Type

/content/images/2022/11/aiventwo-83.png

And then test it

/content/images/2022/11/aiventwo-84.png

And see the results

/content/images/2022/11/aiventwo-85.png

Discord

More and more, I’m seeing my colleagues looking to Discord as the ‘new’ Slack. Let’s setup discord (which unlike teams doesn’t require me to endlessly sign out and in again to swap orgs).

I’ll go ahead and “Add a server”

/content/images/2022/11/aiventwo-86.png

Then I’ll choose to “Create My Own”

/content/images/2022/11/aiventwo-87.png

I’ll likely just say for “Me and my friends” on the next prompt. I don’t think it matters.

/content/images/2022/11/aiventwo-88.png

Then we can give it name and icon

/content/images/2022/11/aiventwo-89.png

We can right-click the server icon to get to Integrations so we can add the webhook

/content/images/2022/11/aiventwo-90.png

which brings us to the Integrations page

/content/images/2022/11/aiventwo-91.png

Just like in Teams, we can the Grafana icon

/content/images/2022/11/aiventwo-92.png

I can “Copy Webhook URL” which puts it in the clipboard. Something like

https://discord.com/api/webhooks/1046asdfasdfsadfasdfasdfasdfa/asdfsadfsadfasdfasdfasdf

We can now use that as a Contact Point in Grafana

/content/images/2022/11/aiventwo-93.png

As before, we can test the alert

/content/images/2022/11/aiventwo-94.png

and see it fires in our ‘#general’ channel

/content/images/2022/11/aiventwo-95.png

Summary

Today we set up hosted instances of Grafana and Influx DB. We ingested metrics to InfluxDB via native Aiven.io integrations, then turned around and tied the InfluxDB to Grafana. In Grafana we verified we could see these metrics then worked through adding Contact Points, Labels and Alert Policies. We tested sending out alerts to Email, Teams and Discord. Later we received a proper alert from Grafana and looked at Metrics gathered in Aiven, Datadog and AWS to identify the time and service.

Arguably, using InfluxDB and Grafana is a pretty simple setup for those looking for a straightforward hosted APM system - we have InfluxDB and Grafana, together for just under $100/mo with our only limitation on server space. We can easily tie these to Aiven.io services or external services. We can even use the private VPC option if we wish to ingest metrics without traversing the public internet using the Aiven.io VPC private endpoint option

/content/images/2022/11/aiventwo-97.png

Looking at all Aiven.io offers, one can see it isn’t a replacement for our clouds, but a natural complimentary augmentation - creating turnkey supported hosting solutions in an easy to use way. For instance, this vendor first came on my radar when looking for an MSK solution in GCP (where no Managed Kafka option presently exists).

aiven grafana influxdb

Isaac Johnson

Isaac Johnson

Cloud Solutions Architect

Isaac is a CSA and DevOps engineer who focuses on cloud migrations and devops processes. He also is a dad to three wonderful daughters (hence the references to Princess King sprinkled throughout the blog).

Theme built by C.S. Rhymes