Sendgrid and Datadog

Published: Jul 15, 2022 by Isaac Johnson

One question that I’ve had to discuss lately is how to watch Sendgrid events. If you are in AWS, you likely use SES. But the other clouds generally rely on Sendgrid for a cloud based email service.

We can leverage Datadog to monitor and alert on our Email stats in a common dashboard as well as ingest events as logs.

Setup

In Datadog, add the Twilio Sendgrid Integration

/content/images/2022/07/sendgriddatadog-01.png

Next, follow the steps to add a SendGrid API key

/content/images/2022/07/sendgriddatadog-02.png

We can create the API Key with Full Access (excludes billing and email address validation) or use Restricted access to include “Stats - Read Access”

/content/images/2022/07/sendgriddatadog-03.png

I’ll tag data from Sendgrid with “IsaacSendGrid” (Note: later I found that “Full Access” did not properly send metrics and switched to Restricted Access and enabled Stats)

/content/images/2022/07/sendgriddatadog-04.png

What we can now are the types of blocks and deliveries Sendgrid collects

The following metrics will be tracked by this integration:

NAME UNIT DESCRIPTION
sendgrid.emails.blocks emails The number of emails that were not allowed to be delivered by ISPs.
sendgrid.emails.bounce_drops emails The number of emails that were dropped because of a bounce.
sendgrid.emails.bounces emails The number of emails that bounced instead of being delivered.
sendgrid.emails.clicks emails The number of links that were clicked in your emails.
sendgrid.emails.daily.total.blocks emails The daily number of emails that were not allowed to be delivered by ISPs.
sendgrid.emails.daily.total.bounce_drops emails The daily number of emails that were dropped because of a bounce.
sendgrid.emails.daily.total.bounces emails The daily number of emails that bounced instead of being delivered.
sendgrid.emails.daily.total.clicks emails The daily number of links that were clicked in your emails.
sendgrid.emails.daily.total.deferred emails The daily number of emails that temporarily could not be delivered.
sendgrid.emails.daily.total.delivered emails The daily number of emails SendGrid was able to confirm were actually delivered to a recipient.
sendgrid.emails.daily.total.invalid_emails emails The daily number of recipients who had malformed email addresses or whose mail provider reported the address as invalid.
sendgrid.emails.daily.total.opens emails The daily total number of times your emails were opened by recipients.
sendgrid.emails.daily.total.processed emails Requests from your website, application, or mail client via SMTP Relay or the API that SendGrid processed daily.
sendgrid.emails.daily.total.requests emails The daily number of emails that were requested to be delivered.
sendgrid.emails.daily.total.spam_report_drops emails The daily number of emails that were dropped due to a recipient previously marking your emails as spam.
sendgrid.emails.daily.total.spam_reports emails The daily number of recipients who marked your email as spam.
sendgrid.emails.daily.total.unique_clicks emails The daily number of unique recipients who clicked links in your emails.
sendgrid.emails.daily.total.unique_opens emails The daily total of unique recipients who opened your emails.
sendgrid.emails.daily.total.unsubscribe_drops emails The daily number of emails dropped due to a recipient unsubscribing from your emails.
sendgrid.emails.daily.total.unsubscribes emails The daily number of recipients who unsubscribed from your emails.
sendgrid.emails.deferred emails The number of emails that temporarily could not be delivered.
sendgrid.emails.delivered emails The number of emails SendGrid was able to confirm were actually delivered to a recipient.
sendgrid.emails.invalid_emails emails The number of recipients who had malformed email addresses or whose mail provider reported the address as invalid.
sendgrid.emails.opens emails The total number of times your emails were opened by recipients.
sendgrid.emails.processed emails Requests from your website, application, or mail client via SMTP Relay or the API that SendGrid processed.
sendgrid.emails.requests emails The number of emails that were requested to be delivered.
sendgrid.emails.spam_report_drops emails The number of emails that were dropped due to a recipient previously marking your emails as spam.
sendgrid.emails.spam_reports emails The number of recipients who marked your email as spam.
sendgrid.emails.unique_clicks emails The number of unique recipients who clicked links in your emails.
sendgrid.emails.unique_opens emails The number of unique recipients who opened your emails.
sendgrid.emails.unsubscribe_drops emails The number of emails dropped due to a recipient unsubscribing from your emails.
sendgrid.emails.unsubscribes emails The number of recipients who unsubscribed from your emails.

It may take time. Here we see evidence that the metrics are being tracked

/content/images/2022/07/sendgriddatadog-05.png

We can ingest the logs if we follow the steps for webhooks

/content/images/2022/07/sendgriddatadog-06.png

And they will show up in our log queries with “source:sendgrid”

/content/images/2022/07/sendgriddatadog-07.png

I did Note that the “Full API Key” did not actual work for Metrics. I changed to Restricted an enabled the Read access on Stats

/content/images/2022/07/sendgriddatadog-09.png

and then I could see Metrics

/content/images/2022/07/sendgriddatadog-08.png

For those wondering how I test Sendgrid, I generally use an Azure Logic App

/content/images/2022/07/sendgriddatadog-10.png

with the Sendgrid Action

/content/images/2022/07/sendgriddatadog-11.png

Checking for issues

Let’s say we want to catch Dropped emails.

We could write a query to do just that

/content/images/2022/07/sendgriddatadog-12.png

We can then create a new Monitor

/content/images/2022/07/sendgriddatadog-13.png

and choose the source as logs

/content/images/2022/07/sendgriddatadog-14.png

The full monitor would look as such

/content/images/2022/07/sendgriddatadog-15.png

We can talk through the parts.

At the top, we see the source as sendgrid and the particular event as parsed from the logs being “dropped”. We chose a window of 4 hours to see that indeed we had some dropped messages (showing the graph as red)

/content/images/2022/07/sendgriddatadog-16.png

In our next sections we see that we will alert if we see the number jump to more than 1 in a 5m window and send an alert to my Teams channel. I could also trigger a pagerduty alert.

/content/images/2022/07/sendgriddatadog-17.png

The last part talks about staying in alert for at least 3 rounds if it hasn’t been addressed

/content/images/2022/07/sendgriddatadog-18.png

Which will trigger an alert as such

/content/images/2022/07/sendgriddatadog-19.png

Lastly, We can see the metrics we gather in our Dashboards. This can be useful to bundle with other widgets such as GH Action and AzDO monitors

/content/images/2022/07/sendgriddatadog-20.png

Summary

The integration between Sendgrid and Datadog is pretty slick. I found the log parsing the easiest way to get reliable metrics and alerts, however I’ll continue to watch and examine the native metrics collected via the API Key.

A couple of gotchas I found: First, my long term Sendgrid account was “closed” and had no way to re-open. This rather annoyed me as, while I used it infrequently, I did rely on it and had set it up with verified senders. I would need to create a new account (with new credentials) just to get back to a good place. The strange behavior was that some emails sent and some dropped. And the Metrics/Logs (like webhooks) did not fire. So it was “closed” but still sort of worked.

/content/images/2022/07/sendgriddatadog-21.png

I often get asked about New Relic as well. New Relic did have a plugin for years (albeit not frequently updated). However, in 2021 New Relic stopped supporting plugins and SendGrid then pointed to this guide for migrating off the plugin. The summary is to use webhooks and a lambda to populate an S3 bucket for ingestion as logs. Then there are steps for manually using NRQL to create dashboards by hand.

I have no axe to grind on NR, however, clearly the Datadog experience with tags and integrations is significantly easier to setup and use.

datadog sendgrid

Have something to add? Feedback? You can use the feedback form

Isaac Johnson

Isaac Johnson

Cloud Solutions Architect

Isaac is a CSA and DevOps engineer who focuses on cloud migrations and devops processes. He also is a dad to three wonderful daughters (hence the references to Princess King sprinkled throughout the blog).

Theme built by C.S. Rhymes