Published: Jul 15, 2022 by Isaac Johnson
One question that I’ve had to discuss lately is how to watch Sendgrid events. If you are in AWS, you likely use SES. But the other clouds generally rely on Sendgrid for a cloud based email service.
We can leverage Datadog to monitor and alert on our Email stats in a common dashboard as well as ingest events as logs.
Setup
In Datadog, add the Twilio Sendgrid Integration
Next, follow the steps to add a SendGrid API key
We can create the API Key with Full Access (excludes billing and email address validation) or use Restricted access to include “Stats - Read Access”
I’ll tag data from Sendgrid with “IsaacSendGrid” (Note: later I found that “Full Access” did not properly send metrics and switched to Restricted Access and enabled Stats)
What we can now are the types of blocks and deliveries Sendgrid collects
The following metrics will be tracked by this integration:
NAME | UNIT | DESCRIPTION |
---|---|---|
sendgrid.emails.blocks | emails | The number of emails that were not allowed to be delivered by ISPs. |
sendgrid.emails.bounce_drops | emails | The number of emails that were dropped because of a bounce. |
sendgrid.emails.bounces | emails | The number of emails that bounced instead of being delivered. |
sendgrid.emails.clicks | emails | The number of links that were clicked in your emails. |
sendgrid.emails.daily.total.blocks | emails | The daily number of emails that were not allowed to be delivered by ISPs. |
sendgrid.emails.daily.total.bounce_drops | emails | The daily number of emails that were dropped because of a bounce. |
sendgrid.emails.daily.total.bounces | emails | The daily number of emails that bounced instead of being delivered. |
sendgrid.emails.daily.total.clicks | emails | The daily number of links that were clicked in your emails. |
sendgrid.emails.daily.total.deferred | emails | The daily number of emails that temporarily could not be delivered. |
sendgrid.emails.daily.total.delivered | emails | The daily number of emails SendGrid was able to confirm were actually delivered to a recipient. |
sendgrid.emails.daily.total.invalid_emails | emails | The daily number of recipients who had malformed email addresses or whose mail provider reported the address as invalid. |
sendgrid.emails.daily.total.opens | emails | The daily total number of times your emails were opened by recipients. |
sendgrid.emails.daily.total.processed | emails | Requests from your website, application, or mail client via SMTP Relay or the API that SendGrid processed daily. |
sendgrid.emails.daily.total.requests | emails | The daily number of emails that were requested to be delivered. |
sendgrid.emails.daily.total.spam_report_drops | emails | The daily number of emails that were dropped due to a recipient previously marking your emails as spam. |
sendgrid.emails.daily.total.spam_reports | emails | The daily number of recipients who marked your email as spam. |
sendgrid.emails.daily.total.unique_clicks | emails | The daily number of unique recipients who clicked links in your emails. |
sendgrid.emails.daily.total.unique_opens | emails | The daily total of unique recipients who opened your emails. |
sendgrid.emails.daily.total.unsubscribe_drops | emails | The daily number of emails dropped due to a recipient unsubscribing from your emails. |
sendgrid.emails.daily.total.unsubscribes | emails | The daily number of recipients who unsubscribed from your emails. |
sendgrid.emails.deferred | emails | The number of emails that temporarily could not be delivered. |
sendgrid.emails.delivered | emails | The number of emails SendGrid was able to confirm were actually delivered to a recipient. |
sendgrid.emails.invalid_emails | emails | The number of recipients who had malformed email addresses or whose mail provider reported the address as invalid. |
sendgrid.emails.opens | emails | The total number of times your emails were opened by recipients. |
sendgrid.emails.processed | emails | Requests from your website, application, or mail client via SMTP Relay or the API that SendGrid processed. |
sendgrid.emails.requests | emails | The number of emails that were requested to be delivered. |
sendgrid.emails.spam_report_drops | emails | The number of emails that were dropped due to a recipient previously marking your emails as spam. |
sendgrid.emails.spam_reports | emails | The number of recipients who marked your email as spam. |
sendgrid.emails.unique_clicks | emails | The number of unique recipients who clicked links in your emails. |
sendgrid.emails.unique_opens | emails | The number of unique recipients who opened your emails. |
sendgrid.emails.unsubscribe_drops | emails | The number of emails dropped due to a recipient unsubscribing from your emails. |
sendgrid.emails.unsubscribes | emails | The number of recipients who unsubscribed from your emails. |
It may take time. Here we see evidence that the metrics are being tracked
We can ingest the logs if we follow the steps for webhooks
And they will show up in our log queries with “source:sendgrid”
I did Note that the “Full API Key” did not actual work for Metrics. I changed to Restricted an enabled the Read access on Stats
and then I could see Metrics
For those wondering how I test Sendgrid, I generally use an Azure Logic App
with the Sendgrid Action
Checking for issues
Let’s say we want to catch Dropped emails.
We could write a query to do just that
We can then create a new Monitor
and choose the source as logs
The full monitor would look as such
We can talk through the parts.
At the top, we see the source as sendgrid and the particular event as parsed from the logs being “dropped”. We chose a window of 4 hours to see that indeed we had some dropped messages (showing the graph as red)
In our next sections we see that we will alert if we see the number jump to more than 1 in a 5m window and send an alert to my Teams channel. I could also trigger a pagerduty alert.
The last part talks about staying in alert for at least 3 rounds if it hasn’t been addressed
Which will trigger an alert as such
Lastly, We can see the metrics we gather in our Dashboards. This can be useful to bundle with other widgets such as GH Action and AzDO monitors
Summary
The integration between Sendgrid and Datadog is pretty slick. I found the log parsing the easiest way to get reliable metrics and alerts, however I’ll continue to watch and examine the native metrics collected via the API Key.
A couple of gotchas I found: First, my long term Sendgrid account was “closed” and had no way to re-open. This rather annoyed me as, while I used it infrequently, I did rely on it and had set it up with verified senders. I would need to create a new account (with new credentials) just to get back to a good place. The strange behavior was that some emails sent and some dropped. And the Metrics/Logs (like webhooks) did not fire. So it was “closed” but still sort of worked.
I often get asked about New Relic as well. New Relic did have a plugin for years (albeit not frequently updated). However, in 2021 New Relic stopped supporting plugins and SendGrid then pointed to this guide for migrating off the plugin. The summary is to use webhooks and a lambda to populate an S3 bucket for ingestion as logs. Then there are steps for manually using NRQL to create dashboards by hand.
I have no axe to grind on NR, however, clearly the Datadog experience with tags and integrations is significantly easier to setup and use.