Datadog Cost Integration

Published: Oct 5, 2023 by Isaac Johnson

This summer Datadog came out with Cloud Cost Management as a new feature to its Observability suite. Today we’ll setup Cost Costs with AWS and Azure and see what we can accomplish. We’ll ingest some real data, look at dashboards, and even setup monitors and alerts.

Setup

We’ll find the new Cloud Cost Management under Infrastructure

/content/images/2023/10/datadogcc-01.png

Our next step to setup AWS or Azure

/content/images/2023/10/datadogcc-02.png

AWS

We’ll next add an AWS account

/content/images/2023/10/datadogcc-03.png

Let’s look at some of the options

/content/images/2023/10/datadogcc-04.png

We can add one or multiple accounts. In my case, I just want to monitor one.

We can select an existing DD API key or create a new one.

Lastly, we can opt-in (or not) on Lambda Log Forwarders and Cloud Security Management

Clicking the “Launch CloudFormation” will kick us over to the AWS console

/content/images/2023/10/datadogcc-05.png

We then click “Create stack” to launch

/content/images/2023/10/datadogcc-06.png

which creates it

/content/images/2023/10/datadogcc-07.png

It seems I must already have the role as it failed with the error “already exists”

/content/images/2023/10/datadogcc-08.png

I tried again, this time with definately a new name

/content/images/2023/10/datadogcc-09.png

Which launched

/content/images/2023/10/datadogcc-10.png

I could watch as it created the resources

/content/images/2023/10/datadogcc-11.png

While this meant I now had AWS added, but I need a CUR (Cost Usage Report) in an S3 bucket for this to work.

/content/images/2023/10/datadogcc-12.png

I was about to follow the guide on creating a CUR when I realized we already did this for Configure8.

/content/images/2023/10/datadogcc-13.png

I’ll try that now

/content/images/2023/10/datadogcc-14.png

I need to create a Policy on the Role. I could create a Policy, then attach to the DataDogIntegration2023 role. However, I decided instead to do it inline instead

/content/images/2023/10/datadogcc-15.png

I’ll then click create to the policy

/content/images/2023/10/datadogcc-16.png

which created

/content/images/2023/10/datadogcc-17.png

In Datadog I clicked “Get Started with Cloud Costs” to check all the configurations.

Here it found I needed a slash on my prefix:

/content/images/2023/10/datadogcc-18.png

I changed it per the settings (thought he “validate” box looks funny)

/content/images/2023/10/datadogcc-19.png

and then tried again.. this time it doesn’t like the daily usage nor a Refresh setting that is set

/content/images/2023/10/datadogcc-20.png

Let’s just create a new one

/content/images/2023/10/datadogcc-21.png

I’ll set it to compress to gzip, hourly granularity and to use a new bucket

/content/images/2023/10/datadogcc-22.png

I now have a new one I can use

/content/images/2023/10/datadogcc-23.png

Once set in the configuration

/content/images/2023/10/datadogcc-25.png

I also needed to update the IAM because it’s a different bucket

/content/images/2023/10/datadogcc-24.png

I clicked Getting Started, the removed the slash from the prefix and tried again. This time we finally got it to show all greens

/content/images/2023/10/datadogcc-26.png

I can now view AWS costs in Cost Explorer

/content/images/2023/10/datadogcc-27.png

Azure

Let’s now add Azure.

The Azure entry I found was a very old VSE that was no longer valid. This meant I needed to jump over to ‘Azure Integrations’ to add one

It already had prefilled the old sub so I would need to clear all that out

/content/images/2023/10/datadogcc-28.png

I almost always create SPs manually because, well, I like to do it that way. But this time I’ll try the ARM wizard

/content/images/2023/10/datadogcc-29.png

I’ll create a new Resource Group

/content/images/2023/10/datadogcc-30.png

I then need to copy a couple of values over

/content/images/2023/10/datadogcc-31.png

We can optionally enable custom metrics which essentially consumes the App Insights metrics and exposes them in Datadog. I could see some pretty awesome uses for that so I’ll enable. Those that are cost conscious should be aware this will increase your metrics usage

/content/images/2023/10/datadogcc-32.png

The other thing I plan to enable is the Cloud Security Posture Management which is the DD SIEM system. I heard some good things about it at a recent event and would love to explore that further.

/content/images/2023/10/datadogcc-33.png

In the Service Principal tab, you’ll want to at least give the SP a name. I narrowed the scope to just my Org Directory.

/content/images/2023/10/datadogcc-34.png

Clicking register popped me out of the template (which was a bit annoying since I lost my progress), but it did get me to the App Registration (SP) so I could create a Client Secret which I would need when going back.

/content/images/2023/10/datadogcc-35.png

I now had all the pieces and could click create

/content/images/2023/10/datadogcc-36.png

Once the deployment was complete

/content/images/2023/10/datadogcc-37.png

I realized I would need a storage account for the reports we will create in a moment. I created one in the same resource group we just made.

/content/images/2023/10/datadogcc-40.png

As well as then create a container in the account to hold reports

/content/images/2023/10/datadogcc-41.png

Now that I have a tenant, I want to pull in Subscription level costs. I’ll want to add Storage Blob Reader and Cost Management Reader role assignments to the SP

/content/images/2023/10/datadogcc-38.png

I left the defaults for all the settings save for moving it to LRS as it’s just reports.

I’ll want to create Cost Management exports under the Billing and Cost Managements area

/content/images/2023/10/datadogcc-39.png

Here I used the Storage Account I created and set “reports” as the directory into which to save reports.

/content/images/2023/10/datadogcc-42.png

STOP did you see the mistake? Clearly I did not. In fact, I had a slew of steps detailing out the usage of this report, but those steps were wrong.

The “Scope”, shown in that prior screenshot, showed my email address meaning we are creating a cost report for the billing entity and not the subscription or resource group. Because my “Billing Account” is a personal, not EA, I got originally blocked later.

I needed to change the “Scope” to my subscription:

/content/images/2023/10/datadogcc-85.png

Then create an Amortized and Actual report

/content/images/2023/10/datadogcc-86.png

Yes! you need both, it will be asked for later.

When created, click “Run Now” to generate an initial report.

While we wait on that first report, we can go take care of an IAM change by finding that new Storage Account, and going to the IAM section

/content/images/2023/10/datadogcc-48.png

There, we will add a new Role Assignment

/content/images/2023/10/datadogcc-49.png

I’ll pick “Storage Blob Data Reader”

/content/images/2023/10/datadogcc-50.png

I’ll then add my SP. I might have created two so I’m adding both for good measure

/content/images/2023/10/datadogcc-51.png

I’ll then complete to done

/content/images/2023/10/datadogcc-52.png

The other thing I need to do is add “Cost Management Reader” for the SP(s) to my sub. I can go to IAM on the subscription and click “Add” and select Role Assignment.

/content/images/2023/10/datadogcc-53.png

I’ll select Cost Management Reader and click next

/content/images/2023/10/datadogcc-54.png

and as before, I’ll add the SP(s) for the reports

/content/images/2023/10/datadogcc-55.png

Back in my Datadog window, I’ll now enter the Billing Account ID for the reports

/content/images/2023/10/datadogcc-60.png

I’m blocked from the Billing Account because it appears Datadog assumes an Enterprise Agreement which gives a numeric value.

Instead, users get a GUID.

/content/images/2023/10/datadogcc-58.png

Now when I pick Subscription, I can see those reports - these are the two we created earlier

/content/images/2023/10/datadogcc-87.png

Reviewing Costs

Let’s review what we have with AWS

/content/images/2023/10/datadogcc-63.png

I can “Analyze” which brings me into a Datadog window that feels very familiar

/content/images/2023/10/datadogcc-64.png

We can start to explore and figure out where the extra costs are originating

We can see it was from a large Metrics ingestion from CloudWatch - very likely to put together costs. For the first couple days, it spiked.

I immediately hopped into AWS Cost Management to see if we were growing. I still feel the burn of that huge Azure bill from last month.

It appears to be settling down

/content/images/2023/10/datadogcc-66.png

That said, I did get a billing alarm yesterday

/content/images/2023/10/datadogcc-67.png

which triggered PagerDuty

/content/images/2023/10/datadogcc-68.png

Something I can do in Datadog that I cannot over in AWS is to group by Usage Type, then get a summary with comparison over a period of time.

For instance, I can look at my second biggest cost, S3 usage, and see where those costs come from and how they compare to the prior week.

/content/images/2023/10/datadogcc-69.png

Monitors and Alerts

Since we are in Datadog, we can also leverage monitors and alerts.

For instance, we can create a Cost Monitor to compare 7d of rolling data to see if the Amortized costs are trending in the wrong direction

/content/images/2023/10/datadogcc-70.png

At the bottom of the image above you can see I can send to emails, PagerDuty - anything I have tied as a Notification Channel in Datadog.

For instance, I’ll create one that emails me when there is a greater than $5 spike in the bill.

/content/images/2023/10/datadogcc-71.png

I can now see the new “5 Doller Jumps” Monitor in my monitors

formula("sum:aws.cost.unblended{aws_cost_type IN (Usage,DiscountedUsage,SavingsPlanCoveredUsage)}").last("7d").change("absolute") > 5

/content/images/2023/10/datadogcc-72.png

I can see the alert when triggered hit my inbox

/content/images/2023/10/datadogcc-73.png

and I can click the link in the email to AWS or to Datadog events

/content/images/2023/10/datadogcc-74.png

This gives me another advantage in that I can see the history of events for this monitor. That is, if I started to notice a trend over time, or that I spiked on certain months or days, that would all become clear looking at the events.

/content/images/2023/10/datadogcc-75.png

Azure

I want to point out that Datadog helped me fix my Azure ingestion error the day of publication so we’ll have to come back to Azure Cloud Costs.

That said, I can see it listed as an account:

/content/images/2023/10/datadogcc-88.png

And just as AWS, there are Azure Cost Dashboards as well

/content/images/2023/10/datadogcc-89.png

It’s just blank for me until I get some data

/content/images/2023/10/datadogcc-90.png

Dashboards

You had to assume Datadog already built out some slick prebaked dashboards and indeed they had.

Here is the AWS one:

/content/images/2023/10/datadogcc-76.png

What you see here is a whole lot of advice along with the numbers. This is reminding me of a lot of ‘cost optimization’ products I’ve used in recent times.

In the bottom left we can see a breakdown (based on my time selector and filters) of costs by product:

/content/images/2023/10/datadogcc-77.png

Since S3 is my larger cost, let’s look at the “S3 Cost Overview” dashboard.

/content/images/2023/10/datadogcc-78.png

That surprised me a bit; that my ‘cf-logs’ was costing me a lot more than the two backend sites hosted in AWS

/content/images/2023/10/datadogcc-79.png

I can double check, but that bucket is used by CloudFront for logs

/content/images/2023/10/datadogcc-80.png

And it’s been growing ever since Dec 2021 when it was created.

/content/images/2023/10/datadogcc-81.png

It just stores old logs I’ll never look at

/content/images/2023/10/datadogcc-82.png

And looking into the Lifecycle policy, I can see the issue - we expire them after 60d but don’t actually delete them!

/content/images/2023/10/datadogcc-83.png

I’ll update the policy

/content/images/2023/10/datadogcc-84.png

It can take a few days to have them start to change, but we’ll circle back on that later.

Summary

My outstanding threads with Datadog support on Azure were sorted too soon to demonstrate the Azure Costs in this writeup. Additionally, I’ll be keeping a close eye on the CloudWatch usage as the timing of the spike lines up to the Datadog CUR CloudFormation launch. If my CW usage continues to grow, I may need to pull that back.

As we saw, I was able to find and fix some waste in S3. Anytime a tool immediately saves me money, that makes it a winner for me. I would like to see the Setup and Onboarding improved. When we use Datadog with cloud metrics and log ingestion, its much much simpler. I would hope they improve the onboarding - that is, I would love to see a 1-click cloud formation, or blueprint - something straightforward.

For me to use this as a full replacement for my current setup, I would also need GCP support (which is coming).

Addendum

I am a fan of Datadog. Most of their new features work great.

However, this one, cost me a lot. I have fought back the ingestion since. My Azure Bills spiked upwards of $45 from $10.

The one way I was able to stop was to go to the Azure Integration and mess with the client secret - I want it invalid.

/content/images/2023/11/ddadd-01.png

I also disabled the Enable Resource Collection section

/content/images/2023/11/ddadd-02.png

I’m fairly certain it is the cause of these massive metrics spikes (“Native Metric Queries”).

/content/images/2023/11/ddadd-03.png

If I cannot stop the cost overrun, I’ll end up having to shut down this subscription and I would much much prefer not to do that.

I already shut down the AWS integration when it spiked.

Datadog PagerDuty Azure AWS

Have something to add? Feedback? You can use the feedback form

Isaac Johnson

Isaac Johnson

Cloud Solutions Architect

Isaac is a CSA and DevOps engineer who focuses on cloud migrations and devops processes. He also is a dad to three wonderful daughters (hence the references to Princess King sprinkled throughout the blog).

Theme built by C.S. Rhymes