Datadog is an incredibly powerful APM and infrastructure monitoring and alerting tool. Terraform is an incredibly powerful infrastructure automation tool. If you are scripting your infrastructure with Terraform, you’ll want to make sure that your monitors and alerts are scripted as well.

You’ll need to make sure you have set up a Datadog API Key and App Key before scripting with Terraform. It is common to use variables for your api key and app key when you include your provider.

In a file called variables.tf, declare the following variables:

variable "datadog_api_key" { default = "" } variable "datadog_app_key" { default = "" }

In a file called terraform.tfvars, place your DataDog API and app keys (do not commit this file to source control if you can help it):

datadog_api_key = "************" datadog_app_key = "************"

Now, in a file called main.tf, place your provider and pass your key variables to it:

provider "datadog" { api_key = "${var.datadog_api_key}" app_key = "${var.datadog_app_key}" }

Initialize the Datadog provider from the command line

> terraform init

Now you are ready to create DataDog monitors from Terraform. Let’s look at how to setup a simple drive space alert. First, let’s go back to variables.tf and add one more variable for our drive space alert thresholds.

variable "c_disk_thresholds" { type = "map" default = { critical = 90 warning = 85 ok = 80 } }

The variable above is a map containing default values for ‘ok’, ‘warning’, and ‘critical’ thresholds, and our monitor will be measured by percentage, so we use 80, 85, and 90 respectively.

We are going to add one more variable for the alert footer text. This is where you are going to want to put your recipients.

variable "datadog_alert_footer" { default = <<EOF @your-dd-slack-user @you@yourdomain.com EOF }

Now that we have all our variables in place, we can create our alert. Our goal with this alert is to monitor all Windows agents C drive, warning at 85% capacity, and going critical at 90% capacity. Either create a new .tf file, or just add this right in main.tf.

resource "datadog_monitor" "c_disk_free" { name = "{{host.name}} C Low Free Space" query = "avg(last_5m):avg:system.disk.in_use{device:c:} by {host} * 100 > ${var.c_disk_thresholds.critical}" type = "metric alert" notify_no_data = false include_tags = true thresholds = "${var.c_disk_thresholds}" message = <<EOM {{#is_alert}} C Drive Usage is {{value}} percent. {{/is_alert}} {{#is_recovery}} C Drive Usage returned to a safe state, {{value}} percent. {{/is_recovery}} ${var.datadog_alert_footer} EOM }

Run terraform plan and apply if there are no issues

> terraform apply

Once your changes are applied, you will see your new alert monitoring all Windows server C drives!

DataDog Alert Profile
May 21 20
derrickatalto9
%d bloggers like this: