
Datadog is an incredibly powerful APM and infrastructure monitoring and alerting tool. Terraform is an incredibly powerful infrastructure automation tool. If you are scripting your infrastructure with Terraform, you’ll want to make sure that your monitors and alerts are scripted as well.
You’ll need to make sure you have set up a Datadog API Key and App Key before scripting with Terraform. It is common to use variables for your api key and app key when you include your provider.
In a file called variables.tf, declare the following variables:
variable "datadog_api_key" {
default = ""
}
variable "datadog_app_key" {
default = ""
}
Code language: JavaScript (javascript)
In a file called terraform.tfvars, place your DataDog API and app keys (do not commit this file to source control if you can help it):
datadog_api_key = "************"
datadog_app_key = "************"
Code language: JavaScript (javascript)
Now, in a file called main.tf, place your provider and pass your key variables to it:
provider "datadog" {
api_key = "${var.datadog_api_key}"
app_key = "${var.datadog_app_key}"
}
Code language: JavaScript (javascript)
Initialize the Datadog provider from the command line
> terraform init
Now you are ready to create DataDog monitors from Terraform. Let’s look at how to setup a simple drive space alert. First, let’s go back to variables.tf and add one more variable for our drive space alert thresholds.
variable "c_disk_thresholds" {
type = "map"
default = {
critical = 90
warning = 85
ok = 80
}
}
Code language: JavaScript (javascript)
The variable above is a map containing default values for ‘ok’, ‘warning’, and ‘critical’ thresholds, and our monitor will be measured by percentage, so we use 80, 85, and 90 respectively.
We are going to add one more variable for the alert footer text. This is where you are going to want to put your recipients.
variable "datadog_alert_footer" {
default = <<EOF
@your-dd-slack-user @you@yourdomain.com
EOF
}
Code language: JavaScript (javascript)
Now that we have all our variables in place, we can create our alert. Our goal with this alert is to monitor all Windows agents C drive, warning at 85% capacity, and going critical at 90% capacity. Either create a new .tf file, or just add this right in main.tf.
resource "datadog_monitor" "c_disk_free" {
name = "{{host.name}} C Low Free Space"
query = "avg(last_5m):avg:system.disk.in_use{device:c:} by {host} * 100 > ${var.c_disk_thresholds.critical}"
type = "metric alert"
notify_no_data = false
include_tags = true
thresholds = "${var.c_disk_thresholds}"
message = <<EOM
{{#is_alert}}
C Drive Usage is {{value}} percent.
{{/is_alert}}
{{#is_recovery}}
C Drive Usage returned to a safe state, {{value}} percent.
{{/is_recovery}}
${var.datadog_alert_footer}
EOM
}
Code language: PHP (php)
Run terraform plan and apply if there are no issues
> terraform apply
Once your changes are applied, you will see your new alert monitoring all Windows server C drives!
