On a previous blog post, we talked about using SSM parameters with ECS to pull secrets from a remote store. If you are using EKS instead of ECS, you’ve probably noticed that this is not a built-in feature. Kubernetes has built-in secrets but base64 encoding is not encryption, and many teams still prefer an external secret store to keep secret values in a central location, only allowing access to authenticated users and services.

Fortunately, the GoDaddy engineering team has created an open source project that helps with this challenge. The external-secrets project allows us to reference AWS secrets within Kubernetes pods.

Before you even install it, you’ll probably have an obvious first question: “How will the pod be granted access to the secrets manager?” Let’s talk about that real quick, we need to set it up first before we even install the external-secrets controller and CRD.

The pod that is requesting the “ExternalSecret” needs to have AWS authentication credentials. The quick and easy way to do this is to give your EC2 node IAM role access to your Secrets Manager, but this is not recommended. If you provide access to the entire node, then any pod running can access your secrets, even one that might have been launched by an attacker. It also doesn’t let you define granular access, only allowing certain pods to access certain secrets. If you want to follow the pattern of ‘least-privilege’, the best way forward is to use IAM Roles for Service Accounts. The tl;dr here is that you associate the service account that runs the external-secrets controller with an IAM role that grants it access to AWS services, so that it can create secrets for you.

Configuring this happens in 3 fairly easy steps that AWS has already documented for us:

  1. Enable IAM Roles for Service Accounts
  2. Creating an IAM Role and Policy for your Service Account
  3. Associate an IAM Role to a Service Account

These docs are fantastic, but in our case we need to push these changes out to 3 completely separate environments (dev, test, and prod) in a reliable way, while providing segregation between them and I don’t want to do the work by hand. We are going to use Terraform for the scripting, separating our environments by terraform workspace. If you would rather create objects manually following the guides above, feel free to skip the Terraform section and join us in Step 2 below.

Step 1: Terraforming

This is a rough representation of what we need to create.

The following terraform code is what we use to create an EKS cluster with OIDC provider enabled, and IAM Roles for Service Accounts preconfigured. For this example we will keep it simple and just say that for each environment, we want to create 1 service account and 1 role specific to that environment, that only allows access to secrets with a certain prefix that matches that environment, or a global prefix. After running this, we will have a service account named external-secrets that will be associated with the apps_role_dev IAM role. This is OK since we have a different cluster for each environment, but we are using the same AWS account so we need segregated role names.

cluster.tf

resource "aws_eks_cluster" "eks_cluster" { name = "${var.cluster_name}-${terraform.workspace}" role_arn = aws_iam_role.eks_cluster_role.arn enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"] vpc_config { subnet_ids = concat(var.public_subnets, var.private_subnets) } timeouts { delete = "30m" } }
Code language: JavaScript (javascript)

irsa.tf

Configuring IAM Roles for Service Accounts is actually pretty easy with terraform, with the help of a module, eks-irsa. All we need to do is pass in the name of the role we want it to create, the cluster information, and any additional policy information (like pulling secrets) and it will do the hard work. We still have to setup the OIDC provider here but again that’s very easy with terraform. Also notice how we use the terraform workspace within the iamSecretPolicy object to restrict what this role will be able to access.

data "tls_certificate" "eks_cert" { url = aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer depends_on = [ aws_eks_cluster.eks_cluster ] } resource "aws_iam_openid_connect_provider" "openid_provider" { client_id_list = ["sts.amazonaws.com"] thumbprint_list = [data.tls_certificate.eks_cert.certificates[0].sha1_fingerprint] url = aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer depends_on = [ aws_eks_cluster.eks_cluster ] } module "eks-irsa" { source = "nalbam/eks-irsa/aws" version = "0.13.2" name = "apps_role_${terraform.workspace}" region = var.aws_region cluster_name = aws_eks_cluster.eks_cluster.name cluster_names = [ aws_eks_cluster.eks_cluster.name ] kube_namespace = "default" kube_serviceaccount = "external-secrets" policy_arns = [ aws_iam_policy.iamSecretPolicy.arn ] depends_on = [ aws_eks_cluster.eks_cluster ] } resource "aws_iam_policy" "iamSecretPolicy" { name = "${terraform.workspace}_secretPolicy" path = "/" description = "Allow access to ${terraform.workspace} secrets" policy = jsonencode({ Version = "2012-10-17" Statement = [ { Action = [ "secretsmanager:GetResourcePolicy", "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:ListSecretVersionIds" ] Effect = "Allow" Resource = [ "arn:aws:secretsmanager:${var.aws_region}:${var.account_id}:secret:${terraform.workspace}/*" ] }, ] }) }
Code language: JavaScript (javascript)

The cluster is now ready to use Service Accounts linked to IAM roles to pull secrets from AWS, and we had Terraform create an IAM role for us that is already setup. The great thing about this is it doesn’t have to be specific to secrets! Your pods tied to service accounts now have the power to perform all sorts of AWS automation.

Step 2: Installing the external-secrets library

helm.tf

resource "helm_release" "external-secrets" { name = "external-secrets" repository = "https://external-secrets.github.io/kubernetes-external-secrets/" chart = "kubernetes-external-secrets" verify = "false" values = [ templatefile("./helm/kubernetes-external-secrets/values.yml", { roleArn = "${module.eks-irsa.arn}" }) ] set { name = "metrics.enabled" value = "true" } set { name = "service.annotations.prometheus\\.io/port" value = "9127" type = "string" } }
Code language: JavaScript (javascript)

./helm/kubernetes-external-secrets/values.yml

Here we pass the role ARN that the IRSA module created, to the external-secrets values. The ‘external-secrets’ service account will be created an annotated with that role arn via a templatefile.

serviceAccount: name: "external-secrets" annotations: { eks.amazonaws.com/role-arn: "${roleArn}" } securityContext: fsGroup: 65534
Code language: JavaScript (javascript)

Step 3: Putting it all together

Now that the cluster is configured properly, and the external-secrets library is installed, there is nothing stopping us from using the ‘ExternalSecret’ CRD to create a secret that our pods can use.

apiVersion: "kubernetes-client.io/v1" kind: ExternalSecret metadata: name: test-db-secret spec: backendType: secretsManager data: - key: dev/database name: DB_CREDENTIALS
Code language: JavaScript (javascript)

This code will generate an opaque secret object in Kubernetes, but you no longer have to keep those secret values in your yaml, and they will only be pulled and associated with pods running in a ServiceAccount/IAM Role with access to those secrets!

Jul 5 21
derrickatalto9

One of the nice things about running your container workloads on AWS ECS is the ability to use AWS Systems Manager Parameters to store sensitive values and inject them into your containers as Docker secrets.

This is a more secure option because your sensitive variables do not need to be accessible from your deployment tools, they only have to be available from the ECS/EC2 instance using an IAM role. For our example, we are going to use Terraform to create the SSM Parameter, and the service that consumes it. Also, injecting the wrong endpoint configuration into the wrong environment becomes a thing of the past.

We will need to make sure that the EC2 instance is configured to talk to the Systems Manager and has permission to pull parameter values.

Step 1: Configure SSM Agent

You’ll want to configure the SSM Agent on your ECS nodes, preferably in the user data file. Example:

# install pip curl -O https://bootstrap.pypa.io/get-pip.py python get-pip.py --user export PATH=~/.local/bin:$PATH # install AWS CLI pip install --upgrade --user awscli export PATH=/home/ec2-user/.local/bin:$PATH # install SSM Agent and dependencies sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm sudo yum install -y polkit # install cloudwatch agent curl -o amazon-cloudwatch-agent.rpm https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm sudo rpm -U ./amazon-cloudwatch-agent.rpm
Code language: PHP (php)

Step 2: IAM Role Permissions

Whatever role your EC2 instances launch with will need to have access to retrieve SSM Parameter Store params. Example IAM policy:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssm:DescribeParameters" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameters" ], "Resource": "arn:aws:ssm:us-east-1:<accountid>:parameter/*" } ] }
Code language: JSON / JSON with Comments (json)

Now your EC2 instances have permission to call SSM, and they also have the AWS CLI and SSM Agent installed. Now we need to create some parameters. You can use the Amazon console for this, but for this example we are using Terraform.

Step 3: Create a SSM Parameter

SSM Parameters lend well to a nested path naming structure. In this example, we use parameters separated by environment and application.

locals { db_host="db1.mydomain.local" } resource "aws_ssm_parameter" "db_host" { name = "/production/myapp/db-host" type = "String" value = "${local.db_host}" }
Code language: JavaScript (javascript)

Don’t forget to run:

>terraform plan

and

>terraform apply

Step 4: Inject SSM Parameter into ECS

In your task definition, add each value in a secrets block, using valueFrom instead of value, with an ARN reference to your parameter.

, "secrets": [ { "name": "DB_HOST", "valueFrom": "arn:aws:ssm:${var.aws_region}:<account_id>:parameter/production/myapp/db-host" } ]
Code language: JavaScript (javascript)

Using this technique, access to sensitive information is much more restrictive than if you were keeping these values within your code. Developers only need environment variables for local development, and container orchestration can handle injecting the right value for the right app in the right environment for hosted applications. Secrets are now on a “need to know” basis. Fantastic!

May 21 20
derrickatalto9

 

Datadog is an incredibly powerful APM and infrastructure monitoring and alerting tool. Terraform is an incredibly powerful infrastructure automation tool. If you are scripting your infrastructure with Terraform, you’ll want to make sure that your monitors and alerts are scripted as well.

You’ll need to make sure you have set up a Datadog API Key and App Key before scripting with Terraform. It is common to use variables for your api key and app key when you include your provider.

In a file called variables.tf, declare the following variables:

variable "datadog_api_key" { default = "" } variable "datadog_app_key" { default = "" }
Code language: JavaScript (javascript)

In a file called terraform.tfvars, place your DataDog API and app keys (do not commit this file to source control if you can help it):

datadog_api_key = "************" datadog_app_key = "************"
Code language: JavaScript (javascript)

Now, in a file called main.tf, place your provider and pass your key variables to it:

provider "datadog" { api_key = "${var.datadog_api_key}" app_key = "${var.datadog_app_key}" }
Code language: JavaScript (javascript)

Initialize the Datadog provider from the command line

> terraform init

Now you are ready to create DataDog monitors from Terraform. Let’s look at how to setup a simple drive space alert. First, let’s go back to variables.tf and add one more variable for our drive space alert thresholds.

variable "c_disk_thresholds" { type = "map" default = { critical = 90 warning = 85 ok = 80 } }
Code language: JavaScript (javascript)

The variable above is a map containing default values for ‘ok’, ‘warning’, and ‘critical’ thresholds, and our monitor will be measured by percentage, so we use 80, 85, and 90 respectively.

We are going to add one more variable for the alert footer text. This is where you are going to want to put your recipients.

variable "datadog_alert_footer" { default = <<EOF @your-dd-slack-user @you@yourdomain.com EOF }
Code language: JavaScript (javascript)

Now that we have all our variables in place, we can create our alert. Our goal with this alert is to monitor all Windows agents C drive, warning at 85% capacity, and going critical at 90% capacity. Either create a new .tf file, or just add this right in main.tf.

resource "datadog_monitor" "c_disk_free" { name = "{{host.name}} C Low Free Space" query = "avg(last_5m):avg:system.disk.in_use{device:c:} by {host} * 100 > ${var.c_disk_thresholds.critical}" type = "metric alert" notify_no_data = false include_tags = true thresholds = "${var.c_disk_thresholds}" message = <<EOM {{#is_alert}} C Drive Usage is {{value}} percent. {{/is_alert}} {{#is_recovery}} C Drive Usage returned to a safe state, {{value}} percent. {{/is_recovery}} ${var.datadog_alert_footer} EOM }
Code language: PHP (php)

Run terraform plan and apply if there are no issues

> terraform apply

Once your changes are applied, you will see your new alert monitoring all Windows server C drives!

DataDog Alert Profile
May 21 20
derrickatalto9