March 28, 2024

Beznadegi

The Joy of Technology

Monitoring Your Microservices on AWS with Terraform and Grafana – Security – Grape Up

Welcome back! This is the last part of this series. If this is the first you’ve ever seen, we recommend reading previous parts first. So, there is an application you’ve created. It consists of multiple microservices talking to each other and some monitoring tools. Now, let’s encrypt this communication.

Securing backend services

Our Java backend services need to be able to expose secured endpoints and talk with each other using HTTPS. For this, we will use keystores and truststores. Keystores are used for storing private keys and certificates with public keys. If our server needs to use HTTPS, during SSL handshake, the server looks for a private key in the keystore and presents it’s corresponding public key and certificate to the client. Then the client looks for the associated certificate in his truststore. If the certificate or Certificate Authorities are not present in truststore, we’ll get an SSLHandshakeException

For our purposes we used self-signed certificates because we wanted to keep secure, encrypted communication between services and domain verification(confirmed by trusted CA) was not important in this case. However, you can use any type of SSL certificate for that purpose – another way to go would be to use a free trusted certificate from https://letsencrypt.org/. You can also simply buy a valid certificate from CA of your choice, for the domain name you own. 

We generated our keys and self-signed certificates using keytool, which is a great tool distributed with the Java SDK. In order to avoid manual steps(when possible) we are generating it using terraform with local-exec provisioned:

keytool -genkey -noprompt -dname "CN=*.${var.certificate_domain}, OU=Organizational Unit, O=Organization, L=Location, S=State, C=Country" -keystore keystore.jks -alias selfsigned_foo -keyalg RSA -keysize 2048 -validity 3950 -storepass changeit

We use a so-called wildcard certificate, valid for all our subdomains. By subdomains, we mean our private subdomains created by service discovery service, used in service-to-service communication in private subnets. With wildcard we can use the same certificate for every microservice(microservices in the end effect will have different subdomains, one subdomain for each one).

With keytool we can export both key and certificate in text format when needed. 

keytool -export -alias selfsigned_foo -keystore keystore.jks -rfc -file cert.pem -storepass changeit

The next step is to configure services to actually use the certificate. Since on localhost we don’t want to complicate things with SSL, we decided to run secured connections only in the cloud. Therefore we created a new Spring profile dedicated to all cloud-related configurations. One point of configuration is SSL(defined at application properties level, in yml file):

server:
 port: 443
 ssl:
   key-store: file:keystore.jks
   key-store-password: changeit
   key-store-type: jks
   key-alias: selfsigned_foo

Using application properties for this purpose is definitely the easiest way. Referenced keystore is added as a separate file to Docker image.

The next step is to upload certificate to truststore of every service that can be called in the cloud(with -cacerts flag we are simply importing certificate to default Java truststore for container):

keytool -importcert -noprompt -alias selfsigned_foo -file cert.pem -cacerts -storepass changeit

We do it on Dockerfile level, for every Java microservice running in a separate Docker container. 

Those 2 operations allow us to expose our endpoints on HTTPS and to accept responses as trusted. We repeated the same steps for the config server and every other microservice that we have in our system. Again, a self-signed certificate was enough for our needs and allowed us to automate the process using terraform, but this should be adjusted to the goal.

Once we have secured services – what about load balancers? They also need to provide HTTPS endpoints. For load balancers, we created SSL certificates in AWS Certificate Manager – one certificate per subdomain, that we use for each one of them.

resource "aws_acm_certificate" "certificate" {
 domain_name = "${local.domain_name}.${var.environment_zone_name}"
 validation_method = "DNS"
}

We attach a certificate (using its ARN) to load the balancer listener:

resource "aws_lb_listener" "foo_https_listener" {
 load_balancer_arn = aws_lb.foo_ecs_alb.arn
 port              = "443"
 protocol          = "HTTPS"
 certificate_arn   = var.certificate_arn
 ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
 default_action {
   type             = "forward"
   target_group_arn = aws_lb_target_group.foo_target_group.arn
 }
}

We also have to adjust the exposed port for the container, target group, and security groups(both for the load balancer and for ECS task inbound traffic):

(...)
   "portMappings": [
     {
       "containerPort": 443,
       "hostPort": 443
     }
   ]
(...)
(...)
 load_balancer {
   target_group_arn = aws_lb_target_group.foo_target_group.arn
   container_name = "foo"
   container_port = 443
 }
(...)
resource "aws_lb_target_group" "foo_target_group" {
 name        = "foo"
 port        = 443
 protocol    = "HTTPS"
 target_type = "ip"
 vpc_id      = var.vpc_id

 health_check {
   port                = 443
   protocol            = "HTTPS"
   path                = "/actuator/health"
   matcher             = "200"
 }

 depends_on = [
   aws_lb.foo_ecs_alb
 ]
}
resource "aws_security_group" "foo_lb_to_ecs" {
 name = "allow_lb_inbound_foo"
 description = "Allow inbound Load Balancer calls"
 vpc_id = var.vpc_id

 ingress {
   from_port       = 443
   protocol        = "tcp"
   to_port         = 443
   security_groups = [aws_security_group.foo_alb_sg.id]
 }
}
resource "aws_security_group" "alb_sg" {
 name        = "alb-sg"
 description = "Inet to ALB"
 vpc_id      = var.vpc_id

 ingress {
   protocol    = "tcp"
   from_port   = 443
   to_port     = 443
   cidr_blocks = [
     "0.0.0.0/0"
   ]
 }

 egress {
   protocol    = "-1"
   from_port   = 0
   to_port     = 0
   cidr_blocks = [
     "0.0.0.0/0"
   ]
 }
}

And that’s enough to enable full HTTPS communication on Java part – all microservices(including config server) are talking to each other using secured connections, and any external request forwarded using load balancer is also received with HTTPS protocol.

Securing monitoring

As we need every service in our system to be secured, we need to provide additional configuration to monitoring services. 

For Grafana we specify protocol and certificate path in grafana.ini file, used while running Grafana from the official Docker image. We can also enable HSTS response header:

(...)
protocol = https
(...)
cert_file = /etc/grafana/grafana.crt
cert_key = /etc/grafana/grafana.key
(...)
cookie_secure = true
(...)
strict_transport_security = true
(...)

For Prometheus and Loki it is a little bit more complicated – they both don’t provide HTTPS support out of the box. So to expose them on HTTPS we used nginx(as reverse proxy). Nginx is run in the same container, exposes HTTPS on specific port and redirects traffic to particular service running on HTTP on localhost(in the container). Therefore communication between containers is always secured.

Sample nginx configuration for Prometheus:

server {
 listen 9090 ssl;
 ssl_certificate /etc/nginx/ssl/server.crt;
 ssl_certificate_key /etc/nginx/ssl/server.key;
 server_name prometheus_ssl;
 location / {
   proxy_pass 
 }
}

It exposes 9090 port for HTTPS and redirects all traffic to Prometheus instance running in a container on port 9091.

A similar operation was configured for Loki:

server {
 listen 3100 ssl;
 ssl_certificate /etc/nginx/ssl/server.crt;
 ssl_certificate_key /etc/nginx/ssl/server.key;
 server_name loki_ssl;
 location / {
   proxy_pass 
 }
}

ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;

Dockerfile’s had to be adjusted as well:

FROM grafana/loki

COPY loki.yml .

COPY entrypoint.sh .
COPY loki.crt /etc/nginx/ssl/server.crt
COPY loki.key /etc/nginx/ssl/server.key
COPY lokihttps.conf /etc/nginx/conf.d/default.conf
USER root
RUN apk add nginx --no-cache && \
   sed -i 's/user nginx;//g' /etc/nginx/nginx.conf && \
   chown -R loki:loki /var/lib/nginx && \
   chown -R loki:loki /var/lib/nginx/logs && \
   chown -R loki:loki /var/tmp/nginx && \
   chown -R loki:loki /var/log/nginx && \
   chmod +x entrypoint.sh && \
   mkdir -p /run/nginx && \
   chown -R loki:loki /run/nginx
USER loki

ENTRYPOINT ./entrypoint.sh
EXPOSE 3100
FROM alpine

COPY prometheus.yml .

COPY entrypoint.sh .
COPY prometheus.crt /etc/nginx/ssl/server.crt
COPY prometheus.key /etc/nginx/ssl/server.key
COPY prometheushttps.conf /etc/nginx/http.d/default.conf
RUN apk add prometheus nginx --no-cache && \
   sed -i 's/user nginx;//g' /etc/nginx/nginx.conf && \
   chmod +x entrypoint.sh && \
   mkdir -p /run/nginx

ENTRYPOINT ./entrypoint.sh
EXPOSE 9090

As you can see it got a little bit more complicated – we have to copy files with keys and certificates into the Docker images, as well as nginx configuration. In order to add nginx to Loki image, we had to switch to the root user, to be able to install the application, and then do some other tricks to resolve errors related to running nginx on this particular Linux Alpine distribution. 

Accepting HTTPS connections

That’s it, for exposing our monitoring infra on HTTPS – but how to make it accept other HTTPS domains? For Loki there is no problem – Loki does not call any service, the services call Loki – as long as they have a proper certificate in their truststores – we are good to go. What about Prometheus? This one is a little bit more tricky – Prometheus calls services for their metrics, therefore it has to be able to call HTTPS addresses and accept the self-signed certificates. 

How to do it? Simply disable validation of the server certificate in Prometheus configuration for polled services:

scrape_configs:
 - job_name: 'cloud-config-server'
   metrics_path: '/actuator/prometheus'
   scrape_interval: 5s
   scheme: https
   tls_config:
       insecure_skip_verify: true
   dns_sd_configs:
     - names:
       - '$cloud_config_server_url'
       type: 'A'
       port: 443
 - job_name: 'foo'
   metrics_path: '/actuator/prometheus'
   scrape_interval: 5s
   scheme: https
   tls_config:
       insecure_skip_verify: true
   dns_sd_configs:
     - names:
       - '$foo_url
       type: 'A'
       port: 443
(...)

This configuration has to be repeated for all jobs defined in the prometheus.yml file.

For Grafana we need to disable SSL verification on data sources level:

example: loki_datasource.yml:

apiVersion: 1

datasources:
 - name: Loki
   type: loki
   access: proxy
   url: 
   jsonData:
     maxLines: 1000
apiVersion: 1

datasources:
 - name: Loki
   type: loki
   access: proxy
   url: 
   jsonData:
     maxLines: 1000
      tlsSkipVerify: true 

The same must be done for each defined datasource.

And that’s it – every component in our system exposes its endpoints using HTTPS protocol, and is able to accept calls from our subdomains, secured with a self-signed certificate. 

Summing up the series

After performing all the steps you should be having a working application, designed and built-in microservice architecture, secured, monitored, and successfully deployed in the AWS cloud. The chosen monitoring stack allows us to flexibly adjust Grafana dashboards with relevant log streams to our needs.

We obviously skipped some common, more basic parts for readability purposes. We assume that the reader has at least a basic understanding of the web applications development process and familiarity with AWS and terraform. Hopefully, we helped you with resolving some of your problems. We thank you very much if you’ve got this far with the lecture 🙂