R&D – Zak Abdel-Illah

Provisioning Grafana on DigitalOcean Kubernetes Service

Zak — Thu, 12 Dec 2024 22:00:10 +0000

The LGTM stack is my essential observability stack, and deploying the architecture on a vendor-agnostic basis allows me to;

Guarantee up-time in the event I need to switch due to cost increases
Re-deploy the stack to a client that has adopted another cloud provider

Anything that I find I can monitor or pull metrics from ends up in Grafana. I currently use three data sources with dashboards and alerts pulling and presenting information from all of them;

Prometheus: For monitoring real-time metrics such as CPU usage and weather
InfluxDB: For storing and monitoring historical metrics, such as stock market data
Loki: For monitoring system logs
Elasticsearch: For storing transactions, documents and RSS feeds.

I chose a Managed Kubernetes offering as a basis for deployment as opposed to virtual machines or self-hosted Kubernetes for two reasons;

Uptime is guaranteed by the vendor
I don’t have to maintain a Kubernetes cluster at a systems-level

Deploying a DigitalOcean Kubernetes Cluster

Overview of my DOKS cluster from the DigitalOcean Dashboard, where my cluster is a single Premium AMD node.

Droplet layout

I’m deploying my stack with two dedicated right-sized nodes living in two separate node pools. One is labelled as fixed and will contain only one instance, whereas the other is scalable with a maximum of three. With this design, I accomplish three essentials for the architecture;

Cost Efficiency
- I prevent over-provisioning by right-sizing, and rely on DigitalOcean’s control plane to scale the node pool when necessary, such as a larger dataset being held in memory by a data source
Availability & Scalability
- At-least one nodes should be available at all time, allowing for a single node failure to keep applications running. This is separate from a HA Control Plane, which offers a different benefit.

Cluster Location

lon1 is about 17km away from where I live, so I’ve deployed my cluster there. The location is not really important for my use case as I plan to do all data ingestion from within, but it’s cool to think it’s just a few streets down from me.

I played with the idea of taking availability even further by deploying to both ams1 and lon1 in a warm-standby style, but that’s a story for another post.

DOKS Cluster Resource in Terraform

resource "digitalocean_kubernetes_cluster" "primary" {
  name = "zai-lon1"
  region = "lon1"
  version = "1.30.4-do.0"
  vpc_uuid = digitalocean_vpc.primary.id
  auto_upgrade = false
  surge_upgrade = true
  ha = false
  registry_integration = true

  node_pool {
    name = "k8s-lon1-dedicated"
    size = "s-1vcpu-1gb-amd"
    node_count = 1
  }

  node_pool {
    name = "k8s-lon1-burst"
    size = "s-1vcpu-1gb-amd"
    node_count = 1
  }

  tags = local.resource_tags

  maintenance_policy {
    day = "saturday"
    start_time = "04:00"
  }
}

Deploying A Load Balancer

Traefik is my load balancer of choice. It’s written in Go, performant and integrates very well into the Kubernetes ecosystem. I’m not using it as a load balancer but as an ‘application gateway’, so that I can have a single IP address handle routing to many Kubernetes services based on HTTP Headers such as domain names or paths.

I’m also using it as a front for all of the web services in the cluster, so I can manage TLS certificates from the same location for all applications, not a unique configuration per-application. I’m not so concerned about inter-service TLS communication at this point, but rather over the public internet.

Traefik has its’ own form of an Operator that works with the Ingress resource definition within Kubernetes. When a resource is created, Traefik will automatically create a route based on the specification. This means that I can easily declare that https://grafana.zai.dev on the LoadBalancer will route to the grafana service on port 80.

When a LoadBalancer object is created within the Kubernetes cluster, DigitalOcean’s operator will proceed to create a Load Balancer and the charge will be applied accordingly.

The Traefik helm chart by default creates a LoadBalancer resource, which configures DigitalOcean to reserve a static public IP address that can be reached from the public internet. I don’t need to provide any additional configuration.

resource "helm_release" "traefik" {
  name       = "traefik"
  repository = "https://traefik.github.io/charts"
  chart      = "traefik"
  version    = "30.1.0"
}

Deploying an Identity Provider

My public-facing Grafana instance requiring external authentication

Authentication is required since Grafana is accessible from the public. With the same mindset for applying Traefik, I’d like to centrally control authentication & authorization rather than defining it on a per-application level.

I adopted Keycloak as it acts as an Identity Provider and / or Broker, supporting both OpenID Connect and SAML. OIDC is a common standard across many apps, including Grafana.

I use GitHub as an Identity Provider for Keycloak, and Keycloak as an Identity Provider for Grafana. I take this approach as it;

Allows me to integrate more OIDC or SAML compatible applications into my own provider
Reduces management of external accounts to a single point (rather than configuring GitHub per-application)
Allows me to add additional roles on-top of GitHub accounts required for Grafana to recognize who’s an Administrator.
Allows me to integrate LDAP in the future

I won’t go through deploying the Keycloak configuration in this post (a future one is coming with more detail on configuring Keycloak), but based on the OAuth2 specification, I have available to me the client_id, client_secret, auth_url, token_url and api_url that I pipe into the grafana.ini in the next stage. I can receive these details from GitHub directly by creating an OAuth2 application.

Deploying Grafana with Helm

To be agile in my deployments, I’m isolating the Grafana container from any configuration by deploying any configuration to Grafana through ConfigMaps. With that, I can truly version control the running version of Grafana without worrying about losing any stored work such as dashboards.

Provisionable elements, such as Dashboards, Alerts and Datasources can be loaded into Grafana by using its’ provisioning directory, defined below as /etc/grafana/provisioning. By using Kubernetes ConfigMaps, I can mount my configuration into Grafana outside of the instance itself.

By enabling the sidecar containers, I save myself from needing to maintain this volumeMount, as these act as operators monitoring for ConfigMaps with specific labels (described below), mounting them into the Grafana pod and instructing Grafana to reload the configuration without restarting the instance.

Within the grafana.ini file, auth.generic_oauth instructs grafana how to connect with an identity provider. Here, I pipe in the values received from Keycloak (or GitHub) above. To force that credentials are given from Keycloak, I enforce the disable_login_form setting.

The $__file{} operator reads a variable from a file on disk, allowing me to further protect the OAuth2 credentials by storing them in a Secret. I use HashiCorp Vault to protect secrets through ServiceAccount, but that’s outside the scope of this post.

role_attribute_path allows me to map user roles defined within Keycloak to Grafana roles, allowing me to centralize “how to define an administrator” across multiple applications, while scopes instructs Keycloak on what data Grafana requires in order to successfully authenticate and authorize.

Finally, ingress is the bridge between the Grafana instance and the load balancer. Within the Helm chart, an Ingress resource will be created that will point to the Service created by the chart, accessible on the domain grafana.zai.dev.

tls provides instructions on how to load the TLS Certificate associated with grafana.zai.dev. In my case, I store the certificate inside a Secret named grafana-tls.

resource "helm_release" "grafana" {
  name       = local.grafana_deployment_name
  repository = local.grafana_repository
  chart      = "grafana"
  version    = var.grafana_chart_version

  values = [
    yamlencode({ "grafana.ini" = {
      analytics = {
        check_for_updates = true
      },
      grafana_net = {
        url = "https://grafana.net"
      },
      log = {
        mode = "console"
      },
      paths = {
        data         = "/var/lib/grafana/",
        logs         = "/var/log/grafana",
        plugins      = "/var/lib/grafana/plugins",
        provisioning = "/etc/grafana/provisioning"
      },
      server = {
        domain   = "grafana.zai.dev",
        root_url = "https://grafana.zai.dev"
      },
      "auth.generic_oauth" = {
        enabled             = true,
        name                = "Keycloak",
        allow_sign_up       = true,
        client_id           = "$__file{/etc/secrets/oidc_credentials/id}",
        client_secret       = "$__file{/etc/secrets/oidc_credentials/secret}",
        disable_login_form  = true
        auth_url            = "$__file{/etc/secrets/oidc_credentials/auth_url}",
        token_url           = "$__file{/etc/secrets/oidc_credentials/token_url}",
        api_url             = "$__file{/etc/secrets/oidc_credentials/api_url}",
        scopes              = "openid profile email offline_access roles",
        role_attribute_path = "contains(realm_access.roles[*], 'admin') && 'Admin' || contains(realm_access.roles[*], 'editor') && 'Editor' || 'Viewer'"
      } 
      },
      "sidecar" = {
        "datasources" = { "enabled" = true },
        "alerts" = { "enabled" = true },
        "dashboards" = { "enabled" = true }
      },
      ingress = {
        enabled = true,
        hosts   = ["grafana.zai.dev"]
        tls = [
          {
            secretName = "grafana-tls",
            hosts = ["grafana.zai.dev"]
          }
        ]
      },
      assertNoLeakedSecrets = false,
    })
  ]
}

Deploying Datasources for Grafana

[]PersistentVolume are key to reliability. Without these, each data source has nowhere to store their data across crashes or reboot. All the Helm charts for each data source, by default, create a PersistentVolumeClaim and rely on the creation of a PersistentVolume with matching labels by an external factor, human or automated.

DigitalOcean’s Operator will create a Volume / Block Store whenever a PersistentVolumeClaim resource is created with any do-* storageClass.

By default, DOKS clusters have do-block-storage as a default storage class for PVCs. Once the block storage has been created, the operator will then create a PersistentVolume with matching labels so that the internal Kubernetes operator can take care of the binding between PVs and PVCs natively.

Deploying Prometheus

Prometheus is ideal for alerting on real-time numeric metrics, and doesn’t require much configuration in a small facility configuration. It includes the entire prometheus stack: AlertManager, Push Gateway and a node metrics exporter.

It contains an operator that provides Kubernetes service discovery by hooking into onto the Service creation loop and looks for prometheus.io/* annotations, and instructs prometheus to start scraping from them. At a minimum, these annotations look like;

prometheus.io/scrape=true
- Tells prometheus to actively scrape this Service
prometheus.io/path=/metrics
- Prometheus scrapes on HTTP. It will request this path.
prometheus.io/port=9090
- Prometheus will connect to a HTTP server on this port within the services’ Endpoint

This means that I don’t have to modify Prometheus configuration directly when expanding the services that my Kubernetes cluster is hosting. By simple appending annotations to any new services that expose metrics in OpenTelemetry format, I will immediately get data visible within Grafana from it.

resource "helm_release" "prometheus" {
  name       = "prometheus"
  repository = "https://prometheus-community.github.io/helm-charts"
  chart      = "prometheus"
  version    = "25.26.0"
}

Deploying Elasticsearch

Elasticsearch is great for analyzing documents and transactions where the data-type varies. It’s defined as a search engine. I use this data source for analyzing articles and stock market transactions.

My first problem was how resource-hungry Elasticsearch is in its’ nature. I had to dial down it’s memory usage to match the amount of content I was putting it through. 512Mb appears to be the right number for it to function as 256Mb causes it to fail to initialize. Increasing this value alongside the replicas value will give me higher availability.

Because of the 512Mb limit, I had to upsize my Kubernetes node as it would report that there was insufficient memory to deploy Elasticsearch.

To get data into Elasticsearch, I use the Elasticsearch Telegraf exporter and connect the input either RabbitMQ, a web socket or a HTTP polling feed. When I’m generating data through Python or Node.JS, I don’t push the data directly from the code, rather pushing the data through RabbitMQ for Telegraf to handle. I do this so that I can throttle the amount of data going through to elasticsearch that may take the service down.

resource "helm_release" "elasticsearch" {
  name       = "elasticsearch"
  repository = "https://helm.elastic.co"
  chart      = "elasticsearch"
  version    = "8.5.1"

  set {
    name  = "replicas"
    value = 1
  }

  set {
    name = "resources.requests.memory"
    value = "1Gi"
  }

  set {
    name = "resources.limits.memory"
    value = "1Gi"
  }

  set {
    name = "heapSize"
    value = "512Mi"
  }

  set {
    name  = "minimumMasterNodes"
    value = 1
  }

  set {
    name  = "volumeClaimTemplate.resources.requests.storage"
    value = "4Gi"
  }

  set {
    name = "cluster.initialMasterNodes"
    value = "elasticsearch-master"
  }
}

cluster.initialMasterNodes is needed in this helm chart as it instructs Elasticsearch to “find itself”. elasticsearch-master is the name of the Kubernetes Service that gets created, and in turn, will instruct the kube-dns service to return the IP Address of the elasticsearch instance when requesting elasticsearch-master.
I restrict the size of the DigitalOcean volume through volumeClaimTemplate.resources.requests.storage, as by default it’s around 20Gi.
minimumMasterNodes and replicas are restricted to 1 as I don’t need more than one instance of Elasticsearch. If I increase the amount of replicas and begin to shard, Grafana shouldn’t need additional configuration to cater for that.

Deploying InfluxDB

InfluxDB is my time-series database of choice when working with historical data that will need batch processing at some point (e.g: Grafana Alerting), such as Apple HealthKit and stock market data. Flux, the syntax used by InfluxDB, is extremely powerful in comparison to PromQL. But with more complexity comes a performance hit.

I also use Telegraf to ingest data into InfluxDB, with inputs pointing solely at RabbitMQ. I use NodeJS to listen to websocket streams and push data points to RabbitMQ for ingestion. Because of the amount of streaming data I plan to put into InfluxDB, I set persistence.size to a high amount of 12GB.

As the chart version hadn’t been updated in a while, using an image tag that was causing me some errors, I manually set the image.tag to the latest available version.

resource "helm_release" "influx" {
  name = "influxdb"
  repository = "https://helm.influxdata.com/"
  chart = "influxdb2"
  version = "2.1.2"

  set {
    name = "image.tag"
    value = "2.7.10"
  }

  set {
    name = "persistence.size"
    value = "12Gi"
  }
}

Deploying Loki

Loki is the most complex to configure, but I find it more intuitive (for Grafana) as a way to store system and application logs. I deploy it in a single binary configuration, and use DigitalOcean Spaces as the backend storage for logs themselves. Relying on a block storage may prove problematic as millions of messages would require constant re-provisioning of storage.

resource "helm_release" "loki" {
  name = "loki"
  repository = "https://grafana.github.io/helm-charts"
  chart = "loki"
  version = "6.18.0"

  values = [
    yamlencode({
        loki = {
          commonConfig = {
            replication_factor = 1
          }
          storage = {
            type = "s3"
            bucketNames = {
              chunks = "",
              ruler = "",
              admin = "",
            },
            s3 = {
              s3 = "s3://",
              endpoint = "lon1.digitaloceanspaces.com",
              region = "lon1",
              secretAccessKey = "",
              accessKeyId = "",
            }
          }
          schemaConfig = {
            configs = [
              {
                from = "2024-04-01",
                store = "tsdb",
                object_store = "s3",
                schema = "v13",
                index = {
                  "prefix" = "loki_index_",
                  "period" = "24h"
                }
              }
            ]
          },
        },
        deploymentMode = "SingleBinary",
        backend = { replicas = 0 },
        read = { replicas = 0 },
        write = { replicas = 0 },
        singleBinary = { replicas = 1 },
        chunksCache = { allocatedMemory = 2048 }
      })
  ]
}

Pushing logs to Loki

Loki exposes an API Endpoint for pushing logs to like the Prometheus Push Gateway, which accepts logs in a OpenTelemetry-compatible format. One tool, Promtail, will follow all container logs created by all pods in a Kubernetes cluster and stream them to the Loki push API.

resource "helm_release" "promtail" {
  name = "promtail"
  repository = "https://grafana.github.io/helm-charts"
  chart = "promtail"
  version = "6.16.6"

  values = [
    yamlencode({
        config = {
          clients = [{url = "http://loki-gateway/loki/api/v1/push", tenant_id = "zai"}]
        }
      })
  ]
}

loki-gateway is the default name of the Kubernetes Service created by the Loki helm chart. The kube-dns service will return the Endpoint IP Address of the Loki instance.

Deploying Provisioned Components for Grafana

Deploying Grafana with sidecar containers provisions operators that listen for []ConfigMap with specific labels for Dashboards, Alerts and Datasources. Simply, it takes the value of the ConfigMap and puts it into Grafana’s provisioning directory.

Grafana’s provisioning directory is defined by paths.provisioning within grafana.ini, which can be injected upon deploying the Grafana helm chart within the "grafana.ini" key. In my case, this path is /etc/grafana/provisioning.

Grafana natively will read its’ provisioning directory and load them into the instance, regardless if its’ containerized or running on the system directly.

Provisioning Dashboards

For dashboards, a label of grafana_dashboard needs to exist, but the value is irrelevant. I use templatefile() to load the file as string into main.json. This will allow me in the future to handle the renaming of data sources used within a Dashboard, or to manipulate a dashboard directly from Terraform.

I design dashboards directly within Grafana, export them as JSON and store them alongside the Terraform module for use by the ConfigMap. In the following resource, my exported dashboard will end up under /etc/grafana/provisioning/main.json.

Within the export menu, Grafana does provide the option to export using HCL (Terraform). I don’t opt for this option as this requires Grafana to be up and running in order to execute the resource. With the approach of declaring Dashboards via ConfigMap, I can re-deploy the dashboard in one go and remove the direct dependency on the Grafana instance running.

resource "kubernetes_config_map" "grafana_dashboards" {
  metadata {
    name = "grafana-dashboards"
    labels = {
      grafana_dashboard = "1"
    }
  }

  data = {
    "main.json" = templatefile("/path/to/dashboard.json", {})
  }
}

Provisioning Alerts

I follow the same approach as above for declaring alerts, with the exception that grafana_alert is the expected label from the sidecar.


resource "kubernetes_config_map" "grafana_alerts" {
  metadata {
    name = "grafana-alerts"
    labels = {
      grafana_alert = "1"
    }
  }

  data = {
    "alerts.json" = templatefile("/path/to/alert.json", {})
  }
}

Provisioning Data-sources

I build the configuration myself when it comes to data sources. The specification varies between each data source. Thanks to using Terraform to deploy each data source, I can re-use the variables used to define the Service names of each data source so that Grafana can find them correctly.

Provisioning Prometheus as a data source

resource "kubernetes_config_map" "prometheus_grafana_discovery" {
  metadata {
    name = "prometheus-grafana-datasource"
    labels = {
      grafana_datasource = "prometheus"
    }
  }

  data = {
    "prometheus.yml" = yamlencode({
        apiVersion = 1,
        datasources = [
          {
            name = var.prometheus_deployment_name,
            type = "prometheus",
            url = "http://${var.prometheus_deployment_name}.${helm_release.prometheus.namespace}.svc.cluster.local",
            access = "proxy"
          }
        ]
    })
  }
}

With the above resource declared from Kubernetes, I then just manipulate datasources = [] to match the following specifications for each datasource;

Specification for Loki

"apiVersion": 1
"datasources":
- "jsonData":
    "httpHeaderName1": "X-Scope-OrgID"
  "name": "prometheus-server"
  "secureJsonData":
    "httpHeaderValue1": "1"
  "type": "loki"
  "url": "http://loki.default.svc.cluster.local"

X-Scope-OrgID is a trick to inject an Organization ID into the HTTP Header so that Grafana gets authenticated by Loki.
loki is the default name of the Kubernetes service created by the Helm chart

Specification for Elasticsearch

Elasticsearch needs one declaration per index (if splitting the data by index). I create an index for each source of data being ingested into Elastic, and postfix it with the date of ingestion.

For authentication, I use the password for the elastic as defined by the Helm chart. By default, this is randomly generated and stored within a Secret. I also use the tlsSkipVerify flag as additional configuration is needed for elasticsearch to use a TLS certificate that’s respected by Grafana. Since the traffic is internal, I’m not that concerned by this at this point.

elasticsearch-master is the default name of the service created by the Helm chart.

"apiVersion": 1
"datasources":
- "basicAuth": true
  "basicAuthUser": "elastic"
  "jsonData":
    "index": "twelvedata-*"
    "timeField": "@timestamp"
    "tlsSkipVerify": true
  "name": "Elasticsearch (Twelve Data)"
  "secureJsonData":
    "basicAuthPassword": ""
  "type": "elasticsearch"
  "url": "https://elasticsearch-master:9200"
- "basicAuth": true
  "basicAuthUser": "elastic"
  "jsonData":
    "index": "coinbase-*"
    "timeField": "@timestamp"
    "tlsSkipVerify": true
  "name": "Elasticsearch (Coinbase)"
  "secureJsonData":
    "basicAuthPassword": ""
  "type": "elasticsearch"
  "url": "https://elasticsearch-master:9200"

Specification for InfluxDB

"apiVersion": 1
"datasources":
- "jsonData":
    "default_bucket": "default"
    "organization": "influxdata"
    "version": "Flux"
  "name": "InfluxDB"
  "secureJsonData":
    "token": ""
  "type": "influxdb"
  "url": "http://influxdb-influxdb2:80"

The Flux version forces InfluxDB v2, which in turn requires a default_bucket and organization. These values are defined by the Helm chart, but its’ default values are used here.
token is also defined by the Helm chart and stored within a Secret. I opt for using the randomly generated default.
influxdb-influxdb2 is the default name of the Service created by the Helm chart.

With all this in place, I have a Terraform module that deploys a Grafana stack onto DigitalOcean’s Kubernetes platform, while maintaining portability.

Deploying AWS Site-to-Site on OpenWRT

Zak — Tue, 10 Dec 2024 16:33:06 +0000

I want to connect to resources on AWS from my home with the least operational overhead, leading me to deploy AWS Site-to-Site for connecting resources from my home to a VPC.

The Environment

Some resources I want to access are;

G4dn.xlarge EC2 instances used for streaming games
t2.micro EC2 instances hosting Home Assistant
RDS (PostgreSQL) instances for hosting OpenStreetMap data

Home Environment

When setting up a connection from AWS to my home, I have to consider the following specifications;

I live in West London, relatively close to the eu-west-2 data center
- I have a VPC in eu-west-2 running on the 10.1.0.0/16 network
I use a publicly-offered ISP for accessing the internet
There are two hops (routers) between the public internet and my home network
- The first hop is the router provided by the ISP to connect to the internet
  - This network lives on the 192.168.0.0/24 subnet
- The second hop is my own off-the-shelf router from ASUS running OpenWRT
  - My home network lives on the 10.0.0.0/24 subnet
  - The router has 8MB of usable storage for packages and configuration

Setting up AWS Site-to-Site

AWS Site-to-Site is one of Amazon’s offerings for bridging an external network to a VPC over the public internet. Some alternatives are;

AWS Client VPN (based on OpenVPN)
- More expensive
- More complex, often tends to be slower without optimization
Self-managed VPN
- Allows use of any VPN technology, such as Wireguard
- Allows custom metric monitoring
- Requires management of VPC topologies and firewalls
- Can be more expensive

I chose to use the Site-to-Site in this occasion so I could learn about how IPSec works in more detail, and saw it as a challenge in deploying to OpenWRT. It’s also a lot cheaper than a firewall license, EC2 rental and public IP charges.

Deploying a Virtual Private Gateway

A Virtual Private Gateway is the AWS-side endpoint of an IPSec tunnel. It also hosts the configuration of the local BGP instance, and is what drives the propagation of routes between the IPSec tunnels and the VPC routing tables.

Dashboard view of the Virtual Private Gateway. I rely on an ASN generated by Amazon for this instance.

resource "aws_vpn_gateway" "main" {
  vpc_id = data.aws_vpc.main.id
}

There’s not much to configure with the VPG, so I left it with its’ defaults.

Deploying a Customer Gateway

A customer gateway represents the local end of the IPSec tunnel and the BGP daemon running on it. In my case, this is the OpenWRT router.

Dashboard view of the customer gateway, which represents my OpenWRT Router itself and the BGP daemon running on it. AWS by default provides an ASN of 65000, but I don’t have any need to customize it.

resource "aws_customer_gateway" "openwrt" {
  bgp_asn    = 65000
  ip_address = ""
  type       = "ipsec.1"
}

Deploying a Site-to-Site VPN Connection

The VPN Connection itself is what connects a VPG (AWS Endpoint) to a customer gateway (Local endpoint) in the form of an IPSec VPN connection.

Dashboard view of the Site-to-Site VPN. Everything is left as the default. For the purposes of building the automated workflow and testing connectivity, local and remote network CIDRs are 0.0.0.0/0

resource "aws_vpn_connection" "main" {
  customer_gateway_id = aws_customer_gateway.openwrt.id
  vpn_gateway_id      = aws_vpn_gateway.main.id
  type                = aws_customer_gateway.openwrt.type

  tunnel1_ike_versions = ["ikev1", "ikev2"]
  tunnel1_phase1_dh_group_numbers = [2, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
  tunnel1_phase1_encryption_algorithms = ["AES128", "AES128-GCM-16", "AES256", "AES256-GCM-16"]
  tunnel1_phase1_integrity_algorithms = ["SHA1", "SHA2-256", "SHA2-384", "SHA2-512"]
  tunnel1_phase2_dh_group_numbers = [2, 5, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
  tunnel1_phase2_encryption_algorithms = ["AES128", "AES128-GCM-16", "AES256", "AES256-GCM-16"]
  tunnel1_phase2_integrity_algorithms = ["SHA1", "SHA2-256", "SHA2-384", "SHA2-512"]

}

These (tunnel1_*) are the default values set by AWS and should be locked down. For the purpose of testing, I left them all to their defaults. This settings are directly tied to the IPSec encryption settings described below.

Connecting OpenWRT via IPSec

Ansible Role Variables

I’ve designed my Ansible role to be able to configure AWS IPSec tunnels with the bare minimum configuration. All information that the role requires is provided by Terraform upon provisioning of the AWS Site-to-Site configuration.

bgp_remote_as: "64512"
ipsec_tunnels:
  - ifid: "301"
    name: xfrm0
    inside_addr: 
    gateway: 
    psk: 
  - ifid: "302"
    name: xfrm1
    inside_addr: 
    gateway: 
    psk:

bgp_remote_as refers to the ASN of the Virtual Private Gateway, and is strictly used by the BGP Daemon offered by Quagga.
- BGP is used to propagate routes to-and-from AWS.
- When a Site-to-Site VPN is configured to use Dynamic routing, it will state that the tunnel is Down if AWS cannot reach the BGP instance.
ipsec_tunnels is used by XFRM and strongSwan to;
- Build one XFRM interface per-tunnel
- Build one alias interface bound to each XFRM interface for static routing
- Configure the static routing of the XFRM interfaces
- Configure the BGP daemon neighbours
- Configure one IPSec endpoint per-tunnel
- Configure one IPSec tunnel for each XFRM interface

Required packages

I used three components for this workflow to function, and a last one for debugging security association errors.

strongswan-full
- strongSwan provides an IPSec implementation for OpenWRT with full support for UCI. The -full variation of the package is overkill, but you never know!
quagga-bgpd
- A BGP implementation light enough to run on OpenWRT. quagga comes in as a dependency
luci-proto-xfrm
- A virtual interface for use by IPSec, where a tunnel requires a vif to bind to.
ip-full
- Provides an xfrm argument for debugging IPSec connections with.

name: install required packages
opkg:
  name: "{{ item }}"
loop:
  - strongswan-full
  - quagga-bgpd
  - luci-proto-xfrm
  - ip-full

Adding the XFRM Interface

OpenWRT LuCI dashboard, showing the final result of the interfaces tab. I declare two XFRM interfaces, one per VPN tunnel provided by AWS, each with an IPv4 assigned that matches the Inside IPv4 CIDRs defined within the AWS Site-to-site configuration. The IPv4 address is applied to an alias of the adapter rather than the adapter itself as the XFRM interface doesn’t support static IP addressing via UCI.

Ansible Task

name: configure xfrm adapter
uci:
  command: section
  key: network
  type: interface
  name: "{{ item.name }}"
  value:
    tunlink: 'wan'
    mtu: '1300'
    proto: xfrm
    ifid: "{{ item.ifid }}"
loop: "{{ ipsec_tunnels }}"

/etc/config/network – UCI Configuration

config interface 'xfrm0'
	option ifid '301'
	option tunlink 'wan'
	option mtu '1300'
	option proto 'xfrm'

config interface 'xfrm1'
	option tunlink 'wan'
	option mtu '1300'
	option proto 'xfrm'
	option ifid '302'

I use the uci task to deploy adapter configurations. I create one interface per-tunnel provided by AWS.

tunlink sets the IPSec tunnel to connect to & from the wan interface
mtu is 1300 by default, I didn’t need to configure this value
ifid is defined as strongSwan will use this to bind an IPSec tunnel to a network interface. This is separate from the name of the interface.

AWS needs to communicate with the BGP instance running on OpenWRT. The value of Inside IPv4 CIDR instructs AWS which IPs to listen on for their BGP instance, and which IP to connect to for fetching routes. The CIDRs will be restricted to the /30 prefix, which provides the range of 4 IP addresses, 2 of which are usable.

As an example, here is the Inside IPv4 CIDR of 169.254.181.60/30 and what that means.

IP Index	IP Address	Responsibility
0	`169.254.181.60`	Network address
1	`169.254.181.61`	IP Address reserved for AWS-side of the IPSec tunnel
2	`169.254.181.62`	IP Address reserved for the OpenWRT-side of the IPSec tunnel
3	`169.254.181.63`	Broadcast address

With this known, we know that;

On the AWS side of the IPSec tunnel	On the OpenWRT side of the IPSec tunnel
AWS has a BGP instance listening on the IP Address on the first index ( `169.254.181.61` )	OpenWRT needs to be configured to use the IP address on the second index (`169.254.181.62`)
AWS is expecting a BGP neighbour on the second index ( `169.254.181.62` )	The BGP daemon running on OpenWRT needs to be configured to use the neighbor at the first index (`169.254.181.61`)
AWS knows how to route traffic across the `169.254.181.60` network	OpenWRT needs to know to route traffic on the `169.254.181.60` network.

Configuring the IP Address on the IPSec tunnel

I create an alias on top of the originating XFRM interface so that I can utilize the static protocol within UCI to configure static routing in a declarative way.

Ansible Task

name: create xfrm alias for static addressing
uci:
  command: section
  key: network
  type: interface
  name: "{{ item.name }}_s"
  value:
    proto: static
    ipaddr:
      - "{{ item.inside_addr | ipaddr('net') | ipaddr(2) }}"
    device: "@{{ item.name }}"
loop: "{{ ipsec_tunnels }}"

/etc/config/network – UCI Configuration

config interface 'xfrm0_s'
	option proto 'static'
	option device '@xfrm0'
	list ipaddr '169.254.211.46/30'

config interface 'xfrm1_s'
	option proto 'static'
	list ipaddr '169.254.181.62/30'
	option device '@xfrm1'

I use ipaddr('net') | ipaddr(2) to simplify my Ansible configuration. inside_addr is 169.254.181.60/30 and these functions simply increase the IP address by two, giving the result of 169.254.181.62/30.

This will ensure two things;

The xfrm interface persistently holds the 169.254.181.62/30 IP Address
The Linux routing table holds a route of 169.254.181.60/30 via the xfrm interface

This resolves the issue of OpenWRT knowing what IP Address to use and how to route the traffic.

Setting up IPSec

Because I’m using strongSwan, I can also use UCI to configure the IPSec tunnel. With this workflow, IPSec configuration is broken down into three elements;

Endpoint
- Primarily what’s known as “IKE Phase 1”. This is the “How I will connect to the other end”.
Tunnel
- Primarily known as “IKE Phase 2”. This is the “How do I pass traffic through to the other end”.
Encryption
- A set of rules to describe how to handle the cryptography.

IPSec Encryption

What’s defined here drives whether Phase 1 will succeed, and must match the AWS VPN Encryption settings.

Ansible Task

name: define ipsec encryption
uci:
  command: section
  key: ipsec
  type: crypto_proposal
  name: "aws"
  value:
    is_esp: '1'
    dh_group: modp1024
    hash_algorithm: sha1

/etc/config/ipsec – UCI Configuration

config crypto_proposal 'aws'
	option is_esp '1'
	option dh_group 'modp1024'
	option encryption_algorithm 'aes128'
	option hash_algorithm 'sha1'

In my case, I’m;

Using AES128 for encryption of the traffic
Using SHA1 as the integrity algorithm for ensuring packets are correct upon arrival
Naming the crypto_proposal aws for use by the Endpoint and the Tunnel

AES128 and SHA1 are supported by the configuration defined on the VPN configuration above.

Declaring the IPSec Endpoint

Ansible Task

name: configure ipsec remote
uci:
  command: section
  key: ipsec
  type: remote
  name: "{{ item.name }}_ep"
  value:
    enabled: "1"
    gateway: "{{ item.gateway }}"
    local_gateway: ""
    local_ip: "10.0.0.1"
    crypto_proposal:
      - aws
    tunnel:
      - "{{ item.name }}"
    authentication_method: psk
    pre_shared_key: "{{ item.psk }}"
    fragmentation: yes
    keyingretries: '3'
    dpddelay: '30s'
    keyexchange: ikev2
loop: "{{ ipsec_tunnels }}"

/etc/config/ipsec – UCI Configuration

config remote 'xfrm0_ep'
	option enabled '1'
	option gateway ''
	option local_gateway ''
	option local_ip '10.0.0.1'
	list crypto_proposal 'ike2'
	list tunnel 'xfrm0'
	option authentication_method 'psk'
	option pre_shared_key ''
	option fragmentation '1'
	option keyingretries '3'
	option dpddelay '30s'
	option keyexchange 'ikev2'

config remote 'xfrm1_ep'
	option enabled '1'
	option gateway ''
	option local_gateway ''
	option local_ip '10.0.0.1'
	list crypto_proposal 'ike2'
	list tunnel 'xfrm1'
	option authentication_method 'psk'
	option pre_shared_key ''
	option fragmentation '1'
	option keyingretries '3'
	option dpddelay '30s'
	option keyexchange 'ikev2'

The gateway is known as the Outside IP Address on AWS
local_gateway points to the WAN Address of OpenWRT
local_ip points to the LAN address of OpenWRT
crypto_proposal points to aws (Defined above)
tunnel points to the name of the interface that this IPSec endpoint represents.
- Since there are two IPSec endpoints, two of these remotes are created. I use the interface name (from xfrm) across all duplicates to make sure that it’s visibly clear what’s being used where.
pre_shared_key is the PSK that gets generated (or set) within the VPN Tunnel.
- This is unique per-tunnel, meaning that there should be two different PSKs per Site-to-site VPN connection. They can be found under the Modify VPN Tunnel Options selection.

Configuring the IPSec Tunnel

The tunnel instructs strongSwan how to bind the IPSec tunnel to an interface. The key here is the ifid of the XFRM interfaces defined earlier.

Ansible Task

name: configure ipsec tunnel
uci:
  command: section
  key: ipsec
  type: tunnel
  name: "{{ item.name }}"
  value:
    startaction: start
    closeaction: start
    crypto_proposal: aws
    dpdaction: start
    if_id: "{{ item.ifid }}"
    local_ip: "10.0.0.1"
    local_subnet:
      - 0.0.0.0/0
    remote_subnet:
      - 0.0.0.0/0
loop: "{{ ipsec_tunnels }}"

/etc/config/ipsec – UCI Configuration

config tunnel 'xfrm0'
	option startaction 'start'
	option closeaction 'start'
	option crypto_proposal 'ike2'
	option dpdaction 'start'
	option if_id '301'
	option local_ip '10.0.0.1'
	list local_subnet '0.0.0.0/0'
	list remote_subnet '0.0.0.0/0'

config tunnel 'xfrm1'
	option startaction 'start'
	option closeaction 'start'
	option crypto_proposal 'ike2'
	option dpdaction 'start'
	option if_id '302'
	option local_ip '10.0.0.1'
	list local_subnet '0.0.0.0/0'
	list remote_subnet '0.0.0.0/0'

Like the AWS configuration, I define the local_subnet and remote_subnet to 0.0.0.0/0. This is so I can focus on testing connectivity.
if_id points to the XFRM interface that’s representing the tunnel in iteration.
- The if_id must match the tunnel in iteration, as the Inside IPv4 CIDRs have been bound to an interface.

Configuring BGP on OpenWRT

In order to apply BGP routes on the AWS-side, route propagation must be enabled on a routing table level. Otherwise, a static route pointing to my home IP Address (10.0.0.0/24) via the Virtual Private Gateway must be declared.

I opted for Quagga when using BGP on OpenWRT.

router bgp 65000
bgp router-id {{ ipsec_inside_cidrs[0] | ipaddr('net') | ipaddr(2) | split('/') | first }}
{% for ipsec_inside_cidr in ipsec_inside_cidrs %}
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} remote-as {{ bgp_remote_as }}
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} soft-reconfiguration inbound
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} distribute-list localnet in
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} distribute-list all out
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} ebgp-multihop 2
{% endfor %}

/etc/quagga/bgpd.conf – Rendered Template

router bgp 65000
bgp router-id 169.254.211.46
neighbor 169.254.211.45 remote-as 64512
neighbor 169.254.211.45 soft-reconfiguration inbound
neighbor 169.254.211.45 distribute-list localnet in
neighbor 169.254.211.45 distribute-list all out
neighbor 169.254.211.45 ebgp-multihop 2
neighbor 169.254.181.61 remote-as 64512
neighbor 169.254.181.61 soft-reconfiguration inbound
neighbor 169.254.181.61 distribute-list localnet in
neighbor 169.254.181.61 distribute-list all out
neighbor 169.254.181.61 ebgp-multihop 2

Like earlier, I use ipaddr('net') | ipaddr(1) to increment the IP address from the CIDR
remote-as defines the AWS-side ASN.
- BGP at its’ core defines routes based on path to AS, a layer on-top of IP Addresses.
- It’s designed to work with direct connections, not over-the-internet.
  - ISPs & exchanges will, however, use BGP at their level to forward the traffic on.
router bgp states what the ASN of the OpenWRT router is. Because I used the default of 65000 from AWS, I place that here.
bgp router-id is set to the first XFRM interface’s IP address, since the same BGP instance will be shared by both tunnels in the event that one tunnel goes down. AWS does not do a validation check on the router-id.

Verifying the connection to IPSec

Using the swanctl command, I can identify whether my applied configuration is successful when logged into my OpenWRT router using SSH.

Start swanctl

I don’t use the legacy ipsec init script, instead, directly using the swanctl one. Under the hood, this will convert the UCI configuration into a strongSwan configuration located at /var/swanctl/swanctl.conf

/etc/init.d/swanctl start
ipsec statusall

Output of the ipsec statusall command, where both VPN tunnels are ESTABLISHED and INSTALLED. Established denotes that IKE Phase 1 (Encryption negotiation) was successful and Installed denotes that IKE Phase 2 (Authorization, the tunnel creation itself) was successful and is now in use.

Connection can also be verified from the AWS Console, by looking at the value of Details. If the connection doesn’t say IPSEC IS DOWN, the connection was successful. Status is only up when BGP can be reached from AWS. When using Dynamic (not static) routing in the configuration for Site-to-Site, AWS doesn’t declare a connection up unless BGP is reachable at the second address available in the Inside IPv4 CIDR.

Routing traffic to & from the XFRM Interface

I finally need to instruct OpenWRT to forward packets that are destined to xfrm0 or xfrm1 to be allowed. The fact that the Linux routing table will state that 10.1.0.0/24 is accessed via xfrm0, which is applied via BGP is enough to know that either xfrm0 or xfrm1 is the interface required.

By default, a flag of REJECT is defined. By applying the following firewall rule, packet successfully go through to the AWS VPC.

Ansible Task

name: install firewall zone
uci:
  command: section
  key: firewall
  type: zone
  find_by:
    name: 'tunnel'
  value:
    input: REJECT
    output: ACCEPT
    forward: REJECT
    network:
      - xfrm0
      - xfrm1

name: install firewall forwarding
uci:
  command: section
  key: firewall
  type: forwarding
  find_by:
    dest: 'tunnel'
  value:
    src: lan

/etc/config/firewall – UCI Configuration

config zone
	option name 'tunnel'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	list network 'xfrm0' 'xfrm1'

config forwarding
	option src 'lan'
	option dest 'tunnel'

Final tasks

The final steps of the Ansible playbook is to instruct the UCI framework to save the changes to the disk, and to reload the configuration of all services required.

name: commit changes
uci:
  command: commit

name: enable required services
service:
  name: "{{ item }}"
  enabled: yes
  state: reloaded
loop:
  - swanctl
  - quagga
  - network

I then invoke the Ansible playbook by using a local-exec provisioner on a null_resource within terraform, where the AWS Site-to-Site resource is a dependency. Along the lines of:

resource "null_resource" "cluster" {
  provisioner "local-exec" {
    command = <





This is a shortened version of what I have, but by simply piping the Ansible playbook with the outputs of the AWS Site-to-Site Resource, my router is automatically configured correctly when I create a Site-to-Site resource.



With IPSec now deployed, I can communicate directly with my resources hosted on AWS as if it were local.



Bus route-finding with PostGIS and ArangoDB
Zak — Mon, 02 Dec 2024 01:47:15 +0000

I created a bus route finding tool using PostGIS and ArangoDB.



I wanted a different way to navigate from village to village using bus routes. Transport for London (TfL) provides spider maps (which is almost the perfect answer), but they don’t show long bus routes and are specific to London. 



While online mapping services are the ‘de-facto’ approach for finding specific routes, I find that they don’t tend to resolve long routes well and sometimes don’t show all services available. Instead, I’d like a way to know which routes are scheduled to pass through which village and allow myself (or a tool) to resolve that.



An example of a spider map for Heathrow Airport. (From the TfL Website). I can visually see which buses I can use to reach Hounslow, Feltham or Hayes without worrying about the stops in-between, but I can’t see what’s further than these towns.



To get an overview of possible routes at scale, I built a workflow that uses;




OpenStreetMap data stored in PostgreSQL to resolve bus route connections



ArangoDB to store resolved bus route connections and the villages that they connect to



Python as an intermediary for bringing data from PostgreSQL to ArangoDB and presenting routes




Obtaining & Interpreting the OpenStreetMap Data



I imported the London dataset from Geofabrik into PostgreSQL using osm2pgsql. I have a database instance running locally on my MacBook Pro, but a cloud-managed instance (e.g using RDS) would work just as well.



osm2pgsql -d osm -U osmuser --create --slim --hstore -C 2000 --number-processes 4 greater-london-latest.osm.pbf

# -d [database name]
# -U [postgres username]
# -C 2000; for limiting the amount of RAM usage by the import tool



Visualizing the OpenStreetMap Data 



QGIS is a great tool that allows me to explore the openstreetmap dataset directly from PostgreSQL (or file) in it’s intended form, a map. I’ll be using it to drive how my SQL queries are structured and to see what data points I can use to reach the end goal.




Adding a data source for viewing in QGIS can be done by right-clicking PostgreSQL and hitting “New Connection” inside the browser







Since I’m working on my local MacBook, simply providing the host and database is enough. I don’t need authentication in my ‘rapid-development’ scenario. 




Once imported, I extracted planet_osm_line for routes and planet_osm_points for villages or ‘places of interest’. I do this by applying a filter of route = 'bus' on the planet_osm_line layer, and place = 'suburb' on planet_osm_point.



The Attribute Table in QGIS for planet_osm_point, which shows the dataset in its’ relational form, or in the way that PostgreSQL would present it. Here is where I isolate that I need to filter place by suburb.



Representation of the data points within planet_osm_point table from PostgreSQL, filtered down to show only entries that are of type suburb.



Representation of the planet_osm_line table from PostgreSQL, filtered down to show only entries that represent a bus route.



What am I trying to achieve?



The end-goal is to be able to resolve which planet_osm_line[] can reach from planet_osm_point A -> planet_osm_point B, or in a simpler terms, “Which buses can I use to get from town A to town B”. Not specifically “How can I get from my house to the specific store on the other side of town”.



Example of how I’ll achieve this in West London. Within GIS, a Buffer expands a point into an area, while an Intersection captures at any geometry that insects with another. The idea is to get all lines that pass through the blue circle.



I was able to create a 1km boundary from planet_osm_points directly in PostgreSQL by using the ST_Buffer GIS function, and to conditionally return using the ST_Intersects GIS function. Additionally, to filter out all the irrelevant data points, I return all points where place has a value of suburb, and all where all lines where the value of route is bus.



WITH suburb_buffer AS (
    SELECT name, place, ST_Buffer(way, 1000) AS buffer
    FROM planet_osm_point
    WHERE place = 'suburb'
)
SELECT DISTINCT line.ref, sb.name
FROM planet_osm_line line
JOIN suburb_buffer sb ON ST_Intersects(line.way, sb.buffer)
WHERE line.route = 'bus';



line.ref sb.name
SL8 Hanwell
SL8 Hillingdon
SL8 Shepherd’s Bush



With this data returned, I can clearly see that the path of the SL8 bus route passes through Hanwell, Hillingdon and Shepherd’s Bush.



I’m relying on the database to handle as much of the resolving logic as possible as opposed to any Python code, mainly for reducing time spent on maintaining Python code. In order, this query is doing the following;




suburb_buffer represents a list of all suburbs according to OpenStreetMap.

way is a single point stored as GIS data.



I request a circle around way points that is 1km wide, and store it as buffer.





sb is a single iteration of suburb_buffer, which represents planet_osm_points.



line is a single iteration of planet_osm_line, restricted by line.route = 'bus'



line is also restricted by whether ST_Intersects() returns True.

ST_Intersects() verifies whether the line in iteration matches the point in iteration.






I then moved this query to a Python script to translate this information to ArangoDB. I did this so that I can use the best tool for the job. As a graph database, I can quickly perform complex relational queries without much computational overhead.



Three collections exist in ArangoDB, where each returned value from PostgreSQL is converted into one document in each collection;




route holds all bus routes, taken from line.ref (planet_osm_line)



location holds all suburbs, taken from sb.name (planet_osm_point)



route_connections is the confirmation returned from ST_Intersects() (route passes through location)

_to and _from are required keys in an Edge document. 



After creating the route and location, I;

Take the _id of each



Set _to on the route_connection to the _id of location



Set _from on the route_connection to the _id of route








A graph view with Hayes (from planet_osm_point) as a starting point. A purple node represents a suburb, a grey node represents a route, and a line states that a route passes through a suburb.



AQL (ArangoDB Query Language) is used to traverse the graph and resolve routes.



FOR vtx, edge, path IN 1..6 ANY 'location/{START}' route_connection
    FILTER vtx.name == "{TARGET}"
    RETURN path



Here, I traverse from location/{START} to a location where location.name is {TARGET}. This is checked against the route_connection collection. For the result below, {START} is “Yiewsley” and {TARGET} is “Shepherd’s Bush”.



Finally, I built an “API Bridge”, returning a JSON representation of calculated routes for use by a possible Vue.JS interface or the likes.



"698 ➡ Hillingdon ➡ SL8, N207 ➡ Shepherd's Bush ➡"
{
	"Hillingdon": {
		"routes": ["698", "U1", "U3", "U5"],
		"next": {
			"Shepherd's Bush": {
				"routes": ["SL8"]
			}
		}
	}
}



A basic CLI implemented in Python, where a route is calculated from Yiewsley to Shepherd’s Bush. It definitely presents a lot more routes, some with a lot more stops, but it’s a start! This takes approximately 0.2 seconds to perform its’ calculation.



In the future, I’d to render spider maps using this data or to be able to interpret this data on-the-go on my iPhone. But at-least now I can get all the possible routes (extreme, quick and sometimes not possible, since 698 is the local school bus) from one village to another without a specific location in mind.



I’d also like to explore optimizing the data structure within the graph database, perhaps moving the route collection into route_connections and eventually including stops and distance to provide more accurate resolves.



Deploying Region-locked AWS Organizations using Terraform
Zak — Mon, 07 Oct 2024 12:30:03 +0000

As a solutions architect, I was tasked with building an AWS Organizations hierarchy for a Canadian startup that needed to comply with local laws and enable multi-site configurations for networking.



To get started, I built an AWS Organizations hierarchy using Terraform. I chose Terraform because it allows me to use the same workflow for building organizations across multiple clouds. This post will focus on building an Organizational Unit (OU) tree for regions and localities.



To create OUs, I have a “basic” Terraform module that is a wrapper on the aws_organizations_organizational_unit resource. To make it reusable, I expose the name and parent. I then specialize the “basic” Terraform module into ones more specific to each organization by injecting tags and appending a postfix to the name of the OU, such as the region or locality.



For compliance, I restrict at the OU-level which zones can be used by the AWS account and any IAM users assuming the role of this AWS account. I use a Service Control Policy (SCP) to deny access to all regions except for those specified in the local.regions value. Because a lot of core infrastructure for AWS is located within us-east-1 and us-east-2, such as Billing, I need to always include it in the local.regions value.



Since I need to cater for both compliance and multi-site, I used my modules to build the OUs in the following hierarchy:




Root Organization

Region OU (e.g: North America)

Country OU (e.g: Canada)

Locality OU (e.g: Vancouver)










And with Terraform modules structured in the following way:




Root Organization

Client Module

Client Root Organizational Unit



Region / Country / Locality Module

Base OU Module

Region / Country / Locality Organizational Unit





Region Policies

SCP Policy












In the case of Vancouver, while the Seattle local zone or us-west-2 region is closer, it’s not located within Canada which may be a problem when looking at local labor laws and compliance, so Calgary (ca-west-1) is the next best thing. I’m waiting for the Vancouver local zone to become publicly available so that I can use that, but it will fall under Calgary anyway.



This means that my SCPs restrict the organizational units to the following regions:




North American OU

us-west-1, us-west-2, us-east-1, us-east-2, ca-central-1, ca-west-1





Canadian OU

us-east-1, us-east-2, ca-central-1, ca-west-1





Vancouver OU

us-east-1, us-east-2, ca-west-1






Because of the hierarchy approach, I can have AWS accounts in parents with shared resources such as VPCs, Databases, S3 and EFS shares. This will be hugely beneficial when working multiple sites.



My re-usable modules follow the following structure:



Core Organizational Unit



This holds the default values for all OUs within the organization, where tags for example would be shared.



resource "aws_organizations_organizational_unit" "root" {
  name      = var.name
  parent_id = var.parent

  tags = var.tags
}



Inheriting the Basic OU into Locality, Country & Region OUs



I re-use the basic module to make it follow a strict naming and tag convention based on the context (e.g: locality, country and region). This module is for the context and not specifically the region in question. The region in question will then re-use this module.



This makes sure that the NA Region and European Region have the same fundamentals between them.



module "basic" {
  source = "../basic"
  parent = var.parent
  name = "${var.name} - ${var.locality}"
  tags = local.tags
}



module "policies" {
  source = "../../../regions/policies"
  policy_name = local.policy_name
  target_id = module.basic.id
  regions = var.regions
}



Inheriting the Context OU Module into literal regions



Here, I take the region context module and adapt it specific to North America. The same logic applies to country and locality. This simply enforces that the tags and name of the OU contain the region and that the SCPs generated block all regions except the regions provided



module "region" {
  source = "../../templates/organization/region"
  region = "North America"
  parent = var.parent
  name = var.name
  tags = local.tags
  regions = [
    "us-east-1",
    "us-east-2",
    "us-west-1",
    "us-west-2",
    "ca-central-1",
    "ca-west-1"
    ]
}



Generating the SCPs from Terraform



policy_name here is the same as the name of an OU with spaces removed. Since SCPs require Deny rules, using the StringNotEquals test is needed.



data "aws_iam_policy_document" "region_restriction" {
  statement {
    sid = "RestrictRegionFor${var.policy_name}"
    effect    = "Deny"
    actions   = ["*"]
    resources = ["*"]

    condition {
      test = "StringNotEquals"
      variable = "aws:RequestedRegion"
      values = local.regions
    }
  }
}



resource "aws_organizations_policy" "region_restriction" {
  name    = "RestrictRegionFor${var.policy_name}"
  content = data.aws_iam_policy_document.region_restriction.json
  type = "SERVICE_CONTROL_POLICY"
}



Declaring a Regional OU for an Organization



Finally, I can use the North American OU to declare an OU that restricts any AWS Accounts inside to only create resources within North America.



module "region-na" {
  source = "../regions/north-america"
  parent = aws_organizations_organizational_unit.root.id
  name = var.name
  tags = local
}



I can do the same with Canada, and Vancouver.



module "region-ca" {
  source = "../regions/north-america/canada"
  parent = module.region-na.id
  name = var.name
  tags = module.region-na.tags
}

module "region-yvr" {
  source = "../regions/north-america/canada/vancouver"
  parent = module.region-ca.id
  name = var.name
  tags = module.region-ca.tags
}



By the end of the deployment, my hierarchy looks like the following:







And the attached SCP policies look like the following, where the SCP that is the direct parent of an AWS Account takes the most precedence:



ZAI – North America



{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictRegionForZAINorthAmerica",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "us-east-1",
            "us-east-2",
            "us-west-1",
            "us-west-2",
            "ca-central-1",
            "ca-west-1",
            "us-east-1",
            "us-east-2"
          ]
        }
      }
    }
  ]
}



ZAI – Canada



{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictRegionForZAICanada",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "ca-central-1",
            "ca-west-1",
            "us-east-1",
            "us-east-2"
          ]
        }
      }
    }
  ]
}



ZAI – Vancouver



{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictRegionForZAIVancouver",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "ca-west-1",
            "us-east-1",
            "us-east-2"
          ]
        }
      }
    }
  ]
}



Here it is in action, when exploring a region that is blocked by the SCP Policy:







Idea: Adopting Serverless for Trading Operations
Zak — Sat, 05 Oct 2024 21:20:54 +0000

I’m not very into day-trading, but I see the potential in the market from time to time, so I came up with this idea to create an automated trading system using AWS services exclusively.



The system will use Lambda, Timestream, EventBridge, S3, SQS and SageMaker to create a serverless architecture for monitoring and trading on the stock market, using the Twelvedata and Coinbase APIs for pulling in market data and executing trades, respectively.



To start, I will use EventBridge as an alternative to cron-jobs to add symbols to an SQS queue for ingestion. This ties in with the use of serverless architecture. For FOREX, the schedule will run every hour, and for Crypto and stock market, it will run every 15 minutes. This is a good balance as I’m not a professional trader and don’t need to use too many API calls.



I will have five Lambda functions:




The first Lambda function will listen to the SQS queue and query Twelvedata for the mentioned symbols. It will then insert the data directly into Timestream.



The second Lambda function will be triggered by an alert from Timestream when new data is available. For safety (and to start with), I have configured this alert to trigger hourly. The function will throw the data at the SageMaker model. If the model predicts a positive yield, the Lambda function will pass the symbol to the third lambda function via another SQS Queue.



The third Lambda function will execute a transaction on Coinbase.



The forth Lambda function will monitor Twelvedata and Coinbase for hot & trending symbols and add them to the monitoring queue, triggered by another EventBridge Schedule.



The fifth Lambda function will create a *.csv dataset from the data within Timescale.




I will use Secrets Manager to securely store the API keys for Twelvedata and Coinbase.



I’m not an AI expert and don’t know much about the specifics of training a model, so I’ll be using the SageMaker canvas feature to train the model. The canvas feature is the easiest way into training AI Models that doesn’t require making a Python script.



Finally, at the end of each day, I’ll extract a dataset from the Timestream database into a *.csv and store it in S3, then pass this file onto SageMaker for training. I’ll use one last EventBridge schedule to trigger this workflow.



Hopefully by following this approach, I’ll have a fully functioning market monitoring and trading system. 







Outline: Extending a home network setup in AWS
Zak — Sun, 22 Sep 2024 12:30:18 +0000

As someone without a permanent base, I needed a secure and flexible cloud infrastructure that allowed me to spawn powerful machines when needed. To achieve this, I built an isolated network on AWS.



I began by creating a Terraform module that provisions the infrastructure needed, such as




VPCs



Subnets



Routing tables



EC2 instances
While the module is tailored to AWS, I plan to keep the variable names consistent to other modules that re-create the setup for different cloud platforms, such as Exoscale.




The isolated network is centered around an EC2 instance, which acts as a router between a public VPC and a private VPC, similar to an at-home router. The EC2 instance has two ENI adapters, one attached to the public VPC and the other attached to the private VPC. The EC2 instance is running VyOS, which I configured using Ansible and the local-exec provisioner in Terraform upon creation.



data "aws_ami" "vyos" {
  most_recent = true

  filter {
    name   = "name"
    values = ["VyOS 1.4.0-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["679593333241"]
}



resource "aws_instance" "vyos" {
  ami           = data.aws_ami.vyos.id
  availability_zone = data.aws_availability_zones.region.names[0]
  instance_type = "t3.small"

  network_interface {
    network_interface_id = aws_network_interface.public.id
    device_index         = 0
  }

  network_interface {
    network_interface_id = aws_network_interface.local.id
    device_index         = 1
  }

  provisioner "local-exec" {
        command = "ansible-playbook -i \"${aws_eip.public.public_ip},\" "
    }
}



The public VPC has an internet gateway attached to it, and all instances in the public VPC have internet access. The router instance is the only instance that resides in the public VPC. Both VPCs have a subnet within a single availability zone (AZ), as a single EC2 instance cannot span two AZs.



resource "aws_internet_gateway" "gw" {
}

resource "aws_internet_gateway_attachment" "gw" {
  internet_gateway_id = aws_internet_gateway.gw.id
  vpc_id              = aws_vpc.public.id
}



Each VPC has a routing table to correctly route traffic. The public VPC routes all traffic towards the internet gateway, while the private VPC routes all traffic within the subnet to each other.



resource "aws_route_table" "public" {
  vpc_id = aws_vpc.public.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.gw.id
  }
}

resource "aws_route_table" "internal" {
  vpc_id = aws_vpc.internal.id
}

resource "aws_route_table_association" "internal" {
  subnet_id      = aws_subnet.internal.id
  route_table_id = aws_route_table.internal.id
}

resource "aws_route_table_association" "public" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}



I connect to my isolated network primarily through my OpenWRT-based router using WireGuard. I also use the WireGuard client on my Mac or phone to connect to the cluster when I’m outside. Keep an eye out for my posts detailing how I deploy VyOS on AWS and configure OpenWRT to connect to WireGuard.



I attached an Elastic IP to the router instance, which lets me destroy and re-build the instance without issue. This is useful when I don’t need the network running, like when I’m flying, or when I’m actively improving the instance.



resource "aws_eip" "public" {
  domain   = "vpc"
}

resource "aws_eip_association" "public" {
  network_interface_id   = aws_network_interface.public.id
  allocation_id = aws_eip.public.id
}



If I need to access any other AWS resource, I add a VPC Endpoint for that resource directly to the private VPC. For example, I use S3FS to mount S3 storage directly on the instance and DynamoDB for building JSONL files for machine learning tasks.



Use Cases



I create a Windows instance inside the subnet when I need to do remote work that involves downloading when I’m outdoors. I also create larger instances for working with AI/Machine Learning models when my Mac isn’t able to load them or when I don’t have storage at a given time.



Multi-Region Setup



To transfer the setup to another region, I simply change the region variable in my Terraform module, and it magically appears in the new region.



Exoscale Exporter for Prometheus
Zak — Wed, 04 Sep 2024 11:18:20 +0000

Visit Repository



I’d built a Prometheus exporter for Exoscale, allowing me to visualize cloud spending and resource usage from a central location alongside AWS and DigitalOcean.



The Exoscale exporter is built using Go and leverages the latest version of Exoscale’s Go API, egoscale v3 and includes basic integration tests and automatic package building for all major platforms and architectures.



Some of the metrics exported are;




Organization Information: Usage, Address, API Keys



Compute Resource Summary: Instances, Kubernetes, Node Pools



Storage Resource Summary: SOS Buckets & Usage, Block Volumes



Networking Resource Summary: Domain & Records, Load Balancers




By integrating organizational data from Exoscale into the Prometheus ecosystem, I can now configure alerts for spending or resource usage on either Exoscale specifically or for all platforms using AlertManager.



I can also identify where I may have left resources behind using Grafana, in the event I’m manually creating them or my IaC executions didn’t do a proper clean-up.



Metric Browser in Grafana; Showing some values exported from the Exporter


I decided to deploy the exporter to my Kubernetes cluster, scraping based on the default interval of 2 minutes. This is roughly a good balance between;




When a new billing amount gets updated (hourly)



How often infrastructure elements themselves gets updated (could be on a minutely-basis)



How much data gets consumed by the time-series




I chose Kubernetes cluster rather than a server-less solution or a dedicated VM so that I can optimize the costs of running the exporter by sharing resources, in addition to abstracting the cloud provider away from the application.







Building AMD64 QEMU Images remotely using Libvirt and Packer
Zak — Fri, 24 May 2024 22:05:58 +0000

Libvirt Packer Plugin



I need to build images based off AMD64 architecture while working from an ARM64 machine. While this is possible directly using the qemu-system-x86_64 binary, it tends to be extremely slow due to the overhead of converting for the ARM architecture.



Workbench




Ubuntu 22.04 LTS with libvirt installed



MacBook Pro M2 with the Packer build files




Configuring the Libvirt Plugin



Connecting to the libvirt host



When using the libvirt plugin, I need to provide a Libvirt URI.



source "libvirt" "image" {
    libvirt_uri = "qemu+ssh://${var.user}@${var.host}/session?keyfile=${var.keyfile}&no_verify=1"
}




qemu+ssh:// denotes that I’ll be using the QEMU / KVM Backend and connecting via SSH. The connection method denotes the rest of the arguments of the string



${var.user}@${var.host} is in the SSH syntax, this is the username and hostname of the machine that is running libvirt



/session is to isolate the running builds from those on the system level. /system would work just as well.



keyfile=${var.keyfile} is used to automatically authenticate to the remote machine without the need of a password. This is useful in the future when I automatically trigger the packer build from a Git repository



no_verify=1 is added so that I can throw the build at any machine and have it “just work”. This is usually guided against due to spoofing attacks.




Communicating with the libvirt guest



communicator {
    communicator                 = "ssh"
    ssh_username                 = var.username
    ssh_bastion_host             = var.host
    ssh_bastion_username         = var.user
    ssh_bastion_private_key_file = var.private_key
  }




The difference between ssh_* and ssh_bastion_* is that the first refers to the target virtual machine being built, and the latter refers to the “middle-man” machine.

I require this as I don’t plan to expose the VM to a network outside of the machine hosting it.



Since I won’t have access from my local workstation, I need to communicate with the virtual machine via the machine that is hosting it.



By adding ssh_bastion_* arguments, I’m telling packer that in-order to communicate with the VM, it needs to access the bastion machine first then execute all SSH commands through it.






Configuring the libvirt daemon



Libvirt Volume Upload Documentation



My Observations



I came across a “Permission Denied” error when attempting to upload an existing image (in my case, the KVM Ubuntu Server Image). This was due to AppArmor not being provided a trust rule upon creation of the domain. This error is first visible in the following form directly from Packer:



==> libvirt.example: DomainCreate.RPC: internal error: process exited while connecting to monitor: 2024-05-24T16:41:42.574660Z qemu-system-x86_64: -blockdev {"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null}: Could not open '/var/lib/libvirt/images/packer-cp8c6ap1ijp2kss08iv0-ua-artifact': Permission denied



At first, I assumed that there was an obvious permissions problem, and at first glace there was in-fact that. When looking at this file upon creation, it had root permissions where only the root user can read/write.



# ls -lah /var/lib/libvirt/images
-rw------- 1 root root  925M May 24 16:41 packer-cp8c6ap1ijp2kss08iv0-ua-artifact



This makes sense since libvirtd is running under the root user, which is the default configuration from the Ubuntu repository. I didn’t see any configuration option to manipulate what the permissions should be after an upload with libvirt either. This was an assumed problem since all QEMU instances are running under a non-root user, libvirt-qemu



# ps -aux | grep libvirtd
# ps -aux | grep qemu

root      145945  0.4  0.1 1778340 28760 ?       Ssl  16:43   0:10 /usr/sbin/libvirtd
libvirt+    3312  2.2 11.1 4473856 1817572 ?     Sl   May12 405:19 /usr/bin/qemu-system-x86_64



My second observation was that all images created directly within libvirt (e.g: with virt-manager) had what looked like “correct” permissions, those that matched the user that QEMU would eventually run under;



# ls -lah /var/lib/libvirt/images
-rw-r--r-- 1 libvirt-qemu kvm   11G May 24 17:11 haos_ova-11.1.qcow2



Since no-one else had reported this particular issue when using the libvirt plugin, I had gone down the route of PEBKAC.



Allowing packer-uploaded images as backing store



Thanks to this discussion on Stack Overflow, I found that AppArmor had been blocking the request to the specific file in question. 



# dmesg -w
[1081541.249157] audit: type=1400 audit(1716568577.970:119): apparmor="DENIED" operation="open" profile="libvirt-25106acc-cfd8-40f7-a7c6-f5c1c63bc16c" name="/var/lib/libvirt/images/packer-cp8c6ap1ijp2kss08iv0-ua-artifact" pid=43927 comm="qemu-system-x86" requested_mask="w" denied_mask="w" fsuid=64055 ouid=64055



Here, I can see that AppArmor is doing three things;




Denying an open request to the QEMU Image

apparmor="DENIED"



operation="open"





Denying writing to the QEMU Image

denied_mask="w"





Using a profile that is specific to the domain being launched

profile="libvirt-25106acc-cfd8-40f7-a7c6-f5c1c63bc16c"



This is achieved because libvirt will automatically push AppArmor rules upon creation of a domain. This also means that libvirt will be using some form of template file or specification to create rules.






This means that I need to find the template file that libvirt is using to design the rules, and allow for writing to packer-uploaded QEMU Images.



# /etc/apparmor.d/libvirt/TEMPLATE.qemu
# This profile is for the domain whose UUID matches this file.
# 

#include 

profile LIBVIRT_TEMPLATE flags=(attach_disconnected) {
  #include 
  /var/lib/libvirt/images/packer-** rwk,
}



As mentioned in the Stack Overflow post, simply adding /var/lib/libvirt/images/packer-** rwk, to the template file is enough to get past this issue.



End Result



By bringing everything together, I get a successful QCOW2 image visible in my default storage pool. I’m using the Ansible provisioner within the build block so that I can keep the execution steps separate from the Packer build script, and re-usable across different cloud providers.







Configuring Traefik for Cross-Namespace Ingress
Zak — Mon, 08 Apr 2024 00:20:11 +0000

When installing Traefik either with Kubernetes or K3s, detection of Ingress object types in other namespaces than what Traefik is running in will not be possible. Since Traefik typically runs under the kube-system namespace, this will be a problem as I don’t want any of my production deployments to be running in a namespace intended to hold essential elements to the Kubernetes cluster.



In my scenario, I inherited Traefik by installing K3s on my homelab and plan to deploy Traefik to a production cluster for my pipeline project in the future.



The Simple Fix



All that Traefik requires is the providers.kubernetesCRD.allowCrossNamepsace setting to be forced to true. This had been set to false by default in a previous version.



Inherited or not, Traefik can be deployed by using Helm charts (which is the case under K3s). Under Helm, a configuration override can be placed by using the HelmChartConfig object. Once deploying this object, the installed deployment will restart with configuration merged between the default and that defined within the new HelmChartConfig object.



apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    providers:
      kubernetesCRD:
        allowCrossNamespace: true



Or if it’s needed in a HCL / Terraform context ..



resource "kubernetes_manifest" "traefik" {
  manifest = {
    "apiVersion" = "helm.cattle.io/v1"
    "kind"       = "HelmChartConfig"
    "metadata" = {
      "name"      = "traefik"
      "namespace" = "kube-system"
    }
    "spec" = {
      "valuesContent" = file("${path.module}/traefik-config.yml")
    }
  }
}



# traefik-config.yml
providers:
  kubernetesCRD:
    allowCrossNamespace: true



Using AWS CodeBuild to execute Ansible playbooks
Zak — Sat, 06 Apr 2024 19:31:19 +0000

I wanted a clean and automate-able way to package third party software into *.deb format (and multiple others, if needed, in the future), and I had three ways to achieve that;




The simple way: Write a Bash script



The easy way: Write a Python script



My chosen method: Write an Ansible role




While all of the options can get me where I wanted, it felt a lot cleaner to go the Ansible route as I can clearly state (and see) what packages I am building either from the command line level or from a playbook level, rather than having to maintain a separate configuration file to drive what to build and where in an alternative format for either the Bash or Python approaches.



The playbook approach also allows me to monitor and execute a build on a remote machine, should I wish to build cross-platform or need larger resources for testing. 



In this scenario, I’ll be executing the Ansible role locally on the CodeBuild instance.



Configuring the CodeBuild Environment



Using GitHub as a source



I have one git repository per Ansible playbook, so by linking CodeBuild to the repository in question I’m able to (eventually) automatically trigger the execution of CodeBuild upon a pushed commit on the main branch.



The only additional setting under sources that I define is the Source version, as I don’t want build executions happening for all branches (as that can get costly).



CodeBuild Environment




For the first iteration of this setup, I am installing the (same) required packages at every launch. This is not the best way to handle pre-installation in terms of cost and build speed. In this instance, I’ve chosen to ignore this and “brute-force” my way through to get a proof-of-concept.





Provisioning Model: On-demand

I’m not pushing enough packages to require a dedicated fleet, so spinning up VMs in response to a pushed commit (~5 times a week) is good enough.





Environment Image: Managed Image

As stated above, I had my focus towards a proof-of-concept that running Ansible under CodeBuild was possible. A custom image with pre-installed packages is the way to go in the long run.





Compute: EC2

Since I’m targeting *.deb format, I choose Ubuntu as the operating system. The playbook I’m expecting to execute doesn’t require GPU resources either.



Amazon Lambda doesn’t support Ubuntu, nor is able to execute Ansible (directly). I’d have to write a wrapper in Python that will execute the Ansible Playbook which is more overhead.



Depending on the build time and size of the result package, I had to adjust the memory required accordingly. However, this may be because I’m making use of the /tmp directory by default.






buildspec.yml



I store the following file at the root level of the same Git repository that contains the Ansible playbook.



version: 0.2

phases:
  pre_build:
    commands:
      - apt install -y ansible python3-botocore python3-boto3
      - ansible-galaxy install -r requirements.yaml
      - ansible-galaxy collection install amazon.aws
  build:
    commands:
      - ansible-playbook build.yaml
artifacts:
  files:
    - /tmp/*.deb




As stated above, I’m always installing the required System packages prior to interacting with Ansible. This line (apt install) should be moved into a pre-built image that this CodeBuild environment will then source from.




I keep the role (and therefore, tasks) separate from the playbook itself, which is why I use ansible-galaxy to install the requirements. Each time the session is started, it pulls down a fresh copy of any requirements. This can differ from playbook to playbook.



I use the role for the execution steps, and the playbook (or inventory) to hold the settings that influence the execution, such as (in this scenario) what the package name is and how to package it.



I explicitly include the amazon.aws Ansible collection in this scenario as I’m using the S3 module to pull down sources (or builds of third party software) and to push build packages up to S3. I’m doing this via Ansible as opposed to storing it within Git due to its’ size, as well as opposed to CodeDeploy as I don’t plan on deploying the packages to infrastructure, rather, to a repository.



I did have some issues using the Artifacts option within CodeBuild also, which lead to pushing from Ansible.



Finally, the ansible-playbook can be executed once all the pre-requisites are needed. The only adaptation that’s needed on the playbook level, is that localhost is listed as a target. This ensures that the playbook will execute on the local machine.



---
- hosts: localhost



Once all the configuration and repository setup is done, the build executed successfully and I received my first Debian package via CodeBuild using Ansible.