AWS – Zak Abdel-Illah https://zai.dev Automation Enthusiast Tue, 10 Dec 2024 16:33:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://zai.dev/wp-content/uploads/2024/06/android-chrome-512x512-1-150x150.png AWS – Zak Abdel-Illah https://zai.dev 32 32 Deploying AWS Site-to-Site on OpenWRT https://zai.dev/2024/12/10/deploying-aws-site-to-site-configurations-to-openwrt/ Tue, 10 Dec 2024 16:33:06 +0000 https://zai.dev/?p=1090

I want to connect to resources on AWS from my home with the least operational overhead, leading me to deploy AWS Site-to-Site for connecting resources from my home to a VPC.

The Environment

Some resources I want to access are;

  • G4dn.xlarge EC2 instances used for streaming games
  • t2.micro EC2 instances hosting Home Assistant
  • RDS (PostgreSQL) instances for hosting OpenStreetMap data

Home Environment

When setting up a connection from AWS to my home, I have to consider the following specifications;

  • I live in West London, relatively close to the eu-west-2 data center
    • I have a VPC in eu-west-2 running on the 10.1.0.0/16 network
  • I use a publicly-offered ISP for accessing the internet
  • There are two hops (routers) between the public internet and my home network
    • The first hop is the router provided by the ISP to connect to the internet
      • This network lives on the 192.168.0.0/24 subnet
    • The second hop is my own off-the-shelf router from ASUS running OpenWRT
      • My home network lives on the 10.0.0.0/24 subnet
      • The router has 8MB of usable storage for packages and configuration

Setting up AWS Site-to-Site

AWS Site-to-Site is one of Amazon’s offerings for bridging an external network to a VPC over the public internet. Some alternatives are;

  • AWS Client VPN (based on OpenVPN)
    • More expensive
    • More complex, often tends to be slower without optimization
  • Self-managed VPN
    • Allows use of any VPN technology, such as Wireguard
    • Allows custom metric monitoring
    • Requires management of VPC topologies and firewalls
    • Can be more expensive

I chose to use the Site-to-Site in this occasion so I could learn about how IPSec works in more detail, and saw it as a challenge in deploying to OpenWRT. It’s also a lot cheaper than a firewall license, EC2 rental and public IP charges.

Deploying a Virtual Private Gateway

A Virtual Private Gateway is the AWS-side endpoint of an IPSec tunnel. It also hosts the configuration of the local BGP instance, and is what drives the propagation of routes between the IPSec tunnels and the VPC routing tables.

Dashboard view of the Virtual Private Gateway. I rely on an ASN generated by Amazon for this instance.
resource "aws_vpn_gateway" "main" {
  vpc_id = data.aws_vpc.main.id
}

There’s not much to configure with the VPG, so I left it with its’ defaults.

Deploying a Customer Gateway

A customer gateway represents the local end of the IPSec tunnel and the BGP daemon running on it. In my case, this is the OpenWRT router.

Dashboard view of the customer gateway, which represents my OpenWRT Router itself and the BGP daemon running on it. AWS by default provides an ASN of 65000, but I don’t have any need to customize it.
resource "aws_customer_gateway" "openwrt" {
  bgp_asn    = 65000
  ip_address = "<WAN Address of OpenWRT>"
  type       = "ipsec.1"
}

Deploying a Site-to-Site VPN Connection

The VPN Connection itself is what connects a VPG (AWS Endpoint) to a customer gateway (Local endpoint) in the form of an IPSec VPN connection.

Dashboard view of the Site-to-Site VPN. Everything is left as the default. For the purposes of building the automated workflow and testing connectivity, local and remote network CIDRs are 0.0.0.0/0
resource "aws_vpn_connection" "main" {
  customer_gateway_id = aws_customer_gateway.openwrt.id
  vpn_gateway_id      = aws_vpn_gateway.main.id
  type                = aws_customer_gateway.openwrt.type

  tunnel1_ike_versions = ["ikev1", "ikev2"]
  tunnel1_phase1_dh_group_numbers = [2, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
  tunnel1_phase1_encryption_algorithms = ["AES128", "AES128-GCM-16", "AES256", "AES256-GCM-16"]
  tunnel1_phase1_integrity_algorithms = ["SHA1", "SHA2-256", "SHA2-384", "SHA2-512"]
  tunnel1_phase2_dh_group_numbers = [2, 5, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
  tunnel1_phase2_encryption_algorithms = ["AES128", "AES128-GCM-16", "AES256", "AES256-GCM-16"]
  tunnel1_phase2_integrity_algorithms = ["SHA1", "SHA2-256", "SHA2-384", "SHA2-512"]

}

These (tunnel1_*) are the default values set by AWS and should be locked down. For the purpose of testing, I left them all to their defaults. This settings are directly tied to the IPSec encryption settings described below.

Connecting OpenWRT via IPSec

Ansible Role Variables

I’ve designed my Ansible role to be able to configure AWS IPSec tunnels with the bare minimum configuration. All information that the role requires is provided by Terraform upon provisioning of the AWS Site-to-Site configuration.

bgp_remote_as: "64512"
ipsec_tunnels:
  - ifid: "301"
    name: xfrm0
    inside_addr: <Inside IPv4>
    gateway: <Endpoint Tunnel 1>
    psk: <PSK Tunnel 1>
  - ifid: "302"
    name: xfrm1
    inside_addr: <Inside IPv4>
    gateway: <Endpoint Tunnel 2>
    psk: <PSK Tunnel 2>
  • bgp_remote_as refers to the ASN of the Virtual Private Gateway, and is strictly used by the BGP Daemon offered by Quagga.
    • BGP is used to propagate routes to-and-from AWS.
    • When a Site-to-Site VPN is configured to use Dynamic routing, it will state that the tunnel is Down if AWS cannot reach the BGP instance.
  • ipsec_tunnels is used by XFRM and strongSwan to;
    • Build one XFRM interface per-tunnel
    • Build one alias interface bound to each XFRM interface for static routing
    • Configure the static routing of the XFRM interfaces
    • Configure the BGP daemon neighbours
    • Configure one IPSec endpoint per-tunnel
    • Configure one IPSec tunnel for each XFRM interface

Required packages

I used three components for this workflow to function, and a last one for debugging security association errors.

  • strongswan-full
    • strongSwan provides an IPSec implementation for OpenWRT with full support for UCI. The -full variation of the package is overkill, but you never know!
  • quagga-bgpd
    • A BGP implementation light enough to run on OpenWRT. quagga comes in as a dependency
  • luci-proto-xfrm
    • A virtual interface for use by IPSec, where a tunnel requires a vif to bind to.
  • ip-full
    • Provides an xfrm argument for debugging IPSec connections with.
name: install required packages
opkg:
  name: "{{ item }}"
loop:
  - strongswan-full
  - quagga-bgpd
  - luci-proto-xfrm
  - ip-full

Adding the XFRM Interface

OpenWRT LuCI dashboard, showing the final result of the interfaces tab. I declare two XFRM interfaces, one per VPN tunnel provided by AWS, each with an IPv4 assigned that matches the Inside IPv4 CIDRs defined within the AWS Site-to-site configuration. The IPv4 address is applied to an alias of the adapter rather than the adapter itself as the XFRM interface doesn’t support static IP addressing via UCI.
Ansible Task
name: configure xfrm adapter
uci:
  command: section
  key: network
  type: interface
  name: "{{ item.name }}"
  value:
    tunlink: 'wan'
    mtu: '1300'
    proto: xfrm
    ifid: "{{ item.ifid }}"
loop: "{{ ipsec_tunnels }}"
/etc/config/network – UCI Configuration
config interface 'xfrm0'
	option ifid '301'
	option tunlink 'wan'
	option mtu '1300'
	option proto 'xfrm'

config interface 'xfrm1'
	option tunlink 'wan'
	option mtu '1300'
	option proto 'xfrm'
	option ifid '302'

I use the uci task to deploy adapter configurations. I create one interface per-tunnel provided by AWS.

  • tunlink sets the IPSec tunnel to connect to & from the wan interface
  • mtu is 1300 by default, I didn’t need to configure this value
  • ifid is defined as strongSwan will use this to bind an IPSec tunnel to a network interface. This is separate from the name of the interface.

AWS needs to communicate with the BGP instance running on OpenWRT. The value of Inside IPv4 CIDR instructs AWS which IPs to listen on for their BGP instance, and which IP to connect to for fetching routes. The CIDRs will be restricted to the /30 prefix, which provides the range of 4 IP addresses, 2 of which are usable.

As an example, here is the Inside IPv4 CIDR of 169.254.181.60/30 and what that means.

IP IndexIP AddressResponsibility
0169.254.181.60Network address
1169.254.181.61IP Address reserved for AWS-side of the IPSec tunnel
2169.254.181.62IP Address reserved for the OpenWRT-side of the IPSec tunnel
3169.254.181.63Broadcast address

With this known, we know that;

On the AWS side of the IPSec tunnelOn the OpenWRT side of the IPSec tunnel
AWS has a BGP instance listening on the IP Address on the first index ( 169.254.181.61 )OpenWRT needs to be configured to use the IP address on the second index (169.254.181.62)
AWS is expecting a BGP neighbour on the second index ( 169.254.181.62 )The BGP daemon running on OpenWRT needs to be configured to use the neighbor at the first index (169.254.181.61)
AWS knows how to route traffic across the 169.254.181.60 networkOpenWRT needs to know to route traffic on the 169.254.181.60 network.

Configuring the IP Address on the IPSec tunnel

I create an alias on top of the originating XFRM interface so that I can utilize the static protocol within UCI to configure static routing in a declarative way.

Ansible Task
name: create xfrm alias for static addressing
uci:
  command: section
  key: network
  type: interface
  name: "{{ item.name }}_s"
  value:
    proto: static
    ipaddr:
      - "{{ item.inside_addr | ipaddr('net') | ipaddr(2) }}"
    device: "@{{ item.name }}"
loop: "{{ ipsec_tunnels }}"
/etc/config/network – UCI Configuration
config interface 'xfrm0_s'
	option proto 'static'
	option device '@xfrm0'
	list ipaddr '169.254.211.46/30'

config interface 'xfrm1_s'
	option proto 'static'
	list ipaddr '169.254.181.62/30'
	option device '@xfrm1'

I use ipaddr('net') | ipaddr(2) to simplify my Ansible configuration. inside_addr is 169.254.181.60/30 and these functions simply increase the IP address by two, giving the result of 169.254.181.62/30.

This will ensure two things;

  • The xfrm interface persistently holds the 169.254.181.62/30 IP Address
  • The Linux routing table holds a route of 169.254.181.60/30 via the xfrm interface

This resolves the issue of OpenWRT knowing what IP Address to use and how to route the traffic.

Setting up IPSec

Because I’m using strongSwan, I can also use UCI to configure the IPSec tunnel. With this workflow, IPSec configuration is broken down into three elements;

  • Endpoint
    • Primarily what’s known as “IKE Phase 1”. This is the “How I will connect to the other end”.
  • Tunnel
    • Primarily known as “IKE Phase 2”. This is the “How do I pass traffic through to the other end”.
  • Encryption
    • A set of rules to describe how to handle the cryptography.

IPSec Encryption

What’s defined here drives whether Phase 1 will succeed, and must match the AWS VPN Encryption settings.

Ansible Task
name: define ipsec encryption
uci:
  command: section
  key: ipsec
  type: crypto_proposal
  name: "aws"
  value:
    is_esp: '1'
    dh_group: modp1024
    hash_algorithm: sha1
/etc/config/ipsec – UCI Configuration
config crypto_proposal 'aws'
	option is_esp '1'
	option dh_group 'modp1024'
	option encryption_algorithm 'aes128'
	option hash_algorithm 'sha1'

In my case, I’m;

  • Using AES128 for encryption of the traffic
  • Using SHA1 as the integrity algorithm for ensuring packets are correct upon arrival
  • Naming the crypto_proposal aws for use by the Endpoint and the Tunnel

AES128 and SHA1 are supported by the configuration defined on the VPN configuration above.

Declaring the IPSec Endpoint

Ansible Task
name: configure ipsec remote
uci:
  command: section
  key: ipsec
  type: remote
  name: "{{ item.name }}_ep"
  value:
    enabled: "1"
    gateway: "{{ item.gateway }}"
    local_gateway: "<Public IP>"
    local_ip: "10.0.0.1"
    crypto_proposal:
      - aws
    tunnel:
      - "{{ item.name }}"
    authentication_method: psk
    pre_shared_key: "{{ item.psk }}"
    fragmentation: yes
    keyingretries: '3'
    dpddelay: '30s'
    keyexchange: ikev2
loop: "{{ ipsec_tunnels }}"
/etc/config/ipsec – UCI Configuration
config remote 'xfrm0_ep'
	option enabled '1'
	option gateway '<Tunnel 1 IP>'
	option local_gateway '<Public IP>'
	option local_ip '10.0.0.1'
	list crypto_proposal 'ike2'
	list tunnel 'xfrm0'
	option authentication_method 'psk'
	option pre_shared_key '<PSK>'
	option fragmentation '1'
	option keyingretries '3'
	option dpddelay '30s'
	option keyexchange 'ikev2'

config remote 'xfrm1_ep'
	option enabled '1'
	option gateway '<Tunnel 2 IP>'
	option local_gateway '<Public IP>'
	option local_ip '10.0.0.1'
	list crypto_proposal 'ike2'
	list tunnel 'xfrm1'
	option authentication_method 'psk'
	option pre_shared_key '<PSK>'
	option fragmentation '1'
	option keyingretries '3'
	option dpddelay '30s'
	option keyexchange 'ikev2'
  • The gateway is known as the Outside IP Address on AWS
  • local_gateway points to the WAN Address of OpenWRT
  • local_ip points to the LAN address of OpenWRT
  • crypto_proposal points to aws (Defined above)
  • tunnel points to the name of the interface that this IPSec endpoint represents.
    • Since there are two IPSec endpoints, two of these remotes are created. I use the interface name (from xfrm) across all duplicates to make sure that it’s visibly clear what’s being used where.
  • pre_shared_key is the PSK that gets generated (or set) within the VPN Tunnel.
    • This is unique per-tunnel, meaning that there should be two different PSKs per Site-to-site VPN connection. They can be found under the Modify VPN Tunnel Options selection.

Configuring the IPSec Tunnel

The tunnel instructs strongSwan how to bind the IPSec tunnel to an interface. The key here is the ifid of the XFRM interfaces defined earlier.

Ansible Task
name: configure ipsec tunnel
uci:
  command: section
  key: ipsec
  type: tunnel
  name: "{{ item.name }}"
  value:
    startaction: start
    closeaction: start
    crypto_proposal: aws
    dpdaction: start
    if_id: "{{ item.ifid }}"
    local_ip: "10.0.0.1"
    local_subnet:
      - 0.0.0.0/0
    remote_subnet:
      - 0.0.0.0/0
loop: "{{ ipsec_tunnels }}"
/etc/config/ipsec – UCI Configuration
config tunnel 'xfrm0'
	option startaction 'start'
	option closeaction 'start'
	option crypto_proposal 'ike2'
	option dpdaction 'start'
	option if_id '301'
	option local_ip '10.0.0.1'
	list local_subnet '0.0.0.0/0'
	list remote_subnet '0.0.0.0/0'

config tunnel 'xfrm1'
	option startaction 'start'
	option closeaction 'start'
	option crypto_proposal 'ike2'
	option dpdaction 'start'
	option if_id '302'
	option local_ip '10.0.0.1'
	list local_subnet '0.0.0.0/0'
	list remote_subnet '0.0.0.0/0'
  • Like the AWS configuration, I define the local_subnet and remote_subnet to 0.0.0.0/0. This is so I can focus on testing connectivity.
  • if_id points to the XFRM interface that’s representing the tunnel in iteration.
    • The if_id must match the tunnel in iteration, as the Inside IPv4 CIDRs have been bound to an interface.

Configuring BGP on OpenWRT

In order to apply BGP routes on the AWS-side, route propagation must be enabled on a routing table level. Otherwise, a static route pointing to my home IP Address (10.0.0.0/24) via the Virtual Private Gateway must be declared.

I opted for Quagga when using BGP on OpenWRT.

router bgp 65000
bgp router-id {{ ipsec_inside_cidrs[0] | ipaddr('net') | ipaddr(2) | split('/') | first }}
{% for ipsec_inside_cidr in ipsec_inside_cidrs %}
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} remote-as {{ bgp_remote_as }}
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} soft-reconfiguration inbound
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} distribute-list localnet in
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} distribute-list all out
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} ebgp-multihop 2
{% endfor %}
/etc/quagga/bgpd.conf – Rendered Template
router bgp 65000
bgp router-id 169.254.211.46
neighbor 169.254.211.45 remote-as 64512
neighbor 169.254.211.45 soft-reconfiguration inbound
neighbor 169.254.211.45 distribute-list localnet in
neighbor 169.254.211.45 distribute-list all out
neighbor 169.254.211.45 ebgp-multihop 2
neighbor 169.254.181.61 remote-as 64512
neighbor 169.254.181.61 soft-reconfiguration inbound
neighbor 169.254.181.61 distribute-list localnet in
neighbor 169.254.181.61 distribute-list all out
neighbor 169.254.181.61 ebgp-multihop 2
  • Like earlier, I use ipaddr('net') | ipaddr(1) to increment the IP address from the CIDR
  • remote-as defines the AWS-side ASN.
    • BGP at its’ core defines routes based on path to AS, a layer on-top of IP Addresses.
    • It’s designed to work with direct connections, not over-the-internet.
      • ISPs & exchanges will, however, use BGP at their level to forward the traffic on.
  • router bgp states what the ASN of the OpenWRT router is. Because I used the default of 65000 from AWS, I place that here.
  • bgp router-id is set to the first XFRM interface’s IP address, since the same BGP instance will be shared by both tunnels in the event that one tunnel goes down. AWS does not do a validation check on the router-id.

Verifying the connection to IPSec

Using the swanctl command, I can identify whether my applied configuration is successful when logged into my OpenWRT router using SSH.

Start swanctl

I don’t use the legacy ipsec init script, instead, directly using the swanctl one. Under the hood, this will convert the UCI configuration into a strongSwan configuration located at /var/swanctl/swanctl.conf

/etc/init.d/swanctl start
ipsec statusall
Output of the ipsec statusall command, where both VPN tunnels are ESTABLISHED and INSTALLED. Established denotes that IKE Phase 1 (Encryption negotiation) was successful and Installed denotes that IKE Phase 2 (Authorization, the tunnel creation itself) was successful and is now in use.
Connection can also be verified from the AWS Console, by looking at the value of Details. If the connection doesn’t say IPSEC IS DOWN, the connection was successful. Status is only up when BGP can be reached from AWS. When using Dynamic (not static) routing in the configuration for Site-to-Site, AWS doesn’t declare a connection up unless BGP is reachable at the second address available in the Inside IPv4 CIDR.

Routing traffic to & from the XFRM Interface

I finally need to instruct OpenWRT to forward packets that are destined to xfrm0 or xfrm1 to be allowed. The fact that the Linux routing table will state that 10.1.0.0/24 is accessed via xfrm0, which is applied via BGP is enough to know that either xfrm0 or xfrm1 is the interface required.

By default, a flag of REJECT is defined. By applying the following firewall rule, packet successfully go through to the AWS VPC.

Ansible Task
name: install firewall zone
uci:
  command: section
  key: firewall
  type: zone
  find_by:
    name: 'tunnel'
  value:
    input: REJECT
    output: ACCEPT
    forward: REJECT
    network:
      - xfrm0
      - xfrm1
name: install firewall forwarding
uci:
  command: section
  key: firewall
  type: forwarding
  find_by:
    dest: 'tunnel'
  value:
    src: lan
/etc/config/firewall – UCI Configuration
config zone
	option name 'tunnel'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	list network 'xfrm0' 'xfrm1'

config forwarding
	option src 'lan'
	option dest 'tunnel'

Final tasks

The final steps of the Ansible playbook is to instruct the UCI framework to save the changes to the disk, and to reload the configuration of all services required.

name: commit changes
uci:
  command: commit
name: enable required services
service:
  name: "{{ item }}"
  enabled: yes
  state: reloaded
loop:
  - swanctl
  - quagga
  - network

I then invoke the Ansible playbook by using a local-exec provisioner on a null_resource within terraform, where the AWS Site-to-Site resource is a dependency. Along the lines of:

resource "null_resource" "cluster" {
  provisioner "local-exec" {
    command = <<EOT
  ansible-playbook \
    -I ${var.openwrt_address}, \
    -e 'aws_tunnel_ips=${aws_vpn_connection.main.tunnel1_address},${aws_vpn_connection.tunnel2_address}' 
    playbook.yaml \
    -e 'aws_psk=${aws_vpn_connection.main.tunnel1_preshared_key},${aws_vpn_connection.main.tunnel2_preshared_key}' 
    playbook.yaml
    EOT
  }
}

This is a shortened version of what I have, but by simply piping the Ansible playbook with the outputs of the AWS Site-to-Site Resource, my router is automatically configured correctly when I create a Site-to-Site resource.

With IPSec now deployed, I can communicate directly with my resources hosted on AWS as if it were local.

]]>
Streaming a bike riders’ journey with AWS MediaLive https://zai.dev/2024/11/25/streaming-a-bike-riders-journey-with-aws-medialive/ Mon, 25 Nov 2024 14:11:11 +0000 https://zai.dev/?p=995 I want to showcase my bike journeys with live content from my GoPro and an overlay of my rough location on the stream. I’d like to use multiple GoPros mounted in different locations on my bike, with a ‘remote’ to switch between the main on-stream view with minimal effort.

Practical Considerations

When designing an approach, I had to consider:

  • Streaming over a mobile connection meant that there could be dead zones
    • I’d need to ensure that the stream runs consistently in the event of any outage, such as providing a “waiting for connection” loop footage
  • The battery life of all equipment could mean I need to carry a lot of backup power
  • Risk of theft and too much weight if I carried too much equipment.
  • Determining whether 4G was enough or if I required 5G.
    • Taking a look at what the bitrate of the GoPro in streaming mode is
  • Having a portable network that supports IPSec or OpenVPN to work with AWS-offered VPNs.
    • By adopting cloud computing, I don’t need to carry a computer with me or need to maintain a machine’s uptime for processing the video streams

AWS Components

  • Site-to-site or client VPN from the network that the GoPro is connected to into a VPC in AWS.
    • The VPC is located in the closest region to me to reduce latency.
    • As RTMP is unsecured, I need an alternative method to protect the stream in transit
  • Stream the GoPro footage into Elemental MediaLive RTMP Inputs
  • Send GPS updates from my mobile phone to API Gateway
    • AWS Batch or EC2 will generate overlay images
  • API Gateway and Lambda expose functions for my “remote” to control the stream.

The Technical Journey I’ll Undergo

Choosing the right equipment for the job

To start, I’ll be looking at the bigger picture and choosing the right physical equipment that covers my needs of a long lasting battery life, portable networking while having the least weight on the bike.

Exploring Configuration for IPSec & OpenVPN

I will explore the configuration of IPSec and OpenVPN for getting my GoPros into the AWS network.

Exploring MediaLive and the Elemental Suite

I’ll be investigating what MediaLive and the Elemental suite has to offer when it comes to streaming for my use-case and designing the optimal streaming pipeline for my rides.

Developing APIs for MediaLive and Exploring API Gateway

I will develop APIs for MediaLive and explore API Gateway for controlling the stream on the road. I’ll have to define the API specification and decide on the appropriate language to write the Lambda functions in.

Exploring AWS Batch or EC2 for On-Demand Element Rendering

To render overlay elements on the fly, I will explore AWS Batch or EC2. As the elemental suite doesn’t cover this specific case of on-screen graphics (to the customization level I want it), I’d need to explore the best tool for the job that scales well.

Determining Which Codecs to Adopt and Where to Stream To

Investigation into the best way to distribute my stream, in addition to conforming my MediaLive channel pipeline into the target

Storing All Configuration as Code

To ensure that the setup is reproducible and easy to maintain, I will store all the configuration as code using Terraform and Ansible.

I’ll also be travelling on a creative journey during the process, such as designing the graphics, choosing the best shots for the stream and improving the experience for the user.

]]>
Deploying Region-locked AWS Organizations using Terraform https://zai.dev/2024/10/07/deploying-region-locked-aws-organizations-using-terraform/ Mon, 07 Oct 2024 12:30:03 +0000 https://zai.dev/?p=960 Read more]]> As a solutions architect, I was tasked with building an AWS Organizations hierarchy for a Canadian startup that needed to comply with local laws and enable multi-site configurations for networking.

To get started, I built an AWS Organizations hierarchy using Terraform. I chose Terraform because it allows me to use the same workflow for building organizations across multiple clouds. This post will focus on building an Organizational Unit (OU) tree for regions and localities.

To create OUs, I have a “basic” Terraform module that is a wrapper on the aws_organizations_organizational_unit resource. To make it reusable, I expose the name and parent. I then specialize the “basic” Terraform module into ones more specific to each organization by injecting tags and appending a postfix to the name of the OU, such as the region or locality.

For compliance, I restrict at the OU-level which zones can be used by the AWS account and any IAM users assuming the role of this AWS account. I use a Service Control Policy (SCP) to deny access to all regions except for those specified in the local.regions value. Because a lot of core infrastructure for AWS is located within us-east-1 and us-east-2, such as Billing, I need to always include it in the local.regions value.

Since I need to cater for both compliance and multi-site, I used my modules to build the OUs in the following hierarchy:

  • Root Organization
    • Region OU (e.g: North America)
      • Country OU (e.g: Canada)
        • Locality OU (e.g: Vancouver)

And with Terraform modules structured in the following way:

  • Root Organization
    • Client Module
      • Client Root Organizational Unit
      • Region / Country / Locality Module
        • Base OU Module
          • Region / Country / Locality Organizational Unit
        • Region Policies
          • SCP Policy

In the case of Vancouver, while the Seattle local zone or us-west-2 region is closer, it’s not located within Canada which may be a problem when looking at local labor laws and compliance, so Calgary (ca-west-1) is the next best thing. I’m waiting for the Vancouver local zone to become publicly available so that I can use that, but it will fall under Calgary anyway.

This means that my SCPs restrict the organizational units to the following regions:

  • North American OU
    • us-west-1, us-west-2, us-east-1, us-east-2, ca-central-1, ca-west-1
  • Canadian OU
    • us-east-1, us-east-2, ca-central-1, ca-west-1
  • Vancouver OU
    • us-east-1, us-east-2, ca-west-1

Because of the hierarchy approach, I can have AWS accounts in parents with shared resources such as VPCs, Databases, S3 and EFS shares. This will be hugely beneficial when working multiple sites.

My re-usable modules follow the following structure:

Core Organizational Unit

This holds the default values for all OUs within the organization, where tags for example would be shared.

resource "aws_organizations_organizational_unit" "root" {
  name      = var.name
  parent_id = var.parent

  tags = var.tags
}

Inheriting the Basic OU into Locality, Country & Region OUs

I re-use the basic module to make it follow a strict naming and tag convention based on the context (e.g: locality, country and region). This module is for the context and not specifically the region in question. The region in question will then re-use this module.

This makes sure that the NA Region and European Region have the same fundamentals between them.

module "basic" {
  source = "../basic"
  parent = var.parent
  name = "${var.name} - ${var.locality}"
  tags = local.tags
}
module "policies" {
  source = "../../../regions/policies"
  policy_name = local.policy_name
  target_id = module.basic.id
  regions = var.regions
}

Inheriting the Context OU Module into literal regions

Here, I take the region context module and adapt it specific to North America. The same logic applies to country and locality. This simply enforces that the tags and name of the OU contain the region and that the SCPs generated block all regions except the regions provided

module "region" {
  source = "../../templates/organization/region"
  region = "North America"
  parent = var.parent
  name = var.name
  tags = local.tags
  regions = [
    "us-east-1",
    "us-east-2",
    "us-west-1",
    "us-west-2",
    "ca-central-1",
    "ca-west-1"
    ]
}

Generating the SCPs from Terraform

policy_name here is the same as the name of an OU with spaces removed. Since SCPs require Deny rules, using the StringNotEquals test is needed.

data "aws_iam_policy_document" "region_restriction" {
  statement {
    sid = "RestrictRegionFor${var.policy_name}"
    effect    = "Deny"
    actions   = ["*"]
    resources = ["*"]

    condition {
      test = "StringNotEquals"
      variable = "aws:RequestedRegion"
      values = local.regions
    }
  }
}
resource "aws_organizations_policy" "region_restriction" {
  name    = "RestrictRegionFor${var.policy_name}"
  content = data.aws_iam_policy_document.region_restriction.json
  type = "SERVICE_CONTROL_POLICY"
}

Declaring a Regional OU for an Organization

Finally, I can use the North American OU to declare an OU that restricts any AWS Accounts inside to only create resources within North America.

module "region-na" {
  source = "../regions/north-america"
  parent = aws_organizations_organizational_unit.root.id
  name = var.name
  tags = local
}

I can do the same with Canada, and Vancouver.

module "region-ca" {
  source = "../regions/north-america/canada"
  parent = module.region-na.id
  name = var.name
  tags = module.region-na.tags
}

module "region-yvr" {
  source = "../regions/north-america/canada/vancouver"
  parent = module.region-ca.id
  name = var.name
  tags = module.region-ca.tags
}

By the end of the deployment, my hierarchy looks like the following:

And the attached SCP policies look like the following, where the SCP that is the direct parent of an AWS Account takes the most precedence:

ZAI – North America

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictRegionForZAINorthAmerica",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "us-east-1",
            "us-east-2",
            "us-west-1",
            "us-west-2",
            "ca-central-1",
            "ca-west-1",
            "us-east-1",
            "us-east-2"
          ]
        }
      }
    }
  ]
}

ZAI – Canada

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictRegionForZAICanada",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "ca-central-1",
            "ca-west-1",
            "us-east-1",
            "us-east-2"
          ]
        }
      }
    }
  ]
}

ZAI – Vancouver

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RestrictRegionForZAIVancouver",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": [
"ca-west-1",
"us-east-1",
"us-east-2"
]
}
}
}
]
}

Here it is in action, when exploring a region that is blocked by the SCP Policy:

]]>
Idea: Adopting Serverless for Trading Operations https://zai.dev/2024/10/05/%f0%9f%92%a1-adopting-serverless-for-trading-operations/ Sat, 05 Oct 2024 21:20:54 +0000 https://zai.dev/?p=943 Read more]]> I’m not very into day-trading, but I see the potential in the market from time to time, so I came up with this idea to create an automated trading system using AWS services exclusively.

The system will use Lambda, Timestream, EventBridge, S3, SQS and SageMaker to create a serverless architecture for monitoring and trading on the stock market, using the Twelvedata and Coinbase APIs for pulling in market data and executing trades, respectively.

To start, I will use EventBridge as an alternative to cron-jobs to add symbols to an SQS queue for ingestion. This ties in with the use of serverless architecture. For FOREX, the schedule will run every hour, and for Crypto and stock market, it will run every 15 minutes. This is a good balance as I’m not a professional trader and don’t need to use too many API calls.

I will have five Lambda functions:

  1. The first Lambda function will listen to the SQS queue and query Twelvedata for the mentioned symbols. It will then insert the data directly into Timestream.
  2. The second Lambda function will be triggered by an alert from Timestream when new data is available. For safety (and to start with), I have configured this alert to trigger hourly. The function will throw the data at the SageMaker model. If the model predicts a positive yield, the Lambda function will pass the symbol to the third lambda function via another SQS Queue.
  3. The third Lambda function will execute a transaction on Coinbase.
  4. The forth Lambda function will monitor Twelvedata and Coinbase for hot & trending symbols and add them to the monitoring queue, triggered by another EventBridge Schedule.
  5. The fifth Lambda function will create a *.csv dataset from the data within Timescale.

I will use Secrets Manager to securely store the API keys for Twelvedata and Coinbase.

I’m not an AI expert and don’t know much about the specifics of training a model, so I’ll be using the SageMaker canvas feature to train the model. The canvas feature is the easiest way into training AI Models that doesn’t require making a Python script.

Finally, at the end of each day, I’ll extract a dataset from the Timestream database into a *.csv and store it in S3, then pass this file onto SageMaker for training. I’ll use one last EventBridge schedule to trigger this workflow.

Hopefully by following this approach, I’ll have a fully functioning market monitoring and trading system.

]]>
Outline: Extending a home network setup in AWS https://zai.dev/2024/09/22/outline-extending-a-home-network-setup-in-aws/ Sun, 22 Sep 2024 12:30:18 +0000 https://zai.dev/?p=929 Read more]]> As someone without a permanent base, I needed a secure and flexible cloud infrastructure that allowed me to spawn powerful machines when needed. To achieve this, I built an isolated network on AWS.

I began by creating a Terraform module that provisions the infrastructure needed, such as

  • VPCs
  • Subnets
  • Routing tables
  • EC2 instances
    While the module is tailored to AWS, I plan to keep the variable names consistent to other modules that re-create the setup for different cloud platforms, such as Exoscale.

The isolated network is centered around an EC2 instance, which acts as a router between a public VPC and a private VPC, similar to an at-home router. The EC2 instance has two ENI adapters, one attached to the public VPC and the other attached to the private VPC. The EC2 instance is running VyOS, which I configured using Ansible and the local-exec provisioner in Terraform upon creation.

data "aws_ami" "vyos" {
  most_recent = true

  filter {
    name   = "name"
    values = ["VyOS 1.4.0-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["679593333241"]
}
resource "aws_instance" "vyos" {
  ami           = data.aws_ami.vyos.id
  availability_zone = data.aws_availability_zones.region.names[0]
  instance_type = "t3.small"

  network_interface {
    network_interface_id = aws_network_interface.public.id
    device_index         = 0
  }

  network_interface {
    network_interface_id = aws_network_interface.local.id
    device_index         = 1
  }

  provisioner "local-exec" {
        command = "ansible-playbook -i \"${aws_eip.public.public_ip},\" <path_to_playbook>"
    }
}

The public VPC has an internet gateway attached to it, and all instances in the public VPC have internet access. The router instance is the only instance that resides in the public VPC. Both VPCs have a subnet within a single availability zone (AZ), as a single EC2 instance cannot span two AZs.

resource "aws_internet_gateway" "gw" {
}

resource "aws_internet_gateway_attachment" "gw" {
  internet_gateway_id = aws_internet_gateway.gw.id
  vpc_id              = aws_vpc.public.id
}

Each VPC has a routing table to correctly route traffic. The public VPC routes all traffic towards the internet gateway, while the private VPC routes all traffic within the subnet to each other.

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.public.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.gw.id
  }
}

resource "aws_route_table" "internal" {
  vpc_id = aws_vpc.internal.id
}

resource "aws_route_table_association" "internal" {
  subnet_id      = aws_subnet.internal.id
  route_table_id = aws_route_table.internal.id
}

resource "aws_route_table_association" "public" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}

I connect to my isolated network primarily through my OpenWRT-based router using WireGuard. I also use the WireGuard client on my Mac or phone to connect to the cluster when I’m outside. Keep an eye out for my posts detailing how I deploy VyOS on AWS and configure OpenWRT to connect to WireGuard.

I attached an Elastic IP to the router instance, which lets me destroy and re-build the instance without issue. This is useful when I don’t need the network running, like when I’m flying, or when I’m actively improving the instance.

resource "aws_eip" "public" {
  domain   = "vpc"
}

resource "aws_eip_association" "public" {
  network_interface_id   = aws_network_interface.public.id
  allocation_id = aws_eip.public.id
}

If I need to access any other AWS resource, I add a VPC Endpoint for that resource directly to the private VPC. For example, I use S3FS to mount S3 storage directly on the instance and DynamoDB for building JSONL files for machine learning tasks.

Use Cases

I create a Windows instance inside the subnet when I need to do remote work that involves downloading when I’m outdoors. I also create larger instances for working with AI/Machine Learning models when my Mac isn’t able to load them or when I don’t have storage at a given time.

Multi-Region Setup

To transfer the setup to another region, I simply change the region variable in my Terraform module, and it magically appears in the new region.

]]>
Overview: Extending my home network to the cloud https://zai.dev/2024/09/11/overview-extending-my-home-network-to-the-cloud/ Wed, 11 Sep 2024 22:02:52 +0000 https://zai.dev/?p=915 As a frequent traveller, I found it impractical to maintain a physical system infrastructure, so I relocated my home infrastructure to the cloud.

Establishing a VPN Connection

To begin, I set up a VPN connection from my OpenWRT router to the cloud provider using WireGuard. I created two VPCs in the cloud provider – one public and one private – to mimic the “WAN-LAN” scenario of at-home routers.

This setup provides isolation similar to a home network, where the resources on the private network can only be access by other resources on the same network, but they are also able to communicate with the outside world.

The intention is to have the private network as an extension to my “home” (at any given time).

Deploying a Cloud Router

I deployed a virtual machine that will act as a router spanning both networks. This needs to be across both networks as I need an endpoint to connect to (which requires an internet-exposed network) while still being able to access private resources.

I chose VyOS as the cloud router’s operating system because it is configuration-driven, allowing for an Infrastructure-as-Code (IaC) approach for easy re-deployment on any cloud provider.

Utilizing Object Storage for Plex Media Server

I adopted object storage to take advantage of the “unlimited” data offered by the cloud provider, and configured s3fs to mount the object storage on a specific node. With this, Plex can access data directly from the object storage bucket without many configuration changes or plugins to Plex.

The VPN connection allows me to access the Plex server securely as if it were local on both my PS5 and laptop. This setup ensures that the Plex interface remains non-accessible to the public and bypasses the bandwidth limit when proxying via the official Plex servers.

Securely Pushing Metrics from In-House Devices

By using the VPN connection, I can push metrics from my in-house devices directly, such as weather sensors without exposing my Prometheus instance to the public internet.

The VPN’s security layer wraps around all traffic, eliminating the need for implementing a CA chain for Prometheus when using platforms such as AWS IoT or Grafana Cloud (where devices are expected to communicate with a public HTTPS endpoint)

Automating At-Home Devices with HomeAssistant

I use HomeAssistant within the cloud provider to automate my at-home devices without worrying about downtime or maintaining a device inside my home. HomeAssistant is scriptable, easily re-deployable, and can bridge a wide range of IoT devices under a single platform, such as HomeKit and Hue.

I can now utilize my old infrastructure without worrying about maintaining hardware, and plan to deploy many services to the private cloud. Keep an eye out for a deeper breakdown on how I deployed and configured each element of my private cloud

]]>
Using AWS CodeBuild to execute Ansible playbooks https://zai.dev/2024/04/06/using-aws-codebuild-to-execute-ansible-playbooks/ Sat, 06 Apr 2024 19:31:19 +0000 https://zai.dev/?p=600 Read more]]> I wanted a clean and automate-able way to package third party software into *.deb format (and multiple others, if needed, in the future), and I had three ways to achieve that;

  • The simple way: Write a Bash script
  • The easy way: Write a Python script
  • My chosen method: Write an Ansible role

While all of the options can get me where I wanted, it felt a lot cleaner to go the Ansible route as I can clearly state (and see) what packages I am building either from the command line level or from a playbook level, rather than having to maintain a separate configuration file to drive what to build and where in an alternative format for either the Bash or Python approaches.

The playbook approach also allows me to monitor and execute a build on a remote machine, should I wish to build cross-platform or need larger resources for testing.

In this scenario, I’ll be executing the Ansible role locally on the CodeBuild instance.

Configuring the CodeBuild Environment

Using GitHub as a source

I have one git repository per Ansible playbook, so by linking CodeBuild to the repository in question I’m able to (eventually) automatically trigger the execution of CodeBuild upon a pushed commit on the main branch.

The only additional setting under sources that I define is the Source version, as I don’t want build executions happening for all branches (as that can get costly).

CodeBuild Environment

For the first iteration of this setup, I am installing the (same) required packages at every launch. This is not the best way to handle pre-installation in terms of cost and build speed. In this instance, I’ve chosen to ignore this and “brute-force” my way through to get a proof-of-concept.

  • Provisioning Model: On-demand
    • I’m not pushing enough packages to require a dedicated fleet, so spinning up VMs in response to a pushed commit (~5 times a week) is good enough.
  • Environment Image: Managed Image
    • As stated above, I had my focus towards a proof-of-concept that running Ansible under CodeBuild was possible. A custom image with pre-installed packages is the way to go in the long run.
  • Compute: EC2
    • Since I’m targeting *.deb format, I choose Ubuntu as the operating system. The playbook I’m expecting to execute doesn’t require GPU resources either.
    • Amazon Lambda doesn’t support Ubuntu, nor is able to execute Ansible (directly). I’d have to write a wrapper in Python that will execute the Ansible Playbook which is more overhead.
    • Depending on the build time and size of the result package, I had to adjust the memory required accordingly. However, this may be because I’m making use of the /tmp directory by default.

buildspec.yml

I store the following file at the root level of the same Git repository that contains the Ansible playbook.

version: 0.2

phases:
  pre_build:
    commands:
      - apt install -y ansible python3-botocore python3-boto3
      - ansible-galaxy install -r requirements.yaml
      - ansible-galaxy collection install amazon.aws
  build:
    commands:
      - ansible-playbook build.yaml
artifacts:
  files:
    - /tmp/*.deb

As stated above, I’m always installing the required System packages prior to interacting with Ansible. This line (apt install) should be moved into a pre-built image that this CodeBuild environment will then source from.

I keep the role (and therefore, tasks) separate from the playbook itself, which is why I use ansible-galaxy to install the requirements. Each time the session is started, it pulls down a fresh copy of any requirements. This can differ from playbook to playbook.

I use the role for the execution steps, and the playbook (or inventory) to hold the settings that influence the execution, such as (in this scenario) what the package name is and how to package it.

I explicitly include the amazon.aws Ansible collection in this scenario as I’m using the S3 module to pull down sources (or builds of third party software) and to push build packages up to S3. I’m doing this via Ansible as opposed to storing it within Git due to its’ size, as well as opposed to CodeDeploy as I don’t plan on deploying the packages to infrastructure, rather, to a repository.

I did have some issues using the Artifacts option within CodeBuild also, which lead to pushing from Ansible.

Finally, the ansible-playbook can be executed once all the pre-requisites are needed. The only adaptation that’s needed on the playbook level, is that localhost is listed as a target. This ensures that the playbook will execute on the local machine.

---
- hosts: localhost

Once all the configuration and repository setup is done, the build executed successfully and I received my first Debian package via CodeBuild using Ansible.

]]>
Building a PKI using Terraform https://zai.dev/2024/02/24/building-a-pki-using-terraform/ Sat, 24 Feb 2024 21:09:08 +0000 https://zai.dev/?p=568 Read more]]>

As part of building a hybrid infrastructure, I explored different technologies for achieving a stable VPN connection from on-premises to the AWS Infrastructure and found AWS’ Client-to-Site feature nested within AWS VPC. I explored this prior to AWS Site-to-Site VPN as I didn’t have the right setup for handling IPSec/L2TP tunnels at the time, and had OpenVPN already handy from my MacBook.

Since I would be using OpenVPN (As that’s what AWS Client VPN uses), I require TLS certificates as a method of authentication and encryption. While AWS provides certificate management features, it does have a cost, making it less suitable for my testing requirements.

I’ve opted to use Terraform to create a custom PKI solution locally, and to prepare for the re-use in larger infrastructure projects.

Working Environment

  • Machine
    • MacBook Pro M2
  • Technologies
    • Terraform

Terraform Module Breakdown

terraform {
  required_providers {
    tls = {
      source = "hashicorp/tls"
    }
    pkcs12 = {
      source = "chilicat/pkcs12"
    }
  }
}

I’m making use of the following modules within my Terraform project

  • hashicorp/tls
    • For generating the private keys, certificate requests and certificates themselves
  • chilicat/pkcs12
    • For combining the private key & certificate together, a requirement for using OpenVPN client without embedding the data inside the *.ovpn configuration file (which didn’t come out-of-the-box from AWS)
/**
 * Private key for use by the self-signed certificate, used for
 * future generation of child certificates. As long as the state
 * remains unchanged, the private key and certificate should not
 * re-update at every re-run unless any variable is changed.
 */
resource "tls_private_key" "pem_ca" {
  algorithm = var.algorithm
}

I’ve made the algorithm of the certificates controllable from a global variable due to customer requirements possibly needing to adopt a different level of encryption. This resource returns a PEM-formatted key.

/**
 * Generation of the CA Certificate, which is in turn used by
 * the client.tf and server.tf submodules to generate child
 * certificates
 */
resource "tls_self_signed_cert" "ca" {
  private_key_pem = tls_private_key.pem_ca.private_key_pem
  is_ca_certificate = true

  subject {
    country             = var.ca_country
    province            = var.ca_province
    locality            = var.ca_locality
    common_name         = var.ca_cn
    organization        = var.ca_org
    organizational_unit = var.ca_org_name
  }

  validity_period_hours = var.ca_validity

  allowed_uses = [
    "digital_signature",
    "cert_signing",
    "crl_signing",
  ]
}

I then used the tls_self_signed_cert resource to generate the CA certificate itself, providing the private key generated prior into the private_key_pem attribute. Again, by providing global variables for the ca subject and validity, I’m able to re-run the same terraform module for multiple clients under different workspaces (or by referencing this into larger modules).

The subject fields I had decided to expose are a way to describe exactly what and where the TLS certificate belongs to without needing to dive back into the module.

By adding cert_signing and crl_signing to the allowed_uses list, it adds permissions to the certificate for signing child certificates. This is essential as I would still need to generate the certificates for the OpenVPN server and the client.

This resource returns a PEM-formatted certificate.

/**
 * Return the certificate itself. It's the responsibility of
 * the user of this module to determine whether the certificate should
 * be stored locally, transferred or submitted directly to a cloud
 * service
 */
output "ca_certificate" {
  value = tls_self_signed_cert.ca.cert_pem
  sensitive = true
  description = "generated ca certificate"
}

Finally, I return the CA Certificate and its’ key from the module for the user to place it where it needs to be, for example;

To a local file

resource "local_file" "ca_key" {
  content_base64 = module.pki.ca_private_key
  filename = "${path.module}/certs/ca.key"
}
resource "local_file" "ca" {
  content_base64 = module.pki.ca_certificate
  filename = "${path.module}/certs/ca.crt"
}

To the AWS Certificate Manager

resource "aws_acm_certificate" "ca" {
  private_key = module.pki.ca_private_key
  certificate_body = module.pki.ca_certificate
}

Server & Client Certificates

resource "tls_cert_request" "csr" {
  for_each = var.clients # or var.servers
  private_key_pem = tls_private_key.pem_clients[each.key].private_key_pem
    # or pem_servers[each.key]
  dns_names = [each.key]

  subject {
    country = try(each.value.country, try(var.default_client_subject.country, var.default_subject.country))
    province = try(each.value.province, try(var.default_client_subject.province, var.default_subject.province))
    locality = try(each.value.locality, try(var.default_client_subject.locality, var.default_subject.locality))
    common_name = try(each.value.cn, try(var.default_client_subject.cn, var.default_subject.cn))
    organization = try(each.value.org, try(var.default_client_subject.org, var.default_subject.org))
    organizational_unit = try(each.value.ou, try(var.default_client_subject.ou, var.default_subject.ou))
  }
}

Regardless of whether generating a server or client TLS certificate, both need to go through the ‘certificate request’ process, which is to;

  1. Generate a private key for the server or client
  2. Generate a certificate signing request based on the private key
  3. Using the CSR to get a CA-signed certificate

In this example, I made use of the try block to achieve a value priority in the following order;

  1. Resource-level
    • Do I have a value specific to the server or client?
  2. Class-level
    • Do I have a value specific to the target type?
  3. Module-level
    • Do I have a global default?

And each refers to a key / value pair that is identical for clients as it is servers, where the key is the machine name and the value is the subject data. Here is a sample of the *.tfvars.json file that drives this behaviour.

{
  "clients": {
    "mbp": {
      "country": "GB",
      "locality": "GB",
      "org": "ZAI",
      "org_name": "ZAI",
      "province": "GB"
    }
  }
}

In an ideal (and secure) scenario, the private keys should never be transmitted over the wire, instead, you generate a CSR and transmit that. Since this is aimed for test environments, security is not a concern for me. Should I want to do the generation securely, I’ve exposed the following variable as a way to override the CSR generation.

variable "client_csrs" {
  type = map
  description = "csrs to use instead of generating them within this module"
  default = {}
}

Getting the signed certificate

resource "tls_locally_signed_cert" "client" {
  for_each = var.clients
  cert_request_pem = tls_cert_request.csr_client[each.key].cert_request_pem
  ca_private_key_pem = tls_private_key.pem_ca.private_key_pem
  ca_cert_pem = tls_self_signed_cert.ca.cert_pem

  validity_period_hours = var.client_certificate_validity

  allowed_uses = [
    "digital_signature",
    "key_encipherment",
    "server_auth", # for server-side
    "client_auth", # for client-side
  ]
}

Once the *.csr is generated (or provided), I’m able to use the tls_locally_signed_cert resource type to connect that data with the CA Certificate for signing against the private key of the CA Certificate. The cert_request_pem, ca_private_key_pem and ca_cert_pem inputs allow me to do so using the raw PEM format, without needing to save to disk before passing the data in.

Relying on the data within the terraform state file allows me to also rule out any “external influence” when troubleshooting, as there will be only a single source of truth.

Adding either server_auth or client_auth (depending on use-case) to allowed_uses permits the use of the signed certificate for authentication, as required by OpenVPN.

Converting from *.PEM to PCKS12

resource "pkcs12_from_pem" "client" {
  for_each = var.clients
  ca_pem          = tls_self_signed_cert.ca.cert_pem
  cert_pem        = tls_locally_signed_cert.client[each.key].cert_pem
  private_key_pem = tls_private_key.pem_client[each.key].private_key_pem
  password = "123" # Testing purposes
  encoding = "legacyRC2"
}

Using the pkcs12_from_pem resource type from chilicat makes this process simple, as long as I have access to the private key in addition to the certificate and ca.

For compatibility with the OpenVPN Connect application, I needed to enforce the encoding of legacyRC2, rather than the modern encryption that’s offered by easy-rsa.

Returning the certificates

output "client_certificates" {
  value = [ for cert in tls_locally_signed_cert.client : cert.cert_pem ]
  description = "generated client certificates in ordered list form"
  sensitive = true
}

Finally, I return the generated certificates and their *.p12 equivalent from the module. I mark this data as sensitive due to the inclusion of private keys.

For the value, I needed to iterate over a list of resources (as I had used the foreach input earlier to handle a key/value pair) and re-build a single list with the result.

As mentioned above, it is then the responsibility of the user to determine what to do with the generated certificates, be it storing them locally or pushing them to AWS.

]]>