DevOps – Zak Abdel-Illah https://zai.dev Automation Enthusiast Tue, 10 Dec 2024 16:33:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://zai.dev/wp-content/uploads/2024/06/android-chrome-512x512-1-150x150.png DevOps – Zak Abdel-Illah https://zai.dev 32 32 Deploying AWS Site-to-Site on OpenWRT https://zai.dev/2024/12/10/deploying-aws-site-to-site-configurations-to-openwrt/ Tue, 10 Dec 2024 16:33:06 +0000 https://zai.dev/?p=1090

I want to connect to resources on AWS from my home with the least operational overhead, leading me to deploy AWS Site-to-Site for connecting resources from my home to a VPC.

The Environment

Some resources I want to access are;

  • G4dn.xlarge EC2 instances used for streaming games
  • t2.micro EC2 instances hosting Home Assistant
  • RDS (PostgreSQL) instances for hosting OpenStreetMap data

Home Environment

When setting up a connection from AWS to my home, I have to consider the following specifications;

  • I live in West London, relatively close to the eu-west-2 data center
    • I have a VPC in eu-west-2 running on the 10.1.0.0/16 network
  • I use a publicly-offered ISP for accessing the internet
  • There are two hops (routers) between the public internet and my home network
    • The first hop is the router provided by the ISP to connect to the internet
      • This network lives on the 192.168.0.0/24 subnet
    • The second hop is my own off-the-shelf router from ASUS running OpenWRT
      • My home network lives on the 10.0.0.0/24 subnet
      • The router has 8MB of usable storage for packages and configuration

Setting up AWS Site-to-Site

AWS Site-to-Site is one of Amazon’s offerings for bridging an external network to a VPC over the public internet. Some alternatives are;

  • AWS Client VPN (based on OpenVPN)
    • More expensive
    • More complex, often tends to be slower without optimization
  • Self-managed VPN
    • Allows use of any VPN technology, such as Wireguard
    • Allows custom metric monitoring
    • Requires management of VPC topologies and firewalls
    • Can be more expensive

I chose to use the Site-to-Site in this occasion so I could learn about how IPSec works in more detail, and saw it as a challenge in deploying to OpenWRT. It’s also a lot cheaper than a firewall license, EC2 rental and public IP charges.

Deploying a Virtual Private Gateway

A Virtual Private Gateway is the AWS-side endpoint of an IPSec tunnel. It also hosts the configuration of the local BGP instance, and is what drives the propagation of routes between the IPSec tunnels and the VPC routing tables.

Dashboard view of the Virtual Private Gateway. I rely on an ASN generated by Amazon for this instance.
resource "aws_vpn_gateway" "main" {
  vpc_id = data.aws_vpc.main.id
}

There’s not much to configure with the VPG, so I left it with its’ defaults.

Deploying a Customer Gateway

A customer gateway represents the local end of the IPSec tunnel and the BGP daemon running on it. In my case, this is the OpenWRT router.

Dashboard view of the customer gateway, which represents my OpenWRT Router itself and the BGP daemon running on it. AWS by default provides an ASN of 65000, but I don’t have any need to customize it.
resource "aws_customer_gateway" "openwrt" {
  bgp_asn    = 65000
  ip_address = "<WAN Address of OpenWRT>"
  type       = "ipsec.1"
}

Deploying a Site-to-Site VPN Connection

The VPN Connection itself is what connects a VPG (AWS Endpoint) to a customer gateway (Local endpoint) in the form of an IPSec VPN connection.

Dashboard view of the Site-to-Site VPN. Everything is left as the default. For the purposes of building the automated workflow and testing connectivity, local and remote network CIDRs are 0.0.0.0/0
resource "aws_vpn_connection" "main" {
  customer_gateway_id = aws_customer_gateway.openwrt.id
  vpn_gateway_id      = aws_vpn_gateway.main.id
  type                = aws_customer_gateway.openwrt.type

  tunnel1_ike_versions = ["ikev1", "ikev2"]
  tunnel1_phase1_dh_group_numbers = [2, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
  tunnel1_phase1_encryption_algorithms = ["AES128", "AES128-GCM-16", "AES256", "AES256-GCM-16"]
  tunnel1_phase1_integrity_algorithms = ["SHA1", "SHA2-256", "SHA2-384", "SHA2-512"]
  tunnel1_phase2_dh_group_numbers = [2, 5, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
  tunnel1_phase2_encryption_algorithms = ["AES128", "AES128-GCM-16", "AES256", "AES256-GCM-16"]
  tunnel1_phase2_integrity_algorithms = ["SHA1", "SHA2-256", "SHA2-384", "SHA2-512"]

}

These (tunnel1_*) are the default values set by AWS and should be locked down. For the purpose of testing, I left them all to their defaults. This settings are directly tied to the IPSec encryption settings described below.

Connecting OpenWRT via IPSec

Ansible Role Variables

I’ve designed my Ansible role to be able to configure AWS IPSec tunnels with the bare minimum configuration. All information that the role requires is provided by Terraform upon provisioning of the AWS Site-to-Site configuration.

bgp_remote_as: "64512"
ipsec_tunnels:
  - ifid: "301"
    name: xfrm0
    inside_addr: <Inside IPv4>
    gateway: <Endpoint Tunnel 1>
    psk: <PSK Tunnel 1>
  - ifid: "302"
    name: xfrm1
    inside_addr: <Inside IPv4>
    gateway: <Endpoint Tunnel 2>
    psk: <PSK Tunnel 2>
  • bgp_remote_as refers to the ASN of the Virtual Private Gateway, and is strictly used by the BGP Daemon offered by Quagga.
    • BGP is used to propagate routes to-and-from AWS.
    • When a Site-to-Site VPN is configured to use Dynamic routing, it will state that the tunnel is Down if AWS cannot reach the BGP instance.
  • ipsec_tunnels is used by XFRM and strongSwan to;
    • Build one XFRM interface per-tunnel
    • Build one alias interface bound to each XFRM interface for static routing
    • Configure the static routing of the XFRM interfaces
    • Configure the BGP daemon neighbours
    • Configure one IPSec endpoint per-tunnel
    • Configure one IPSec tunnel for each XFRM interface

Required packages

I used three components for this workflow to function, and a last one for debugging security association errors.

  • strongswan-full
    • strongSwan provides an IPSec implementation for OpenWRT with full support for UCI. The -full variation of the package is overkill, but you never know!
  • quagga-bgpd
    • A BGP implementation light enough to run on OpenWRT. quagga comes in as a dependency
  • luci-proto-xfrm
    • A virtual interface for use by IPSec, where a tunnel requires a vif to bind to.
  • ip-full
    • Provides an xfrm argument for debugging IPSec connections with.
name: install required packages
opkg:
  name: "{{ item }}"
loop:
  - strongswan-full
  - quagga-bgpd
  - luci-proto-xfrm
  - ip-full

Adding the XFRM Interface

OpenWRT LuCI dashboard, showing the final result of the interfaces tab. I declare two XFRM interfaces, one per VPN tunnel provided by AWS, each with an IPv4 assigned that matches the Inside IPv4 CIDRs defined within the AWS Site-to-site configuration. The IPv4 address is applied to an alias of the adapter rather than the adapter itself as the XFRM interface doesn’t support static IP addressing via UCI.
Ansible Task
name: configure xfrm adapter
uci:
  command: section
  key: network
  type: interface
  name: "{{ item.name }}"
  value:
    tunlink: 'wan'
    mtu: '1300'
    proto: xfrm
    ifid: "{{ item.ifid }}"
loop: "{{ ipsec_tunnels }}"
/etc/config/network – UCI Configuration
config interface 'xfrm0'
	option ifid '301'
	option tunlink 'wan'
	option mtu '1300'
	option proto 'xfrm'

config interface 'xfrm1'
	option tunlink 'wan'
	option mtu '1300'
	option proto 'xfrm'
	option ifid '302'

I use the uci task to deploy adapter configurations. I create one interface per-tunnel provided by AWS.

  • tunlink sets the IPSec tunnel to connect to & from the wan interface
  • mtu is 1300 by default, I didn’t need to configure this value
  • ifid is defined as strongSwan will use this to bind an IPSec tunnel to a network interface. This is separate from the name of the interface.

AWS needs to communicate with the BGP instance running on OpenWRT. The value of Inside IPv4 CIDR instructs AWS which IPs to listen on for their BGP instance, and which IP to connect to for fetching routes. The CIDRs will be restricted to the /30 prefix, which provides the range of 4 IP addresses, 2 of which are usable.

As an example, here is the Inside IPv4 CIDR of 169.254.181.60/30 and what that means.

IP IndexIP AddressResponsibility
0169.254.181.60Network address
1169.254.181.61IP Address reserved for AWS-side of the IPSec tunnel
2169.254.181.62IP Address reserved for the OpenWRT-side of the IPSec tunnel
3169.254.181.63Broadcast address

With this known, we know that;

On the AWS side of the IPSec tunnelOn the OpenWRT side of the IPSec tunnel
AWS has a BGP instance listening on the IP Address on the first index ( 169.254.181.61 )OpenWRT needs to be configured to use the IP address on the second index (169.254.181.62)
AWS is expecting a BGP neighbour on the second index ( 169.254.181.62 )The BGP daemon running on OpenWRT needs to be configured to use the neighbor at the first index (169.254.181.61)
AWS knows how to route traffic across the 169.254.181.60 networkOpenWRT needs to know to route traffic on the 169.254.181.60 network.

Configuring the IP Address on the IPSec tunnel

I create an alias on top of the originating XFRM interface so that I can utilize the static protocol within UCI to configure static routing in a declarative way.

Ansible Task
name: create xfrm alias for static addressing
uci:
  command: section
  key: network
  type: interface
  name: "{{ item.name }}_s"
  value:
    proto: static
    ipaddr:
      - "{{ item.inside_addr | ipaddr('net') | ipaddr(2) }}"
    device: "@{{ item.name }}"
loop: "{{ ipsec_tunnels }}"
/etc/config/network – UCI Configuration
config interface 'xfrm0_s'
	option proto 'static'
	option device '@xfrm0'
	list ipaddr '169.254.211.46/30'

config interface 'xfrm1_s'
	option proto 'static'
	list ipaddr '169.254.181.62/30'
	option device '@xfrm1'

I use ipaddr('net') | ipaddr(2) to simplify my Ansible configuration. inside_addr is 169.254.181.60/30 and these functions simply increase the IP address by two, giving the result of 169.254.181.62/30.

This will ensure two things;

  • The xfrm interface persistently holds the 169.254.181.62/30 IP Address
  • The Linux routing table holds a route of 169.254.181.60/30 via the xfrm interface

This resolves the issue of OpenWRT knowing what IP Address to use and how to route the traffic.

Setting up IPSec

Because I’m using strongSwan, I can also use UCI to configure the IPSec tunnel. With this workflow, IPSec configuration is broken down into three elements;

  • Endpoint
    • Primarily what’s known as “IKE Phase 1”. This is the “How I will connect to the other end”.
  • Tunnel
    • Primarily known as “IKE Phase 2”. This is the “How do I pass traffic through to the other end”.
  • Encryption
    • A set of rules to describe how to handle the cryptography.

IPSec Encryption

What’s defined here drives whether Phase 1 will succeed, and must match the AWS VPN Encryption settings.

Ansible Task
name: define ipsec encryption
uci:
  command: section
  key: ipsec
  type: crypto_proposal
  name: "aws"
  value:
    is_esp: '1'
    dh_group: modp1024
    hash_algorithm: sha1
/etc/config/ipsec – UCI Configuration
config crypto_proposal 'aws'
	option is_esp '1'
	option dh_group 'modp1024'
	option encryption_algorithm 'aes128'
	option hash_algorithm 'sha1'

In my case, I’m;

  • Using AES128 for encryption of the traffic
  • Using SHA1 as the integrity algorithm for ensuring packets are correct upon arrival
  • Naming the crypto_proposal aws for use by the Endpoint and the Tunnel

AES128 and SHA1 are supported by the configuration defined on the VPN configuration above.

Declaring the IPSec Endpoint

Ansible Task
name: configure ipsec remote
uci:
  command: section
  key: ipsec
  type: remote
  name: "{{ item.name }}_ep"
  value:
    enabled: "1"
    gateway: "{{ item.gateway }}"
    local_gateway: "<Public IP>"
    local_ip: "10.0.0.1"
    crypto_proposal:
      - aws
    tunnel:
      - "{{ item.name }}"
    authentication_method: psk
    pre_shared_key: "{{ item.psk }}"
    fragmentation: yes
    keyingretries: '3'
    dpddelay: '30s'
    keyexchange: ikev2
loop: "{{ ipsec_tunnels }}"
/etc/config/ipsec – UCI Configuration
config remote 'xfrm0_ep'
	option enabled '1'
	option gateway '<Tunnel 1 IP>'
	option local_gateway '<Public IP>'
	option local_ip '10.0.0.1'
	list crypto_proposal 'ike2'
	list tunnel 'xfrm0'
	option authentication_method 'psk'
	option pre_shared_key '<PSK>'
	option fragmentation '1'
	option keyingretries '3'
	option dpddelay '30s'
	option keyexchange 'ikev2'

config remote 'xfrm1_ep'
	option enabled '1'
	option gateway '<Tunnel 2 IP>'
	option local_gateway '<Public IP>'
	option local_ip '10.0.0.1'
	list crypto_proposal 'ike2'
	list tunnel 'xfrm1'
	option authentication_method 'psk'
	option pre_shared_key '<PSK>'
	option fragmentation '1'
	option keyingretries '3'
	option dpddelay '30s'
	option keyexchange 'ikev2'
  • The gateway is known as the Outside IP Address on AWS
  • local_gateway points to the WAN Address of OpenWRT
  • local_ip points to the LAN address of OpenWRT
  • crypto_proposal points to aws (Defined above)
  • tunnel points to the name of the interface that this IPSec endpoint represents.
    • Since there are two IPSec endpoints, two of these remotes are created. I use the interface name (from xfrm) across all duplicates to make sure that it’s visibly clear what’s being used where.
  • pre_shared_key is the PSK that gets generated (or set) within the VPN Tunnel.
    • This is unique per-tunnel, meaning that there should be two different PSKs per Site-to-site VPN connection. They can be found under the Modify VPN Tunnel Options selection.

Configuring the IPSec Tunnel

The tunnel instructs strongSwan how to bind the IPSec tunnel to an interface. The key here is the ifid of the XFRM interfaces defined earlier.

Ansible Task
name: configure ipsec tunnel
uci:
  command: section
  key: ipsec
  type: tunnel
  name: "{{ item.name }}"
  value:
    startaction: start
    closeaction: start
    crypto_proposal: aws
    dpdaction: start
    if_id: "{{ item.ifid }}"
    local_ip: "10.0.0.1"
    local_subnet:
      - 0.0.0.0/0
    remote_subnet:
      - 0.0.0.0/0
loop: "{{ ipsec_tunnels }}"
/etc/config/ipsec – UCI Configuration
config tunnel 'xfrm0'
	option startaction 'start'
	option closeaction 'start'
	option crypto_proposal 'ike2'
	option dpdaction 'start'
	option if_id '301'
	option local_ip '10.0.0.1'
	list local_subnet '0.0.0.0/0'
	list remote_subnet '0.0.0.0/0'

config tunnel 'xfrm1'
	option startaction 'start'
	option closeaction 'start'
	option crypto_proposal 'ike2'
	option dpdaction 'start'
	option if_id '302'
	option local_ip '10.0.0.1'
	list local_subnet '0.0.0.0/0'
	list remote_subnet '0.0.0.0/0'
  • Like the AWS configuration, I define the local_subnet and remote_subnet to 0.0.0.0/0. This is so I can focus on testing connectivity.
  • if_id points to the XFRM interface that’s representing the tunnel in iteration.
    • The if_id must match the tunnel in iteration, as the Inside IPv4 CIDRs have been bound to an interface.

Configuring BGP on OpenWRT

In order to apply BGP routes on the AWS-side, route propagation must be enabled on a routing table level. Otherwise, a static route pointing to my home IP Address (10.0.0.0/24) via the Virtual Private Gateway must be declared.

I opted for Quagga when using BGP on OpenWRT.

router bgp 65000
bgp router-id {{ ipsec_inside_cidrs[0] | ipaddr('net') | ipaddr(2) | split('/') | first }}
{% for ipsec_inside_cidr in ipsec_inside_cidrs %}
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} remote-as {{ bgp_remote_as }}
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} soft-reconfiguration inbound
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} distribute-list localnet in
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} distribute-list all out
neighbor {{ ipsec_inside_cidr | ipaddr('net') | ipaddr(1) | split('/') | first }} ebgp-multihop 2
{% endfor %}
/etc/quagga/bgpd.conf – Rendered Template
router bgp 65000
bgp router-id 169.254.211.46
neighbor 169.254.211.45 remote-as 64512
neighbor 169.254.211.45 soft-reconfiguration inbound
neighbor 169.254.211.45 distribute-list localnet in
neighbor 169.254.211.45 distribute-list all out
neighbor 169.254.211.45 ebgp-multihop 2
neighbor 169.254.181.61 remote-as 64512
neighbor 169.254.181.61 soft-reconfiguration inbound
neighbor 169.254.181.61 distribute-list localnet in
neighbor 169.254.181.61 distribute-list all out
neighbor 169.254.181.61 ebgp-multihop 2
  • Like earlier, I use ipaddr('net') | ipaddr(1) to increment the IP address from the CIDR
  • remote-as defines the AWS-side ASN.
    • BGP at its’ core defines routes based on path to AS, a layer on-top of IP Addresses.
    • It’s designed to work with direct connections, not over-the-internet.
      • ISPs & exchanges will, however, use BGP at their level to forward the traffic on.
  • router bgp states what the ASN of the OpenWRT router is. Because I used the default of 65000 from AWS, I place that here.
  • bgp router-id is set to the first XFRM interface’s IP address, since the same BGP instance will be shared by both tunnels in the event that one tunnel goes down. AWS does not do a validation check on the router-id.

Verifying the connection to IPSec

Using the swanctl command, I can identify whether my applied configuration is successful when logged into my OpenWRT router using SSH.

Start swanctl

I don’t use the legacy ipsec init script, instead, directly using the swanctl one. Under the hood, this will convert the UCI configuration into a strongSwan configuration located at /var/swanctl/swanctl.conf

/etc/init.d/swanctl start
ipsec statusall
Output of the ipsec statusall command, where both VPN tunnels are ESTABLISHED and INSTALLED. Established denotes that IKE Phase 1 (Encryption negotiation) was successful and Installed denotes that IKE Phase 2 (Authorization, the tunnel creation itself) was successful and is now in use.
Connection can also be verified from the AWS Console, by looking at the value of Details. If the connection doesn’t say IPSEC IS DOWN, the connection was successful. Status is only up when BGP can be reached from AWS. When using Dynamic (not static) routing in the configuration for Site-to-Site, AWS doesn’t declare a connection up unless BGP is reachable at the second address available in the Inside IPv4 CIDR.

Routing traffic to & from the XFRM Interface

I finally need to instruct OpenWRT to forward packets that are destined to xfrm0 or xfrm1 to be allowed. The fact that the Linux routing table will state that 10.1.0.0/24 is accessed via xfrm0, which is applied via BGP is enough to know that either xfrm0 or xfrm1 is the interface required.

By default, a flag of REJECT is defined. By applying the following firewall rule, packet successfully go through to the AWS VPC.

Ansible Task
name: install firewall zone
uci:
  command: section
  key: firewall
  type: zone
  find_by:
    name: 'tunnel'
  value:
    input: REJECT
    output: ACCEPT
    forward: REJECT
    network:
      - xfrm0
      - xfrm1
name: install firewall forwarding
uci:
  command: section
  key: firewall
  type: forwarding
  find_by:
    dest: 'tunnel'
  value:
    src: lan
/etc/config/firewall – UCI Configuration
config zone
	option name 'tunnel'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	list network 'xfrm0' 'xfrm1'

config forwarding
	option src 'lan'
	option dest 'tunnel'

Final tasks

The final steps of the Ansible playbook is to instruct the UCI framework to save the changes to the disk, and to reload the configuration of all services required.

name: commit changes
uci:
  command: commit
name: enable required services
service:
  name: "{{ item }}"
  enabled: yes
  state: reloaded
loop:
  - swanctl
  - quagga
  - network

I then invoke the Ansible playbook by using a local-exec provisioner on a null_resource within terraform, where the AWS Site-to-Site resource is a dependency. Along the lines of:

resource "null_resource" "cluster" {
  provisioner "local-exec" {
    command = <<EOT
  ansible-playbook \
    -I ${var.openwrt_address}, \
    -e 'aws_tunnel_ips=${aws_vpn_connection.main.tunnel1_address},${aws_vpn_connection.tunnel2_address}' 
    playbook.yaml \
    -e 'aws_psk=${aws_vpn_connection.main.tunnel1_preshared_key},${aws_vpn_connection.main.tunnel2_preshared_key}' 
    playbook.yaml
    EOT
  }
}

This is a shortened version of what I have, but by simply piping the Ansible playbook with the outputs of the AWS Site-to-Site Resource, my router is automatically configured correctly when I create a Site-to-Site resource.

With IPSec now deployed, I can communicate directly with my resources hosted on AWS as if it were local.

]]>
Deploying Region-locked AWS Organizations using Terraform https://zai.dev/2024/10/07/deploying-region-locked-aws-organizations-using-terraform/ Mon, 07 Oct 2024 12:30:03 +0000 https://zai.dev/?p=960 Read more]]> As a solutions architect, I was tasked with building an AWS Organizations hierarchy for a Canadian startup that needed to comply with local laws and enable multi-site configurations for networking.

To get started, I built an AWS Organizations hierarchy using Terraform. I chose Terraform because it allows me to use the same workflow for building organizations across multiple clouds. This post will focus on building an Organizational Unit (OU) tree for regions and localities.

To create OUs, I have a “basic” Terraform module that is a wrapper on the aws_organizations_organizational_unit resource. To make it reusable, I expose the name and parent. I then specialize the “basic” Terraform module into ones more specific to each organization by injecting tags and appending a postfix to the name of the OU, such as the region or locality.

For compliance, I restrict at the OU-level which zones can be used by the AWS account and any IAM users assuming the role of this AWS account. I use a Service Control Policy (SCP) to deny access to all regions except for those specified in the local.regions value. Because a lot of core infrastructure for AWS is located within us-east-1 and us-east-2, such as Billing, I need to always include it in the local.regions value.

Since I need to cater for both compliance and multi-site, I used my modules to build the OUs in the following hierarchy:

  • Root Organization
    • Region OU (e.g: North America)
      • Country OU (e.g: Canada)
        • Locality OU (e.g: Vancouver)

And with Terraform modules structured in the following way:

  • Root Organization
    • Client Module
      • Client Root Organizational Unit
      • Region / Country / Locality Module
        • Base OU Module
          • Region / Country / Locality Organizational Unit
        • Region Policies
          • SCP Policy

In the case of Vancouver, while the Seattle local zone or us-west-2 region is closer, it’s not located within Canada which may be a problem when looking at local labor laws and compliance, so Calgary (ca-west-1) is the next best thing. I’m waiting for the Vancouver local zone to become publicly available so that I can use that, but it will fall under Calgary anyway.

This means that my SCPs restrict the organizational units to the following regions:

  • North American OU
    • us-west-1, us-west-2, us-east-1, us-east-2, ca-central-1, ca-west-1
  • Canadian OU
    • us-east-1, us-east-2, ca-central-1, ca-west-1
  • Vancouver OU
    • us-east-1, us-east-2, ca-west-1

Because of the hierarchy approach, I can have AWS accounts in parents with shared resources such as VPCs, Databases, S3 and EFS shares. This will be hugely beneficial when working multiple sites.

My re-usable modules follow the following structure:

Core Organizational Unit

This holds the default values for all OUs within the organization, where tags for example would be shared.

resource "aws_organizations_organizational_unit" "root" {
  name      = var.name
  parent_id = var.parent

  tags = var.tags
}

Inheriting the Basic OU into Locality, Country & Region OUs

I re-use the basic module to make it follow a strict naming and tag convention based on the context (e.g: locality, country and region). This module is for the context and not specifically the region in question. The region in question will then re-use this module.

This makes sure that the NA Region and European Region have the same fundamentals between them.

module "basic" {
  source = "../basic"
  parent = var.parent
  name = "${var.name} - ${var.locality}"
  tags = local.tags
}
module "policies" {
  source = "../../../regions/policies"
  policy_name = local.policy_name
  target_id = module.basic.id
  regions = var.regions
}

Inheriting the Context OU Module into literal regions

Here, I take the region context module and adapt it specific to North America. The same logic applies to country and locality. This simply enforces that the tags and name of the OU contain the region and that the SCPs generated block all regions except the regions provided

module "region" {
  source = "../../templates/organization/region"
  region = "North America"
  parent = var.parent
  name = var.name
  tags = local.tags
  regions = [
    "us-east-1",
    "us-east-2",
    "us-west-1",
    "us-west-2",
    "ca-central-1",
    "ca-west-1"
    ]
}

Generating the SCPs from Terraform

policy_name here is the same as the name of an OU with spaces removed. Since SCPs require Deny rules, using the StringNotEquals test is needed.

data "aws_iam_policy_document" "region_restriction" {
  statement {
    sid = "RestrictRegionFor${var.policy_name}"
    effect    = "Deny"
    actions   = ["*"]
    resources = ["*"]

    condition {
      test = "StringNotEquals"
      variable = "aws:RequestedRegion"
      values = local.regions
    }
  }
}
resource "aws_organizations_policy" "region_restriction" {
  name    = "RestrictRegionFor${var.policy_name}"
  content = data.aws_iam_policy_document.region_restriction.json
  type = "SERVICE_CONTROL_POLICY"
}

Declaring a Regional OU for an Organization

Finally, I can use the North American OU to declare an OU that restricts any AWS Accounts inside to only create resources within North America.

module "region-na" {
  source = "../regions/north-america"
  parent = aws_organizations_organizational_unit.root.id
  name = var.name
  tags = local
}

I can do the same with Canada, and Vancouver.

module "region-ca" {
  source = "../regions/north-america/canada"
  parent = module.region-na.id
  name = var.name
  tags = module.region-na.tags
}

module "region-yvr" {
  source = "../regions/north-america/canada/vancouver"
  parent = module.region-ca.id
  name = var.name
  tags = module.region-ca.tags
}

By the end of the deployment, my hierarchy looks like the following:

And the attached SCP policies look like the following, where the SCP that is the direct parent of an AWS Account takes the most precedence:

ZAI – North America

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictRegionForZAINorthAmerica",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "us-east-1",
            "us-east-2",
            "us-west-1",
            "us-west-2",
            "ca-central-1",
            "ca-west-1",
            "us-east-1",
            "us-east-2"
          ]
        }
      }
    }
  ]
}

ZAI – Canada

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictRegionForZAICanada",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "ca-central-1",
            "ca-west-1",
            "us-east-1",
            "us-east-2"
          ]
        }
      }
    }
  ]
}

ZAI – Vancouver

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RestrictRegionForZAIVancouver",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": [
"ca-west-1",
"us-east-1",
"us-east-2"
]
}
}
}
]
}

Here it is in action, when exploring a region that is blocked by the SCP Policy:

]]>
Outline: Extending a home network setup in AWS https://zai.dev/2024/09/22/outline-extending-a-home-network-setup-in-aws/ Sun, 22 Sep 2024 12:30:18 +0000 https://zai.dev/?p=929 Read more]]> As someone without a permanent base, I needed a secure and flexible cloud infrastructure that allowed me to spawn powerful machines when needed. To achieve this, I built an isolated network on AWS.

I began by creating a Terraform module that provisions the infrastructure needed, such as

  • VPCs
  • Subnets
  • Routing tables
  • EC2 instances
    While the module is tailored to AWS, I plan to keep the variable names consistent to other modules that re-create the setup for different cloud platforms, such as Exoscale.

The isolated network is centered around an EC2 instance, which acts as a router between a public VPC and a private VPC, similar to an at-home router. The EC2 instance has two ENI adapters, one attached to the public VPC and the other attached to the private VPC. The EC2 instance is running VyOS, which I configured using Ansible and the local-exec provisioner in Terraform upon creation.

data "aws_ami" "vyos" {
  most_recent = true

  filter {
    name   = "name"
    values = ["VyOS 1.4.0-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["679593333241"]
}
resource "aws_instance" "vyos" {
  ami           = data.aws_ami.vyos.id
  availability_zone = data.aws_availability_zones.region.names[0]
  instance_type = "t3.small"

  network_interface {
    network_interface_id = aws_network_interface.public.id
    device_index         = 0
  }

  network_interface {
    network_interface_id = aws_network_interface.local.id
    device_index         = 1
  }

  provisioner "local-exec" {
        command = "ansible-playbook -i \"${aws_eip.public.public_ip},\" <path_to_playbook>"
    }
}

The public VPC has an internet gateway attached to it, and all instances in the public VPC have internet access. The router instance is the only instance that resides in the public VPC. Both VPCs have a subnet within a single availability zone (AZ), as a single EC2 instance cannot span two AZs.

resource "aws_internet_gateway" "gw" {
}

resource "aws_internet_gateway_attachment" "gw" {
  internet_gateway_id = aws_internet_gateway.gw.id
  vpc_id              = aws_vpc.public.id
}

Each VPC has a routing table to correctly route traffic. The public VPC routes all traffic towards the internet gateway, while the private VPC routes all traffic within the subnet to each other.

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.public.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.gw.id
  }
}

resource "aws_route_table" "internal" {
  vpc_id = aws_vpc.internal.id
}

resource "aws_route_table_association" "internal" {
  subnet_id      = aws_subnet.internal.id
  route_table_id = aws_route_table.internal.id
}

resource "aws_route_table_association" "public" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}

I connect to my isolated network primarily through my OpenWRT-based router using WireGuard. I also use the WireGuard client on my Mac or phone to connect to the cluster when I’m outside. Keep an eye out for my posts detailing how I deploy VyOS on AWS and configure OpenWRT to connect to WireGuard.

I attached an Elastic IP to the router instance, which lets me destroy and re-build the instance without issue. This is useful when I don’t need the network running, like when I’m flying, or when I’m actively improving the instance.

resource "aws_eip" "public" {
  domain   = "vpc"
}

resource "aws_eip_association" "public" {
  network_interface_id   = aws_network_interface.public.id
  allocation_id = aws_eip.public.id
}

If I need to access any other AWS resource, I add a VPC Endpoint for that resource directly to the private VPC. For example, I use S3FS to mount S3 storage directly on the instance and DynamoDB for building JSONL files for machine learning tasks.

Use Cases

I create a Windows instance inside the subnet when I need to do remote work that involves downloading when I’m outdoors. I also create larger instances for working with AI/Machine Learning models when my Mac isn’t able to load them or when I don’t have storage at a given time.

Multi-Region Setup

To transfer the setup to another region, I simply change the region variable in my Terraform module, and it magically appears in the new region.

]]>
Building AMD64 QEMU Images remotely using Libvirt and Packer https://zai.dev/2024/05/24/building-amd64-qemu-images-remotely-using-libvirt-and-packer/ Fri, 24 May 2024 22:05:58 +0000 https://zai.dev/?p=675 Read more]]>

I need to build images based off AMD64 architecture while working from an ARM64 machine. While this is possible directly using the qemu-system-x86_64 binary, it tends to be extremely slow due to the overhead of converting for the ARM architecture.

Workbench

  • Ubuntu 22.04 LTS with libvirt installed
  • MacBook Pro M2 with the Packer build files

Configuring the Libvirt Plugin

Connecting to the libvirt host

When using the libvirt plugin, I need to provide a Libvirt URI.

source "libvirt" "image" {
    libvirt_uri = "qemu+ssh://${var.user}@${var.host}/session?keyfile=${var.keyfile}&no_verify=1"
}
  • qemu+ssh:// denotes that I’ll be using the QEMU / KVM Backend and connecting via SSH. The connection method denotes the rest of the arguments of the string
  • ${var.user}@${var.host} is in the SSH syntax, this is the username and hostname of the machine that is running libvirt
  • /session is to isolate the running builds from those on the system level. /system would work just as well.
  • keyfile=${var.keyfile} is used to automatically authenticate to the remote machine without the need of a password. This is useful in the future when I automatically trigger the packer build from a Git repository
  • no_verify=1 is added so that I can throw the build at any machine and have it “just work”. This is usually guided against due to spoofing attacks.

Communicating with the libvirt guest

communicator {
    communicator                 = "ssh"
    ssh_username                 = var.username
    ssh_bastion_host             = var.host
    ssh_bastion_username         = var.user
    ssh_bastion_private_key_file = var.private_key
  }
  • The difference between ssh_* and ssh_bastion_* is that the first refers to the target virtual machine being built, and the latter refers to the “middle-man” machine.
    • I require this as I don’t plan to expose the VM to a network outside of the machine hosting it.
    • Since I won’t have access from my local workstation, I need to communicate with the virtual machine via the machine that is hosting it.
    • By adding ssh_bastion_* arguments, I’m telling packer that in-order to communicate with the VM, it needs to access the bastion machine first then execute all SSH commands through it.

Configuring the libvirt daemon

My Observations

I came across a “Permission Denied” error when attempting to upload an existing image (in my case, the KVM Ubuntu Server Image). This was due to AppArmor not being provided a trust rule upon creation of the domain. This error is first visible in the following form directly from Packer:

==> libvirt.example: DomainCreate.RPC: internal error: process exited while connecting to monitor: 2024-05-24T16:41:42.574660Z qemu-system-x86_64: -blockdev {"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null}: Could not open '/var/lib/libvirt/images/packer-cp8c6ap1ijp2kss08iv0-ua-artifact': Permission denied

At first, I assumed that there was an obvious permissions problem, and at first glace there was in-fact that. When looking at this file upon creation, it had root permissions where only the root user can read/write.

# ls -lah /var/lib/libvirt/images
-rw------- 1 root root  925M May 24 16:41 packer-cp8c6ap1ijp2kss08iv0-ua-artifact

This makes sense since libvirtd is running under the root user, which is the default configuration from the Ubuntu repository. I didn’t see any configuration option to manipulate what the permissions should be after an upload with libvirt either. This was an assumed problem since all QEMU instances are running under a non-root user, libvirt-qemu

# ps -aux | grep libvirtd
# ps -aux | grep qemu

root      145945  0.4  0.1 1778340 28760 ?       Ssl  16:43   0:10 /usr/sbin/libvirtd
libvirt+    3312  2.2 11.1 4473856 1817572 ?     Sl   May12 405:19 /usr/bin/qemu-system-x86_64

My second observation was that all images created directly within libvirt (e.g: with virt-manager) had what looked like “correct” permissions, those that matched the user that QEMU would eventually run under;

# ls -lah /var/lib/libvirt/images
-rw-r--r-- 1 libvirt-qemu kvm   11G May 24 17:11 haos_ova-11.1.qcow2

Since no-one else had reported this particular issue when using the libvirt plugin, I had gone down the route of PEBKAC.

Allowing packer-uploaded images as backing store

Thanks to this discussion on Stack Overflow, I found that AppArmor had been blocking the request to the specific file in question.

# dmesg -w
[1081541.249157] audit: type=1400 audit(1716568577.970:119): apparmor="DENIED" operation="open" profile="libvirt-25106acc-cfd8-40f7-a7c6-f5c1c63bc16c" name="/var/lib/libvirt/images/packer-cp8c6ap1ijp2kss08iv0-ua-artifact" pid=43927 comm="qemu-system-x86" requested_mask="w" denied_mask="w" fsuid=64055 ouid=64055

Here, I can see that AppArmor is doing three things;

  • Denying an open request to the QEMU Image
    • apparmor="DENIED"
    • operation="open"
  • Denying writing to the QEMU Image
    • denied_mask="w"
  • Using a profile that is specific to the domain being launched
    • profile="libvirt-25106acc-cfd8-40f7-a7c6-f5c1c63bc16c"
    • This is achieved because libvirt will automatically push AppArmor rules upon creation of a domain. This also means that libvirt will be using some form of template file or specification to create rules.

This means that I need to find the template file that libvirt is using to design the rules, and allow for writing to packer-uploaded QEMU Images.

# /etc/apparmor.d/libvirt/TEMPLATE.qemu
# This profile is for the domain whose UUID matches this file.
# 

#include <tunables/global>

profile LIBVIRT_TEMPLATE flags=(attach_disconnected) {
  #include <abstractions/libvirt-qemu>
  /var/lib/libvirt/images/packer-** rwk,
}

As mentioned in the Stack Overflow post, simply adding /var/lib/libvirt/images/packer-** rwk, to the template file is enough to get past this issue.

End Result

By bringing everything together, I get a successful QCOW2 image visible in my default storage pool. I’m using the Ansible provisioner within the build block so that I can keep the execution steps separate from the Packer build script, and re-usable across different cloud providers.

]]>
Configuring Traefik for Cross-Namespace Ingress https://zai.dev/2024/04/08/configuring-traefik-for-cross-namespace-ingress/ Mon, 08 Apr 2024 00:20:11 +0000 https://zai.dev/?p=662 Read more]]> When installing Traefik either with Kubernetes or K3s, detection of Ingress object types in other namespaces than what Traefik is running in will not be possible. Since Traefik typically runs under the kube-system namespace, this will be a problem as I don’t want any of my production deployments to be running in a namespace intended to hold essential elements to the Kubernetes cluster.

In my scenario, I inherited Traefik by installing K3s on my homelab and plan to deploy Traefik to a production cluster for my pipeline project in the future.

The Simple Fix

All that Traefik requires is the providers.kubernetesCRD.allowCrossNamepsace setting to be forced to true. This had been set to false by default in a previous version.

Inherited or not, Traefik can be deployed by using Helm charts (which is the case under K3s). Under Helm, a configuration override can be placed by using the HelmChartConfig object. Once deploying this object, the installed deployment will restart with configuration merged between the default and that defined within the new HelmChartConfig object.

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    providers:
      kubernetesCRD:
        allowCrossNamespace: true

Or if it’s needed in a HCL / Terraform context ..

resource "kubernetes_manifest" "traefik" {
  manifest = {
    "apiVersion" = "helm.cattle.io/v1"
    "kind"       = "HelmChartConfig"
    "metadata" = {
      "name"      = "traefik"
      "namespace" = "kube-system"
    }
    "spec" = {
      "valuesContent" = file("${path.module}/traefik-config.yml")
    }
  }
}
# traefik-config.yml
providers:
  kubernetesCRD:
    allowCrossNamespace: true
]]>
Using AWS CodeBuild to execute Ansible playbooks https://zai.dev/2024/04/06/using-aws-codebuild-to-execute-ansible-playbooks/ Sat, 06 Apr 2024 19:31:19 +0000 https://zai.dev/?p=600 Read more]]> I wanted a clean and automate-able way to package third party software into *.deb format (and multiple others, if needed, in the future), and I had three ways to achieve that;

  • The simple way: Write a Bash script
  • The easy way: Write a Python script
  • My chosen method: Write an Ansible role

While all of the options can get me where I wanted, it felt a lot cleaner to go the Ansible route as I can clearly state (and see) what packages I am building either from the command line level or from a playbook level, rather than having to maintain a separate configuration file to drive what to build and where in an alternative format for either the Bash or Python approaches.

The playbook approach also allows me to monitor and execute a build on a remote machine, should I wish to build cross-platform or need larger resources for testing.

In this scenario, I’ll be executing the Ansible role locally on the CodeBuild instance.

Configuring the CodeBuild Environment

Using GitHub as a source

I have one git repository per Ansible playbook, so by linking CodeBuild to the repository in question I’m able to (eventually) automatically trigger the execution of CodeBuild upon a pushed commit on the main branch.

The only additional setting under sources that I define is the Source version, as I don’t want build executions happening for all branches (as that can get costly).

CodeBuild Environment

For the first iteration of this setup, I am installing the (same) required packages at every launch. This is not the best way to handle pre-installation in terms of cost and build speed. In this instance, I’ve chosen to ignore this and “brute-force” my way through to get a proof-of-concept.

  • Provisioning Model: On-demand
    • I’m not pushing enough packages to require a dedicated fleet, so spinning up VMs in response to a pushed commit (~5 times a week) is good enough.
  • Environment Image: Managed Image
    • As stated above, I had my focus towards a proof-of-concept that running Ansible under CodeBuild was possible. A custom image with pre-installed packages is the way to go in the long run.
  • Compute: EC2
    • Since I’m targeting *.deb format, I choose Ubuntu as the operating system. The playbook I’m expecting to execute doesn’t require GPU resources either.
    • Amazon Lambda doesn’t support Ubuntu, nor is able to execute Ansible (directly). I’d have to write a wrapper in Python that will execute the Ansible Playbook which is more overhead.
    • Depending on the build time and size of the result package, I had to adjust the memory required accordingly. However, this may be because I’m making use of the /tmp directory by default.

buildspec.yml

I store the following file at the root level of the same Git repository that contains the Ansible playbook.

version: 0.2

phases:
  pre_build:
    commands:
      - apt install -y ansible python3-botocore python3-boto3
      - ansible-galaxy install -r requirements.yaml
      - ansible-galaxy collection install amazon.aws
  build:
    commands:
      - ansible-playbook build.yaml
artifacts:
  files:
    - /tmp/*.deb

As stated above, I’m always installing the required System packages prior to interacting with Ansible. This line (apt install) should be moved into a pre-built image that this CodeBuild environment will then source from.

I keep the role (and therefore, tasks) separate from the playbook itself, which is why I use ansible-galaxy to install the requirements. Each time the session is started, it pulls down a fresh copy of any requirements. This can differ from playbook to playbook.

I use the role for the execution steps, and the playbook (or inventory) to hold the settings that influence the execution, such as (in this scenario) what the package name is and how to package it.

I explicitly include the amazon.aws Ansible collection in this scenario as I’m using the S3 module to pull down sources (or builds of third party software) and to push build packages up to S3. I’m doing this via Ansible as opposed to storing it within Git due to its’ size, as well as opposed to CodeDeploy as I don’t plan on deploying the packages to infrastructure, rather, to a repository.

I did have some issues using the Artifacts option within CodeBuild also, which lead to pushing from Ansible.

Finally, the ansible-playbook can be executed once all the pre-requisites are needed. The only adaptation that’s needed on the playbook level, is that localhost is listed as a target. This ensures that the playbook will execute on the local machine.

---
- hosts: localhost

Once all the configuration and repository setup is done, the build executed successfully and I received my first Debian package via CodeBuild using Ansible.

]]>
Packaging proprietary software for Debian using Ansible https://zai.dev/2024/02/25/packaging-proprietary-software-for-debian-using-ansible/ Sun, 25 Feb 2024 18:47:33 +0000 https://zai.dev/?p=598 Read more]]>

Managing software installations for an animation studio can be a time-consuming process, especially if relying on the provided installation methods, and even more so when using a distribution that isn’t officially supported by the software.

I needed a streamlined workflow that allowed me to download DCCs from their creators, package them into *.deb files and deploy them to a local server for eventual installation on workstations. This will allow me to properly version control the versions of software from a system level, and to rapidly build DCC-backed docker containers without needing to run the lengthy install processes. I achieved this in the form of an Ansible role.

The Role

I will use SideFX® Houdini as an example for how to interpret this role.

{{ package }} refers to one item within packages

{{ source }} refers to an individual item within sources

      packages:
      - name: houdini
        version: 20.0.625
        release: 1
        category: graphics
        architecture: amd64
        author: "WhoAmI? <someone@goes.here>"
        description: Houdini 20
        sources:
          - name: base
            s3:
              bucket: MyLocalPrivateS3Bucket
              object: "houdini/houdini-20.0.625-linux_x86_64_gcc11.2.tar.gz"
            path: /houdini.tar.gz
            archive: yes
        scripts:
          pre_build:
            - "{{ lookup('first_found', 'package_houdini.yaml')}}"
        repo:
          method: s3
          bucket: MyLocalPrivateS3Bucket
          object_prefix: "houdini"
---
- include_tasks: build.yml
  loop: "{{ packages }}"
  loop_control:
    loop_var: package

I want to execute an entire block in iteration, once per package to build. I want to use a block so that I can make use of the rescue: clause for cleanup in the event of a packaging failure, and to make use of the always: clause to release the package to a local repository.

In order to achieve this, I need to put the block in a separate task and iterate on the include_tasks task as the block task doesn’t accept a loop: argument. I’ve also named the iteration variable to package as I will be running iterations within.

- block:
  - name: create build directory
    file:
      path: /tmp/build
      state: directory

Getting the installer archive

From a HTTP source

- name: download archive
    ansible.builtin.get_url:
      url: "{{ source.url }}"
      dest: "{{ source.path }}"
      mode: '0440'
    loop: "{{ package.sources }}"
    loop_control:
      loop_var: source
    when: "source.url is defined"

If I download the package to an on-prem server (to perform a virus scan), I may choose to deliver the files in the form of a HTTP Server.

From S3

  - name: download from s3 bucket
    amazon.aws.s3_object:
      bucket: "{{ source.s3.bucket }}"
      object: "{{ source.s3.object }}"
      dest: "{{ source.path }}"
      mode: get
    loop: "{{ package.sources }}"
    loop_control:
      loop_var: source
    when: "source.s3 is defined"

If I’m working with a cloud-based studio, I may opt in to store the archive on S3 for cheaper expenses, especially if I tend to get multiple errors and need to debug the playbook (as bringing data into AWS is a cost)

From the Ansible controller

  - name: copy archives
    copy:
      src: "{{ source.path }}"
      dest: "{{ source.path }}"
    loop: "{{ package.sources }}"
    loop_control:
      loop_var: source
    when:
      - "source.archive is not defined"
      - "source.url is not defined"

If it’s my first time and I’m on-premises, I may choose to deliver the package straight from my machine to a build machine

From network storage attached to the remote machine

Or if I decide to store the archive on the network storage that’s attached to a build machine, I can choose to pull the file from the network directly without the need to copy.

Getting to the Application

  - name: extract archives
    unarchive:
      src: "{{ source.path }}"
      dest: /tmp/build.src
      remote_src: yes
    loop: "{{ package.sources }}"
    loop_control:
      loop_var: source
    when: "source.archive is defined"

Should I need to extract an archive, I provide an archive: yes attribute onto the package in iteration. The yes isn’t relevant, but the fact that the attribute exists is enough for Ansible to trigger this task based on the when: clause.

  - name: copy archives
    copy:
      src: "{{ source.path }}"
      dest: /tmp/build.src
    loop: "{{ package.sources }}"
    loop_control:
      loop_var: source
    when: "source.archive is not defined"

If we have the other case of not needing to extract anything, we can just copy the source over to the building directory.

  - name: prepare package
    include_tasks: "{{ item }}"
    loop: "{{ package.scripts.pre_build }}"
    vars:
      build_prefix: /tmp/build

Finally, we need to layout the contents in order for dpkg to create the package correctly. The packaging process requires a directory that will act as the “target root directory”.

If I have the file /tmp/build.src/test and build the directory /tmp/build.src, I would get a build.src.deb file that will create a /test file once installed.

Following this logic, I need to install the application I want into /tmp/build.src as if it were the root filesystem.

Since every application is different, I implemented the concept of pre_build scripts, in which this role itself will handle the pulling and releasing of packages, the preparation and destroy of the build directories and the templates and the execution for the packaging system. What it doesn’t handle is how to get the contents of the application itself.

package.scripts.pre_build points to a list of tasks to run to prepare package for packaging. This should not be specific to any distribution. As an example, for SideFX® Houdini, my pre_build is the following:

- name: find houdini installers
  ansible.builtin.find:
    paths: /tmp/build.src
    patterns: 'houdini.install'
    recurse: yes
  register: found_houdini_installers

- name: install houdini to directory
  shell: "{{ item.path }} \
    --make-dir \
    --auto-install \
    --no-root-check \
    /tmp/build/opt/hfsXX.X"
  loop: "{{ found_houdini_installers.files }}"

Where I use the find module to search for any Houdini installers (since it may be nested, I use the recurse: yes variable), and then run the Houdini installer as if I were at the machine.

There are some additional flags that I had removed from the snippet to make the installation automatic (--acceptEULA) and selective (--no-install-license-server). The latter I plan to package into their own individual packages (e.g: houdini-sesinetd, maya2024-houdini-engine etc.) but haven’t got round to it.

  - include_tasks: debian.yml
    when: ansible_os_family == 'Debian'

Finally, since I’m packaging for Debian, I place the procedures for building a Debian package into a debian.yml file. I plan to extend the packaging role to include Arch Linux and Fedora-based in the future.

- name: create debian directory
  file:
    path: /tmp/build/DEBIAN
    state: directory

- name: render debian control template
  template:
    src: debian.control.j2
    dest: /tmp/build/DEBIAN/control

At the “root” level, we require a DEBIAN directory, with a control file inside. This is the minimum to get a deb package. I’m making use of a Jinja2 template for the control file as it’s cleaner to manage down the road.

DEBIAN/control

Package: {{ package.name }}
Version: {{ package.version }}-{{ package.release }}
Section: {{ package.category }}
Priority: optional
Architecture: {{ package.architecture }}
Maintainer: {{ package.author }}
Description: {{ package.description }}

Building the package

- name: execute package build
  shell: "dpkg --build /tmp/build /tmp/{{ package.name }}-{{ package.version }}-{{ package.release }}-{{ package.architecture }}.deb"

- set_fact:
    local_package_location: "/tmp/{{ package.name }}-{{ package.version }}-{{ package.release }}-{{ package.architecture }}.deb"

I set the name of the result *.deb to follow a strict naming convention to avoid file conflicts. Lastly, once the package is built, I use set_fact to return the path to the *.deb file from the sub-task so that the build.yml file can deploy it to where it needs to go, be it S3 or a local repository. I do this as I may be building more than a debian package in the future.

]]>
Building a PKI using Terraform https://zai.dev/2024/02/24/building-a-pki-using-terraform/ Sat, 24 Feb 2024 21:09:08 +0000 https://zai.dev/?p=568 Read more]]>

As part of building a hybrid infrastructure, I explored different technologies for achieving a stable VPN connection from on-premises to the AWS Infrastructure and found AWS’ Client-to-Site feature nested within AWS VPC. I explored this prior to AWS Site-to-Site VPN as I didn’t have the right setup for handling IPSec/L2TP tunnels at the time, and had OpenVPN already handy from my MacBook.

Since I would be using OpenVPN (As that’s what AWS Client VPN uses), I require TLS certificates as a method of authentication and encryption. While AWS provides certificate management features, it does have a cost, making it less suitable for my testing requirements.

I’ve opted to use Terraform to create a custom PKI solution locally, and to prepare for the re-use in larger infrastructure projects.

Working Environment

  • Machine
    • MacBook Pro M2
  • Technologies
    • Terraform

Terraform Module Breakdown

terraform {
  required_providers {
    tls = {
      source = "hashicorp/tls"
    }
    pkcs12 = {
      source = "chilicat/pkcs12"
    }
  }
}

I’m making use of the following modules within my Terraform project

  • hashicorp/tls
    • For generating the private keys, certificate requests and certificates themselves
  • chilicat/pkcs12
    • For combining the private key & certificate together, a requirement for using OpenVPN client without embedding the data inside the *.ovpn configuration file (which didn’t come out-of-the-box from AWS)
/**
 * Private key for use by the self-signed certificate, used for
 * future generation of child certificates. As long as the state
 * remains unchanged, the private key and certificate should not
 * re-update at every re-run unless any variable is changed.
 */
resource "tls_private_key" "pem_ca" {
  algorithm = var.algorithm
}

I’ve made the algorithm of the certificates controllable from a global variable due to customer requirements possibly needing to adopt a different level of encryption. This resource returns a PEM-formatted key.

/**
 * Generation of the CA Certificate, which is in turn used by
 * the client.tf and server.tf submodules to generate child
 * certificates
 */
resource "tls_self_signed_cert" "ca" {
  private_key_pem = tls_private_key.pem_ca.private_key_pem
  is_ca_certificate = true

  subject {
    country             = var.ca_country
    province            = var.ca_province
    locality            = var.ca_locality
    common_name         = var.ca_cn
    organization        = var.ca_org
    organizational_unit = var.ca_org_name
  }

  validity_period_hours = var.ca_validity

  allowed_uses = [
    "digital_signature",
    "cert_signing",
    "crl_signing",
  ]
}

I then used the tls_self_signed_cert resource to generate the CA certificate itself, providing the private key generated prior into the private_key_pem attribute. Again, by providing global variables for the ca subject and validity, I’m able to re-run the same terraform module for multiple clients under different workspaces (or by referencing this into larger modules).

The subject fields I had decided to expose are a way to describe exactly what and where the TLS certificate belongs to without needing to dive back into the module.

By adding cert_signing and crl_signing to the allowed_uses list, it adds permissions to the certificate for signing child certificates. This is essential as I would still need to generate the certificates for the OpenVPN server and the client.

This resource returns a PEM-formatted certificate.

/**
 * Return the certificate itself. It's the responsibility of
 * the user of this module to determine whether the certificate should
 * be stored locally, transferred or submitted directly to a cloud
 * service
 */
output "ca_certificate" {
  value = tls_self_signed_cert.ca.cert_pem
  sensitive = true
  description = "generated ca certificate"
}

Finally, I return the CA Certificate and its’ key from the module for the user to place it where it needs to be, for example;

To a local file

resource "local_file" "ca_key" {
  content_base64 = module.pki.ca_private_key
  filename = "${path.module}/certs/ca.key"
}
resource "local_file" "ca" {
  content_base64 = module.pki.ca_certificate
  filename = "${path.module}/certs/ca.crt"
}

To the AWS Certificate Manager

resource "aws_acm_certificate" "ca" {
  private_key = module.pki.ca_private_key
  certificate_body = module.pki.ca_certificate
}

Server & Client Certificates

resource "tls_cert_request" "csr" {
  for_each = var.clients # or var.servers
  private_key_pem = tls_private_key.pem_clients[each.key].private_key_pem
    # or pem_servers[each.key]
  dns_names = [each.key]

  subject {
    country = try(each.value.country, try(var.default_client_subject.country, var.default_subject.country))
    province = try(each.value.province, try(var.default_client_subject.province, var.default_subject.province))
    locality = try(each.value.locality, try(var.default_client_subject.locality, var.default_subject.locality))
    common_name = try(each.value.cn, try(var.default_client_subject.cn, var.default_subject.cn))
    organization = try(each.value.org, try(var.default_client_subject.org, var.default_subject.org))
    organizational_unit = try(each.value.ou, try(var.default_client_subject.ou, var.default_subject.ou))
  }
}

Regardless of whether generating a server or client TLS certificate, both need to go through the ‘certificate request’ process, which is to;

  1. Generate a private key for the server or client
  2. Generate a certificate signing request based on the private key
  3. Using the CSR to get a CA-signed certificate

In this example, I made use of the try block to achieve a value priority in the following order;

  1. Resource-level
    • Do I have a value specific to the server or client?
  2. Class-level
    • Do I have a value specific to the target type?
  3. Module-level
    • Do I have a global default?

And each refers to a key / value pair that is identical for clients as it is servers, where the key is the machine name and the value is the subject data. Here is a sample of the *.tfvars.json file that drives this behaviour.

{
  "clients": {
    "mbp": {
      "country": "GB",
      "locality": "GB",
      "org": "ZAI",
      "org_name": "ZAI",
      "province": "GB"
    }
  }
}

In an ideal (and secure) scenario, the private keys should never be transmitted over the wire, instead, you generate a CSR and transmit that. Since this is aimed for test environments, security is not a concern for me. Should I want to do the generation securely, I’ve exposed the following variable as a way to override the CSR generation.

variable "client_csrs" {
  type = map
  description = "csrs to use instead of generating them within this module"
  default = {}
}

Getting the signed certificate

resource "tls_locally_signed_cert" "client" {
  for_each = var.clients
  cert_request_pem = tls_cert_request.csr_client[each.key].cert_request_pem
  ca_private_key_pem = tls_private_key.pem_ca.private_key_pem
  ca_cert_pem = tls_self_signed_cert.ca.cert_pem

  validity_period_hours = var.client_certificate_validity

  allowed_uses = [
    "digital_signature",
    "key_encipherment",
    "server_auth", # for server-side
    "client_auth", # for client-side
  ]
}

Once the *.csr is generated (or provided), I’m able to use the tls_locally_signed_cert resource type to connect that data with the CA Certificate for signing against the private key of the CA Certificate. The cert_request_pem, ca_private_key_pem and ca_cert_pem inputs allow me to do so using the raw PEM format, without needing to save to disk before passing the data in.

Relying on the data within the terraform state file allows me to also rule out any “external influence” when troubleshooting, as there will be only a single source of truth.

Adding either server_auth or client_auth (depending on use-case) to allowed_uses permits the use of the signed certificate for authentication, as required by OpenVPN.

Converting from *.PEM to PCKS12

resource "pkcs12_from_pem" "client" {
  for_each = var.clients
  ca_pem          = tls_self_signed_cert.ca.cert_pem
  cert_pem        = tls_locally_signed_cert.client[each.key].cert_pem
  private_key_pem = tls_private_key.pem_client[each.key].private_key_pem
  password = "123" # Testing purposes
  encoding = "legacyRC2"
}

Using the pkcs12_from_pem resource type from chilicat makes this process simple, as long as I have access to the private key in addition to the certificate and ca.

For compatibility with the OpenVPN Connect application, I needed to enforce the encoding of legacyRC2, rather than the modern encryption that’s offered by easy-rsa.

Returning the certificates

output "client_certificates" {
  value = [ for cert in tls_locally_signed_cert.client : cert.cert_pem ]
  description = "generated client certificates in ordered list form"
  sensitive = true
}

Finally, I return the generated certificates and their *.p12 equivalent from the module. I mark this data as sensitive due to the inclusion of private keys.

For the value, I needed to iterate over a list of resources (as I had used the foreach input earlier to handle a key/value pair) and re-build a single list with the result.

As mentioned above, it is then the responsibility of the user to determine what to do with the generated certificates, be it storing them locally or pushing them to AWS.

]]>
Authenticating DigitalOcean for Terraform OSS https://zai.dev/2023/12/05/authenticating-digitalocean-for-terraform-oss/ Tue, 05 Dec 2023 19:21:25 +0000 https://zai.dev/?p=542 Terraform DigitalOcean Provider with API tokens from DigitalOcean]]> Scenario

Why?

I’m diving into Terraform as part of my adventure into the DevOps world, which I’ve adopted an interest in the past few months.

  • I use 2 workstations with DigitalOcean
    • MacBook; for when I’m out and about
    • ArchLinux; for when I’m at home

Generating the API Tokens

Under API, located within the dashboards’ menu (on the left-hand side), I’m presented with the option to Generate New Token.

Followed by an interface to define;

  • Name
    • I typically name this token as zai.dev or personal, as this token will be shared across my devices. While this approach isn’t the most secure (Ideally, I should have one token per machine), I’m going for the matter of convenience of having one token for my user profile.
  • Expiry date
    • Since I’m sharing the token across workstations (including my laptop, which may be prone to theft), I set the expiration to the lowest possible value of 30 days.
  • Write permissions
    • Since I’ll be using Terraform, and it’s main purpose is to ‘sculpt’ infrastructure, I require the token that it’ll use to connect to DigitalOcean to have write permissions.

Authenticating DigitalOcean Spaces

As the Terraform Provider allows the creation of Spaces, DigitalOceans’ equivalent to AWS’ S3-bucket, I should also create tokens for it. By navigating to the “Spaces Keys” tab under the APIs option, I can repeat the same steps as above

Installing the Tokens

Continuing from the setup of environment variables in my Synchronizing environment variables across Workstations post, I need to add 3 environment variables for connecting to DigitalOcean.

  • DIGITALOCEAN_TOKEN
    • This is the value that is given to you after hitting “Generate Token” on the Tokens tab
  • SPACES_ACCESS_KEY_ID
    • This is the value that is given to you after hitting “Generate Token” on the Spaces Tokens tab
  • SPACES_SECRET_ACCESS_KEY
    • This is the one-time value that is given to you alongside the SPACES_ACCESS_KEY_ID value

Whilst I’m at it, I’m going to add the following environment variables so that I can use any S3-compliant tools to communicate with my object storage, such as the s3 copy command to push build artifacts

  • AWS_ACCESS_KEY_ID=${SPACES_ACCESS_KEY_ID}
  • AWS_SECRET_ACCESS_KEY=${SPACES_SECRET_ACCESS_KEY}

To keep things tidy, I created a separate environment file for digital ocean, under ~/.config/zai/env/digitalocean.sh

export DIGITALOCEAN_TOKEN="<DO_TOKEN>"
export SPACES_ACCESS_KEY_ID="<SPACES_KEY>"
export SPACES_SECRET_ACCESS_KEY="<SPACES_SECRET>"
export AWS_ACCESS_KEY_ID=${SPACES_ACCESS_KEY_ID}
export AWS_SECRET_ACCESS_KEY=${SPACES_SECRET_ACCESS_KEY}
]]>