Mitigating SPOF: Migrating and Replicating Container Images

Published: Mar 8, 2022 by Isaac Johnson

Recently my cluster took another nose dive. This really panicked me as I cannot blog (easily) without it - my private Github runners are hosted in my cluster and moreover the container registry (harbor) is as well. Harbor is, to put mildly, finicky. It’s a bit delicate and tends to fall down hard more often than I would like.

I do not fault Harbor, however. I beat the hell out of my cluster and it’s made out of commodity hardware (literally discarded laptops):

/content/images/2022/03/mycluster.png

To mitigate this risk, and I think it’s a good strategy for anyone who depends on private hosting of a container registry, I decided it was high time to set up some basic replication to another cloud provider.

Current Setup

Presently, we have a Github Action set to PRs to build a new Container image if we change a Dockerfile.

/content/images/2022/03/crbackup-41.png

Looking at the actual Github Action flow you might notice we also have added Datadog events for tracking and notifications.

name: GitHub Actions Demo
on:
  push:
    paths:
    - "**/Dockerfile"
jobs:
  HostedActions:
    runs-on: ubuntu-latest
    steps:
      - run: echo "🎉 The job was automatically triggered by a $ event."
      - run: echo "🐧 This job is now running on a $ server hosted by GitHub!"
      - run: echo "🔎 The name of your branch is $ and your repository is $."
      - name: Check out repository code
        uses: actions/checkout@v2
      - run: echo "💡 The $ repository has been cloned to the runner."
      - run: echo "🖥️ The workflow is now ready to test your code on the runner."
      - name: Check on Binaries
        run: |
          which az
          which aws
      - name: Build Dockerfile
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          docker build -t $BUILDIMGTAG ghRunnerImage
          docker images
      - name: Tag and Push
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          export FINALBUILDTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^#//g'`"
          docker tag $BUILDIMGTAG $FINALBUILDTAG
          docker images
          echo $CR_PAT | docker login harbor.freshbrewed.science -u $CR_USER --password-stdin
          docker push $FINALBUILDTAG
        env: # Or as an environment variable
          CR_PAT: ${{ secrets.CR_PAT }}
          CR_USER: ${{ secrets.CR_USER }}
      - run: echo "🍏 This job's status is $."
      - name: Build count
        uses: masci/datadog@v1
        with:
          api-key: $DATADOG_API_KEY
          metrics: |
            - type: "count"
              name: "dockerbuild.runs.count"
              value: 1.0
              host: ${{ github.repository_owner }}
              tags:
                - "project:${{ github.repository }}"
                - "branch:${{ github.head_ref }}"
  run-if-failed:
    runs-on: ubuntu-latest
    needs: HostedActions
    if: always() && (needs.HostedActions.result == 'failure')
    steps:
      - name: Datadog-Fail
        uses: masci/datadog@v1
        with:
          api-key: ${{ secrets.DATADOG_API_KEY }}
          events: |
            - title: "Failed building Dockerfile"
              text: "Branch ${{ github.head_ref }} failed to build"
              alert_type: "error"
              host: ${{ github.repository_owner }}
              tags:
                - "project:${{ github.repository }}"
      - name: Fail count
        uses: masci/datadog@v1
        with:
          api-key: ${{ secrets.DATADOG_API_KEY }}
          metrics: |
            - type: "count"
              name: "dockerbuild.fails.count"
              value: 1.0
              host: ${{ github.repository_owner }}
              tags:
                - "project:${{ github.repository }}"
                - "branch:${{ github.head_ref }}"

This works fine in so far as it builds the new image and pushes to my private Harbor.

Quick Note on DD Events

By tracking our Events in Datadog, I can see both Azure DevOps stats comingled with Github Actions. Looking back 12 months, for instance, we can see when I transitioned from AzDO over to Github:

/content/images/2022/03/crbackup-46.png

Using Datadog CI tracking, we can track our Github Action workflows with more detail than the raw events:

/content/images/2022/03/crbackup-47.png

While CI is a feature add-on, we can basically dig into our builds like they are traces which for large enterprises that have to content with very long CI processes, can be quite useful

/content/images/2022/03/crbackup-48.png

SPOF issues

Back to our topic at hand. The issue, of course, lies when the cluster is down. I have a SPOF (Single Point of Failure). This needs mitigation.

To start with, let’s see if we can push some images to Clouds that offer low-cost or free Container Registries.

Create IBM CR

IBM Cloud may not be the biggest, but one thing it does offer is a very nice free tier Container Registry.

Login to IBM Cloud and go to Container Registry under Kubernetes:

/content/images/2022/03/crbackup-01.png

Here we see we presently do not have any repositories synced. To sync some, we’ll need an API key. We covered the steps in our last guide but let’s do it one more time.

Install the CLI

builder@DESKTOP-QADGF36:~$ which ibmcloud
builder@DESKTOP-QADGF36:~$ curl -fsSL https://clis.cloud.ibm.com/install/linux | sh
Current platform is linux64. Downloading corresponding IBM Cloud CLI...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.0M  100 12.0M    0     0  25.9M      0 --:--:-- --:--:-- --:--:-- 25.9M
Download complete. Executing installer...
Bluemix_CLI/
Bluemix_CLI/bin/
Bluemix_CLI/bin/ibmcloud
Bluemix_CLI/bin/ibmcloud.sig
Bluemix_CLI/bin/NOTICE
Bluemix_CLI/bin/LICENSE
Bluemix_CLI/bin/CF_CLI_Notices.txt
Bluemix_CLI/bin/CF_CLI_SLC_Notices.txt
Bluemix_CLI/autocomplete/
Bluemix_CLI/autocomplete/bash_autocomplete
Bluemix_CLI/autocomplete/zsh_autocomplete
Bluemix_CLI/install
Bluemix_CLI/uninstall
Bluemix_CLI/install_bluemix_cli
Superuser privileges are required to run this script.
[sudo] password for builder:
Install complete.
builder@DESKTOP-QADGF36:~$ which ibmcloud
/usr/local/bin/ibmcloud

Install the CR Plugin:

$ ibmcloud plugin install container-registry -r 'IBM Cloud'
Looking up 'container-registry' from repository 'IBM Cloud'...
Plug-in 'container-registry 0.1.560' found in repository 'IBM Cloud'
Attempting to download the binary file...
 10.41 MiB / 10.41 MiB [==========================================================================================================================================================================] 100.00% 0s
10915840 bytes downloaded
Installing binary...
OK
Plug-in 'container-registry 0.1.560' was successfully installed into /home/builder/.bluemix/plugins/container-registry. Use 'ibmcloud plugin show container-registry' to show its details.

Next, login:

$ ibmcloud login -a https://cloud.ibm.com
API endpoint: https://cloud.ibm.com

Email> isaac.johnson@gmail.com

Password>
Authenticating...
OK

Targeted account Isaac Johnson's Account (2189de6cf2374fcc8d59b2f019178a0b)


Select a region (or press enter to skip):
1. au-syd
2. in-che
3. jp-osa
4. jp-tok
5. kr-seo
6. eu-de
7. eu-gb
8. ca-tor
9. us-south
10. us-east
11. br-sao
Enter a number> 9
Targeted region us-south


API endpoint:      https://cloud.ibm.com
Region:            us-south
User:              isaac.johnson@gmail.com
Account:           Isaac Johnson's Account (2189de6cf2374fcc8d59b2f019178a0b)
Resource group:    No resource group targeted, use 'ibmcloud target -g RESOURCE_GROUP'
CF API endpoint:
Org:
Space:

Ensure our Region is US-South

$ ibmcloud cr region-set us-south
The region is set to 'us-south', the registry is 'us.icr.io'.

OK

Next we want to add a namespace (if not already there)

$ ibmcloud cr namespace-add freshbrewedcr
No resource group is targeted. Therefore, the default resource group for the account ('Default') is targeted.

Adding namespace 'freshbrewedcr' in resource group 'Default' for account Isaac Johnson's Account in registry us.icr.io...

The requested namespace is already owned by your account.

OK

Even though I am aware I created a “MyKey”, i’ll do it again (to rotate creds)

$ ibmcloud iam api-key-create MyKey -d "this is my API key" --file ibmAPIKeyFile
Creating API key MyKey under 2189de6cf2374fcc8d59b2f019178a0b as isaac.johnson@gmail.com...
OK
API key MyKey was created
Successfully save API key information to ibmAPIKeyFile

We can see the values (albeit obscuficated for the blog)

$ cat ibmAPIKeyFile | sed 's/[0-9]/0/g' | sed 's/apikey": ".*/apikey": "----masked-----","/g'
{
        "id": "ApiKey-d0000aad-0f00-0000-b0ef-0000f00e0000",
        "crn": "crn:v0:bluemix:public:iam-identity::a/0000de0cf0000fcc0d00b0f000000a0bF::apikey:ApiKey-d0000aad-0f00-0000-b0ef-0000f00e0000",
        "iam_id": "IBMid-000000Q0G0",
        "account_id": "0000de0cf0000fcc0d00b0f000000a0b",
        "name": "MyKey",
        "description": "this is my API key",
        "apikey": "----masked-----","
        "locked": false,
        "entity_tag": "0-e0e00a00000b0cb00000d000000a0e00",
        "created_at": "0000-00-00T00:00+0000",
        "created_by": "IBMid-000000Q0G0",
        "modified_at": "0000-00-00T00:00+0000"
}

Next I want to create a Principal - an IBM Service ID:

$ ibmcloud iam service-id-create newk3s-default-id --description "New Service ID for IBM Cloud Container Registry in Kubernetes Cluster k3s namespace default"
Creating service ID newk3s-default-id bound to current account as isaac.johnson@gmail.com...
OK
Service ID newk3s-default-id is created successfully

ID            ServiceId-asdfasdfasdf-asdfasdf-asdf-asdf-asdfasfasdfasdf
Name          newk3s-default-id
Description   New Service ID for IBM Cloud Container Registry in Kubernetes Cluster k3s namespace default
CRN           crn:v1:bluemix:public:iam-identity::a/asdfasdfasdfasdfasdfasdf::serviceid:ServiceId-asdfasfd-0a60-asfd-asdf-asdfasdfasdf
Version       1-asdfasdfasdfasdfasdfasdfas
Locked        false

We’ll bind that ID to a Manager role via an IAM Policy

$ ibmcloud iam service-policy-create newk3s-default-id --roles Manager
Creating policy under current account for service ID newk3s-default-id as isaac.johnson@gmail.com...
OK
Service policy is successfully created


Policy ID:   a8fd6b2b-0df2-4a0b-b668-313bd1111ade
Version:     1-a626e911650f4ab06b666d42d496305c
Roles:       Manager
Resources:
             Service Type   All resources in account

Lastly, get an API Key so we can use it

$ ibmcloud iam service-api-key-create newk3s-default-key newk3s-default-id --description 'API key for service ID newk3s-default-id is Kubernetes cluster localk3s namespace default'
Creating API key newk3s-default-key of service ID newk3s-default-id under account 2189de6cf2374fcc8d59b2f019178a0b as isaac.johnson@gmail.com...
OK
Service ID API key newk3s-default-key is created

Please preserve the API key! It cannot be retrieved after it's created.

ID            ApiKey-asdfasdf-asdf-asdf-asdf-985asdfasdfb2f5e9015
Name          newk3s-default-key
Description   API key for service ID newk3s-default-id is Kubernetes cluster localk3s namespace default
Created At    2022-02-23T13:56+0000
API Key       ******************MYNEWAPIKEY******************
Locked        false

Harbor Replication to IBM Cloud

First, we need to create a new Harbor Registry Endpoint

/content/images/2022/03/crbackup-02.png

For the endpoint URL use the IBM CR endpoint we saw above when we ensured our region was US South (us.icr.io).

The username will be iamapikey and the “Access Secret” should match the API Key we just created (******MYNEWAPIKEY******).

/content/images/2022/03/crbackup-03.png

You can click “Test Connection” (as I did above) to see that the login works. Then save to add to endpoints

/content/images/2022/03/crbackup-04.png

Under Administration/Replications, click “+ New Replication Rule” to create a new rule

We will use our CR and the namespace we created earlier (fresbrewedcr). In this case, I’m setting the Trigger mode to Manual (instead of Scheduled or Event Based).

/content/images/2022/03/crbackup-05.png

Next, let us replicate our images

/content/images/2022/03/crbackup-06.png

It will prompt us to confirm, then start an execution

/content/images/2022/03/crbackup-07.png

However, no format seemed to work.

/content/images/2022/03/crbackup-08.png

I just got 404 errors

/content/images/2022/03/crbackup-09.png

I even verified I was within space boundaries in settings:

/content/images/2022/03/crbackup-10.png

I should point out that IBM CR paid-tier pricing is pretty inexpensive:

/content/images/2022/03/crbackup-11.png

Replication to Ali Cloud

I tried the same with AliCloud

First, I created a valid endpoint

/content/images/2022/03/crbackup-12.png

Then a replication rule (also tried “**” and only images)

/content/images/2022/03/crbackup-13.png

And that too failed

/content/images/2022/03/crbackup-14.png

Replication to ACR

Not willing to give up just yet, I tried one more time, this time to Azure.

First, I created a new Container Registry from “Create a Resource”

/content/images/2022/03/crbackup-49.png

Fill in a name, location, resource group and sku

/content/images/2022/03/crbackup-50.png

Then click create

/content/images/2022/03/crbackup-51.png

When complete

/content/images/2022/03/crbackup-52.png

We can go to the CR and under Settings/Access keys, enable the admin user

/content/images/2022/03/crbackup-53.png

When enabled, we will see two passwords and the username (either password works).

/content/images/2022/03/crbackup-54.png

Logging into Harbor, I’ll add a new Registry Endpoint.

In my first attempt, I used “Azure ACR” for the type (seems redundant to say Azure Azure Container Registry)

/content/images/2022/03/crbackup-55.png

Then I created a replication rule

/content/images/2022/03/crbackup-56.png

And invoked it

/content/images/2022/03/crbackup-57.png

The problem was, it did not copy to a destination. For whatever reason it copied from the same source to target. The target being my harbor registry. It has to be some sort of odd bug.

I went back and created a new Endpoint, this time using a standard “Docker Registry” provider

/content/images/2022/03/crbackup-58.png

I just changed the destination in the replication rule to use “AzureCR2” and upon invoking maually, it worked just fine.

Checking ACR, i could see the sample repo was successfully synced over

/content/images/2022/03/crbackup-59.png

going back to the replication rule, I wanted to set it up to now just sync the ghrunner image that I’ve been building in the github actions workflow.

Besides the name (highlighted below), I also set the Trigger Mode to “Event Based” which triggers on a push (or delete).

/content/images/2022/03/crbackup-60.png

Upon pushing the image in the Github Actions build, I saw it start a sync to ACR in the Excutions pane

/content/images/2022/03/crbackup-61.png

Checking the completions, I could see more details

/content/images/2022/03/crbackup-62.png

And it completed in minutes (instead of seconds before)

/content/images/2022/03/crbackup-63.png

Checking ACR, I can see the new image created

/content/images/2022/03/crbackup-64.png

Including dates and layers if I check the maniffest for the Container Repository

/content/images/2022/03/crbackup-65.png

It will take some days to gather costs, but at $0.167/day for the ACR Basic Sku, provided I stay in my 10Gb alotment, I might get a bill for $5/month or so. The standard SKU will run you $20/mo in Central US (for 100Gb) and just over $50/mo for Premium (with 500Gb and a few more bells and whistles).

Manual Replication to IBM and Ali Cloud

If we cannot get Harbor to do it, we can do it manually

/content/images/2022/03/crbackup-42.png

First, let’s, just for sanity, force our latest images over, even if done by hand

First, I’ll do IBM Cloud

$ ibmcloud cr login
Logging in to 'us.icr.io'...
Logged in to 'us.icr.io'.

OK

Then I’ll login to AliCloud

$ docker login --username=isaac.johnson@gmail.com registry-intl.cn-hangzhou.aliyuncs.com
Password:
Login Succeeded

Logging in with your password grants your terminal complete access to your account.
For better security, log in with a limited-privilege personal access token. Learn more at https://docs.docker.com/go/access-tokens/

And of course, to access private CR images, I have to login to harbor as well

$ docker login --username=myusername harbor.freshbrewed.science
Password:
Login Succeeded

Logging in with your password grants your terminal complete access to your account.
For better security, log in with a limited-privilege personal access token. Learn more at https://docs.docker.com/go/access-tokens/

In Harbor, I’ll get the latest image from each Repository. The pull command can be simply copied to the clipboard

/content/images/2022/03/crbackup-15.png

For each latest tag we:

Pull an image

$ docker pull harbor.freshbrewed.science/freshbrewedprivate/hello-argo@sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1
harbor.freshbrewed.science/freshbrewedprivate/hello-argo@sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1: Pulling from freshbrewedprivate/hello-argo
59bf1c3509f3: Pull complete
Digest: sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1
Status: Downloaded newer image for harbor.freshbrewed.science/freshbrewedprivate/hello-argo@sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1
harbor.freshbrewed.science/freshbrewedprivate/hello-argo@sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1

Then we can tag and push to our namespace in AliCloud:

$ docker tag harbor.freshbrewed.science/freshbrewedprivate/hello-argo@sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1 registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/hello-argo:latest

$ docker push registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/hello-argo:latest
The push refers to repository [registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/hello-argo]
8d3ac3489996: Pushing [=======================>                           ]  2.604MB/5.586MB

# later 

$ docker push registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/hello-argo:latest
The push refers to repository [registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/hello-argo]
8d3ac3489996: Pushed
latest: digest: sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1 size: 528

IBM Cloud as well

$ docker tag harbor.freshbrewed.science/freshbrewedprivate/hello-argo@sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1 us.icr.io/freshbrewedcr/hello-argo:latest

$ docker push us.icr.io/freshbrewedcr/hello-argo:latest
The push refers to repository [us.icr.io/freshbrewedcr/hello-argo]
8d3ac3489996: Pushed
latest: digest: sha256:d524aa802151c2db01ecef2a58526b6759e1b32a79e3c84d0574aadb464338f1 size: 528

That was a small image; doing the larger (and more important one), my GH Runner. This has a fixed tag I use so I should ensure i push with that. This image alone would eclipse my free storage in IBM CR, so I will just push to China.

/content/images/2022/03/crbackup-16.png

(9:53)

$ docker pull harbor.freshbrewed.science/freshbrewedprivate/myghrunner@sha256:6d350a042d79213291ec25ec6ddf26161ba9766523776669035b630e6cfccf6a
harbor.freshbrewed.science/freshbrewedprivate/myghrunner@sha256:6d350a042d79213291ec25ec6ddf26161ba9766523776669035b630e6cfccf6a: Pulling from freshbrewedprivate/myghrunner
7b1a6ab2e44d: Already exists
55dd4dfd1d49: Pull complete
f6b94392ad06: Pull complete
65229a11bf89: Pull complete
8bb28339f4b9: Pull complete
636988ecc88a: Pull complete
96ef816f01dc: Pull complete
1390e4b4b99f: Pull complete
1797fb937420: Pull complete
2187f1c13de7: Pull complete
a48278b9a27e: Pull complete
0c663509afae: Pull complete
95d63a2e2f97: Pull complete
16af35381f27: Pull complete
f1c9dde1cd07: Pull complete
90694ba1c517: Pull complete
4a1b598016fd: Pull complete
Digest: sha256:6d350a042d79213291ec25ec6ddf26161ba9766523776669035b630e6cfccf6a
Status: Downloaded newer image for harbor.freshbrewed.science/freshbrewedprivate/myghrunner@sha256:6d350a042d79213291ec25ec6ddf26161ba9766523776669035b630e6cfccf6a
harbor.freshbrewed.science/freshbrewedprivate/myghrunner@sha256:6d350a042d79213291ec25ec6ddf26161ba9766523776669035b630e6cfccf6a
builder@DESKTOP-QADGF36:~$ docker tag harbor.freshbrewed.science/freshbrewedprivate/myghrunner@sha256:6d350a042d79213291ec25ec6ddf26161ba9766523776669035b630e6cfccf6a registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner:1.1.4
builder@DESKTOP-QADGF36:~$ docker push registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner:1.1.4
The push refers to repository [registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner]
7b04bab33dc8: Pushed
2765bcf5a4cf: Pushing [>                                                  ]  1.617MB/212.1MB
8f353eccf0c6: Pushed
0eb265869108: Pushed
d0ce75e79200: Pushed
ad981eb5b164: Pushing [>                                                  ]  3.849MB/1.227GB
c125104d22bb: Pushing [==================================================>]  3.584kB
faab2751ac3f: Waiting
f1b9ccc8c9bc: Waiting
1addd6a3aaed: Waiting
0152563ce9e4: Waiting
cde6f8325730: Waiting
29a611cadaf9: Waiting
6d22ac30d31a: Preparing
b79a45cdc0c0: Waiting
c4b3d29e5837: Waiting
9f54eef41275: Waiting

This took well over 30m. However, in the end it was saved out to a Hangzhou based CR

/content/images/2022/03/crbackup-17.png

I can ensure that this repo is moved to private by clicking edit and changing from public to private

/content/images/2022/03/crbackup-18.png

Disaster Recovery (DR)

So how might we use that image?

We can see the Image used is in the RunnerDeployment object in k8s:

$ kubectl get runnerdeployment my-jekyllrunner-deployment -o yaml | tail -n 14 | head -n4
      image: harbor.freshbrewed.science/freshbrewedprivate/myghrunner:1.1.3
      imagePullPolicy: IfNotPresent
      imagePullSecrets:
      - name: myharborreg

The key thing here is I’ll want to create an image pull secret then use the Ali Image

$ kubectl create secret docker-registry alicloud --docker-server=registry-intl.cn-hangzhou.aliyuncs.com --docker-username=isaac.johnson@gmail.com --docker-password=myalipassword --docker-email=isaac.johnson@gmail.com
secret/alicloud created

Now i can change my DeploymentRunner to use AliCloud:

builder@DESKTOP-QADGF36:~/Workspaces/jekyll-blog$ kubectl get runnerdeployment my-jekyllrunner-deployment -o yaml > my-jekyllrunner-deployment.yaml
builder@DESKTOP-QADGF36:~/Workspaces/jekyll-blog$ kubectl get runnerdeployment my-jekyllrunner-deployment -o yaml > my-jekyllrunner-deployment.yaml.bak
builder@DESKTOP-QADGF36:~/Workspaces/jekyll-blog$ vi my-jekyllrunner-deployment.yaml
builder@DESKTOP-QADGF36:~/Workspaces/jekyll-blog$ diff my-jekyllrunner-deployment.yaml.bak my-jekyllrunner-deployment.yaml
39c39
<       image: harbor.freshbrewed.science/freshbrewedprivate/myghrunner:1.1.3
---
>       image: registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner:1.1.4
42c42
<       - name: myharborreg
---
>       - name: alicloud

If there is any question on tags, in AliCloud, one looks in the Tags section at the bottom to see the tag names

/content/images/2022/03/crbackup-19.png

Now just apply

$ kubectl apply -f my-jekyllrunner-deployment.yaml
runnerdeployment.actions.summerwind.dev/my-jekyllrunner-deployment configured

And now we can see it pulling the image:

$ kubectl get pods | tail -n 5
my-jekyllrunner-deployment-phkdv-hs97z                   2/2     Running             0                3m20s
my-jekyllrunner-deployment-phkdv-96v49                   2/2     Running             0                3m20s
my-jekyllrunner-deployment-z25qh-2xcrb                   0/2     ContainerCreating   0                51s
my-jekyllrunner-deployment-z25qh-twjm6                   0/2     ContainerCreating   0                51s
my-jekyllrunner-deployment-z25qh-qd8sq                   0/2     ContainerCreating   0                51s

$ kubectl describe pod my-jekyllrunner-deployment-z25qh-2xcrb | tail -n5
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  45s   default-scheduler  Successfully assigned default/my-jekyllrunner-deployment-z25qh-2xcrb to isaac-macbookpro
  Normal  Pulling    43s   kubelet            Pulling image "registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner:1.1.4"

And a while later, we see the pods do come up:

$ kubectl get pods | tail -n 5
my-jekyllrunner-deployment-z25qh-2xcrb                   2/2     Running             1                 15h
my-jekyllrunner-deployment-z25qh-qd8sq                   2/2     Running             1                 15h
my-jekyllrunner-deployment-z25qh-twjm6                   2/2     Running             1                 15h
harbor-registry-harbor-database-0                        1/1     Running             5 (15h ago)       2d14h
my-runner-deployment-njn2k-szd5p                         2/2     Running             0                 12h

And verifying it is using the new image

$ kubectl describe pod my-jekyllrunner-deployment-z25qh-2xcrb | grep Image:
    Image:          registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner:1.1.4
    Image:          docker:dind

Saving to File storage

There is another way to save out files for transport and archive - that is to dump them to disk.

/content/images/2022/03/crbackup-43.png

For instance, if I have an image I pulled, I can easily tgz it and save to external file storage.

For example, first I’ll list the ghrunner images (and if need be, do a “docker pull <image>”)

$ docker images | grep ghrunner
registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner   1.1.4                                                   2d1863525ab8   2 months ago    2.44GB
harbor.freshbrewed.science/freshbrewedprivate/myghrunner        <none>                                                  2d1863525ab8   2 months ago    2.44GB
harbor.freshbrewed.science/freshbrewedprivate/myghrunner        1.1.3                                                   585b0d6e69a0   2 months ago    2.47GB
harbor.freshbrewed.science/freshbrewedprivate/myghrunner        1.1.1                                                   670715c8c9a2   2 months ago    2.26GB
harbor.freshbrewed.science/freshbrewedprivate/myghrunner        1.1.2                                                   670715c8c9a2   2 months ago    2.26GB
harbor.freshbrewed.science/freshbrewedprivate/myghrunner        1.1.0                                                   16fb58a29b01   2 months ago    2.21GB

Using docker save we can dump a local container image to a tarball. To save space, we’ll gzip it enroute.

$ docker save registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner:1.1.4 | gzip > /mnt/c/Users/isaac/Documents/myghrunner.docker.1.1.4.tgz

$ ls -lh /mnt/c/Users/isaac/Documents/myghrunner.docker.1.1.4.tgz
-rwxrwxrwx 1 builder builder 593M Feb 24 07:46 /mnt/c/Users/isaac/Documents/myghrunner.docker.1.1.4.tgz

Then I can just save it to the storage of my choice, be it OneDrive, Box, Google Drive, etc.

/content/images/2022/03/crbackup-20.png

When it comes time to load it back, we can just bring the tgz down then load and push to a new CR:

$ docker tag registry-intl.cn-hangzhou.aliyuncs.com/freshbrewed/myghrunner:1.1.4 mynewharbor.freshbrewed.science/freshbrewedprivate/ghrunner:1.1.4 && docker push mynewharbor.freshbrewed.science/freshbrewedprivate/ghrunner:1.1.4

For some size context: My Docker install reports the full ghrunner image as 2.44Gb. The tar gzipped is 592Mb and the Tarball uncompressed is 2.35Gb.

/content/images/2022/03/crbackup-21.png

Why file storage might be prefered

If we are to assume that these images are only needed in an emergency situation, it makes far more sense to just back them up to a file store.

For one, consider costs. In Azure, as mentioned ealier on the SKUs of basic, standard and premium, we get 10, 100 and 500Gb included storage for $5/mo, $20/mo and $50/mo respectively. However, if we used standard GPv2 Hot LRS Block Blob storage in Central US, we would pay US$2.08/mo today for 100Gb of storage, $20.80 for 1Tb, $41.60 for 2Tb. Using storage lifecycle policies, one could really pack on the savings moving to archive storage.

If just saving for rainy days, one can get 10Tb of Arhive storage for US$10/mo as of this writing:

/content/images/2022/03/crbackup-21.png

And if curious, retrieving, say 50Gb of that Archive storage would cost $1 so it’s not aggregious to fetch an image in an emergency situation.

Other ways to store images

One could download and install the OSS versions of Nexus or the Community edition of Artifactory.

I did set up a local Nexus to try:

/content/images/2022/03/crbackup-22.png

However, the underlying issue is that both Nexus and JFrog are large Java apps that have a fair amount of steps to setting up TLS. Beyond which, if i used self-signed certs, I might have challenges getting my local docker client to accept. Docker, nowadays, really really hates using http endpoints. There are supposedly ways to setup the client to use insecure registries, but then you have to deal with Kubernetes using them.

That said, we can setup Nexus OSS and it will have a default Maven releases endpoint for artifacts:

/content/images/2022/03/crbackup-23.png

Since I’m in-network, I can just use the WebUI to upload the TGZ’ed docker image

/content/images/2022/03/crbackup-24.png

after which I see it listed:

/content/images/2022/03/crbackup-25.png

and could download with:

$ wget http://192.168.1.32:8081/repository/maven-releases/dockerbackup/ghrunner/1.1.4/ghrunner-1.1.4-docker.tgz

I can also use curl to upload a file:

$ docker save myrunner:1.1.3 | gzip > /mnt/c/Users/isaac/Documents/myghrunner.docker.1.1.3.tgz
$ curl -u admin:MYPASSWORD --upload-file /mnt/c/Users/isaac/Documents/myghrunner.docker.1.1.3.tgz http://192.168.1.32:8081/repository/maven-releases/dockerbackup/ghrunner/1.1.3/ghrunner-1.1.3.tgz

and we can see it is there now

/content/images/2022/03/crbackup-26.png

Build to NAS

The following is particular to a Synology NAS. I’ve had magnificent luck with mine and highly recommend them.
(I get nothing for saying that, I just really like their products. I’ve had a DS216play with WD Red WD60EFRX drives for 6 years now that has had zero issues. Only downside is Plex transcoding is limited on that model )

Setting up User, Share and SFTP in Synology DiskStation

First, I’ll create a new user in the NAS just for this purpose

/content/images/2022/03/crbackup-27.png

And disable the ability to change password

/content/images/2022/03/crbackup-28.png

In the user add wizard, I’ll skip adding to any shares, but I will enable FTP usage

/content/images/2022/03/crbackup-29.png

next, I’ll create a new shared folder and make sure it is not restricted to only admins (is by default)

/content/images/2022/03/crbackup-30.png

Upon create, I can add my builder users for R/W access

/content/images/2022/03/crbackup-31.png

Lastly, under File Services, enable SFTP (if not already) and click apply

/content/images/2022/03/crbackup-32.png

We can validate our local ghRunner can reach it by exec’ing into the pod and trying to upload a file

$ kubectl exec -it my-jekyllrunner-deployment-z25qh-2xcrb -- /bin/sh
Defaulted container "runner" out of: runner, docker
$ cd /tmp
$ touch asdf
$ sftp builder@sassynassy
builder@sassynassy's password:
Connected to sassynassy.
sftp> cd dockerBackup
sftp> put asdf
Uploading asdf to /dockerBackup/asdf
asdf                                                                                             100%    0     0.0KB/s   00:00
sftp> exit

I’ll need to save the password to GH Secrets so i can use it in the pipeline later

/content/images/2022/03/crbackup-33.png

I also added a block in the Dockerfile to add expect and sshpass

# Install Expect and SSHPass

RUN sudo apt update -y \
  && umask 0002 \
  && sudo apt install -y sshpass expect

which spurned a fresh image build. This, in the end, is what we will modify to backup to a NAS on build

/content/images/2022/03/crbackup-34.png

Using that update image, we can leverage SSH pass to upload the built container:

Assuming my externally exposed SFTP is 73.242.50.55 on port 12345 (it isn’t, fyi)


      - name: SFTP
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          export FILEOUT="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g' | sed 's/:/-/g'`".tgz
          docker save $BUILDIMGTAG | gzip > $FILEOUT
          
          mkdir ~/.ssh || true
          ssh-keyscan -p 12345 73.242.50.55 >> ~/.ssh/known_hosts
          sshpass -e sftp -P 12345 -oBatchMode=no -b - builder@73.242.50.55 << !
            cd dockerBackup
            put $FILEOUT
            bye
          !
        env: # Or as an environment variable
          SSHPASS: $

Here we can see it actually ran and uploaded from the ephimeral provided Github Actions Runner:

/content/images/2022/03/crbackup-35.png

And we can see it is backed up

/content/images/2022/03/crbackup-36.png

And if we wanted to trust the last runner to push the new runner, we could use the local IP and move to a follow-up stage:

name: GitHub Actions Demo
on:
  push:
    paths:
    - "**/Dockerfile"
jobs:
  HostedActions:
    runs-on: ubuntu-latest
    steps:
      - run: echo "🎉 The job was automatically triggered by a $ event."
      - run: echo "🐧 This job is now running on a $ server hosted by GitHub!"
      - run: echo "🔎 The name of your branch is $ and your repository is $."
      - name: Check out repository code
        uses: actions/checkout@v2
      - run: echo "💡 The $ repository has been cloned to the runner."
      - run: echo "🖥️ The workflow is now ready to test your code on the runner."
      - name: Check on Binaries
        run: |
          which az
          which aws
      - name: Build Dockerfile
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          docker build -t $BUILDIMGTAG ghRunnerImage
          docker images
      - name: Tag and Push
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          export FINALBUILDTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^#//g'`"
          docker tag $BUILDIMGTAG $FINALBUILDTAG
          docker images
          echo $CR_PAT | docker login harbor.freshbrewed.science -u $CR_USER --password-stdin
          docker push $FINALBUILDTAG
        env: # Or as an environment variable
          CR_PAT: $
          CR_USER: $
      - name: Build count
        uses: masci/datadog@v1
        with:
          api-key: $
          metrics: |
            - type: "count"
              name: "dockerbuild.runs.count"
              value: 1.0
              host: $
              tags:
                - "project:$"
                - "branch:$"
      - run: echo "🍏 This job's status is $."
  push-to-nas:
    runs-on: self-hosted
    needs: HostedActions
    steps:
      - name: Check out repository code
        uses: actions/checkout@v2
      - name: Pull From Tag
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          export FINALBUILDTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^#//g'`"
          docker images
          echo $CR_PAT | docker login harbor.freshbrewed.science -u $CR_USER --password-stdin
          docker pull $FINALBUILDTAG
        env: # Or as an environment variable
          CR_PAT: $
          CR_USER: $
      - name: SFTP Locally
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          export FILEOUT="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g' | sed 's/:/-/g'`".tgz
          export FINALBUILDTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^#//g'`"

          docker save $FINALBUILDTAG | gzip > $FILEOUT
          
          mkdir ~/.ssh || true
          ssh-keyscan sassynassy >> ~/.ssh/known_hosts
          sshpass -e sftp -oBatchMode=no -b - builder@sassynassy << !
            cd dockerBackup
            put $FILEOUT
            bye
          !
        env: # Or as an environment variable
          SSHPASS: $
  run-if-failed:
    runs-on: ubuntu-latest
    needs: HostedActions
    if: always() && (needs.HostedActions.result == 'failure' || needs.push-to-nas.result == 'failure')
    steps:
      - name: Datadog-Fail
        uses: masci/datadog@v1
        with:
          api-key: $
          events: |
            - title: "Failed building Dockerfile"
              text: "Branch $ failed to build"
              alert_type: "error"
              host: $
              tags:
                - "project:$"
      - name: Fail count
        uses: masci/datadog@v1
        with:
          api-key: $
          metrics: |
            - type: "count"
              name: "dockerbuild.fails.count"
              value: 1.0
              host: $
              tags:
                - "project:$"
                - "branch:$"

The new block being push-to-nas:


  push-to-nas:
    runs-on: self-hosted
    needs: HostedActions
    steps:
      - name: Check out repository code
        uses: actions/checkout@v2
      - name: Pull From Tag
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          export FINALBUILDTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^#//g'`"
          docker images
          echo $CR_PAT | docker login harbor.freshbrewed.science -u $CR_USER --password-stdin
          docker pull $FINALBUILDTAG
        env: # Or as an environment variable
          CR_PAT: $
          CR_USER: $
      - name: SFTP Locally
        run: |
          export BUILDIMGTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g'`"
          export FILEOUT="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^.*\///g' | sed 's/:/-/g'`".tgz
          export FINALBUILDTAG="`cat ghRunnerImage/Dockerfile | tail -n1 | sed 's/^#//g'`"

          docker save $FINALBUILDTAG | gzip > $FILEOUT
          
          mkdir ~/.ssh || true
          ssh-keyscan sassynassy >> ~/.ssh/known_hosts
          sshpass -e sftp -oBatchMode=no -b - builder@sassynassy << !
            cd dockerBackup
            put $FILEOUT
            bye
          !
        env: # Or as an environment variable
          SSHPASS: $

Which we can see run

/content/images/2022/03/crbackup-37.png

This clearly takes long as we have to wait and invoke an on-prem agent to pull the large image and then push to the local NAS.

/content/images/2022/03/crbackup-38.png

18 minutes instead of 11:

/content/images/2022/03/crbackup-39.png

However, this is a lot more secure as we do not need to expose SFTP on the public internet for the Github provided ephemeral agents to use.

And because I now save the images to a local NAS, i can pull them down to load any CR I desire later

/content/images/2022/03/crbackup-40.png

Summary

We covered setting up CRs in IBM Cloud, Ali Cloud and Azure then trying Harbor replication. With ACR we successfully triggered a post build replication of our ghrunner image. For IBM and Ali Cloud, we showed manual replication and how to use them in a DR situation.

We worked through saving container images as files and backing up to a NAS manually. Then we automated the process by building and pushing both the image to a private CR and saving the tgz of the image to a remote FTP site (also on-prem).

Lastly, we finalized the process (albeit with a slight performance hit) by moving the packaging and transmission of the backup image to the same on-prem Github runner (of which we are building a new image).

/content/images/2022/03/crbackup-44.png

Our goal would be to ultimately have a turn-key build that would not only build but also update the runner itself (which is what is running).

In order to really be successful on that, however, we would need to create proper tests with a phased rollout (lest we take down the runner deployment and have to manually back it out to fix).

/content/images/2022/03/crbackup-45.png

One approach would be to have two deployments. The “staging” runners would use the candidate image in a staging RunnerDeployment.

$ kubectl get runnerdeployment my-jekyllrunner-deployment -o yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: staging-deployment
  namespace: default
spec:
  replicas: 1
  selector: null
  template:
    metadata: {}
    spec:
      dockerEnabled: true
      dockerdContainerResources: {}
      env:
      - name: AWS_DEFAULT_REGION
        value: us-east-1
      - name: AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            key: USER_NAME
            name: awsjekyll
      - name: AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            key: PASSWORD
            name: awsjekyll
      - name: DATADOG_API_KEY
        valueFrom:
          secretKeyRef:
            key: DDAPIKEY
            name: ddjekyll
      image: harbor.freshbrewed.science/freshbrewedprivate/myghrunner:$NEWIMAGE
      imagePullPolicy: Always
      imagePullSecrets:
      - name: myharborreg
      labels:
      - staging-deployment
      repository: idjohnson/jekyll-blog
      resources: {}

Then in our validation flow before promotion we would use the custom label in our testing stage

jobs:
  validation:
    runs-on: staging-deployment

I should also point out that the SummerWind style RunnerDeployment does have a field “dockerRegistryMirror” that, provided you use container registry repositories with public endpoints, would allow a fallback endpoint. It is really intended to work around docker.io rate-limiting, but would likely be sufficient for DR on a private registry.


      # Optional Docker registry mirror
      # Docker Hub has an aggressive rate-limit configuration for free plans.
      # To avoid disruptions in your CI/CD pipelines, you might want to setup an external or on-premises Docker registry mirror.
      # More information:
      # - https://docs.docker.com/docker-hub/download-rate-limit/
      # - https://cloud.google.com/container-registry/docs/pulling-cached-images
      dockerRegistryMirror: https://mirror.gcr.io/

But that is a topic for a follow-up blog comming soon that will cover next steps in testing and rollout as well as further cost mitigation strategies.

harbor ibmcloud alicloud synology nas dr github datadog

Have something to add? Feedback? You can use the feedback form

Isaac Johnson

Isaac Johnson

Cloud Solutions Architect

Isaac is a CSA and DevOps engineer who focuses on cloud migrations and devops processes. He also is a dad to three wonderful daughters (hence the references to Princess King sprinkled throughout the blog).

Theme built by C.S. Rhymes