Adding trusted certs to nodes on TKGS 7.0 U2

A new feature added to TKGS as of 7.0 Update 2 is support for adding private SSL certificates to the “trust” on TKG cluster nodes.

This is very important as it finally provides a supported mechanism to use on-premises Harbor and other image registries.

It’s done by adding the encoded CAs to the “TkgServiceConfiguration”. The template for the TkgServiceConfiguration looks like this:

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TkgServiceConfiguration
metadata:
  name: tkg-service-configuration
spec:
  defaultCNI: antrea
  proxy:
    httpProxy: http://<user>:<pwd>@<ip>:<port>

  trust:
    additionalTrustedCAs:
      - name: first-cert-name
        data: base64-encoded string of a PEM encoded public cert 1
      - name: second-cert-name
        data: base64-encoded string of a PEM encoded public cert 2

Notice that there are two new sections under spec; one for proxy and one for trust. This article is going to focus on trust for additional CAs.

If your registry uses a self-signed cert, you’ll just encode that cert itself. If you take advantage on an Enterprise CA or similar to sign your certs, you’d encoded and import the “signing”, “intermediate” and/or “root” CA.

Example

Let’s add the certificate for a standalone Harbor (not the built-in Harbor instance in TKGS, its certificate is already trusted)

Download the certificate by clicking the “Registry Certificate” link

Run base64 -i <ca file> to return the base64 encoded content:

Provide a simple name and copy and paste the encoded cert into the data value:

Apply the TkgServiceConfiguration

After setting up your file. Apply it to the Supervisor cluster:

kubectl apply -f ./TanzuServiceConfiguration.yaml

Notes

  • Existing TKG clusters will not automatically inherit the trust for the certificates
  • Clusters created after the TKGServiceConfiguration is applied will get the certificates
  • You can scale an existing TKG cluster to trigger a rollout with the certificates
  • You can verify the certificates exist by connecting through SSH to the nodes and locating the certs under /etc/ssl/certs:

Retrieving the Admin Password for Harbor Image Registry in Tanzu Kubernetes Grid Service

In TKGS on vSphere 7.0 through (at least) 7.0.1d, a Harbor Image Registry may be enabled for the vSphere Cluster (Under Configure|Namespaces| Image Registry). This feature currently (as of 7.0.1d) requires the Pod Service, which in turn requires NSX-T integration.

As of 7.0.1d, the self-signed certificate created for this instance of Harbor is added to the trust for nodes in TKG clusters, making it easier (possible?) to use images from Harbor.

When you login to harbor as a user, you’ll notice that the menu is very sparse. Only the ‘admin’ account can access the “Administration” menu.

To get logged in as the ‘admin’ account, we’ll need to retrieve the password from a secret for the harbor controller in the Supervisor cluster.

Steps:

  • SSH into the vCenter Server as root, type ‘shell’ to get to bash shell
  • Type ‘/usr/lib/vmware-wcp/decryptK8Pwd.py‘ to return information about the Supervisor Cluster. The results include the IP for the cluster as well as the node root password
  • While still in the SSH session on the vCenter Server, ssh into the Supervisor Custer node by entering ‘ssh root@<IP address from above>’. For the password, enter the PWD value from above.
  • Now, we have a session as root on a supervisor cluster control plane node.
  • Enter ‘kubectl get ns‘ to see a list of namespaces in the supervisor cluster. You’ll see a number of hidden, system namespaces in addition to those corresponding to the vSphere namespaces. Notice there is a namespace named “vmware-system-registry” in addition to one named “vmware-system-registry-#######”. The namespace with the number is where Harbor is installed.
  • Run ‘kubectl get secret -n vmware-system-registry-######‘ to get a list of secrets in the namespace. Locate the secret named “harbor-######-controller-registry”.
  • Run this to return the decoded admin password: kubectl get secret -n vmware-system-registry-###### harbor-######-controller.data.harborAdminPassword}' | base64 -d | base64 -d
  • In the cases I seen so far, the password is about 16 characters long, if it’s longer than that, you may not have decoded it entirely. Note that the value must be decoded twice.
  • Once you’ve saved the password, enter “exit” three times to get out of the ssh sessions.

Notes

  • Don’t manipulate the authentication settings
  • The process above is not supported; VMware GSS will not help you complete these steps
  • Some features may remain disabled (vulnerability scanning for example)
  • As admin, you may configure registries and replication (although it’s probably unsupported with this built-in version of Harbor for now)

Use Helm to deploy Harbor with Annotations for Velero

So, lets say you want to deploy an instance of Harbor to your “services” kubernetes cluster.  The cluster is protected by a scheduled Velero backup Velero pickup all resources in all namespaces by default, but we need to add an annotation to indicate a persistent volume that should be included in the backup.  Without this annotation, Velero will not include the PV in the backup.

First, let’s create a namespace we want to install Harbor to:
kubectl create ns harbor
Then, we’ll make sure helm has the chart for Harbor
helm repo add harbor https://helm.goharbor.io
helm repo update

Finally, we’ll install harbor
helm install harbor harbor/harbor --namespace harbor \
--set expose.type=loadBalancer,expose.tls.enabled=true,expose.tls.commonName=harbor.ragazzilab.com,\
externalURL=harbor.ragazzilab.com,harborAdminPassword=harbor,\
redis.podAnnotations."backup\.velero\.io/backup-volumes"=data,\
registry.podAnnotations."backup\.velero\.io/backup-volumes"=registry-data,\
trivy.podAnnotations."backup\.velero\.io/backup-volumes"=data,\
database.podAnnotations."backup\.velero\.io/backup-volumes"=database-data,\
chartmuseum.podAnnotations."backup\.velero\.io/backup-volumes"=chartmuseum-data,\
jobservice.podAnnotations."backup\.velero\.io/backup-volumes"=job-logs

Notice a few of the configurations we’re passing here:

  • expose.tls.commonName is the value that will be used by the gnerated TLS certificate
  • externalURL is the FQDN that we’ll use to reach Harbor (post deploy, you’ll get the loadBalancer IP and add the DNS record for it)
  • harborAdminPassword is the password assigned by default to the admin account – clearly this should be changed immediately
  • The next items are for the podAnnotations; the syntax was unexpectedly different.  Notice there’s a dot instead of an equals-sign between the key and the value.  Also notice that the dots in the value must be escaped.

Once Harbor is deployed, you can get the loadBalancer’s IP and point your browser at it.

Now, we can wait for the Velero backup job to run or kick off a one-off backup.

Not So Fast...

I noticed that Harbor did not start properly after restore.  This was because postgres in the database pod expects a specific set of permissions – which were apparently different as a result of the restore.  The log on the database pod only read FATAL: data directory “/var/lib/postgresql/data” has group or world access

To return Harbor to functionality post-restore, I had to take the following steps:

  1. Edit the database statefulSet: kubectl edit StatefulSet harbor-harbor-database -n harbor
  2. Replace the command in the “change-permission-of-directory” initContainer from chown -R 999:999 /var/lib/postgresql/data to chmod -R 0700 /var/lib/postgresql/data
  3. Save changes and bounce the database pod by running kubectl delete po -n harbor harbor-harbor-database-0
  4. Bounce the remaining pods that are in CrashLoopBackup (because they’re trying to connect to the database)

 

harborThanks to my friend and colleague Hemanth AVS for help with the podAnnotations syntax!

 

 

Replicating images from DockerHub to Harbor

Harbor LogoI found the documentation for actually replicating images from DockerHub to a local Harbor instance to be missing.  So here’s what I’ve found:

Objective: Replicate the images for the Yelb sample application to local Harbor repo

Set-up and Prereqs

  1. A local Harbor instance – I’ll be using an Enterprise PKS foundation with Harbor 1.10
  2. An account for DockerHub

Steps

      1. Login to Harbor Web GUI as an administrator. Navigate to Administration/Registries
      2. Add Endpoint for local Harbor by clicking ‘New Endpoint’ and entering the following:
        • Provider: harbor
        • Name: local (or FQDN or whatever)
        • Description: optional
        • Endpoint URL: the actual URL for your harbor instance beginning with https and ending with :443
        • Access ID: username for an admin or user that at least has Project Admin permission to the target Projects/namespaces
        • Access Secret: Password for the account above
        • Verify Remote Cert: typically checked
      3. Add Endpoint for Docker Hub by clicking ‘New Endpoint’ and entering the following:
        • Provider: docker-hub
        • Name: dockerhub (or something equally profound)
        • Description: optional
        • Endpoint URL: pre-populated/li>
        • Access ID: username for your account at dockerhub
        • Access Secret: Password for the account above
        • Verify Remote Cert: typically checked

        Notice that this is for general dockerhub, not targeting a particular repo.

      4. Configure Replications for the Yelb Images
        You may create replications for several images at once using a variety of filters, but I’m going to create a replication rule for each image we need. I think this makes it easier to identify a problem, removes the risk of replicating too much and makes administration easier. Click ‘New Replication Rule‘ enter the following to create our first rule:

        • Name: yelb-db-0.5
        • Description: optional
        • Replication Mode: Pull-based (because we’re pulling the image from DockerHub)
        • Source registry: dockerhub
        • Source Registry Filter – Name: mreferre/yelb-db
        • Source Registry Filter – Tag: 0.5
        • Source Registry Filter – Resource: pre-populated
        • Destination Namespace: yelb (or whatever Project you want the images saved to)
        • Trigger Mode: Select ‘Manual’ for a one-time sync or select ‘Scheduled’ if you want to ensure the image is replicated periodically. Note that the schedule format is cron with seconds, so 0 0 23 * * 5 would trigger the replication to run every Friday at 23:00:00. Scheduled replication makes sense when the tag filter is ‘latest’ for example
        • Override: leave checked to overwrite the image if it already exists
        • Enable rule: leave checked to keep the rule enabled
      5. Add the remaining Replication Rules:
        Name Name Filter Tag Filter Dest Namespace
        yelb-ui-latest mreferre/yelb-ui latest yelb
        yelb-appserver-latest mreferre/yelb-appserver latest yelb
        redis-4.0.2 library/redis 4.0.2 yelb

        Note that redis is an official image, so we have to include library/

Adding a private Docker registry to a PKS 1.5 Windows Kubernetes cluster

Pivotal Container Service (PKS) 1.5 and Kubernetes 1.14 bring *beta* support for Workers running Windows.  This means that we can provide the advantages of Kubernetes to a huge array of applications running on Windows.  I see this especially useful for Windows applications that you don’t have the source code for and/or do not want to invest in reworking it for .NET core or languages that run on Linux.

In nearly all cases, you’ll need an image with your applications’ dependencies or configuration and in the real world, we don’t want those in the public space like dockerhub.  Enter Private Docker Repositories.

PKS Enterprise includes VMware Harbor as a private registry, it’s very easy to deploy alongside PKS and provides a lot of important functionality.  The Harbor interface uses TLS/SSL; you may use a self-signed, enterprise PKI-signed or public CA-signed certificate.  If you chose to not use a public CA-signed certificate ($!), the self-signed or PKI-signed certificate must be trusted by the docker engine on each Kubernetes worker node.

Clusters based on Ubuntu Xenial Stemcells:

  • The operator/administrator simply puts the CA certificate into the “Trusted Certificates” box of the Security section in Ops Manager.
  • When BOSH creates the VMs for kubernetes clusters, the trusted certificates are added to the certificate store automatically.
  • If using an enterprise PKI where all of the internal certificates are signed by the Enterprise CA, this method makes it very easy to trust and “un-trust” CAs.

Clusters based on Windows 2019 Stemcells:

This is one of those tasks that is easier to perform on Linux that it is on Windows.  Unfortunately, Windows does not automatically add the Trusted Certificates from Ops Manager to the certificate store, so extra steps are required.

    1. Obtain the Registry CA Certificate.  In Harbor, you may click the “REGISTRY CERTIFICATE” link while in a Project.  Save the certificate to where the BOSH cli is installed (Ops Manager typically).
    2.  Connect BOSH cli to director.  This may be done on the Ops Manager.
    3. List BOSH-managed vms to identify the service_instance deployment corresponding to the targeted K8s cluster by matching the VM IP address to the IP address of the master node as reported by PKS cluster.
    4. Run this command to copy the certificate to the Windows worker
      bosh -e ENV -d DEPLOYMENT scp root.cer WINDOWS-WORKER:/
      • ENV – your environment alias in the BOSH cli
      • DEPLOYMENT – the BOSH deployment that corresponds to the k8s cluster; ex: service-instance_921bd35d-c46d-4e7a-a289-b577ff743e15
      • WINDOWS-WORKER – the instance name of the specific Windows worker VM; ex: windows-worker/277536dd-a7e6-446b-acf7-97770be18144

      This command copies the local file named root.cer to the root folder on the Windows VM

    5. Use BOSH to SSH into the Windows Worker.
      bosh -e ENV -d DEPLOYMENT ssh WINDOWS-WORKER
      • ENV – your environment alias in the BOSH cli
      • DEPLOYMENT – the BOSH deployment that corresponds to the k8s cluster; ex: service-instance_921bd35d-c46d-4e7a-a289-b577ff743e15
      • WINDOWS-WORKER – the instance name of the specific Windows worker VM; ex: windows-worker/277536dd-a7e6-446b-acf7-97770be18144

      SSH into Windows node, notice root.cer on the filesystem
    6. In the Windows SSH session run “powershell.exe” to enter powershell
    7. At the PS prompt, enter
      Import-certificate -filepath .\root.cer -CertStoreLocation 
      Cert:\LocalMachine\Root

      The example above imports the local file “root.cer” into the Trusted Root Certificate Store

    8. Type “exit” twice to exit PS and SSH
    9. Repeat steps 5-8 for each worker node.

Add docker-registry secret to k8s cluster

Whether the k8s cluster is running Windows workers or not, you’ll want to add credentials for authenticating to harbor.  These credentials are stored in a secret. To add the secret, use this command:

kubectl create secret docker-registry harbor \
--docker-server=HARBOR_FQDN \
--docker-username=HARBOR_USER \
--docker-password=USER_PASS \
--docker-email=USER_EMAIL
  • HARBOR_FQDN – FQDN for local/private Harbor registry
  • HARBOR_USER – name of user in Harbor with access to project and repos containing the desired images
  • USER_PASS – username for the above account
  • USER_EMAIL – email adddress for the above account

Note that this secret is namespaced; it needs to be added to the namespace of the deployments that will reference it

More info

Here’s an example deployment yaml for a Windows K8s cluster that uses a local private docker registry.  Note that Windows clusters cannot leverage NSX-T yet, so this example uses a NodePort to expose the service.

Automating PKS Upgrades

Last night, Pivotal announced new versions of PKS and Harbor, so I thought it’s time to simplify the upgrade process. Here is a concourse pipeline that essentially aggregates the upgrade-tile pipeline so that PKS and Harbor are upgraded in one go.

What it does:

  1. Runs on a schedule – you set the time and days it may run
  2. Downloads the latest version of PKS and Harbor from Pivnet- you set the major.minor version range
  3. Uploads the PKS and Harbor releases to your BOSH director
  4. Determines whether the new release is missing a stemcell, downloads it from PivNet and uploads it to BOSH director
  5. Stages the tiles/releases
  6. Applies changes

What you need:

  1. A working Concourse instance that is able to reach the Internet to pull down the binaries and repo
  2. The fly cli and credentials for your Concourse.
  3. A token from your PivNet account
  4. An instance of PKS 1.0.2 or 1.0.3 AND Harbor 1.4.x deployed on Ops Manager
  5. Credentials for your Ops Manager
  6. (optional) A token from your GitHub account

How to use the pipeline:

  1. Download params.yml and pipeline.yml from here.
  2. Edit the params.yml by replacing the values in double-parentheses with the actual value. Each line has a bit explaining what it’s expecting.  For example, ((ops_mgr_host)) becomes opsmgr.pcf1.domain.local
    • Remove the parens
    • If you have a GitHub Token, pop that value in, otherwise remove ((github_token))
    • The current pks_major_minor_version regex will get the latest 1.0.x.  If you want to pin it to a specific version, or when PKS 1.1.x is available, you can make those changes here.
    • The ops_mgr_usr and ops_mgr_pwd credentials are those you use to logon to Ops Manager itself.  Typically set when the Ops Manager OVA is deployed.
    • The schedule params should be adjusted to a convenient time to apply the upgrade.  Remember that in addition to the PKS Service being offline (it’s a singleton) during the upgrade, your Kubernetes clusters may be affected if you have the “Upgrade all Clusters” errand set to run in the PKS configuration, so schedule wisely!

  3. Open your cli and login to concourse with fly

    fly -t concourse login -c http://concourse.domain.local:8080 -u username -p password

  4. Set the new pipeline. Here, I’m naming the pipeline “PKS_Upgrade”. You’ll pass the pipeline.yml with the “-c” param and your edited params.yml with the “-l” param

    fly -t concourse sp -p PKS_Upgrade -c pipeline.yml -l params.yml

    Answer “y” to “Apply Configuration”…

  5. Unpause the pipeline so it can run when in the scheduled window

    fly -t concourse up -p PKS_Upgrade

  6. Login to the Concourse web to see our shiny new pipeline!

    If you don’t want to deal with the schedule and simply want it to upgrade on-demand, use the pipeline-nosched.yml instead of pipeline.yml, just be aware that when you unpause the pipeline, it’ll start doing its thing.  YMMV, but for me, it took about 8 minutes to complete the upgrade.

Behind the scenes
It’s not immediately obvious how the pipeline does what it does. When I first started out, I found it frustrating that there just isn’t much to the pipeline itself. To that end, I tried making pipelines that were entirely self-contained. This was good in that you can read the pipeline and see everything it’s doing; plus it can be made to run in an air-gapped environment. The downside is that there is no separation, one error in any task and you’ll have to edit the whole pipeline file.
As I learned a little more and poked around in what others were doing, it made sense to split the “tasks” out, keep them in a GitHub public repo and pull it down to run on-demand.

Pipelines generally have two main sections; resources and jobs.
Resources are objects that are used by jobs. In this case, the binary installation files, a zip of the GitHub repo and the schedule are resources.
Jobs are (essentially) made up of plans and plans have tasks.
Each task in most pipelines uses another source yml. This task.yml will indicate which image concourse should build a container from and what it should do on that container (typically, run a script). All of these task components are in the GitHub repo, so when the pipeline job runs, it clones the repo and runs the appropriate task script in a container built on an image pulled from dockerhub.

More info
I’ve got a several pipelines in the repo.   Some of them do what they’re supposed to. 🙂 Most of them are derived from others’ work, so many thanks to Pivotal Services and Sabha Parameswaran