Pivotal Cloud Foundry – Brian Ragazzi

PAS with NSX-T Tip: use a fresh IP Block

I’ve fought with this for an embarrassingly long time. Had a failed PAS (Pivotal Application Services) deployment (missed several of the NSX configuration requirements) but removed the cruft and tried again and again and again. In each case, PAS and NCP would deploy, but fail on the PAS smoke_test errand. The error message said more detail is in the log.

Which Log?!

I ssh’d into the clock_global VM and found the smoke_test logs. They stated that the container for instance {whatever} could not be created and an error of NCP04004. This pointed me to the Diego Cells (where the containers would be created) and I poked around in the /var/vcap/sys/log/garden logs there. They stated that the interface for the instance could not be found. Ok, this is sounding more like an NSX problem.

I ended up parsing through the NSX Manager event log and found this gem:

Ah-ha! Yup, I’d apparently allocated a couple of /28 subnets from the IP Block. So when the smoke test tried to allocate a /24, the “fixed” subnet size had already been set to /28, causing the error.

Resolution was to simply remove all of the allocated subnets from the IP block. This could have been avoided by either not reusing an existing IP Block or using the settings in the NCP configuration to create a new IP Block with a given CIDR.

Automating PKS Upgrades

Last night, Pivotal announced new versions of PKS and Harbor, so I thought it’s time to simplify the upgrade process. Here is a concourse pipeline that essentially aggregates the upgrade-tile pipeline so that PKS and Harbor are upgraded in one go.

What it does:

Runs on a schedule – you set the time and days it may run
Downloads the latest version of PKS and Harbor from Pivnet- you set the major.minor version range
Uploads the PKS and Harbor releases to your BOSH director
Determines whether the new release is missing a stemcell, downloads it from PivNet and uploads it to BOSH director
Stages the tiles/releases
Applies changes

What you need:

A working Concourse instance that is able to reach the Internet to pull down the binaries and repo
The fly cli and credentials for your Concourse.
A token from your PivNet account
An instance of PKS 1.0.2 or 1.0.3 AND Harbor 1.4.x deployed on Ops Manager
Credentials for your Ops Manager
(optional) A token from your GitHub account

How to use the pipeline:

Download params.yml and pipeline.yml from here.
Edit the params.yml by replacing the values in double-parentheses with the actual value. Each line has a bit explaining what it’s expecting. For example, ((ops_mgr_host)) becomes opsmgr.pcf1.domain.local
- Remove the parens
- If you have a GitHub Token, pop that value in, otherwise remove ((github_token))
- The current pks_major_minor_version regex will get the latest 1.0.x. If you want to pin it to a specific version, or when PKS 1.1.x is available, you can make those changes here.
- The ops_mgr_usr and ops_mgr_pwd credentials are those you use to logon to Ops Manager itself. Typically set when the Ops Manager OVA is deployed.
- The schedule params should be adjusted to a convenient time to apply the upgrade. Remember that in addition to the PKS Service being offline (it’s a singleton) during the upgrade, your Kubernetes clusters may be affected if you have the “Upgrade all Clusters” errand set to run in the PKS configuration, so schedule wisely!
Open your cli and login to concourse with fly

fly -t concourse login -c http://concourse.domain.local:8080 -u username -p password
Set the new pipeline. Here, I’m naming the pipeline “PKS_Upgrade”. You’ll pass the pipeline.yml with the “-c” param and your edited params.yml with the “-l” param

fly -t concourse sp -p PKS_Upgrade -c pipeline.yml -l params.yml

Answer “y” to “Apply Configuration”…
Unpause the pipeline so it can run when in the scheduled window

fly -t concourse up -p PKS_Upgrade
Login to the Concourse web to see our shiny new pipeline!

If you don’t want to deal with the schedule and simply want it to upgrade on-demand, use the pipeline-nosched.yml instead of pipeline.yml, just be aware that when you unpause the pipeline, it’ll start doing its thing. YMMV, but for me, it took about 8 minutes to complete the upgrade.

Behind the scenes
It’s not immediately obvious how the pipeline does what it does. When I first started out, I found it frustrating that there just isn’t much to the pipeline itself. To that end, I tried making pipelines that were entirely self-contained. This was good in that you can read the pipeline and see everything it’s doing; plus it can be made to run in an air-gapped environment. The downside is that there is no separation, one error in any task and you’ll have to edit the whole pipeline file.
As I learned a little more and poked around in what others were doing, it made sense to split the “tasks” out, keep them in a GitHub public repo and pull it down to run on-demand.

Pipelines generally have two main sections; resources and jobs.
Resources are objects that are used by jobs. In this case, the binary installation files, a zip of the GitHub repo and the schedule are resources.
Jobs are (essentially) made up of plans and plans have tasks.
Each task in most pipelines uses another source yml. This task.yml will indicate which image concourse should build a container from and what it should do on that container (typically, run a script). All of these task components are in the GitHub repo, so when the pipeline job runs, it clones the repo and runs the appropriate task script in a container built on an image pulled from dockerhub.

More info
I’ve got a several pipelines in the repo. Some of them do what they’re supposed to. 🙂 Most of them are derived from others’ work, so many thanks to Pivotal Services and Sabha Parameswaran

PKS and NSX-T: I did everything wrong

I’ve fought with PKS and NSX-T for a month or so now. I’ll admit it: I did everything wrong, several times. One thing for certain, I know how NOT to configure it. So, now that I’ve finally gotten past my configuration issues, it makes sense to share the ~~pain~~ lessons learned.

Set your expectations correctly. PKS is literally a 1.0 product right now. It’s getting a lot of attention and will make fantastic strides very quickly, but for now, it can be cumbersome and confusing. The documentation is still pretty raw. Similarly, NSX-T is very young. The docs are constantly referring you to the REST API instead of the GUI – this is fine of course, but is a turn-off for many. The GUI has many weird quirks. (when entering a tag, you’ll have to tab off of the value field after entering a value, since it is only checked onBlur)
Use Chrome Incognito NSX-T does not work in Firefox on Windows. It works in Chrome, but I had issues where the cache would problems (the web GUI would indicate that backup is not configured until I closed Chrome, cleared cache and logged in again)
Do not use exclamation point in the NSX-T admin password Yep, learned that the hard way. Supposedly, this is resolved in PKS 1.0.3, but I’m not convinced as my environment did not wholly cooperate until I reset the admin password to something without an exclamation point in it
Tag only one IP Pool with ncp/external I needed to build out several foundations on this environment and wanted to keep them in discrete IP space by created multiple “external IP Pools” and assigning each to its own foundation. Currently the nsx-cli.sh script that accompanies PKS with NSX-T only looks for the “ncp/external” tag on IP Pools, if more than one is found, it quits. I suppose you could work around this by forking the script and passing an additional “cluster” param, but I’m certain that the NSBU is working on something similar
Do not take a snapshot of the NSX Manager This applies to NSX for vSphere and NSX-T, but I have made this mistake and it was costly. If your backup solution relies on snapshots (pretty much all of them do), be sure to exclude the NSX Manager and…
Configure scheduled backups of NSX Manager I found the docs for this to be rather obtuse. Spent a while trying to configure a FileZilla SFTP or even IIS-FTP server until it finally dawned on me that it really is just FTP over SSH. So, the missing detail for me was that you’ll just need a linux machine with plenty of space that the NSX Manager can connect to – over SSH – and dump files to. I started with this procedure, but found that the permissions were too restrictive.
Use concourse pipelines This was an opportunity for me to really dig into concourse pipelines and embrace what can be done. One moment of frustration came when PKS 1.0.3 was released and I discovered that the parameters for vSphere authentication had changed. In PKS 1.0 through 1.0.2, there was a single set of credentials to be used by PKS to communicate with vCenter Server. As of 1.0.3, this was split into credentials for master and credentials for workers. So, the pipeline needed a tweak in order to complete the install. I ended up putting in a conditional to check the release version, so the right params are populated. If interested, my pipelines can be found at https://github.com/BrianRagazzi/concourse-pipelines
Count your Load-Balancers In NSX-T, the load-balancers can be considered a sort of empty appliance that Virtual Servers are attached to and can itself attach to a Logical Router. The load-balancers in-effect require pre-allocated resources that must come from an Edge Cluster. The “small” load-balancer consumes 2 CPU and 4GB RAM and the “Large” edge VM provides 8 CPU and 16GB RAM. So, a 2-node Edge Cluster can support up to FOUR active/standby Load-Balancers. This quickly becomes relevant when you realize that PKS creates a new load-balancer when a new K8s cluster is created. If you get errors in the diego databse with the ncp job when creating your fifth k8s cluster, you might need to add a few more edge nodes to the edge cluster.

Configure your NAT rules as narrow as you can. I wasted a lot of time due to mis-configured NAT rules. The log data from provisioning failures did not point to NAT mis-configuration, so wild geese were chased. Here’s what finally worked for me:

Router	Priority	Action	Source	Destination	Translated	Description
Tier1 PKS Management	512	No NAT	[PKS Management CIDR]	[PKS Service CIDR]	Any	No NAT between management and services
	512	No NAT	[PKS Service CIDR]	[PKS Management CIDR]	Any	No NAT between management and services
	1024	DNAT	Any	[External IP for Ops Manager]	[Internal IP for Ops Manager]	So Ops Manager is reachable
		DNAT	Any	[External IP for PKS Service]	[Internal IP for PKS Service] (obtain from Status tab of PKS in Ops Manager)	So PKS Service (and UAA) is reachable
		SNAT	[Internal IP for PKS Service]	Any	[External IP for PKS Service]	Return Traffic for PKS Service
	2048		[PKS Management CIDR]	[Infrastructure CIDR] (vCenter Server, NSX Manager, DNS Servers)	[External IP for Ops Manager]	So PKS Management can reach infrastructure
	2048		[PKS Management CIDR]	[Additional Infrastructure] (NTP in this case)	[External IP for Ops Manager]	So PKS Management can reach infrastructure
Tier1 PKS Services	512	No NAT	[PKS Service CIDR]	[PKS Management CIDR]	Any	No NAT between management and services
	512	No NAT	[PKS Management CIDR]	[PKS Service CIDR]	Any	No NAT between management and services
	1024	SNAT	[PKS Service CIDR]	[Infrastructure CIDR] (vCenter Server, NSX Manager, DNS Servers)	[External IP] (not the same as Ops Manager and PKS Service, but in the same L3 network)	So PKS Services can reach infrastructure
	1024	SNAT	[PKS Service CIDR]	[Additional Infrastructure] (NTP in this case)	[External IP]	So PKS Services can reach infrastructure

Resolutions for 2018

If I put it here, I’m much more likely to follow-through. Like many, I work best under some pressure. Here is a list of what I want to do differently (with regard to technology) next year.

Do more blogging. I can make a ton of excuses for not blogging as much this year. I love sharing what I’ve learned; the more new stuff I learn, the more I share. So….
Do more for NSX for vSphere and NSX-T. I feel strongly that SDN is critical to the future of how datacenters operate. NSX is the logical leader in this space and will only grow in interest. There is still a tendency to replicate what was done with pre-SDN technology and I’d like to see modern ways to solve problems while finding and pushing the limits of what can be done in SDN.
Do more with containers and PKS. The technologies that Pivotal provides are cutting edge. Already and continuing, containers and applications-as-code methods are growing and will define the datacenter of the future. Just as a few years ago, we stopped thinking of hardware servers as single-purpose, we’ll embrace multiple workloads within a VM.
Do more coding. I love concourse and pipelines, but have a lot to learn. Let’s find the limits of BOSH and pipelines. Can we not only deploy, but automate the operation and maintenance of a PaaS solution?
Do more coding. I feel that as we move to “applications-as-code”, it’s important to understand what that means to developers and operators. What sort of problems become irrelevant in this approach? What molehills become mountains?

Hope to see you next year!

Building Stand-Alone BOSH and Concourse

This should be the last “how to install concourse” post; With this, I think I’ve covered all the interesting ways to install it. Using BOSH is by-far my favorite approach. After this, I hope to post more related to the use of concourse and pipelines.

Overview

There are three phases to this deployment:

BOSH-start – We’ll set up an ubuntu VM to create the BOSH director from. We’ll be using BOSH v2 and not bosh-init
BOSH Director – This does all the work for us, but has to be instructed how to connect to vSphere
Concourse – We’ll use a deployment manifest in BOSH to deploy concourse

I took the approach that – where possible – I would manually download the files and transfer them to the target, rather than having the install process pull the files down automatically. In my case, I went through a lot of trial-and-error, so I did not want to pull down the files every time. In addition, I’d like to get a feel for what a self-contained (no Internet access) solution would look like. BTW, concourse requires Internet access in order to get to docker hub for a container to run its pipelines.

Starting position

Make sure you have the following:

Working vSphere environment with some available storage and compute capacity
At least one network on a vSwitch or Distributed vSwitch with available IP addresses
Account for BOSH to connect to vSphere with permissions to create folders, resource pools, and VMs
An Ubuntu VM template. Mine is 16.04 LTS
PuTTY, Win-SCP or similar tools

BOSH-start

Deploy a VM from your Ubuntu template. Give it a name – I call mine BOSH-start – and IP address, power it on. In my case, I’m logged in as my account to avoid using root unless necessary.

Install dependencies:

sudo apt-get install -y build-essential zlibc zlib1g-dev ruby ruby-dev openssl \
libxslt-dev libxml2-dev libssl-dev libreadline6 libreadline6-dev \
libyaml-dev libsqlite3-dev sqlite3

Download BOSH CLI v2, make it executable and move it to the path. Get the latest version of the BOSH v2 CLI here.

wget https://s3.amazonaws.com/bosh-cli-artifacts/bosh-cli-2.0.16-linux-amd64
chmod +x ~/Downloads/bosh-cli-*
sudo mv ~/Downloads/bosh-cli-* /usr/local/bin/bosh

BOSH Director

Git Director templates

mkdir ~/bosh-1
cd ~/bosh-1
git clone https://github.com/cloudfoundry/bosh-deployment

Create a folder and use bosh to create the environment. This command will create several “state” files and our BOSH director with the information you provide. Replace the values in red with your own.


bosh create-env bosh-deployment/bosh.yml \
    --state=state.json \
    --vars-store=creds.yml \
    -o bosh-deployment/vsphere/cpi.yml \
    -o bosh-deployment/vsphere/resource-pool.yml \
    -o bosh-deployment/misc/dns.yml \
    -v internal_dns=<DNS Servers ex: [192.168.100.10,192.168.100.11]>
    -v director_name=<name of BOSH director. eg:boshdir> \
    -v internal_cidr=<CIDR for network ex: 172.16.9.0/24> \
    -v internal_gw=<Gateway Address> \
    -v internal_ip=<IP Address to assign to BOSH director> \
    -v network_name="<vSphere vSwitch Port Group>" \
    -v vcenter_dc=<vSphere Datacenter> \
    -v vcenter_ds=<vSphere Datastore> \
    -v vcenter_ip=<IP address of vCenter Server> \
    -v vcenter_user=<username for connecting to vCenter Server> \
    -v vcenter_password=<password for that account> \
    -v vcenter_templates=<location for templates ex:/BOSH/templates> \
    -v vcenter_vms=<location for VM.  ex:/BOSH/vms> \
    -v vcenter_disks=<folder on datastore for bosh disks.  ex:bosh-1-disks> \
    -v vcenter_cluster=<vCenter Cluster Name> \
    -v vcenter_rp=<Resource Pool Name>

One note here; if you do not add the line for dns.yml and internal_dns, your BOSH director will use 8.8.8.8 as its DNS server and won’t be able to find anything internal. This will take a little while to download the bits and set up the Director for you.

Connect to Director. The following commands will create an alias for the new BOSH environment named “bosh-1”. Replace 10.0.0.6 with the IP of your BOSH Director from the create-env command:

# Configure local alias
bosh alias-env bosh-1 -e 10.0.0.6 --ca-cert <(bosh int ./creds.yml --path /director_ssl/ca)
export BOSH_CLIENT=admin
export BOSH_CLIENT_SECRET=`bosh int ./creds.yml --path /admin_password`
bosh -e bosh-1 env

Next we’ll need a “cloud config”. This indicates to BOSH Director how to configure the CPI for interaction with vSphere. You can find examples and details here. For expediency, What I ended up with is below. As usual, you’ll want to update the values in red to match your environment. Save this file as ~/bosh-1/cloud-config.yml on the BOSH-start VM

azs:
- name: z1
  cloud_properties:
    datacenters:
    - name: <vSphere Datacenter Name>
    - clusters: 
      - <vSphere Cluster Name>: {resource_pool: <Resource Pool in that cluster>}
properties:
  vcenter:
    address: <IP of FQDN of vCenter Server>
    user: <account to connect to vSphere with>
    password: <Password for that account>
    default_disk_type: thin
    enable_auto_anti_affinity_drs_rules: false
    datacenters:
    - name: <vSphere Datacenter Name>
      vm_folder: /BOSH/vms
      template_folder: /BOSH/templates
      disk_path: prod-disks
      datastore_pattern: <regex filter for datastores to use ex: '\AEQL-THICK0\d' >
      persistent_datastore_pattern: <regex filter for datastores to use ex: '\AEQL-THICK0\d' >
      clusters:
      - <vSphere Cluster Name>: {resource_pool: <Resource Pool in that cluster>}

vm_types:
- name: default
  cloud_properties:
    cpu: 2
    ram: 4096
    disk: 16_384
- name: large
  cloud_properties:
    cpu: 2
    ram: 8192
    disk: 32_768

disk_types:
- name: default
  disk_size: 16_384
  cloud_properties:
    type: thin
- name: large
  disk_size: 32_768
  cloud_properties:
    type: thin

networks:
- name: default
  type: manual
  subnets:
  - range: <network CIDR where to place VMs ex:192.168.10.0/26>
    reserved: <reserved range in that CIDR ex:[192.168.10.1-192.168.10.42] >
    gateway: <gateway address for that network>
    az: z1
    dns: <DNS Server IPs ex: [192.168.100.50,192.168.100.150] >
    cloud_properties:
      name: <name of port group to attach created VMs to>

compilation:
  workers: 5
  reuse_compilation_vms: true
  az: z1
  vm_type: large
  network: default

Update Cloud Config with our file:
```
bosh -e bosh-1 update-cloud-config ./cloud-config
```
This is surprisingly fast. You should now have a functional BOSH Director.

Concourse

Let’s deploy something with BOSH!

Prereqs:

Copy the URLs for the Concourse and Garden runC BOSH releases from here
Copy the URL for the latest Ubuntu Trusty stemcell for vSphere from here

Upload Stemcell. You’ll see it create a VM with a name beginning with “sc” in vSphere
```
bosh -e bosh-1 upload-stemcell <URL to stemcell>
```

Upload Garden runC release to BOSH

bosh -e bosh-1 upload-release <URL to garden-runc tgz>

Upload Concourse release to BOSH

bosh -e bosh-1 upload-release <URL to concourse tgz>

A BOSH deployment must have a stemcell, a release and a manifest. You can get a concourse manifest from here, or start with the one I’m using. You’ll notice that a lot of the values here must match those in our cloud-config. Save the concourse manifest as ~/concourse.yml

---
name: concourse

releases:
- name: concourse
  version: latest
- name: garden-runc
  version: latest

stemcells:
- alias: trusty
  os: ubuntu-trusty
  version: latest

instance_groups:
- name: web
  instances: 1
  # replace with a VM type from your BOSH Director's cloud config
  vm_type: default
  stemcell: trusty
  azs: [z1]
  networks: [{name: default}]
  jobs:
  - name: atc
    release: concourse
    properties:
      # replace with your CI's externally reachable URL, e.g. https://ci.foo.com
      external_url: http://concourse.mydomain.com

      # replace with username/password, or configure GitHub auth
      basic_auth_username: myuser
      basic_auth_password: mypass

      postgresql_database: &atc_db atc
  - name: tsa
    release: concourse
    properties: {}

- name: db
  instances: 1
  # replace with a VM type from your BOSH Director's cloud config
  vm_type: large
  stemcell: trusty
  # replace with a disk type from your BOSH Director's cloud config
  persistent_disk_type: default
  azs: [z1]
  networks: [{name: default}]
  jobs:
  - name: postgresql
    release: concourse
    properties:
      databases:
      - name: *atc_db
        # make up a role and password
        role: atc_db
        password: mypass

- name: worker
  instances: 1
  # replace with a VM type from your BOSH Director's cloud config
  vm_type: default
  stemcell: trusty
  azs: [z1]
  networks: [{name: default}]
  jobs:
  - name: groundcrew
    release: concourse
    properties: {}
  - name: baggageclaim
    release: concourse
    properties: {}
  - name: garden
    release: garden-runc
    properties:
      garden:
        listen_network: tcp
        listen_address: 0.0.0.0:7777

update:
  canaries: 1
  max_in_flight: 1
  serial: false
  canary_watch_time: 1000-60000
  update_watch_time: 1000-60000

A couple of notes:

The Worker instance will need plenty of space, especially if you’re planning to use PCF Pipeline Automation, as it’ll have to download the massive binaries from PivNet. You’ll want to make sure that you have a sufficiently large vm type defined in your cloud config and assigned as worker in the Concourse manifest

Now, we have everything we need to deploy concourse. Notice that we’re using BOSH v2 and the deployment syntax is a little different than in BOSH v1. This command will create a handful of VMs, compile a bunch of packages and push them to the VMs. You’ll a couple extra IPs for the compilation VMs – these will go away after the deployment is complete.
```
bosh -e bosh-1 -d concourse deploy ./concourse.yml
```
Odds are that you’ll have to make adjustments to the cloud-config and deployment manifest. If so, you can easily apply updates to the cloud-config with the bosh update-cloud-config command.
If the deployment is completely hosed up and you need to remove it, you can do so with
```
bosh -e bosh-1 -d concourse stop &&  bosh -e bosh-1 -d concourse deld
```

Try it out

Get the IP address of the web instance by running
```
bosh -e bosh-1 vms
```
From the results, identify the IP address of the web instance:
Point your browser to http://<IP of web instance>:8080
Click Login, Select “main” team and login with the username and password (myuser and mypass in the example) you used in the manifest

References:

http://bosh.io/docs/cloud-config.html – Help for figuring out the schema of the cloud config
http://bosh.io/docs/networks.html – Help figuring out the network section of the cloud config
http://concourse.ci/clusters-with-bosh.html – Where I got most of my information. Note that (as of July 2017) the deploying method linked on this page only works for BOSH v1
Thanks Danny Berger for this comment on github, saved me pulling all my hair out.

Building a Concourse CI VM on Ubuntu

Recently, I’ve found myself needing a Concourse CI system. I struggled with the documentation on concourse.ci, couldn’t find any comprehensive build guides. Knew for certain I wasn’t going to use VirtualBox. So, having worked it out; thought I’d share what I went through to get to a working system.

Starting Position
Discovered that the CentOS version I was using previously did not have a compatible Linux kernel version. CentOS 7.2 uses kernel 3.10, Concourse requires 3.19+. So, I’m starting with a freshly-deployed Ubuntu Server 16.04 LTS this time.

Prep Ubuntu
Not a lot we have to do, but still pretty important:

Make sure port for concourse is open

~~sudo ufw allow 8080 sudo ufw status~~

sudo ufw disable

I disabled the firewall on ubuntu because it was preventing the concourse worker and concourse web from communicating.
Update and make sure wget is installed

apt-get update apt-get install wget

Postgresql
Concourse expects to use a postgresql database, I don’t have one standing by, so let’s install it.

Pretty straightforward on Ubuntu too:

apt-get install postgresql postgresql-contrib

Enter y to install the bits. On Ubuntu, we don’t have to take extra steps to configure the service.
Ok, now we have to create an account and a database for concourse. First, lets create the linux account. I’m calling mine “concourse” because I’m creative like that.

adduser concourse passwd concourse
Next, we create the account (aka “role” or “user”) in postgres via the createuser command. In order to do this, we have to switch to the postgres account, do that with sudo:

sudo -i -u postgres

Now, while in as postgres we can use the createuser command

createuser –interactive

You’ll enter the name of the account, and answer a couple of special permissions questions.
While still logged in as postgres, run this command to create a new database for concourse. I’m naming my database “concourse” – my creativity is legendary. Actually, I think it makes life easier if the role and database are named the same

createdb concourse
Test by switching users to the concourse account and making sure it can run psql against the concourse databaseWhile in psql, use this command to set the password for the account in postgress

ALTER ROLE concourse WITH PASSWORD 'changeme';
Type \q to exit psql

Concourse
Ok, we have a running postgresql service and and account to be used for concourse. Let’s go.

Create a folder for concourse. I used /concourse, but you can use /var/lib/whatever/concourse if you feel like it.
Download the binary from concourse.ci/downloads.html into your /concourse folder using wget or transfer via scp.
Create a symbolic link named “concourse” to the file you downloaded and make it executable

ln -s ./concourse_linux_amd64 ./concourse chmod +x ./concourse_linux_amd64
Create keys for concourse

cd /concourse

mkdir -p keys/web keys/worker

ssh-keygen -t rsa -f ./keys/web/tsa_host_key -N ”
ssh-keygen -t rsa -f ./keys/web/session_signing_key -N ”
ssh-keygen -t rsa -f ./keys/worker/worker_key -N ”
cp ./keys/worker/worker_key.pub ./keys/web/authorized_worker_keys
cp ./keys/web/tsa_host_key.pub ./keys/worker
Create start-up script for Concourse. Save this as /concourse/start.sh:

/concourse/concourse web \
–basic-auth-username myuser \
–basic-auth-password mypass \
–session-signing-key /concourse/keys/web/session_signing_key \
–tsa-host-key /concourse/keys/web/tsa_host_key \
–tsa-authorized-keys /concourse/keys/web/authorized_worker_keys \
–external-url http://192.168.103.81:8080 \
–postgres-data-source postgres://concourse:changeme@127.0.0.1/concourse?sslmode=disable

/concourse/concourse worker \
–work-dir /opt/concourse/worker \
–tsa-host 127.0.0.1 \
–tsa-public-key /concourse/keys/worker/tsa_host_key.pub \
–tsa-worker-private-key /concourse/keys/worker/worker_key

The items in red should definitely be changed for your environment. “external_url” uses the IP address of the VM its running on. and the username and password values in the postgres-data-source should reflect what you set up earlier. Save the file and be sure to set it as executable (chmod +x ./start.sh)
Run the script “./start.sh”. You should see several lines go by concerning worker-collectors and builder-reapers.
- If you instead see a message about authentication, you’ll want to make sure that 1) the credentials in the script are correct, 2) the account has not had it’s password set in linux or in postgres
- If you instead see a message about the connection not accepting SSL, be sure that the connection string in the script includes “?sslmode=disable” after the database name
Test by pointing a browser at the value you assigned to the external_url. You should see “no pipelines configured”. You can login using the basic-auth username and password you specified in the startup script.
Success!
Back in your SSH session, you can kill it with <CRTL>+C

Finishing Up
Now we just have to make sure that concourse starts when the system reboots. I am certain that there are better/safer/more reliable ways to do this, but here’s what I did:
Use nano or your favorite text editor to add “/concourse/start.sh” to /etc/rc.local ABOVE the line that reads “exit 0”
Now, reboot your VM and retest the connectivity to the concourse page.

Thanks

EMC ECS Community Edition project for how to start the script on boot.

Mitchell Anicas’ very helpful post on setting up postgres on Ubuntu.

Concourse.ci for some wholly inadequate documentation

Alfredo Sánchez for bringing the issue with Concourse and CentOS to my attention

Building a Concourse CI VM on CentOS

WARNING

It has been brought to my attention that CentOS does not have a compatible Linux kernel, so I’ve redone this post using Ubuntu instead.

Starting Position
I’m starting with a freshly-deployed CentOS 7 VM. I use Simon’s template build, so it comes up quickly and reliably. Logged on as root.

Prep CentOS
Not a lot we have to do, but still pretty important:

Open firewall post for concourse

firewall-cmd --add-port=8080/tcp --permanent firewall-cmd --reload

optionally, you can open 5432 for postgres if you feel like it
Update and make sure wget is installed

yum update yum install wget

Postgresql
Concourse expects to use a postgresql database, I don’t have one standing by, so let’s install it.

Pretty straightforward on CentOS:

yum install postgresql-server postgresql-contrib

Enter y to install the bits.
When that step is done, we’ll set it up with this command:

sudo postgresql-setup initdb
Next, we’ll update the postgresql config to allow passwords. Use your favorite editor to open /var/lib/pgsql/data/pg_hba.conf We need to update the value in the method column for IPv4 and IPv6 connections from “ident” to “md5” then save the file.
Before

After
Now, let’s start postgresql and set it to run automatically

sudo systemctl start postgresql sudo systemctl enable postgresql
Ok, now we have to create an account and a database for concourse. First, lets create the linux account. I’m calling mine “concourse” because I’m creative like that.

adduser concourse passwd concourse
Next, we create the account (aka “role” or “user”) in postgres via the createuser command. In order to do this, we have to switch to the postgres account, do that with sudo:

sudo -i -u postgres

Now, while in as postgres we can use the createuser command

createuser –interactive

You’ll enter the name of the account, and answer a couple of special permissions questions.
While still logged in as postgres, run this command to create a new database for concourse. I’m naming my database “concourse” – my creativity is legendary. Actually, I think it makes life easier if the role and database are named the same

createdb concourse
Test by switching users to the concourse account and making sure it can run psql against the concourse databaseWhile in psql, use this command to set the password for the account in postgress

ALTER ROLE concourse WITH PASSWORD 'changeme';
Type \q to exit psql

Concourse
Ok, we have a running postgresql service and and account to be used for concourse. Let’s go.

Create a folder for concourse. I used /concourse, but you can use /var/lib/whatever/concourse if you feel like it.
Download the binary from concourse.ci/downloads.html into your /concourse folder using wget or transfer via scp.
Create a symbolic link named “concourse” to the file you downloaded and make it executable

ln -s ./concourse_linux_amd64 ./concourse chmod +x ./concourse_linux_amd64
Create keys for concourse

cd /concourse

mkdir -p keys/web keys/worker

ssh-keygen -t rsa -f ./keys/web/tsa_host_key -N ”
ssh-keygen -t rsa -f ./keys/web/session_signing_key -N ”
ssh-keygen -t rsa -f ./keys/worker/worker_key -N ”
cp ./keys/worker/worker_key.pub ./keys/web/authorized_worker_keys
cp ./keys/web/tsa_host_key.pub ./keys/worker
Create start-up script for Concourse. Save this as /concourse/start.sh:

/concourse/concourse web \
–basic-auth-username myuser \
–basic-auth-password mypass \
–session-signing-key /concourse/keys/web/session_signing_key \
–tsa-host-key /concourse/keys/web/tsa_host_key \
–tsa-authorized-keys /concourse/keys/web/authorized_worker_keys \
–external-url http://192.168.103.81:8080 \
–postgres-data-source postgres://concourse:changeme@127.0.0.1/concourse?sslmode=disable

/concourse/concourse worker \
–work-dir /opt/concourse/worker \
–tsa-host 127.0.0.1 \
–tsa-public-key /concourse/keys/worker/tsa_host_key.pub \
–tsa-worker-private-key /concourse/keys/worker/worker_key

The items in red should definitely be changed for your environment. “external_url” uses the IP address of the VM its running on. and the username and password values in the postgres-data-source should reflect what you set up earlier. Save the file and be sure to set it as executable (chmod +x ./start.sh)
Run the script “./start.sh”. You should see several lines go by concerning worker-collectors and builder-reapers.
- If you instead see a message about authentication, you’ll want to make sure that 1) the credentials in the script are correct, 2) the account has not had it’s password set in linux or in postgres and 3) the pg_hba.conf fie has been updated to use md5 instead of ident
- If you instead see a message about the connection not accepting SSL, be sure that the connection string in the script includes “?sslmode=disable” after the database name
Test by pointing a browser at the value you assigned to the external_url. You should see “no pipelines configured”
Success!
Back in your SSH session, you can kill it with <CRTL>+X

Finishing Up
Now we just have to make sure that concourse starts when the system reboots. I am certain that there are better/safer/more reliable ways to do this, but here’s what I did:

echo "/concourse/start.sh" >> /etc/rc.d/rc.local chmod +x /etc/rc.d/rc.local

Now, reboot your VM and retest the connectivity to the concourse page.

Thanks

EMC ECS Community Edition project for how to start the script on boot.

Mitchell Anicas’ very helpful post on setting up postgres on CentOS.

Concourse.ci for some wholly inadequate documentation

Configuring NSX Load-Balancer for PCF

There’s not a lot of specific information out there for this configuration. There’s some guidance from Pivotal and some how-tos from VMware, so with a little additional detail, we should be able to figure this out.

Edit – 2/1/17 – Updated with OpenSSL configuration detail
Edit – 3/20/17 – Updated SubjectAltNames in config

Preparation

SSL Certificate. You’ll need the signed public cert for your URL (certnew.cer), the associated private key (pcf.key) and the public cert of the signing CA (root64.cer).
1. Download and install OpenSSL
2. Create a config file for your request – paste this into a text file:
  
  [ req ] default_bits = 2048 default_keyfile = rui.key distinguished_name = req_distinguished_name encrypt_key = no prompt = no string_mask = nombstr req_extensions = v3_req
  [ v3_req ] basicConstraints = CA:FALSE keyUsage = digitalSignature, keyEncipherment extendedKeyUsage = serverAuth, clientAuth subjectAltName = DNS: *.pcf.domain.com, DNS:ServerShortName, IP:ServerIPAddress, DNS: *.system.pcf.domain.com, DNS: *.apps.pcf.domain.com, DNS:*.login.system.pcf.domain.com, DNS: *.uaa.system.pcf.domain.com
  [ req_distinguished_name ] countryName = US stateOrProvinceName = State localityName = City 0.organizationName = Company Name organizationalUnitName = PCF commonName = *.pcf.domain.com
3. Replace the values in red with those appropriate for your environment. Be sure to specify the server name and IP address as the Virtual IP and its associated DNS record. Save the file as pcf.cfg. You’ll want to use the wildcard “base” name as the common name and the server name, as well as the *.system, *.apps, *.login.system and *.uaa.system Subject Alt Names.
4. Use OpenSSL to create the Certificate Site Request (CSR) for the wildcard PCF domain.
  
  openssl req -new -newkey rsa:2048 -nodes -keyout pcf.key -out pcf.csr -config pcf.cfg
5. Use OpenSSL to convert the key to RSA (required for NSX to accept it)
  
  openssl rsa -in pcf.key -out pcfrsa.key
6. Submit the CSR (pcf.csr) to your CA (Microsoft Certificate Services in my case), retrieve the certificate (certnew.cer) and certificate chain (certnew.p7b) base-64 encoded.
7. Double-click certnew.p7b to open certmgr. Export the CA certificate as 64-bit encoded x509 to a file (root64.cer is the file name I use)
Networks. You’ll need to know what layer 3 networks the PCF components will use. In my case, I set up a logical switch in NSX and assigned the gateway address to the DLR. Probably should make this a 24-bit network, so there’s room to grow, but not reserving a ridiculous number of addresses. We’re going to carve up the address space a little, so make a note of the following:
- Gateway and other addresses you typically reserve for network devices. (eg: first 9 addresses 1-9)
- Address that will be assigned to the NSX load balancer. Just need one (eg: 10)
- Addresses that will be used by the PCF Routers. At least two. These will be configured as members in the NSX Load Balancer Pool.
DNS, IP addresses. PCF will use “system” and “apps” subdomains, plus whatever names you give any apps deployed. This takes some getting used to – not your typical application. Based on the certificate we created earlier, I recommend just creating a “pcf” subdomain. In my case, the network domain (using AD-DNS) is ragazzilab.com and I’ve created the following:
- pcf.ragazzilab.com subdomain
- *.pcf.ragazzilab.com A record for the IP address I’m going to assign to the NSX Load-Balancer

NSX

Assuming NSX is already installed and configured. Create or identify an existing NSX Edge that has an interface on the network where PCF will be / is deployed.

Assign the address we noted above to the inteface under Settings|Interfaces
Under Settings|Certificates, add the our SSL certificates
- Click the Green Plus and select “CA Certificate”. Paste the content of the signing CA public certificate (base64.cer) into the Certificate Contents box. Click OK.
- Click the Green Plus and select “Certificate”. Paste the content of the signed public cert (certnew.cer) into the Certificate Contents box and paste the content of the RSA private key (pcfrsa.key) into the Private Key box. Click OK.
Under Load Balancer, create an Application Profile. We need to ensure that NSX inserts the x-forwarded-for HTTP headers. To do that, we need to be able to decrypt the request and therefore must provide the certificate information. I found that Pool Side SSL had to be enabled and using the same Service and CA Certificates.
Router Application Profile
Create the Service Monitor. What worked for me is a little different from what is described in the GoRouter project page. The key points are that we want to specify the useragent and look for a response of “ok” with a header of “200 OK”.
Service Monitor for PCF Router
Create the Pool. Set it to ROUND-ROBIN using the Service Monitor you just created. When adding the routers as members, be sure to set the port to 443, but the Monitor Port to 80.
Router Pool
Create the Virtual Server. Specify the Application Profile and default Pool we just created. Obviously, specify the correct IP Address.
Virtual Server Configuration

PCF – Ops Manager

Assuming you’ve already deployed the Ops Manager OVF, use the installation dashboard to edit the configuration for Ops Manager Director. I’m just going to highlight the relevant areas of the configuration here:

Networks. Under “Create Networks”, be sure that the Subnet specified has the correct values. Pay special attention to the reserved IP ranges. These should be the addresses of the network devices and the IP address assigned to the load-balancer. Do not include the addresses we intend to use for the routers though. Based on the example values above, we’ll reserve the first 10 addresses.

Ops Manager Director will probably use the first/lowest address in range that is not reserved.

PCF – Elastic Runtime

Next, we’ll install Elastic Runtime. Again, I’ll highlight the relevant sections of the configuration.

Domains. In my case it’s System Domain = system.pcf.ragazzilab.com and Apps Domain = apps.pcf.ragazzilab.com
Networking.
- Set the Router IPs to the addresses (comma-separated) you noted and added to as members to the NSX load-balancer earlier.
- Leave HAProxy IPs empty
- Select the point-of-entry option for “external load balancer, and it can forward encrypted traffic”
- Paste the content of the signed certificate (certnew.cer) into the Certificate PEM field. Paste the content of the CA public certificate (root64.cer) into the same field, directly under the certificate content.
- Paste the content of the private key (pcf.key) into the Private Key PEM field.
- Check “Disable SSL Certificate verification for this environment”.
Resource Config. Be sure that the number of Routers is at least 2 and equal to the number of IP addresses you reserved for them.

Troubleshooting

Help! The Pool Status is down when the Service Monitor is enabled.

This could occur if your routers are behaving differently from mine. Test the response by sending a request to one of the routers through curl and specifying the user agent as HTTP-Monitor/1.1

curl -v -A “HTTP-Monitor/1.1” “http://{IP of router}”

The value in the yellow box should go into the “Expected” field of the Service Monitor and the value in the red box should go into the “Receive” field. Note that you should not get a 404 response, if you do, check that he user agent is set correctly.

Notes

This works for me and I hope it works for you. If you have trouble or disagree, please let me know.

Pivotal Cloud Foundry vApp startup order workflow

After installing Pivotal Cloud Foundry (PCF) on vSphere, you’ll have a collection of at least 21 (probably closer to 60!) VMs with names that probably don’t match anyone’s convention. Although, as noted in the PCF documentation, there is a correct order to starting up and shutting down the VMs in PCF, the installer does not configure a vApp so that we can control that order. So, I dragged all the PCF VMs into a vApp and starting trying to determine which ones are in which role and quickly realized that it’s a pain.

Creating an AZ in Ops Manager on vSphere

As an aside, when you create your Availability Zone, you point it at a vSphere cluster and, optionally, a Resource Pool. Unfortunately, if you specify a vApp Name instead of a Resource Pool name, BOSH will fail to deploy the VMs. So, I’ve typically leave the Resource Pool field blank and then drag the VMs into a vApp post-deployment.

I put together a workflow that will help place the PCF VMs into correct startup/shutdown groups for you.

Instructions for Use

Download the package from here
Import the package into vRealize Orchestrator
If you haven’t already, create a new vApp in your cluster and drag the Ops Manager, Ops Manager Director and all of the Elastic Runtime VMs into the vApp
Run the “PCFvAppStartupOrder” workflow, select your new vApp as the input, click Submit
If the PCF installation is scaled out to more VMs, just drag them to the vApp and rerun the workflow

How it works/What it does

The correct order is stored in a string array
The deployment, job and director custom fields are read for each VM in the vApp to get the VM’s assigned role
For the Ops Manager, the Notes field is read and if found, it is placed at the top of the startup sequnce
Unknown VMs are assigned a startup order higher than the last in the array. This way, they start last and power-off first
Unknown VMs are those where the “deployment” field does not start with “cf”; with exceptions for Ops Manager (Notes field) and Ops Manager Director (“director” field value is “bosh-init”)

Additional suggestions and notes

Adjust the resources for the vApp based on VMware best practices and what makes sense for your environment
Use this at your own risk, there is no implied warranty