PAS with NSX-T Tip: use a fresh IP Block

I’ve fought with this for an embarrassingly long time.  Had a failed PAS (Pivotal Application Services) deployment (missed several of the NSX configuration requirements) but removed the cruft and tried again and again and again.  In each case, PAS and NCP would deploy, but fail on the PAS smoke_test errand. The error message said more detail is in the log.

Which Log?!

I ssh’d into the clock_global VM and found the smoke_test logs. They stated that the container for instance {whatever} could not be created and an error of NCP04004. This pointed me to the Diego Cells (where the containers would be created) and I poked around in the /var/vcap/sys/log/garden logs there. They stated that the interface for the instance could not be found. Ok, this is sounding more like an NSX problem.

I ended up parsing through the NSX Manager event log and found this gem:

IP Block Error

Ah-ha! Yup, I’d apparently allocated a couple of /28 subnets from the IP Block. So when the smoke test tried to allocate a /24, the “fixed” subnet size had already been set to /28, causing the error.

Resolution was to simply remove all of the allocated subnets from the IP block. This could have been avoided by either not reusing an existing IP Block or using the settings in the NCP configuration to create a new IP Block with a given CIDR.

Advertisement

vCloud Director Database Issues

We had a situation recently where vCloud Director 1.5.1 (that’s one-dot-five-dot-one, not five-dot-one) failed to delete an external network, giving this error:

Could not execute JDBC batch update
– The DELETE statement conflicted with the REFERENCE constraint “allocated_ip_add_fk”. The conflict occurred in database “vCloud”, table “dbo.network_assigned_ip”, column ‘allocated_ip_add_id’.

This indicates that the record in the network_assigned_ip table cannot be deleted because it has a reference to a record in another table.

I had previously deleted all the related vApp networks and org networks and removed any stranded items.

To find the erroneous record, we first obtained the logical network id based on the network name:

Select * from [vCloud].[dbo].[logical_network] where name like ‘%BAD_NET%’ –note the network name has been set to “BAD_NET”

This returned one record, the first column value is the logical network ID, so we’re going to use this value to identify any assigned IP addresses in that logical networks via this query:

SELECT * from network_assigned_ip where logical_net_id = 0xLOGICALNETWORKID –replace with correct Logical network id value from previous query

Yes, I know how to do an INNER JOIN, but that’s not the point…

At this point, we had no left over IP addresses for that network, so we deleted the errant record from the logical_network table.

Now, the external network no longer appears, but everything’s not rosy.

vCD will only let you bind one network to a given port group and the database indicates that the port group I need to bind to a new external network is still in use. In my case this query was expected to have a record for the port group but did not:

Select * from [dbo].[ui_portgroups_avail_list_view

This information is stored in two different tables; vlan_in_use and real_network. Searching the vlan_in_use table for my VLAN ID did not return any records, so there must be a lingering item in real_network. I found it by querying the real_networks table and deleted that record.

Now ui_portgroups_avail_list_view includes a record for my port group, so I’m good-to-go.

Hopefully this information will be helpful to someone. I strongly suggest working with VMware support and having a case open and a very recent database backup before making any changes to the vCloud database.