Archive

Archive for the ‘Troubleshooting’ Category

Removing NSX-T VIBs from ESXi hosts

I’d wanted to revert my environment from (an incomplete install of) NSX-T v2.0 back to NSX for vSphere v6.3.x, but found that the hosts would not complete preparation.  The logs indicated that something was “claimed by multiple non-overlay vibs”.

Error in esxupdate.log

I found that the hosts still had the NSX-T VIBs loaded, so to remove them, here’s what I did:

  1. Put the host in maintenance mode.  This is necessary to “de-activate” the VIBs that may be in use
  2. Login to the host via SSH
  3. Run

    /etc/init.d/netcpad stop

  4. Run this all in one line; note the the order of the vibs is important

    esxcli software vib remove -n nsx-ctxteng -n nsx-hyperbus -n nsx-platform-client -n nsx-nestdb -n nsx-aggservice -n nsx-da -n nsx-esx-datapath -n nsx-exporter -n nsx-host -n nsx-lldp -n nsx-mpa -n nsx-netcpa -n nsx-python-protobuf -n nsx-sfhc -n nsx-support-bundle-client -n nsxa -n nsxcli -n nsx-common-libs -n nsx-metrics-libs -n nsx-nestdb-libs -n nsx-rpc-libs -n nsx-shared-libs

  5. reboot the host
Advertisements

A downside to VVols

03/17/2017 Comments off

I picked up a Dell Equallogic PS6000 for my homelab.  Updated it to the latest firmware and discovered it’s capable of VVols.  Yay!  I created a container and (eventually) migrated nearly everything to it.  Seriously, every VM except  Avamar VE.  Started creating and destroying VMs; DRS is happily moving VMs among the hosts.

UNTIL (dun dun dun)

The Equallogic VSM, running the VASA storage provider gets stuck during a vMotion.  Hmm, I notice that all of the powered-off VMs now have a status of “inaccessible”.  On the hosts, the VVol “datastore” is inaccessible.  

Ok, that’s bad.  Thank goodness for Cormac Hogan’s post about this issue.  It boils down to a chicken-and-egg problem.  vCenter relies on the VASA provider to supply information about the VVol.  If the VASA provider resides on the VVols, there’s no apparent way to recover it.  There’s no datastore to find the vmx and re-register, the connections to the VVols are based on the VM, so if it’s not running, there’s no connection to it.

To resolve, I had to create a new instance of the Equallogic VSM, re-register it with vCenter, re-register it as a VASA provider and add the Equallogic group.  Thankfully, the array itself is the source-of-truth for the VVol configuration, so the New VSM picked it up seamlessly.

So your options are apparently to place the VSM/VASA provider on a non-VVol or build a new one every time it shuts down.  Not cool.

 

vRealize Automation DEM worker cannot connect to Orchestrator

08/23/2016 Comments off

In vRA 6.2, using vRO 6.0, you may find that the data collection and other vRO workflows fail with the error “You must have at least one properly configured vCenter Orchestrator endpoint that is reachable”.  The IaaS/Monitoring/Log will show which DEM worker threw the error.  When you check the DEM worker logs for that instance, if you find the message “Could not create SSL/TLS secure channel. —> System.Net.WebException: The request was aborted: Could not create SSL/TLS secure channel“, you have probably been affected by VMKB 2123455 and MS KB 3061588.

Although both articles seem to suggest that removing the offending patch will solve the problem, I think figuring out exactly which patch is rather awkward.  The easier fix is to apply a quick registry hack to your DEM workers (and wherever the vRA Designer runs).

  1. Logon as an account with admin rights (suggest the account your IaaS services run under)
  2. locate or add the key

    HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\
    KeyExchangeAlgorithms\Diffie-Hellman

  3. Add/update the DWORD value ClientMinKeyBitLength and set the value to 512 decimal (200 hex)
  4. Restart the DEM worker service

 

Notes:
The Microsoft patch sets the default minimum group size to 1024.  It appears that the vRO 6.0.x appliances use something less than that.  This registry hack indicates that SCHANNEL should accept keys as small as 512 bits.  I suggest only applying this to the necessary and affected machines since it does lower the bar for the DHE security requirements.

Thanks to Zach Milleson for reminding me that this workaround may not resolve everyone’s issue, depending on which MS patches are installed.  If this workaround doesn’t work for you, you may have to locate and remove the offending patch.  YMMV.

Resolving Site Recovery Manager Error “Unable to create protection group. No VRM registered with vCenter Server”

Just a quick note here about this error.  I felt kinda silly once I realized what I’d done.

Scenario:

  1. Installed a vCenter Server at both sites
  2. Deployed a vSphere Replication 6.1 appliance at both sites.  Connected each to its corresponding PSC and vCenter Server
  3. Installed Site Recovery Manager 6.1 on each site, registered to corresponding vCenter Server
  4. Complete mappings and basic configurations in SRM, confirm VR is available at each site

    SRM configured with VR

    SRM configured with VR

  5. Attempt to create a protection group
  6. Receive Error “Unable to create protection group. No VRM registered with vCenter Server …”
  7. Google frantically

Its not clear from the documentation, but you have to configure at least one VM for replication using vSphere Replication section of Web Client first.  Totally counter-intuitive.  Just right-click the target VM and select “All vSphere Replication Actions | Configure Replication” and walk through the wizard.  Once complete, you can return to the SRM configuration and succesfully set up a Protection Group and subsequent Recovery Plan.

Weak Diffie-Hellman key in vRealize Orchestrator

{Edited Oct 19 2015 to reflect updated information inVMKB 2131619}

Recent versions of Google Chrome and Mozilla Firefox have begun rejecting connections using SSLv3 ciphers. Chrome complains of a weak ephemeral Diffie-Hellman public key, calling it a “disastrous misconfiguration”.  Firefox’s message also complains of a weak ephemeral Diffie-Hellman key in Server Key Exchange, but doesn’t foreshadow impending doom.

Interestingly (I guess), Internet Explorer 11, still happily connects…

Firefox message on vRO configuration page

Firefox message on vRO configuration page

Chrome message on vRO configuration page

Chrome message on vRO configuration page

Let’s fix Orchestrator so that we can use FF and Chrome…

Procedure

Confirmed this works on the vCO Appliance v5.5.2.1 through v6.0.2.1 and on the vRealize Automation Appliance v6.2.x

  1. SSH into the appliance
  2. Enter this to navigate to the configuration for the configuration page

    cd /etc/vco/configuration

  3. Enter this to backup the server.xml file

    cp ./server.xml ./serverxml.backup

  4. Use vi, or whatever you’re familiar with, to edit server.xml and replace the line that reads (as one line)

    ciphers=“TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
    TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
    TLS_RSA_WITH_AES_256_CBC_SHA,
    TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
    TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
    TLS_RSA_WITH_AES_128_CBC_SHA”

    with (again, as one line)

    ciphers=“TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
    TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
    TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
    TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
    TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,
    TLS_RSA_WITH_3DES_EDE_CBC_SHA”

  5. save the file
  6. Repeat the steps above for /etc/vco/app-server/server.xml
  7. Restart the vco-server and vcp-configurator services

    service restart vco-server
    service restart vco-configurator

That’s it, you should be good to go. There’s probably other VMware applications that will need the same treatment though.

Not Diffie-Hellman

Dang, wrong Hellman

Fix – Unable to import vCAC/vRA certificates into Orchestrator

07/17/2015 Comments off

Problem:

While in the vRealize Orchestrator Client you find that the Library/Configuration/SSL Trust Manager/”Import a certificate from URL” workflow returns an error reading “InternalError: handshake alert: unrecognized_name” when provided. The URL the resolves to the Load-Balancer VIP for the vCAC/vRA appliances.

 

Background:

Signed SSL certificate installed on vCAC/vRA Appliance, SSL Passthrough on NSX/vCNX Load-Balancer, vCAC/vRA Settings/Hostname set to resolve to VIP, matching SSL cert.

 

Fix:

  1. SSH into the vCAC Appliance as root
  2. Backup /etc/apache2/vhosts.d/vcac.conf to vcac.conf.bak
  3. Use vi to edit /etc/apache2/vhosts.d/vcac.conf
  4. Scroll down to  <virtualHost _default_:443>
  5. Add these lines

    ServerName fqdn.of.appliance.node

    ServerAlias: load.balancer.name

  6. Scroll further to ensure these params aren’t listed elsewhere, remove or revise if so.
  7. save the file and exit vi
  8. restart the vCAC/vRA services

NSX-v and vCNS Coexistence

06/25/2014 Comments off

It may or may not be apparent, but NSX for vSphere is in many ways the next version of vCNS. In my lab, I’ve attempted to keep vCNS while adding NSX to the same vCenter server. The license key or configuration apparently overlap; if vShield Manager boots up first, NSX indicates that it’s not licensed. If NSX manager boots first, the vShield Manager states that it’s not licensed.

I did not, but should have, performed an upgrade from vCNS to NSX and now will have to add NSX Edge Gateways to replace the vShield Edges.