Resolutions for 2018

If I put it here, I’m much more likely to follow-through.  Like many, I work best under some pressure.  Here is a list of what I want to do differently (with regard to technology) next year.

  1. Do more blogging.  I can make a ton of excuses for not blogging as much this year.  I love sharing what I’ve learned; the more new stuff I learn, the more I share.  So….
  2. Do more for NSX for vSphere and NSX-T.  I feel strongly that SDN is critical to the future of how datacenters operate.  NSX is the logical leader in this space and will only grow in interest.  There is still a tendency to replicate what was done with pre-SDN technology and I’d like to see modern ways to solve problems while finding and pushing the limits of what can be done in SDN.
  3. PKS
    Do more with containers and PKS.  The technologies that Pivotal provides are cutting edge.  Already and continuing, containers and applications-as-code methods are growing and will define the datacenter of the future.  Just as a few years ago, we stopped thinking of hardware servers as single-purpose, we’ll embrace multiple workloads within a VM.
  4. Do more coding.  I love concourse and pipelines, but have a lot to learn.  Let’s find the limits of BOSH and pipelines.  Can we not only deploy, but automate the operation and maintenance of a PaaS solution?
  5. Do more coding.  I feel that as we move to “applications-as-code”, it’s important to understand what that means to developers and operators.  What sort of problems become irrelevant in this approach?  What molehills become mountains?

Hope to see you next year!



How UCS QOS System Class can affect VMware NSX VXLANs

In a recent, ongoing, installation we encountered a wide variety of sporadic network traffic issues  affecting the VMs connected to NSX Logical Switches (VXLANs).

Some of the symptoms were:

  • Can ping a device, but not get a complete tracert
  • Can connect to a server over HTTPS, but not its neighbor (both webservers)
  • vCAC Gugent cannot connect to vCAC server
  • Were unable to perform a vmkping using the VXLAN TCP stack with more than 1470 bytes.

The last bullet made it pretty clear that the issue was related to the MTU.  We had no visibility into the configuration of the north-south layer 3 devices, but had been assured that they were configured for 1600 byte frames.

In the NSX for vSphere implementation of VXLAN, the packets sent by devices get a new additional header, increasing its overall size beyond 1500 bytes (up to ~1540 bytes or so).

I checked the UCS service profiles and the vNIC Templates, it looked something like this:

UCS vNIC template
UCS vNIC template

It certainly looks like its set for jumbo frames, but also notice the second red ellipse there; QoS Policy. If you pay attention to things like that, you might also notice the warning about the MTU size.

The QoS policy assigned to the vNIC (template) is uses an egress priority based on a QoS System Class.

UCS QoS Policy
UCS QoS Policy

The QoS System Classes specify not only a class-of-service and a weight, but also MTU! In my case, I changed the vNIC Template QoS Policy to one with a System Class whose MTU is  9216. Once this change was made, the VMs behaved as expected.

UCS System Class
UCS System Class

A couple of notes:

  • If your vNIC (templates) do not specify a QoS Policy, UCS appears to use the MTU given
  • If you do not have an enabled QoS System Class with an MTU of 9216, you’ll have to type it in, the dropdown list only contains “normal” and “fc”

This is another of those posts where I just stumbled upon the fix and needed to write it up before I forgot. Hopefully this will some someone a lot of time later.