VMware NSX-T ESXi network connectivity lost when using Broadcom 57500 series network cards are used. Solution available

Issue Description

The symptoms are that ESXi hosts and all virtual machines gets isolated in the network randomly. In the NSX Manager (System->Fabric->Hosts->Cluster) BFD (Bidirectional forward detection) tunnels are in a down state.

 

In this screenshot BFD tunnels are up and running.

NSX version and ESXI versions are not important. There is an issue with the driver and firmware of the used Broadcom NetXtreme E-Series 57504 Quad-port 25Gb in my case.

Setup of the environment which was affected from the issue:

VMware ESXI 7.0U3L
VMware NSX 4.1.0.2
Broadcom NetXtreme E-Series 57504: bnxtnet driver 222.0.155.0 FW: 22.21.07.80
Server: Dell PowerEdge R7525


VMware documented this problem in a KB ESXi host BFD tunnels are down in an NSX environment (90494) 

Based on that KB the workaround is to downgrade the NIC firmware to 218.0.219.13 /pkg 21.85.21.91.

The disadvantage of the old firmware is the uncertified compatibility(see VMware compatibility matrix) with current bnxtnet drivers, which results in operational risks and possibly a reduced performance of the network throughput and latency.

Solution

There is now a permanent solution available documented by Dell and Broadcom.

According to Broadcom release notes BFD controls were dropped and thefore the issue occured. Using the new bnxtnet release 223.0.152.2 this issue is solved together with firmware version 22.31.13.70.

 

This release has also been certified by Dell and VMware based on this VMware compatibility matrix. The solution is also documented by Dell in the context of VxRail:

Dell VxRail: Intermittent issues with network dropouts across the entire cluster and NSXT tunnels showing down with Broadcom 25Gb Ethernet Adapter




Comments