VMware NSX-T ESXi network connectivity lost when using Broadcom 57500 series network cards are used. Solution available
The symptoms are that ESXi hosts and all virtual machines gets isolated in the network randomly. In the NSX Manager (System->Fabric->Hosts->Cluster) BFD (Bidirectional forward detection) tunnels are in a down state.
In this screenshot BFD tunnels are up and running.
NSX version and ESXI versions are not important. There is an issue with the driver and firmware of the used Broadcom NetXtreme E-Series 57504 Quad-port 25Gb in my case.
Setup of the environment which was affected from the issue:
VMware ESXI 7.0U3L
VMware NSX 18.104.22.168
Broadcom NetXtreme E-Series 57504: bnxtnet driver 22.214.171.124 FW: 22.21.07.80
Server: Dell PowerEdge R7525
VMware documented this problem in a KB ESXi host BFD tunnels are down in an NSX environment (90494)
Based on that KB the workaround is to downgrade the NIC firmware to 126.96.36.199 /pkg 188.8.131.52.
The disadvantage of the old firmware is the uncertified compatibility(see VMware compatibility matrix) with current bnxtnet drivers, which results in operational risks and possibly a reduced performance of the network throughput and latency.
There is now a permanent solution available documented by Dell and Broadcom.
According to Broadcom release notes BFD controls were dropped and thefore the issue occured. Using the new bnxtnet release 184.108.40.206 this issue is solved together with firmware version 220.127.116.11.
This release has also been certified by Dell and VMware based on this VMware compatibility matrix. The solution is also documented by Dell in the context of VxRail: