03-20-2018 04:17 AM - edited 03-01-2019 01:27 PM
We updated two of our UCS ESXi hosts to enic version 2.3.0.14, and now have a strange problem with ASAv VMs running on those hosts (never had this with the old enic 2.3.0.7):
As soon as a ASAv is migrated to one of the two updated hosts, they're not reachable via SSH anymore, and after a while some sessions passing through the firewall on trunk ports are lost. We usually notice this with SMTP sessions that just timeout.
We used to have a very similar problem with HP elxnet drivers a couple of years ago, but I don't remember in which version this has been fixed.
Has anyone seen something similar?
03-20-2018 07:07 AM - edited 03-20-2018 07:08 AM
Greetings.
Do you have any other other Linux type guestVMs you move to the hosts in question, that have SSH reach-ability?
Might need to use the esxi pktcap-uw packet capture utility, to, confirm the frames at both the VMNIC/UPlink level, and at the DVS port level.
Is this a standard vmware DVS, or is this the ACI vmm integration one?
Thanks,
Kirk...
03-20-2018 08:00 AM
03-20-2018 08:36 AM - edited 03-20-2018 08:54 AM
Yeah, packet captures cut to the chase.
If you see some packet issues, especially when captured at the VMnic/uplink level, may want to open a TAC case.
That makes me want to ask another question about initiating the SSH from another guestVM in same vlan, on same host (might have to check which vmnic it's pinned to to make sure traffic stays inside DVS).
How often is this reproducible?
Does this only happen from another subnet? Same subnet, guestVM on same host?, etc,etc.
Referencing VMware article https://kb.vmware.com/s/article/2051814, some sample capture commands:
# pktcap-uw --uplink vmnic2 --dir 0 -o /tmp/inbound.cap & pktcap-uw --uplink vmnic2 --dir 1 -o /tmp/outbound.cap &
Above command is actually two commands, running in background to capture frames in both directions.
As noted in vmware article, you 'll actually have to kill the capture process with:
#kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)
Thanks,
Kirk...
04-06-2018 03:50 AM
After a lot of debugging we found out that we were completely on the wrong track with this: The reason for our ASAvs misbehaving was that they happened to be on those two hosts with two VMs of a different ASAv cluster, which have - for reasons yet unknown - their interfaces in promiscuous mode, and were processing packets for other firewalls in the same subnet that happened to end up on the same ESX host. Since they didn't know about the respective other destinations, they sent RST answers for those unknown sessions, effectively terminating the connections.
We disabled promiscuous mode for the dvSwitch port groups in question, and the problem has gone away for now. Why that firewall cluster behaves different from all others we have is the next unsolved puzzle.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide