07-17-2015 10:33 AM - edited 03-08-2019 01:01 AM
We have deployed an HSRP pair of CSR1000v routers on clustered ESXi servers utilizing the VMware 5.5 distributed switch. The routers are used to switch our privately addressed hosts in different networks/vlans on the distributed switch in an infrastructure service provider "cloud" environment.
In our development we've found that we are getting DUP ping responses (3 DUP responses to be exact) from Linux hosts that ping other Linux hosts on the same network when either one of the Linux hosts is on the same clustered ESXi server as our active CSR.
Some observations:
1. The DUP responses do not happen for Windows hosts under the same circumstances.
2. The DUP responses from the Linux hosts go away when the HSRP configuration is removed from the routers.
3. Linux, Windows, and the CSRs are all using the same virtual host adapter type (vmxnet3).
4. The CSR interfaces are setup with basic HSRP, no ip redirects, and no proxy-arp set.
5. The vlans on the VMware distributed switch security setting are set to "Accept" for promiscuous mode and forged transmits (the only way HSRP seems to work).
Has anyone seen this type of problem or have any suggestions on how to resolve/troubleshoot it?
Thanks,
Mike
07-18-2015 05:43 AM
Hello Mike
I guess there are a lot of similar issues seen with vmware. take a look at the below one:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017612
Hope this helps.
Vinit
07-20-2015 10:48 AM
[sfuller@rhel12 ~]$ ping -c2 192.168.22.1 PING 192.168.22.1 (192.168.22.1) 56(84) bytes of data. 64 bytes from 192.168.22.1: icmp_seq=1 ttl=255 time=4.96 ms 64 bytes from 192.168.22.1: icmp_seq=2 ttl=255 time=1.37 ms --- 192.168.22.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 1.372/3.166/4.961/1.795 ms [sfuller@rhel12 ~]$ ping -c2 192.168.22.3 PING 192.168.22.3 (192.168.22.3) 56(84) bytes of data. 64 bytes from 192.168.22.3: icmp_seq=1 ttl=255 time=3.05 ms 64 bytes from 192.168.22.3: icmp_seq=2 ttl=255 time=2.73 ms --- 192.168.22.3 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 2.739/2.899/3.059/0.160 ms [sfuller@rhel12 ~]$ ping -c2 192.168.22.4 PING 192.168.22.4 (192.168.22.4) 56(84) bytes of data. 64 bytes from 192.168.22.4: icmp_seq=1 ttl=255 time=0.427 ms 64 bytes from 192.168.22.4: icmp_seq=1 ttl=255 time=0.428 ms (DUP!) 64 bytes from 192.168.22.4: icmp_seq=1 ttl=254 time=0.514 ms (DUP!) 64 bytes from 192.168.22.4: icmp_seq=1 ttl=254 time=0.518 ms (DUP!) 64 bytes from 192.168.22.4: icmp_seq=2 ttl=255 time=0.497 ms --- 192.168.22.4 ping statistics --- 2 packets transmitted, 2 received, +3 duplicates, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.427/0.476/0.518/0.049 ms
Packet 35. Echo req Ethernet II, Src: 00:50:56:b0:c3:e7, Dst: 00:50:56:b0:1a:71 Internet Protocol Version 4, Src: 192.168.22.100, Dst: 192.168.22.4 IP TTL=64 Packet 36. Echo rep Ethernet II, Src: 00:50:56:b0:1a:71, Dst: 00:50:56:b2:66:6a Internet Protocol Version 4, Src: 192.168.22.4, Dst: 192.168.22.100 IP TTL=255 Packet 37. Echo rep Ethernet II, Src: 00:50:56:b0:c3:e7, Dst: 00:50:56:b2:66:6a Internet Protocol Version 4, Src: 192.168.22.4, Dst: 192.168.22.100 IP TTL=254 Packet 38. Echo rep Ethernet II, Src: 00:50:56:b0:1a:71, Dst: 00:50:56:b2:66:6a Internet Protocol Version 4, Src: 192.168.22.4, Dst: 192.168.22.100 IP TTL=255 Packet 39. Echo rep Ethernet II, Src: 00:50:56:b0:c3:e7, Dst: 00:50:56:b2:66:6a Internet Protocol Version 4, Src: 192.168.22.4, Dst: 192.168.22.100 IP TTL=254
PORT-ID ED-BY TEAM-PNIC DNAME PKTTX/s MbTX/s PKTRX/s MbRX/s %DRPTX %DRPRX [..] 100663301 4553795:rhel12 void vSwitch4 0.00 0.00 0.00 0.00 0.00 0.00 100663302 4706701:winxp3 void vSwitch4 0.00 0.00 0.00 0.00 0.00 0.00 100663304 4727660:c1kv-f5-1 void vSwitch4 119.53 0.09 119.72 0.09 0.00 0.00 100663305 4727766:c1kv-f5-2 void vSwitch4 0.59 0.00 0.39 0.00 0.00 0.00
~ # pktcap-uw --switchport 100663301 --dir 0 -o vss_cap_rhel12_rx.pcap The switch port id is 0x06000005 The dir is Rx The output file is vss_cap_rhel12_rx.pcap No server port specifed, select 38149 as the port Local CID 2 Listen on port 38149 Accept...Vsock connection from port 1032 cid 2 Dump: 2, broken : 0, drop: 0, file err: 0Destroying session 8 Dumped 2 packet to file vss_cap_rhel12_rx.pcap, dropped 0 packets. Done.
[sfuller@centos651 vmstore]$ tshark -r vss_cap_rhel12_rx_2.pcap 1 0.000000 192.168.22.100 -> 192.168.22.4 ICMP 98 Echo (ping) request id=0xb421, seq=1/256, ttl=64 2 1.000503 192.168.22.100 -> 192.168.22.4 ICMP 98 Echo (ping) request id=0xb421, seq=2/512, ttl=64 [sfuller@centos651 vmstore]$
~ # pktcap-uw --switchport 100663305 --dir 1 -o vss_cap_c1kv1-f5-2_tx_2.pcap The switch port id is 0x06000009 The dir is Tx The output file is vss_cap_c1kv1-f5-2_tx_2.pcap No server port specifed, select 47141 as the port Local CID 2 Listen on port 47141 Accept...Vsock connection from port 1040 cid 2 Dump: 7, broken : 0, drop: 0, file err: 0Destroying session 16 Dumped 7 packet to file vss_cap_c1kv1-f5-2_tx_2.pcap, dropped 0 packets. Done.
[sfuller@centos651 vmstore]$ tshark -r vss_cap_c1kv1-f5-2_tx_2.pcap 1 0.000000 192.168.22.3 -> 224.0.0.2 HSRP 62 Hello (state Active) 2 2.688828 192.168.22.3 -> 224.0.0.2 HSRP 62 Hello (state Active) 3 2.873541 192.168.22.100 -> 192.168.22.4 ICMP 98 Echo (ping) request id=0xb421, seq=1/256, ttl=64 4 2.874020 192.168.22.100 -> 192.168.22.4 ICMP 98 Echo (ping) request id=0xb421, seq=1/256, ttl=63 5 3.874027 192.168.22.100 -> 192.168.22.4 ICMP 98 Echo (ping) request id=0xb421, seq=2/512, ttl=64 6 3.874991 192.168.22.100 -> 192.168.22.4 ICMP 98 Echo (ping) request id=0xb421, seq=2/512, ttl=63 7 5.361105 192.168.22.3 -> 224.0.0.2 HSRP 62 Hello (state Active) [sfuller@centos651 vmstore]$
[sfuller@centos651 vmstore]$ tshark -V -r vss_cap_c1kv1-f5-2_tx_2.pcap Frame 3: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) [..] Ethernet II, Src: Vmware_b2:66:6a (00:50:56:b2:66:6a), Dst: Vmware_b0:1a:71 (00:50:56:b0:1a:71) Destination: Vmware_b0:1a:71 (00:50:56:b0:1a:71) Address: Vmware_b0:1a:71 (00:50:56:b0:1a:71) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Source: Vmware_b2:66:6a (00:50:56:b2:66:6a) Address: Vmware_b2:66:6a (00:50:56:b2:66:6a) Frame 4: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) [..] Ethernet II, Src: Vmware_b0:c3:e7 (00:50:56:b0:c3:e7), Dst: Vmware_b0:1a:71 (00:50:56:b0:1a:71) Destination: Vmware_b0:1a:71 (00:50:56:b0:1a:71) Address: Vmware_b0:1a:71 (00:50:56:b0:1a:71) .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default) .... ...0 .... .... .... .... = IG bit: Individual address (unicast) Source: Vmware_b0:c3:e7 (00:50:56:b0:c3:e7) Address: Vmware_b0:c3:e7 (00:50:56:b0:c3:e7)
2 - HSRP standby router to Linux via HSRP active router for 1 above
3 - HSRP standby router to Linux direct for 2 above
4 - HSRP standby router to Linux via HSRP active router for 2 above
07-20-2015 10:48 AM
Steve,
Thank you for your response! I appreciate the details you added. We were seeing some of the same traffic sources from tcpdump on our Linux host showing the additional ping responses and decremented TTL counter. My first impression was that the router was responding in duplicate to traffic that should have been handled by the distributed switch. You're response makes exchange much more clear!
Thank you for also clarifying that Windows boxes respond in the same way (although they hide it in the generic CLI response). That was a mystery to me that makes more sense now.
I'm still a bit curious why it only happens when one of the hosts is on the same ESXi server as the CSR but not if the CSR is on a different host. Regardless of CSR location, I would have figured the interaction would be identical
To answer your questions:
I too only see duplicate responses when pinging the standby router. The HSRP vip and active router IPs do not exhibit the DUP response.
Thank you again for looking into this. I'm still struggling to find a solution to the problem.
-Mike
07-20-2015 11:34 AM
Hi Mike,
I think the reason you only see the duplicates when the CSR is on the same host as your Linux VM is explained in the Promiscuous Portgroup Myth post. While the VMware switch doesn't learn MAC addresses and which ports they're connected to, the real switch you have connecting your hosts does.
In the diagram on that post, if VM A is the active HSRP router, and VM B is the standby HSRP router, then when VM C (your Linux host) pings the standby router, that traffic will not get sent out port A to Host A as the real switch knows the destination MAC address of the standby router is out port B.
As for a solution, not sure at this stage. I noticed the show controllers command on the CSR that is HSRP active shows "Software MAC Filter Enabled" which is probably how they make the router listen to trafifc destinated to the BIA (if you can call a vSphere assigned MAC a BIA) and the HSRP MAC. The fact that it's processing all MACs could be a bug, or perhaps it's by design.
Any chance you can raise a TAC case against it? I think we're pretty close, but getting that last bit is probably going to take some knowledge of the inner workings of the beast.
Regards
07-20-2015 01:39 PM
Thanks Steve,
I just finished reading the linked blog (I missed it my first go round). It makes sense now. I can't open a TAC case on this yet. We've set this up in our development lab as an evaluation and have just sent the PO out to make the purchase with support for the routers. Hopefully in the next week or so we can actually open a TAC case.
In the mean time, we're attacking this from our VMware contract side as well.
Thanks,
Mike
07-24-2015 10:21 AM
This caught my eye. I also confirm I can replicate the issue on both 3.13.1S and 3.14.1S with above mentioned configuration (HSRP, and only when one of the guests is on the same hypervisor as the CSR). I've captured with EPC and can see this behaviour from the router. Its as if with HSRP configured, the router thinks this ICMP is destined for him and taking a punt. Need to look at the punt statistics closer, but I'll raise with TAC in the meantime and let you know how I get on.
Interestingly, a colleague of mine who has a variety of services behind these routers mentioned issues with incoming MySQL connections too - not necessarily persistent connections, and only when HSRP is configured. I've yet to confirm this as we're looking to create a more isolated environment (as I'm sure this will speed up the TAC case).
Sandy
07-24-2015 11:27 AM
Thank you for the confirmation Sandy. I appreciate the additional sets of eyes on the problem to see if there is a solution. A TAC case would be great! We're still waiting for our procurement to finish off the licensing and support contract purchase with Cisco before I can open a case.
-Mike
08-26-2015 12:48 AM
We just recently received our service contract (procurement moves at a snail's pace here) and I was able to open a TAC case on my issue.
Thanks to your suggestions and feedback, Steve and Sandy, the Cisco engineer was able to quickly suggest an option to work around the issue. By forcing the routers to use their BIA instead of the floating virtual MAC with the "standby use-bia" command, the duplicate pings cleared up!
I am going to do some failover testing to make sure that this option does not cause issues with connected hosts, but initially things look promising.
Additionally, the TAC engineer also mentioned that Cisco case (CSCup28090) was filed to add support for this feature by default with FHRP's. The case is not public facing yet.
-Mike
08-26-2015 12:48 AM
Hi Mike,
Thanks for coming back to the forum and posting the answer. Unfortunately I don't think you can mark your own answer as correct :)
Regards
09-02-2015 05:02 AM
Hi Mike,
I'm glad you've found a workaround. Though for completeness, I thought it would be helpful to share what I’ve found.
The actual root cause is not through using any FHRP’s, but as a consequence of using them given the VMWare implementation to overcome unknown MAC’s is to flood out all ports, combined with a CSR bug which choses not to ignore L3 packets with a L2 destination of not for us. See: CSCuv63708.
What is happening is because the portgroup restricts sending to unknown MAC addresses unless promiscuous mode is configured, the CSR gets a copy of the frame when its flooded. Well, that is if the CSR is sitting on the same hypervisor as one of the VM’s. Despite the L2 header not being for the CSR, the CSR undergoes the routing algorithm, decrements the TTL, rewrites the source-mac to itself and forwards the frame out the destination interface, which if on the same LAN results in duplicates. This is not the correct behavior. In the physical world this is unlikely to happen as a downstream switch will not (unless configured to) flood out all ports.
If you have physical switches in between your hypervisors, then your physical switches will stop some of this flooded traffic in such cases where for example the destination MAC is known on the receiving port. Its also worth noting, this is the case for all traffic, not just ICMP. ICMP just shows you the issue in its DUP packet output.
Cisco have marked the bug as severe but currently there are only 2 support cases listed and 1 known affected version. I’ve demonstrated this exists in more than version and would appreciate those reading this to take their case back to TAC and get their case added to the support request.
We’re also working with VMWare in seeing if there is a workaround in changing this behavior but maintaining functionality somehow.
Sandy
09-02-2015 10:38 AM
Thanks for posting an update Sandy.
Regards
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide