Nexus 9000 cant ping own HSRP address

chris · ‎09-23-2015

Hi,

I have a problem with a Nexus 9372PX not responding to pings of the HSRP address it is a participant in.

Topology

2 x 9372 connected in vPC domain 11

connected to

2 x 9372 connected in vPC domain 21

The HSRP address is across all chassis, so we have a Active, Standby, Listen and Listen.

The 9k diagonal to the Active cannot ping "its" HSRP address.

The HSRP mac address is present in all switches with the G set.

Now, if i run ethanalyzer i see the echo request packets going to the control plane, no response, but also they never stop even though only 5 were requested.

The more pings i run to "its" HSRP address the slower all responses from the control plane become to addresses that are pingable. eg one of the other switches gets response of over 1100ms to an SVI on this 9k. Traffic switches through the box seems unaffected.

Peer-gateway is not on, same issue occurs when it is on

NXOS 6.1(2)I3(2)

Is HSRP supported in this configuration?

What is causing the increase in latency to the control plane (i am aware of control plane policing but not shaping)?

Any ideas?

Thanks

Chris

chris · ‎09-23-2015

Below is pretty much the issue experienced, but i am also suffering low ping responce to all IPs on the box as if the control plane is shaping but i cant find any command outputs to support that.

https://tools.cisco.com/bugsearch/bug/CSCuq09078

Steve Fuller · ‎09-23-2015

Hi Chris,

It seems you’ve found the answer to your question regarding the HSRP setup, but thought I’d add something regarding your earlier point on ethanalyzer.

“Now, if i run ethanalyzer i see the echo request packets going to the control plane, no response, but also they never stop even though only 5 were requested.”

Are you running a display or capture filter with ethanalyzer, and if so which interfaces does the traffic ingress/egress the switch on? If it’s the 40GE ports i.e., eth1/49 – 53, it will ingress/egress the switch on the Cisco Northstar ASIC on the Generic Expansion Module, and there’s a bug CSCup35239 (title Packets egressing via Northstar port not seen with ethanalyzer), that means using the display or capture filter is broken. I came across this recently and it seems that there’s an internal header added to the frame at the point of capture. If there’s a ping to the switch there is a response, but due to an additional 16-bytes on the frame, the capture/display filter never matches the pattern it’s looking for at the offset in the frame it’s inspecting.

In terms of your later question “but i am also suffering low ping responce to all IPs on the box” can you explain what you mean low ping response? Is this after you’ve removed HSRP from one of the vPC domains as recommended in the bug you found? Are you losing pings or just that the response time is high as seen previously?

Regards

chris · ‎09-24-2015

Steve,

Thanks for the reply, with regard to ethanayler I was greping for icmp, looking for pings from the same interface that also has the hsrp address, so from 192.168.1.2 to 192.168.1.1.

interface Vlan101
ip address 192.168.1.2/24
hsrp version 2
hsrp 101
ip 192.168.1.1

Switch# ethanalyzer local interface inband limit-captured-frames 10000 | grep -i icm | no-more

2015-09-22 23:09:06.099113 192.168.1.2 -> 192.168.1.1 ICMP Echo (ping) request
2015-09-22 23:09:06.099115 192.168.1.2 -> 192.168.1.1 ICMP Echo (ping) request

But thanks for the info its good to know.

The later question I missed out the very important word "times". When ethanalyer sees the packets destined for the HSRP address they never stop, the more pings to HSRP the more it drags the control plane down. Below are results from its VPC partner connected over 10gbps pinging the SVI address, in between me firing off more pings to the HSRP address.

switch# ping 192.168.1.2 vrf test count 10
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=62 time=68.228 ms

switch# ping 192.168.1.2 vrf test count 10
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=62 time=416.185 ms

--- 192.168.1.2 ping statistics ---
10 packets transmitted, 10 packets received, 0.00% packet loss
round-trip min/avg/max = 416.185/425.58/426.66 ms

switch# ping 192.168.1.2 vrf test count 10
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: icmp_seq=0 ttl=62 time=1101.235 ms

If I break the link between and isolate the two sets of switches so that HSRP becomes active /standby everything returns to normal operation.

Steve Fuller · ‎09-25-2015

Hi Chris,

When you try and ping the HSRP IP address, the switch is going to ARP and the active HSRP router will respond with the HSRP MAC address of 0000.0c9f.f065.

The thing that’s confusing me is that as you mentioned in your first post “The HSRP mac address is present in all switches with the G set.” This means that any of the four routers could process the ICMP echo request packet that is destined to the HSRP IP and MAC address.

So when you ping the HSRP IP address I’m not sure which switch the packet will actually go to. Does it even leave the device sending the ping? If the above is true i.e., that any of the four switches can process the packet, why would it leave the switch soucing the ping?

Can you post the output of the show mac address-table vlan 101 from each of the four switches so we can try and figure out where the packet goes?

The other thing that looks a little odd here is the TTL on the ping response. On the Nexus 9000 I’ve played with, ping is sent with an initial TTL of 255. Your ping response shows a received TTL of 62. Even if the ping is sent with an initial TTL of 64, then the ping has gone through more than one “router hop”. Although I know on Nexus 7000 with vPC they decrement the TTL for traffic that’s crossed the vPC peer link.

Confused? You bet I am!!!

Regards

chris · ‎09-25-2015

Steve,

Apologies i led you up a garden path with the TTL, the switches are not configured with 192.168 addressing so I've been altering stuff before cut and pasting, i was using the pings to illustrate the increase in response time, i didn't pay attention to the TTL and which set i cut and pasted, they should read ttl=254. Those actual pings were to a linux box not the switch.

Which switch actually responds to pings of the HSRP address? I have asked this myself and not found the answer in any documentation i have read. I presumed they all would to save effort, may get around to figuring this out at some point.

Haven't got the output from mac tables but they all had their own vlan interface mac present and the HSRP mac present both have the G flag.