Solved: EIGRP q cnt 1 peering with Nexus 7K

Matt Glosson · ‎01-03-2018

I have a 4331, whose IP address on Gi0/0/2 is 10.0.10.37, mask 255.255.255.240 (/28). It is connected to two Nexus 7000 routing switches on Vlan808 interface. There is also another router on the same VLAN (10.0.10.36), which doesn't have issues. The 7Ks are 10.0.10.34 and 10.0.10.35. One of the 7Ks (10.0.10.34) does not peer over EIGRP properly... see the output of "show ip eigrp neighbor":

H  Address    Interface Hold Uptime   SRTT  RTO  Q   Seq
                                      (sec) (ms) Cnt Num
14 10.0.10.34 Gi0/0/2   13   00:00:23 1     5000 1   58026
13 10.0.10.36 Gi0/0/2   14   00:01:48 10    100  0   41814
12 10.0.10.35 Gi0/0/2   12   00:01:48 7     100  0   57944

I did a traceroute (which should directly respond) to both 7K Vlan808 interfaces, first to 10.0.10.34:

hospice4331#trace 10.0.10.34
Type escape sequence to abort.
Tracing the route to lls-n7k--vlan808.example.com (10.0.10.34)
VRF info: (vrf in name/id, vrf out name/id)
 1 lln-n7k--vlan808.example.com (10.0.10.35) 5 msec 5 msec 5 msec
 2 lls-n7k--vlan808.example.com (10.0.10.34) 5 msec 6 msec 4 msec

... then to 10.0.10.35:

hospice4331#trace 10.0.10.35
Type escape sequence to abort.
Tracing the route to lln-n7k--vl808.example.com (10.0.10.35)
VRF info: (vrf in name/id, vrf out name/id)
1 lln-n7k--vlan808.example.com (10.0.10.35) 5 msec 6 msec 4 msec

The fact that it shows two hops made on the first traceroute (which also corresponds to the 7K that has the >0 q cnt) makes me wonder if it's a 7K VPC problem, as the VPC configuration specifies both peer-switch and peer-gateway (set up by my predecessor). Oddly, the other router (actually a 3750 no switchport L3 interface) doesn't suffer from the same issue. Is this a VPC problem, perhaps? Other thoughts? I could provide a topology diagram if need be, but I thought I'd try floating the question without one first.

Peter Paluch · ‎01-04-2018

Matt,

Indeed, if you are using vPC, this would be one of the typical symptoms when trying to peer an external device in EIGRP through a vPC to both vPC peers.

The catch, as has already been partially commented in this thread, is that with peer-gateway, each of the vPC peers adopts the MAC address of the other peer for routing purposes. Now, if the 4331 sends an EIGRP packet to the vPC peer B, and the packet due to load-balancing lands on the physical link toward vPC peer A, the vPC peer A will start routing the EIGRP packet to forward it to peer B across the peer-link - but by routing it, it will need to decrement the TTL, and as EIGRP packets are sent with a TTL of 1, they will effectively perish in the process.

The following document summarizes what topologies are supported for routing adjacencies over vPCs:

https://www.cisco.com/c/en/us/support/docs/ip/ip-routing/118997-technote-nexus-00.html

For Nexus 7000, since 7.2(0)D1(1), you can configure the layer3 peer-router in the vPC domain. This command has the effect of not decrementing the TTL of packets routed and forwarded across the peer-link. With peer-gateway and routed adjacencies over vPCs, this command is a must - but on Nexus 7000 platforms, the feature is only supported with F2E and F3 linecards.

Whether you are actually hit by the problem of packets expiring on one vPC peer when being routed to the other one depends obviously on whether the packets are truly hashed and forwarded through the physical link to the other vPC peer than the one they are targeted to - which depends on source and destination IP and MAC addresses. But in addition, since you've mentioned that a 3750 switch does not appear to have these problems: Many IOS versions used to send the EIGRP packets with their TTL actually set to 2 (likely a "fossil" remained from times old past in hub-and-spoke Frame Relay topologies). The TTL of 2 would actually allow the EIGRP packets to survive being routed through one vPC peer to the other. That would explain why the 3750 works without issues.

Best regards,
Peter

View solution in original post

leinad427 · ‎01-04-2018

Ok so your saying that one of the routers are working properly correct? Sorry a lil tippsy.. please run the following command on all routers starting with the one that is working fine

show ip eigrp neighbors

post results starting with the router that is communicating fine with EIGRP

now run show ip eigrp neighbors detail

post the result also starting with the router that is working fine

and lastly run show ip eigrp interfaces XX "where XX is the interface you a running EIGRP"

post the results starting with the router working

pigallo · ‎01-04-2018

Hi,

in this situation i would probably check the arp table and investigate why you resolve the .34 ip address with wrong mac address. I would also have give a try to fix this temporarily with a static arp entry. Not sure if this will work eventually because i do not know your features underneath.

However your main goal is to verify why l3 to l2 resolution does a double hop because Eigrp packets have TTL=1 and they will expire after passing over first hop. That's why probably your connection is ending with unidirectional issue displayed by q-count 1.

Matt Glosson · ‎01-04-2018

ARP actually showed the switches as they really were. However, your second paragraph was right-on and in light of it, I looked for a way to increase the TTL to 2. Not finding it, I undertook the physical solution (see my comment to Peter's reply).

Peter Paluch · ‎01-04-2018

Matt,

Indeed, if you are using vPC, this would be one of the typical symptoms when trying to peer an external device in EIGRP through a vPC to both vPC peers.

The catch, as has already been partially commented in this thread, is that with peer-gateway, each of the vPC peers adopts the MAC address of the other peer for routing purposes. Now, if the 4331 sends an EIGRP packet to the vPC peer B, and the packet due to load-balancing lands on the physical link toward vPC peer A, the vPC peer A will start routing the EIGRP packet to forward it to peer B across the peer-link - but by routing it, it will need to decrement the TTL, and as EIGRP packets are sent with a TTL of 1, they will effectively perish in the process.

The following document summarizes what topologies are supported for routing adjacencies over vPCs:

https://www.cisco.com/c/en/us/support/docs/ip/ip-routing/118997-technote-nexus-00.html

For Nexus 7000, since 7.2(0)D1(1), you can configure the layer3 peer-router in the vPC domain. This command has the effect of not decrementing the TTL of packets routed and forwarded across the peer-link. With peer-gateway and routed adjacencies over vPCs, this command is a must - but on Nexus 7000 platforms, the feature is only supported with F2E and F3 linecards.

Whether you are actually hit by the problem of packets expiring on one vPC peer when being routed to the other one depends obviously on whether the packets are truly hashed and forwarded through the physical link to the other vPC peer than the one they are targeted to - which depends on source and destination IP and MAC addresses. But in addition, since you've mentioned that a 3750 switch does not appear to have these problems: Many IOS versions used to send the EIGRP packets with their TTL actually set to 2 (likely a "fossil" remained from times old past in hub-and-spoke Frame Relay topologies). The TTL of 2 would actually allow the EIGRP packets to survive being routed through one vPC peer to the other. That would explain why the 3750 works without issues.

Best regards,
Peter

Matt Glosson · ‎01-04-2018

As usual, that was an excellent answer. I actually looked into manually setting the EIGRP TTL to 2 on the 4331, based on the earlier thread, but saw no way to do it. Unfortunately, the hardware on our 7Ks are not able to run anything newer than 6.2(16), so we can't go with your second solution. Instead, I figured out a way to run the 4331's connection (it's the far end of a wireless bridge) to a 'no switchport' of an IP Services 3560 we have for just such occasions. I had to jumper it though a few patch panels and push the limits of copper cabling (it ended up being 118 meters) but it seems to be working error-free for now (and we're monitoring it carefully).

But it's still great to have an answer to that vexing question.