Re: Bad hop count - 224.0.0.2 224.0.0.5 behavior

mury · ‎04-05-2023

I noticed a very high bad hop count on our 9407. When I run debug ip error, the bulk of the errors are from dispose ip.hopcount on multicast addresses. Ie, 224.0.0.2, .5, .102.

We have suspected a routing loop in the network for some time now. However, I'm not sure if this is evidence of a routing loop, or just normal behavior for a Cisco to error out multicast traffic in this manner. This behavior is happening on multiple vlans and multiple vrfs. Thanks for any input.

IP statistics:
Rcvd: 3828131010 total, 376837528 local destination
2 format errors, 0 checksum errors, 167802478 bad hop count

Apr 5 22:49:48.471: IP: s=10.51.31.3 (Vlan431), d=224.0.0.5 (nil), g=224.0.0.5, len 68, dispose ip.hopcount
Apr 5 22:49:48.705: IP: s=10.51.31.2 (Vlan431), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.000: IP: s=10.99.2.254 (Vlan420), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.185: IP: s=10.99.2.253 (Vlan420), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.215: IP: s=172.20.22.253 (Vlan682), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.215: IP: s=10.51.30.2 (Vlan430), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.256: IP: s=192.168.85.5 (Vlan460), d=224.0.0.5 (nil), g=224.0.0.5, len 132, dispose ip.hopcount
Apr 5 22:49:49.640: IP: s=10.105.2.254 (Vlan703), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.805: IP: s=10.105.2.253 (Vlan703), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.823: IP: s=10.105.2.253 (Vlan703), d=10.105.2.40 (Vlan703), g=10.105.2.40, len 52, dispose udp.noport
Apr 5 22:49:49.860: IP: s=10.51.30.3 (Vlan430), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:49.897: IP: s=10.105.2.254 (Vlan703), d=10.105.2.40 (nil), g=0.0.0.0, len 52, dispose udp.noport
Apr 5 22:49:50.005: IP: s=10.99.2.253 (Vlan420), d=224.0.0.5 (nil), g=224.0.0.5, len 68, dispose ip.hopcount
Apr 5 22:49:50.351: IP: s=192.168.85.1 (Vlan460), d=224.0.0.5 (nil), g=224.0.0.5, len 132, dispose ip.hopcount
Apr 5 22:49:50.391: IP: s=10.51.30.3 (Vlan430), d=224.0.0.5 (nil), g=224.0.0.5, len 68, dispose ip.hopcount
Apr 5 22:49:50.411: IP: s=10.8.2.252 (Vlan800), d=224.0.0.102 (nil), g=224.0.0.102, len 80, dispose ip.baddest
Apr 5 22:49:50.562: IP: s=172.20.22.254 (Vlan682), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:50.670: IP: s=192.168.85.3 (Vlan460), d=224.0.0.5 (nil), g=224.0.0.5, len 132, dispose ip.hopcount
Apr 5 22:49:50.780: IP: s=10.51.31.3 (Vlan431), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:50.800: IP: s=10.8.2.253 (Vlan800), d=224.0.0.102 (nil), g=224.0.0.102, len 80, dispose ip.baddest
Apr 5 22:49:51.555: IP: s=10.51.31.2 (Vlan431), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:51.760: IP: s=10.99.2.254 (Vlan420), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:51.793: IP: s=10.105.2.253 (Vlan703), d=10.105.2.40 (Vlan703), g=10.105.2.40, len 52, dispose udp.noport
Apr 5 22:49:51.793: IP: s=10.105.2.254 (Vlan703), d=10.105.2.40 (nil), g=0.0.0.0, len 52, dispose udp.noport
Apr 5 22:49:51.893: IP: s=192.168.85.2 (Vlan460), d=224.0.0.5 (nil), g=224.0.0.5, len 152, dispose ip.hopcount
Apr 5 22:49:52.085: IP: s=10.99.2.253 (Vlan420), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:52.165: IP: s=172.20.22.253 (Vlan682), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount
Apr 5 22:49:52.165: IP: s=10.51.30.2 (Vlan430), d=224.0.0.2 (nil), g=224.0.0.2, len 48, dispose ip.hopcount

Kanan Huseynli · ‎04-05-2023

Hi,

bad hop count means, you receive IP packet with TTL=1 which results TTL=0 after packet switching logic.

It is normal for multicast traffics, especially if it is routing protocol traffic. 224.0.0.5 is OSPF multicast IP (so, most probably you have OSPF in your environment). 224.0.0.2 is all ipv4 router on the segment (some protocols use it e.g HSRPv1). 224.0.0.102 is for HSRPv2.

This can be also due to routing loop, when TTL is expired in transit.

Seach for unicast address. Do sh with | exclude 224.0.0 option. Then if there are lots of line (especially repeated ones) in the output, there is possibility that you have L3 loop. Then, it is needed to diagnose more.

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

MHM Cisco World · ‎04-05-2023

This common multicast issue

From cisco doc.

In order to solve the issue, you need to increase the TTL. This is done at the application level on the Sender. For more information, refer to your multicast application instruction manual.

https://www.cisco.com/c/en/us/support/docs/ip/ip-multicast/16450-mcastguide0.html

Giuseppe Larosa · ‎04-06-2023

Hello @mury ,

all IPv4 multicast addresses in range 224.0.0.x have link local scope and they cannot be routed or forwarded even if the TTL in packets is greater then 1.

The output of the debug ip error in your device is misleading, but for the the mentioned multicast IP addresses it is normal that they cannot go through the device they should be processed by the local device used or discarded depending on what protocols are active on the receiving L3 interface (OSPFv2 , HSRPv2 and so on).

Hope to help

Giuseppe

mury · ‎04-06-2023

To all who have responded, first of all thank you, but I am still surprised by this output. I think I really need to stress the part of "is this normal for Cisco to error out these multicast packets in this way?"

Following up on what I think Giuseppe is saying. Let's say this device receives multicast traffic, heck let's even say 224.0.0.5 specifically. Shouldn't the device either:

1) Receive the traffic, decrement TTL, and process the packet. AND not increment "bad hop count" along with ip errors such as "dispose ip.hopcount." It's not an error.

2) Receive the traffic, decrement TTL, and be unable to process the packet (for some reason,) in which case I would be more accepting of seeing the evidence of errors as I see here.

There are many vlans and many vrfs on this device. That is also in my head a very high number of "bad hop count" errors. OSPF is running everywhere. It is between sites, customers, also from customer A link 1 to customer A link 2. It seems to me I still have a problem. Some devices in the network have way too many IP addresses (interfaces) for me to keep straight in my head.

I probably need to go through each of these errors and see what each one is actually doing, but hypothetically, would any of these scenarios generate the specific logging I'm seeing:

1) OSPF configured on a connected device, that we do not have OSPF configured for. In other words, would the lack of passive interface configuration create this exact error. I know it's undesirable, but I'm curious if anyone know if that is the error you would see in a debug.

2) Giuseppe, you said that 224.0.0.X is link local. I always thought you could adjust the TTL on OSPF to 254 or something like that. I'll have to dig deeper in to that one. Assuming you are incorrect about it only being link local, (which is unlikely, I've read many of your posts and you are very knowledgeable,) would it be possible to have a multicast routing loop, even in the absence of a unicast loop? There are redundant private links between some locations that have been configured with different vlans, and on a lot of devices STP is turned off. I know there were previous problems on this part of the network. It feels like to me that had a very real routing loop problem, including unicast, and solved it by creating multiple vlans for the same purpose (ie traffic.) I am wondering if it is possible that even though the unicast loop (at least in this part of the network) has been solved, could multicast traffic still be spinning endlessly through some of these vlans/vrfs, or perhaps even outside of these vlans/vrfs.

Eh, I probably need to go through each and every line of debug output and figure out exactly what is happening. I do appreciate all your help so far, and any additional insight anybody may have.

I hope you all have a good day.

MHM Cisco World · ‎04-06-2023

Are your sw connect to NSK?

mury · ‎04-06-2023

Not sure what NSK stands for.

NSK = Nexus?

If so, yes.

MHM Cisco World · ‎04-06-2023

Yes nexus I mean.

I think solution is two command in NSK must add under vpc domain.

Peer-gateway

Peer-router

The packet pass through vpc link make ttl drop by one.

Check above command and try apply them

mury · ‎04-06-2023

My knowledge of VPCs is very weak. My knowledge of Nexus in general isn't so hot either. However, those commands are already present.

Here is the config from the Nexus:

NexusA:

vpc domain 900
peer-switch
role priority 4096
system-priority 4096
peer-keepalive destination x:x:x:x::b source x:x:x:x::a
delay restore 150
peer-gateway
layer3 peer-router
ipv6 nd synchronize
ip arp synchronize

NexusB:

vpc domain 900
peer-switch
role priority 8192
system-priority 4096
peer-keepalive destination x:x:x:x::a source x:x:x:x::b
delay restore 150
peer-gateway
layer3 peer-router
ipv6 nd synchronize
ip arp synchronize

MHM Cisco World · ‎04-06-2023

What nexus platform is this

N3K or N5K or N7K or N9K??

mury · ‎04-06-2023

They are 5672s.

MHM Cisco World · ‎04-08-2023

Cisco Nexus 5000 Series NX-OS Interfaces Operations Guide, Release 5.1(3)N1(1) - Cisco Nexus 5500 Platform Layer 3 and vPC Operations [Cisco Nexus 5000 Series Switches] - Cisco
unsupported connection in N5K.