09-24-2013 09:03 AM - edited 03-07-2019 03:39 PM
All,
I'm not sure how to go about troubleshooting this one. I have a site that has pim dense enabled. I had to do this for some local multicast traffic they needed to support, but I did it for all of our sites. The other sites have normal looking multicast routing tables while this one site has quite a few more entries than normal. The equipment is the same across the board, but the IOS versions differ.
On this single router, here's what I'm seeing. Notice that it's every minute and the amount of changes are astronomical. I do not have an igp at this location. We peer with our provider with bgp and that's about it. I'm assuming this could just be a bug. I'm not losing any traffic, but noticed high cpu utilization which is what pushed me down this path.
Last 15 triggered multicast RPF check events
RPF backoff delay: 500 msec
RPF maximum delay: 5 sec
DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES
Sep 24 10:55:19.076 500 msec BGP Unknown 55572
Sep 24 10:54:19.075 500 msec BGP Unknown 55480
Sep 24 10:53:19.186 500 msec BGP Unknown 55542
Sep 24 10:52:18.973 500 msec BGP Unknown 55498
Sep 24 10:51:18.972 500 msec BGP Unknown 55417
Sep 24 10:50:18.970 500 msec BGP Unknown 55541
The other thing that I'm seenig is a source address that doesn't exist. I've been watching the mroute table, and the source continues to refresh. I've cleared the table and it refreshes again. I've tracked it back to matching only on our default route in bgp, so the source itself doesn't exist on my network. I'm not sure what could be causing this....
Thanks!
John
09-24-2013 01:40 PM
I've disabled triggered rpf checks for the time being. This has "resolved" the issue, but it doesn't solve the problem I don't believe. I found two unknown routes in the mcast routing table, and I've created a route to both of those to null0 simply because they're located on that local network and I don't route for them.
Triggered RPF checks happen when the routing table changes from what I've understood, but the routing table in this case simply isn't changing. I also can't clear these stats as I've yet to find the command for it. I'm unsure as to why BGP is registering Unknown when at this point I don't have a single mcast route that's not local. We use bgp to the provider, but I don't have pim enabled on the wan side of the router. (I guess I could enable it to see what would happen.)
We have a single peer, no igp, and there's no way of the router learning the same route from two different sources to even register an rpf failure.
Thanks,
John
09-24-2013 08:12 PM
John
Can you provide a simple network diagram that shows the source and receivers and where these drop are seeing?
Sent from Cisco Technical Support iPhone App
09-25-2013 05:04 AM
Thanks Amjad. Here's a very simple diagram. I redid my box recently and didn't have visio...sorry
The PE is into our mpls provider. The CE is the branch router. The route that I was learning from, which has since been fixed, was an unknown route and was matching on the default route toward the core via the PE. This is what I initially thought was causing the rpf failure. For that situation, I enabled the subnet that the unknown host was on as a secondary subnet on the router, and that fixed the rpf failure, but it didn't stop logging these events. I'm updating the router this morning to a different version of ios to see if that helps with the issue. In comparison, another site that has the same router model, but different ios, had 2 rpf failures as opposed to 5500...it doesn't seem like these are incremental either, but I'm not sure.
Thanks!
John
09-25-2013 06:25 AM
Some additional information:
MRT(0): Triggered RPF backoff timer started, cause: BGP
MRT(0): Triggered RPF check after 500 msec back off delay
MRT(0): Triggered RPF reset after 1000 msec delay
I ran "debug ip mrouting rpf-events" and received the above. I also ran "debug ip bgp ipv4 unicast" and I wasn't seeing any changes/updates. What in the world is going on?
HTH,
John
*** Please rate all useful posts ***
09-25-2013 10:58 AM
Thanks John
few questions:
- show ip mroute: from both CE and Edge switch
- Where are these sources and receivers located? are they in the same site?
- from "show ip mroute count", do you see drops because of RPF?
09-25-2013 11:11 AM
Here you go:
(*, 239.255.255.254), 00:07:49/00:02:23, RP 0.0.0.0, OIF count: 1, flags: DC
(*, 239.255.255.253), 00:07:25/stopped, RP 0.0.0.0, OIF count: 1, flags: DC
(10.125.1.40, 239.255.255.253), 00:01:49/00:01:10, OIF count: 0, flags: PT
(*, 239.255.255.250), 00:07:50/stopped, RP 0.0.0.0, OIF count: 1, flags: DC
(192.168.1.61, 239.255.255.250), 00:00:57/00:02:02, OIF count: 0, flags: PT
(*, 224.0.72.62), 00:07:48/00:02:12, RP 0.0.0.0, OIF count: 1, flags: DC
(*, 232.44.44.233), 00:07:50/00:02:32, RP 0.0.0.0, OIF count: 1, flags: DC
(*, 224.0.255.135), 00:07:50/00:02:49, RP 0.0.0.0, OIF count: 1, flags: DC
(*, 224.0.1.22), 00:07:49/00:02:23, RP 0.0.0.0, OIF count: 1, flags: DC
(*, 224.0.1.55), 00:07:44/00:02:28, RP 0.0.0.0, OIF count: 1, flags: DC
(*, 224.0.1.60), 00:07:50/00:02:24, RP 0.0.0.0, OIF count: 1, flags: DC
(*, 224.0.1.40), 00:07:50/00:02:22, RP 0.0.0.0, OIF count: 1, flags: DCL
(*, 224.0.1.84), 00:07:49/00:02:19, RP 0.0.0.0, OIF count: 1, flags: DC
There aren't any senders at the moment.
IP Multicast Statistics
14 routes using 7810 bytes of memory
11 groups, 0.27 average sources per group
Forwarding Counts: Pkt Count/Pkts(neg(-) = Drops) per second/Avg Pkt Size/Kilobits per second
Other counts: Total/RPF failed/Other drops(OIF-null, rate-limit etc)
Group: 239.255.255.254, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 239.255.255.253, Source count: 1, Packets forwarded: 0, Packets received: 2
Source: 10.125.1.40/32, Forwarding: 0/0/0/0, Other: 2/0/2
Group: 239.255.255.250, Source count: 2, Packets forwarded: 0, Packets received: 15
Source: 192.168.1.60/32, Forwarding: 0/-1/0/0, Other: 5/0/5
Source: 192.168.1.61/32, Forwarding: 0/0/0/0, Other: 10/0/10
Group: 224.0.72.62, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 232.44.44.233, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 224.0.255.135, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 224.0.1.22, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 224.0.1.55, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 224.0.1.60, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 224.0.1.40, Source count: 0, Packets forwarded: 0, Packets received: 0
Group: 224.0.1.84, Source count: 0, Packets forwarded: 0, Packets received: 0
And there aren't any drops due to RPF. From the time that I posted the last message, the number has incremented to highest of 17, and I still can't find any sources. The only difference in the BGP process that I have between this site and another that is working is that I've changed the timers on here, but I wouldn't think that would do anything:
DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES
Sep 25 13:06:10.687 500 msec BGP Unknown 19
Sep 25 13:05:10.686 500 msec BGP Unknown 19
Sep 25 13:04:10.684 500 msec BGP Unknown 15
Here is a working site:
Last 15 triggered multicast RPF check events
RPF backoff delay: 500 msec
RPF maximum delay: 5 sec
DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES
Sep 25 13:07:31.630 500 msec BGP Unknown 0
Sep 25 13:06:31.629 500 msec BGP Unknown 0
Sep 25 13:05:31.628 500 msec BGP Unknown 0
Sep 25 13:04:31.626 500 msec BGP Unknown 0
Something is going on, but I can't see what it is. Also, between the non-working and working site, they have the same ios. I copied the ios version from the working site's router, which was newer, to the non-working this morning and reloaded to see if it was an IOS issue. While I don't have the 55000 changes, I'm still seeing more than I should.
The edge switch is a Dell, and only supports igmp.
The only reason for the need to support multicast is that I had a server that was sending multicast to communicate with hosts and the switches were broadcasting that traffic. After configuring igmp snooping and dense mode on the switches, it resolved the issue. Now I have this The server is in the same site as all of the receivers.
Thanks!!
09-25-2013 11:13 AM
According to a document from Cisco, a normal event would look like the following:
DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES
Mar 7 03:24:10.505 500 msec Static Route UP 0
Mar 7 03:23:11.804 1000 sec BGP Route UP 3
Mar 7 03:23:10.796 500 msec ISIS Route UP 0
I can't find any documentation for the "UNKNOWN" event. I was running "debug ip routing" and I didn't see any rib changes at all, but the rpf change was still registering.
Thanks,
John
09-25-2013 11:33 AM
John
All these groups that with 224.0.255.X dont have a source but they have receivers, some clients requested to join to these groups. These groups most probably are used by an application or devices on your netwrok (FW, Antivirus, ...) , you need to check if this is ligit, if yes then check why the source is not transmitting any traffic
You can find that from the OIF, it will show you where these join came from, and since you are using igmp snooping it will be easy to find that on the switch
Same things for other groups who dont have a source, check the receivers why they sent the join, it looks the router doesnt have any idea about these groups and thats why there is no packets being forwarded for these groups.
In general, any (*,g) in your ip moure table means the router received a join request but if you don't have (s,g) it means it is either there is no source or there is a network issue and the router is not getting the packet from the source,
Also, you can use sparse-mode or sparse-dense mode, it will save your resources and you will not see unwanted groups on your router
regards
09-25-2013 11:49 AM
Switching to sparse mode is not resolving the issue. It's not a problem with how many multicast routes I have, but rather I'm seeing triggered rpf changes which supposedly comes from the unicast routing table making a change. Debugging the routing table doesn't show that....
Address Interface Ver/ Nbr Query DR DR
Mode Count Intvl Prior
x.x.x.x FastEthernet0/0 v2/S 0 30 1 x.x.x.x
You can see that it's in sparse mode now above. I've cleared the multicast routing table, and I'm still getting the rpf changes:
RPF backoff delay: 500 msec
RPF maximum delay: 5 msec
DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES
Sep 25 13:48:10.736 500 msec BGP Unknown 16
Sep 25 13:47:10.735 500 msec BGP Unknown 15
Sep 25 13:46:10.734 500 msec BGP Unknown 15
Sep 25 13:45:10.733 500 msec BGP Unknown 16
I'm about to lab this up to see if I can get it to fail in the lab....
09-25-2013 12:19 PM
John
- changing to sparse mode is to get rid of these unneeded groups, I dont beleive it is related to the original issue.
- For these rpf, can you please provide the show ip route, if the routing table is big just provide me with show ip route static
09-25-2013 12:21 PM
in addition to :
show ip bgp dampening dampened-paths
show ip bgp dampening flap-statistics
show ip bgp dampening parameters
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide