Re: Extreme amount of RPF events

John Blakley · ‎09-24-2013

All,

I'm not sure how to go about troubleshooting this one. I have a site that has pim dense enabled. I had to do this for some local multicast traffic they needed to support, but I did it for all of our sites. The other sites have normal looking multicast routing tables while this one site has quite a few more entries than normal. The equipment is the same across the board, but the IOS versions differ.

On this single router, here's what I'm seeing. Notice that it's every minute and the amount of changes are astronomical. I do not have an igp at this location. We peer with our provider with bgp and that's about it. I'm assuming this could just be a bug. I'm not losing any traffic, but noticed high cpu utilization which is what pushed me down this path.

Last 15 triggered multicast RPF check events

RPF backoff delay: 500 msec

RPF maximum delay: 5 sec

DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES

Sep 24 10:55:19.076 500 msec BGP Unknown 55572

Sep 24 10:54:19.075 500 msec BGP Unknown 55480

Sep 24 10:53:19.186 500 msec BGP Unknown 55542

Sep 24 10:52:18.973 500 msec BGP Unknown 55498

Sep 24 10:51:18.972 500 msec BGP Unknown 55417

Sep 24 10:50:18.970 500 msec BGP Unknown 55541

The other thing that I'm seenig is a source address that doesn't exist. I've been watching the mroute table, and the source continues to refresh. I've cleared the table and it refreshes again. I've tracked it back to matching only on our default route in bgp, so the source itself doesn't exist on my network. I'm not sure what could be causing this....

Thanks!

John

HTH, John *** Please rate all useful posts ***

John Blakley · ‎09-24-2013

I've disabled triggered rpf checks for the time being. This has "resolved" the issue, but it doesn't solve the problem I don't believe. I found two unknown routes in the mcast routing table, and I've created a route to both of those to null0 simply because they're located on that local network and I don't route for them.

Triggered RPF checks happen when the routing table changes from what I've understood, but the routing table in this case simply isn't changing. I also can't clear these stats as I've yet to find the command for it. I'm unsure as to why BGP is registering Unknown when at this point I don't have a single mcast route that's not local. We use bgp to the provider, but I don't have pim enabled on the wan side of the router. (I guess I could enable it to see what would happen.)

We have a single peer, no igp, and there's no way of the router learning the same route from two different sources to even register an rpf failure.

Thanks,

John

HTH, John *** Please rate all useful posts ***

amabdelh · ‎09-24-2013

John
Can you provide a simple network diagram that shows the source and receivers and where these drop are seeing?

Sent from Cisco Technical Support iPhone App

John Blakley · ‎09-25-2013

Thanks Amjad. Here's a very simple diagram. I redid my box recently and didn't have visio...sorry

The PE is into our mpls provider. The CE is the branch router. The route that I was learning from, which has since been fixed, was an unknown route and was matching on the default route toward the core via the PE. This is what I initially thought was causing the rpf failure. For that situation, I enabled the subnet that the unknown host was on as a secondary subnet on the router, and that fixed the rpf failure, but it didn't stop logging these events. I'm updating the router this morning to a different version of ios to see if that helps with the issue. In comparison, another site that has the same router model, but different ios, had 2 rpf failures as opposed to 5500...it doesn't seem like these are incremental either, but I'm not sure.

Thanks!

John

HTH, John *** Please rate all useful posts ***

John Blakley · ‎09-25-2013

Some additional information:

MRT(0): Triggered RPF backoff timer started, cause: BGP

MRT(0): Triggered RPF check after 500 msec back off delay

MRT(0): Triggered RPF reset after 1000 msec delay

I ran "debug ip mrouting rpf-events" and received the above. I also ran "debug ip bgp ipv4 unicast" and I wasn't seeing any changes/updates. What in the world is going on?

HTH,
John

*** Please rate all useful posts ***

HTH, John *** Please rate all useful posts ***

amabdelh · ‎09-25-2013

Thanks John

few questions:

- show ip mroute: from both CE and Edge switch

- Where are these sources and receivers located? are they in the same site?

- from "show ip mroute count", do you see drops because of RPF?

John Blakley · ‎09-25-2013

Here you go:

(*, 239.255.255.254), 00:07:49/00:02:23, RP 0.0.0.0, OIF count: 1, flags: DC

(*, 239.255.255.253), 00:07:25/stopped, RP 0.0.0.0, OIF count: 1, flags: DC

(10.125.1.40, 239.255.255.253), 00:01:49/00:01:10, OIF count: 0, flags: PT

(*, 239.255.255.250), 00:07:50/stopped, RP 0.0.0.0, OIF count: 1, flags: DC

(192.168.1.61, 239.255.255.250), 00:00:57/00:02:02, OIF count: 0, flags: PT

(*, 224.0.72.62), 00:07:48/00:02:12, RP 0.0.0.0, OIF count: 1, flags: DC

(*, 232.44.44.233), 00:07:50/00:02:32, RP 0.0.0.0, OIF count: 1, flags: DC

(*, 224.0.255.135), 00:07:50/00:02:49, RP 0.0.0.0, OIF count: 1, flags: DC

(*, 224.0.1.22), 00:07:49/00:02:23, RP 0.0.0.0, OIF count: 1, flags: DC

(*, 224.0.1.55), 00:07:44/00:02:28, RP 0.0.0.0, OIF count: 1, flags: DC

(*, 224.0.1.60), 00:07:50/00:02:24, RP 0.0.0.0, OIF count: 1, flags: DC

(*, 224.0.1.40), 00:07:50/00:02:22, RP 0.0.0.0, OIF count: 1, flags: DCL

(*, 224.0.1.84), 00:07:49/00:02:19, RP 0.0.0.0, OIF count: 1, flags: DC

There aren't any senders at the moment.

IP Multicast Statistics

14 routes using 7810 bytes of memory

11 groups, 0.27 average sources per group

Forwarding Counts: Pkt Count/Pkts(neg(-) = Drops) per second/Avg Pkt Size/Kilobits per second

Other counts: Total/RPF failed/Other drops(OIF-null, rate-limit etc)

Group: 239.255.255.254, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 239.255.255.253, Source count: 1, Packets forwarded: 0, Packets received: 2

Source: 10.125.1.40/32, Forwarding: 0/0/0/0, Other: 2/0/2

Group: 239.255.255.250, Source count: 2, Packets forwarded: 0, Packets received: 15

Source: 192.168.1.60/32, Forwarding: 0/-1/0/0, Other: 5/0/5

Source: 192.168.1.61/32, Forwarding: 0/0/0/0, Other: 10/0/10

Group: 224.0.72.62, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 232.44.44.233, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 224.0.255.135, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 224.0.1.22, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 224.0.1.55, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 224.0.1.60, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 224.0.1.40, Source count: 0, Packets forwarded: 0, Packets received: 0

Group: 224.0.1.84, Source count: 0, Packets forwarded: 0, Packets received: 0

And there aren't any drops due to RPF. From the time that I posted the last message, the number has incremented to highest of 17, and I still can't find any sources. The only difference in the BGP process that I have between this site and another that is working is that I've changed the timers on here, but I wouldn't think that would do anything:

DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES

Sep 25 13:06:10.687 500 msec BGP Unknown 19

Sep 25 13:05:10.686 500 msec BGP Unknown 19

Sep 25 13:04:10.684 500 msec BGP Unknown 15

Here is a working site:

Last 15 triggered multicast RPF check events

RPF backoff delay: 500 msec

RPF maximum delay: 5 sec

DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES

Sep 25 13:07:31.630 500 msec BGP Unknown 0

Sep 25 13:06:31.629 500 msec BGP Unknown 0

Sep 25 13:05:31.628 500 msec BGP Unknown 0

Sep 25 13:04:31.626 500 msec BGP Unknown 0

Something is going on, but I can't see what it is. Also, between the non-working and working site, they have the same ios. I copied the ios version from the working site's router, which was newer, to the non-working this morning and reloaded to see if it was an IOS issue. While I don't have the 55000 changes, I'm still seeing more than I should.

The edge switch is a Dell, and only supports igmp.

The only reason for the need to support multicast is that I had a server that was sending multicast to communicate with hosts and the switches were broadcasting that traffic. After configuring igmp snooping and dense mode on the switches, it resolved the issue. Now I have this The server is in the same site as all of the receivers.

Thanks!!

HTH, John *** Please rate all useful posts ***

John Blakley · ‎09-25-2013

According to a document from Cisco, a normal event would look like the following:

DATE/TIME             BACKOFF    PROTOCOL   EVENT         RPF CHANGES

Mar 7 03:24:10.505    500 msec   Static     Route UP        0

Mar 7 03:23:11.804    1000 sec   BGP        Route UP        3

Mar 7 03:23:10.796    500 msec   ISIS       Route UP        0

I can't find any documentation for the "UNKNOWN" event. I was running "debug ip routing" and I didn't see any rib changes at all, but the rpf change was still registering.

Thanks,

John

HTH, John *** Please rate all useful posts ***

amabdelh · ‎09-25-2013

John

All these groups that with 224.0.255.X dont have a source but they have receivers, some clients requested to join to these groups. These groups most probably are used by an application or devices on your netwrok (FW, Antivirus, ...) , you need to check if this is ligit, if yes then check why the source is not transmitting any traffic

You can find that from the OIF, it will show you where these join came from, and since you are using igmp snooping it will be easy to find that on the switch

Same things for other groups who dont have a source, check the receivers why they sent the join, it looks the router doesnt have any idea about these groups and thats why there is no packets being forwarded for these groups.

In general, any (*,g) in your ip moure table means the router received a join request but if you don't have (s,g) it means it is either there is no source or there is a network issue and the router is not getting the packet from the source,

Also, you can use sparse-mode or sparse-dense mode, it will save your resources and you will not see unwanted groups on your router

regards

John Blakley · ‎09-25-2013

Switching to sparse mode is not resolving the issue. It's not a problem with how many multicast routes I have, but rather I'm seeing triggered rpf changes which supposedly comes from the unicast routing table making a change. Debugging the routing table doesn't show that....

Address Interface Ver/ Nbr Query DR DR

Mode Count Intvl Prior

x.x.x.x FastEthernet0/0 v2/S 0 30 1 x.x.x.x

You can see that it's in sparse mode now above. I've cleared the multicast routing table, and I'm still getting the rpf changes:

RPF backoff delay: 500 msec

RPF maximum delay: 5 msec

DATE/TIME BACKOFF PROTOCOL EVENT RPF CHANGES

Sep 25 13:48:10.736 500 msec BGP Unknown 16

Sep 25 13:47:10.735 500 msec BGP Unknown 15

Sep 25 13:46:10.734 500 msec BGP Unknown 15

Sep 25 13:45:10.733 500 msec BGP Unknown 16

I'm about to lab this up to see if I can get it to fail in the lab....

HTH, John *** Please rate all useful posts ***

amabdelh · ‎09-25-2013

John

- changing to sparse mode is to get rid of these unneeded groups, I dont beleive it is related to the original issue.

- For these rpf, can you please provide the show ip route, if the routing table is big just provide me with show ip route static

amabdelh · ‎09-25-2013

in addition to :

show ip bgp dampening dampened-paths

show ip bgp dampening flap-statistics

show ip bgp dampening parameters