02-17-2012 03:33 AM - edited 03-04-2019 03:18 PM
Hi all,
I have two Cisco 3845 routers which receive a multicast stram via a tunnel interface, i.e Tunnel163 (PIM Dense mode is enabled).
These routers are both connected to a LAN segment (FastEthernet0/1/0) where receivers are.
I observe the following very strange behavior:
Router1# show ip mroute 224.100.6.163 output (extract):
(192.168.163.22, 224.100.6.163), 00:32:57/00:02:38, flags: T
Incoming interface: Tunnel163, RPF nbr 173.1.163.2
Outgoing interface list:
FastEthernet0/1/0, Prune/Dense, 00:31:33/00:02:40, A
Router1# show ip pim neighbor fastEthernet 0/1/0 (extract)
Neighbor Interface Uptime/Expires Ver DR
Address Prio/Mode
100.1.6.252 FastEthernet0/1/0 00:23:21/850 msec v2 1 / S P G
Router1#show ip igmp groups 224.100.6.163 (extract)
IGMP Connected Group Membership
Group Address Interface Uptime Expires Last Reporter Group Accounted
224.100.6.163 FastEthernet0/1/0 5d01h 00:02:33 100.1.6.11
Router1 is the assert winner (highest IP address), it sees igmp joins request, but it's pruning the interface.
It's really confusing to me. It happens sometimes and it lasts until I manually issue clear ip mroute *
Unfortunately I cannot migrate to Sparse Mode, so I have to fix this problem.
Any help is really appreciated.
02-17-2012 06:28 AM
Ciao Gianluca,
while this happens is the twin router (100.1.6.252) also in prune state or is it fowarding traffic downstream?
R
02-17-2012 06:39 AM
Ciao,
the other router is also in prune state and thus there is no multicast traffic on the LAN (I checked with Wireshark).
Thanks for your interest,
Gianluca
02-17-2012 06:42 AM
Hi Gianluca,
When this issue occurs, what does the show ip igmp group command tell you when issued on both these routers?
Best regards,
Peter
02-17-2012 06:54 AM
Hi,
please find the output of show ip igmp gruops below.
Thanks four your interest,
Gianluca
Router1#show ip igmp groups 224.100.6.163 (extract)
IGMP Connected Group Membership
Group Address Interface Uptime Expires Last Reporter Group Accounted
224.100.6.163 FastEthernet0/1/0 5d01h 00:02:33 100.1.6.11
Router2#show ip igmp groups 224.100.6.163 (extract)
IGMP Connected Group Membership
Group Address Interface Uptime Expires Last Reporter Group Accounted
224.100.6.163 FastEthernet0/1/0 00:08:12 00:02:54 100.1.6.12
02-17-2012 07:06 AM
Hi Gianluca,
Thank you for your response. Hmmm... the group is subscribed indeed. I see only two logical reasons for an interface to be in a pruned state for a particular group:
I am considering exploring the second option. If you temporarily disabled the PIM on your Router2 (if that is permissible) and cleared the mroute table on your Router1, would the situation stabilize and would the interface continuously remain in Forward state?
Also consider using the ip pim neighbor-filter on your interface to decrease the possibility of PIM spoofing attacks.
Riccardo, any further ideas on this?
Best regards,
Peter
02-17-2012 07:50 AM
I was thinking of enabling some debugs on that group and see if we can get something useful out of it.
debug ip pim 224.100.6.163
also, I would check if there is some known issue on the IOS Gianluca is running. Gianluca, which release do you have on your c3845?
Riccardo
02-20-2012 09:07 AM
Hi Peter and Riccardo,
thanks for your valuable help. Coming to your hints:
1) Even assuming that an illegitimate source is sending prune messages, wouldn't Router1 go on forwarding 224.100.6.163 traffic nevertheless because of local IGMP receivers?
2) Router1 and Router2 are quite loaded and I am a bit afraid about turning on debugging commands (this has already been cause of troubles in the past). I would prefer to keep this option as a last resort.
3) Current release is:
C3845-ADVIPSERVICESK9-M V15.0(1)M4 Release SW (fc1)
Ciao,
Gianluca
02-20-2012 10:18 AM
Hello Gianluca,
1) Even assuming that an illegitimate source is sending prune messages, wouldn't Router1 go on forwarding 224.100.6.163 traffic nevertheless because of local IGMP receivers?
Good question. If the Router1 was not the Assert winner then receiving a Prune message would cause it to stop forwarding the multicast stream despite its knowledge about subscribed stations.
2) Router1 and Router2 are quite loaded and I am a bit afraid about turning on debugging commands (this has already been cause of troubles in the past). I would prefer to keep this option as a last resort.
I am afraid we are closing to the last resort possibilities I believe that the debug commands Riccardo suggested should not produce excessive output or load, and will most probably give us some more hints about what is happening.
Best regards,
Peter
02-23-2012 11:06 PM
Hello Gianluca,
Any news in this matter?
Best regards,
Peter
02-25-2012 11:21 AM
Hello Peter,
I have been analysing some sniffer captures on the LAN to which Router1 and Router2 are connected. I'm trying to check the exchange of PIM and OSPF packets to understand if something strange happens when the problem occurs. However, it is not an easy task
I'm also waiting for Roberto to let me know about any known IOS issue with the IOS. Finally, I'm monitoring the CPU load of the routers, I wonder if the problem is more likely to appear under stress conditions, but average CPU load is around 40%, which should not be a critical value I think.
Ciao
Gianluca,
02-25-2012 02:09 PM
Hello Gianluca,
I have a feeling that Riccardo merely suggested that it would be reasonable to look for known issues - you can visit the bug toolkit yourself at http://cisco.com/go/bugs . Needless to say, though, I'll try to reach out to him and find out if he did some internal search on this issue.
Regarding your sniffing work - I can imagine that it is difficult. I hope that the sniffer traces will help us narrow down the cause of the problem, though.
Best regards,
Peter
03-01-2012 02:47 AM
Hi again Peter,
actually I'm not entitled to use the Cisco bug toolkit (I guess you need SMARTNET support or similar). This is why I would be extremely grateful if someone could help with this check.
Gianluca
03-05-2012 08:35 AM
Hello Gianluca,
I have found a couple of bug reports that could theoretically pertain to this behavior, however, they all should have been fixed in the IOS version you have now, so this leaves me somewhat confused. Still, do you have an option of upgrading to a 15.1M IOS?
Did you arrive to any conclusion after analyzing your sniffer traces?
Best regards,
Peter
03-11-2012 04:20 PM
Dear Peter,
I have some news indeed. I have isolated the problem and understood wthat it is related to (but I'm not sure yet if it is a bug or expected behavior). I have also reproduced it in a simulation scenario. My feeling is that two factors play a fundamental role:
- having PIM State-Refresh configured on all the routers
- having an equal cost multi-path problem
Please refer to the diagram above, which is more or less the topology I have to deal with. C2 is the multicast source. R4 is the multicast IGMP receiver (actually it's a host, but it's represented as router because it was easier for me in order to setup up the simulation: it just joins the multicast group and does not take part to PIM nor to OSPF). R1<->R2 and R1<->R3 have the same cost. Also both LAN paths between R2 and R3 have the same cost. R3 in normal condition is the assert winner (equal metric towards the source but higher IP address) and multicast forwarder.
Now, if I shutdown R1<-->R3 connection, R2 becomes the assert winner (best metric towards the source, of course), but it remains in the pruned state (this is already strange). If I reactivate R1<->R3 link, R3 becomes the asserted winner again but it also remains in the pruned state and no traffic is forwarded on SW2. At least until I manually issue clear ip mroute*.
Now let's come to the interesting thing. The subnet on SW3 is higner (100.1.8.0/24) than the subnet on SW2 (100.1.6.0/24). When R3 looses the direct connection to R1, it has two equal cost paths toward C2 but it has to choose only one RPF interface. In this case it selects the neighbor with higher IP address (as expected according to PIM behavior) and thus the interface attached to SW3. Amazingly enough, if I change the subnet on SW3 in order to be lower than than the SW2 subnet (e.g 100.1.5.0/24) there is no issue at all and everythign works perfectly!
What's more, even if I disable PIM state refresh everywhere (without changing the subnets), there is no issue.
Sorry for the very long post. Things are a bit less obscure now, but still I don't clearly understand what happens. I can post the output of debug ip pim 224.100.6.163 if you think this can help!
Thanks,
Gianluca
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide