I had an issue where the multicast streams being published took over 4mins to recover when an HSRP failover occured.
Now I know PIM is independent of HSRP and that the multicast streams were being forwarded by the standby router, so the drop
of the multicast was not unexpected since the standby had rebooted and the unicast traffic was unaffected. But why would multicast
take so long to recover for the subscribers to the streams. the multicast setup is pim sparse with the RP being configured as an anycast address,
located in the core. Only when the standby had recovered was multicast fully operational.
any thoughts appreciated.
What was the "show ip mroute <group>" output showing. Were there any RPF failures incrementing?
Also, was the OIL complete or showing NULL?
Multicast requires live troubleshooting, so it will be hard to tell what might be going on.
but based on information shared, possibly RPF failures might be happening.
I'm coming at this after the event, so don't have the multicast group data or outgoing interfaces at the time of the event, just the logging showing PIM DR state change. I'm trying to workout the best way to debug the problem and ensure another crash doesn't cause same problem, can't run HSRP Aware PIM.
Since you cannot run HSRP aware PIM, the other alternate method that you can apply is having IP SLA configured with tracking object. So that in case the primary goes down and secondary takes over, you can point the static/default route for next-hop via standby router.
Please let me know if that helps.