cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1568
Views
4
Helpful
12
Replies

Multicast troubleshooting

B-7462682
Frequent Visitor
Frequent Visitor

Multicast failover is slow, causing multicast traffic to drop for around 30 seconds.

How low can the IGMP timers (ip igmp query-interval and ip igmp query-max-response-time) be set?

Are there any risks with setting them too low?

Will lowering these timers improve multicast failover performance?

12 Replies 12

Atul_Choudhary
Frequent Visitor
Frequent Visitor
ip igmp query-interval
To configure the frequency at which the IGMP querier sends IGMP host-query messages from an interface. The IGMP querier sends query-host messages to discover which multicast groups have members on the attached networks of the router. Cisco IOS software uses a default IGMP query interval of 60 seconds, which is different from the RFC standard default of 125 seconds.
The show ip igmp interface command displays both the configured query interval and the received query interval in its output.
 
ip igmp max-response-time
Configure the maximum response time advertised in Internet Group Management Protocol ( IGMP) queries. It is the maximum amount of time hosts have to respond to an Internet Group Management Protocol (IGMP) query. 
 

How low can the IGMP timers (ip igmp query-interval and ip igmp query-max-response-time) be set?

Are there any risks with setting them too low? 
Since, Cisco default value of query-interval (60s)  is lower than RFC standard (125s), there should not be any need to decrease it further in most of the scenarios. 
IGMP max response time default is 10 seconds. On most platforms you should be able to configure it as low as 1 second but keeping max response time too low (1-2s) would force all host to respond to query almost in less interval of time which can create high CPU utilization, transient packet loss or IGMP report implosion (microbursts). 
 
However, for some reason if you need to configure the igmp for ultrafast failover:
ip igmp query-interval: 20
ip igmp query-max-response-time: 5
I won't advise going lower than this unless it is very specific case. 
 
Secondly, IGMP is not used for failure detection of paths, it maintains receiver membership. 
 
The failover issue you are observing might be due to one or more of these
PIM (Protocol Independent Multicast) convergence
RPF (Reverse Path Forwarding) recalculation
Upstream tree rebuild
 
OR non multicast related issues like IGP (OSPF/EIGRP/BGP) convergence. 
 

Joseph W. Doherty
Hall of Fame
Hall of Fame

You've asked about IGMP timers, but what kind of multicast failure(s) are you dealing with?  Possibly other factors might impact multicast failover time include PIM timers, IGMP snooping and shared vs. source trees.

.

Can you please share how much delay (in seconds) you see when you power off the Core Switch A? 
Where are the receivers and source? Connected to same LAN or another site connected by routing?

Were you able to isolate if it is only the Multicast traffic which is seeing delay in failover?

 

On the access switches, do I only need to enable IGMP snooping on them, or is there any additional configuration required to improve the multicast failover?

I'm not an expert on multicast, especially the time it takes for failover recover, but I would suspect more than enabling multicast snooping might be needed.

From from your description of cutting power on core A, the active RP is on it?

If the active RP fails, I suspect that's possibly a significant fault.

Possibly to speed up failover, some configuration changes might make a difference, but also what might be required is redundant hardware, such as a chassis with dual sups or one of the Stack variants.

Yes the active RP is on switch A.

Hello @B-7462682 ,

>> the active RP is on switch A.

Your fault test is to power off switch A. Switch B is configured to act as a PIM sparse mode multicast router.

If Switch B is using switch A as RP the failure has a great impact : the RP fails and switch B ( this depends on effective configuration) may fall back to PIM dense mode or it can elect itself as new RP.

Both of these two possible reactions to switchA power off take time and they are not related to IGMP timers and not related to PIM timers as you have used PIM  1 second hello interval and PIM with BFD.

The possible solutions are :

a) implementing anycast RP between switchA and switchB where a shared IP address in a loopback interface loop1 is the RP address and MSDP protocol is used between them using loop0 ( unique IP addresses on loop0)  to keep them in sync for active multicast sources.

see

https://www.cisco.com/c/en/us/td/docs/ios/solutions_docs/ip_multicast/White_papers/anycast.html

anycast RP confiiguration is different on Nexus

https://www.cisco.com/c/en/us/support/docs/ip/ip-multicast/115011-anycast-pim.html

 

b) As an alternative if switchA and switchB are of the same model and that model support a form of VSS or SVL pair or a stack the two switches can become a single box for outside world

Hope to help

Giuseppe

 

@B-7462682 , @Giuseppe Larosa  appears to confirm my suspicion that losing the active RP can be a significant fault.  He and @Atul_Choudhary both suggest using Anycast.  This is a great recommendation.  Before seeing either of their replies, while doing my own multicast research, I came across that too.  However, I recall reading it's IPv4 only (there's something else for IPv6).  I also didn't come across how fast it is.

But I'm also guessing it's possible core switch might be an active multicast transient router, if so, its particular total failure can be even more impactful affecting failover time.

Something else I recall from my research, loss of an active RP, might have no impact to active multicast flows.  It's very much an "it depends" situation.  Without going into all the factors, topology matters and as can  hardware redundancy.

If your goal is to support minimizing any multicast service interruption, possibly major network changes and/or upgrades might be necessary.

Atul_Choudhary
Frequent Visitor
Frequent Visitor

You might have done it already but let me ask this again?

 

Tried to set pim spt threshold to 0?

ip pim spt-threshold 0

Maybe try highly aggressive igmp query interval. Not highly advisable but still to check results?
ip igmp query-interval 5
ip igmp query-max-response-time 1

Make sure Core B is also rp
ip pim rp-address Core B

or 

Try Anycast RP on both. So, RP never goes down. 

interface loopback0
ip address 10.10.10.10 255.255.255.255
ip pim rp-address 10.10.10.10

 

Hello
Sounds like your pim isn’t cfgd for hsrp failover as a result as/when you do failover -pim is waiting on its own timers (pim hello and hold times)
suggest apply pim rp redundancy on the hsrp interfaces or at a minimum lower your pim query’s to a 1 sec interval (may take a cpu hit) you could even possibly apply BFD on those links for faster pim failure


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul