IGMP querying router is 0.0.0.0 (and not "this system")

nnraymond · ‎02-01-2022

I'm trying to get to the bottom of why there is persistent intermittent problems with wired computer networking on one of our biggest VLANs that is being handled by layer 3 routing on our Catalyst 6509. This problem sprang up at time when no configuration changes had been made to the network so it is a bit of a mystery. Our wireless clients on VLAN 16 are not affected. Our APs on VLAN 12 are not affected. Our servers on VLAN 110 are not affected. Just VLAN 8, which has our wired workstations. Pinging internal hosts (Firepower internal interface, Umbrella VAs, etc.) results in 1% packet loss during low overall network activity but spikes higher (I've seen over 20%) at other times of the day, I think in relation to overall network traffic. Wired Windows PC workstations will think there is no internet randomly and report that to users who think their computers are offline. Ethernet links are staying up, load on the Firewpower is not too high, Umbrella VAs are not overloaded, and other buildings that are connected by fiber links to the Catalyst 6509 are not seeing any issues with any clients (wired or wireless).

Doing a show log on the 6509 returns this:

ATCH: Received a non-matching query interval 125, from querier address 0.0.0.0
*Feb 1 18:30:51.470: %IGMP-3-QUERY_INT_MISMATCH: Received a non-matching query interval 125, from querier address 0.0.0.0
*Feb 1 18:33:41.434: %IGMP-3-QUERY_INT_MISMATCH: Received a non-matching query interval 125, from querier address 0.0.0.0
*Feb 1 18:35:51.446: %IGMP-3-QUERY_INT_MISMATCH: Received a non-matching query interval 125, from querier address 0.0.0.0
*Feb 1 18:38:01.442: %IGMP-3-QUERY_INT_MISMATCH: Received a non-matching query interval 125, from querier address 0.0.0.0

...

That "%IGMP-3-QUERY_INT_MISMATCH: Received a non-matching query interval 125, from querier address 0.0.0.0" repeats every two minutes, which I thought was odd, so I did a show ip igmp interface and that revealed a discrepancy with VLAN 8 compared to all the other VLANs. While in every other VLAN the "Multicast designated router (DR)" matched the "IGMP querying router is" and both were "(this system)", VLAN 8 instead shows:

Multicast designated router (DR) is 10.2.8.1 (this system)
IGMP querying router is 0.0.0.0

I assume this IGMP querying router being 0.0.0.0 and not 10.2.8.1 (this system) is the cause of all those "%IGMP-3-QUERY_INT_MISMATCH"? What would cause this? Looking at the VLAN configurations, nothing is jumping out at me as the source... here is show run interface vlan8:

interface Vlan8
description *** workstations 8.1 - 11.255 ***
ip address 10.2.3.1 255.255.255.0 secondary
ip address 10.2.8.1 255.255.252.0
ip helper-address 10.2.1.5
no ip redirects
ip pim sparse-dense-mode
end

And here is show run interface vlan16:

interface Vlan16
description *** wireless-devices .16.1 - .31.255 ***
ip address 10.2.16.1 255.255.240.0
ip helper-address 10.2.1.5
ip pim sparse-dense-mode
end

Importantly, would fixing this IGMP issue help with the packet loss issues I'm seeing on VLAN 8, or is this a separate problem?

marce1000 · ‎02-01-2022

- FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCts38007

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

nnraymond · ‎02-02-2022

When I click that link all I get is "HTTP Status 400 – Bad Request"

paul driver · ‎02-02-2022

Hello

@nnraymond wrote:

. Our wireless clients on VLAN 16 are not affected. Our APs on VLAN 12 are not affected. Our servers on VLAN 110 are not affected. Just VLAN 8, which has our wired workstations.

Pinging internal hosts (Firepower internal interface, Umbrella VAs, etc.) results in 1% packet loss during low overall network activity but spikes higher (I've seen over 20%) at other times of the day, I think in relation to overall network traffic. Wired Windows PC workstations will think there is no internet randomly and report that to users who think their computers are offline.

Have you tried re-locating a couple of clients onto a different vlan and test connectivity, it you don't experience any connectivity loss then, it should confirm it is indeed down to that specific vlan

Do you these clients have both their wired/wlan nics activated at the same time

Regards igmp , On that core switch can you post:
sh ip igmp interface vlan 8
sh ip igmp groups

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

nnraymond · ‎02-02-2022

The wired clients on VLAN 8 mostly do not have wireless interfaces so none of them are configured to connect to WiFi. People are abandoning their desktops and turning to things like ChromeBooks to get their work done while I try to figure this out. I'll try moving a PC to a different wired VLAN and re-testing.

Here is the output of show ip igmp interface vlan 8:

Vlan8 is up, line protocol is up
Internet address is 10.2.8.1/22
IGMP is enabled on interface
Current IGMP host version is 2
Current IGMP router version is 2
IGMP query interval is 125 seconds
IGMP configured query interval is 60 seconds
IGMP querier timeout is 250 seconds
IGMP configured querier timeout is 120 seconds
IGMP max query response time is 10 seconds
Last member query count is 2
Last member query response interval is 1000 ms
Inbound IGMP access group is not set
IGMP activity: 2417 joins, 2415 leaves
Multicast routing is enabled on interface
Multicast TTL threshold is 0
Multicast designated router (DR) is 10.2.8.1 (this system)
IGMP querying router is 0.0.0.0
No multicast groups joined by this system

And the output of show ip igmp groups:

IGMP Connected Group Membership
Group Address Interface Uptime Expires Last Reporter Group Accounted
239.255.255.254 Vlan110 1w1d 00:02:25 10.2.1.5
239.255.255.250 Vlan15 2w5d 00:02:32 10.2.15.10
239.255.255.250 Vlan110 38w6d 00:02:29 10.2.1.55
239.255.255.250 Vlan8 38w6d 00:04:11 10.2.3.208
224.0.1.1 Vlan110 38w6d 00:02:29 10.2.1.215
224.0.1.60 Vlan8 20w5d 00:04:11 10.2.3.77
224.0.1.40 Vlan7 38w6d 00:02:30 10.2.7.1

paul driver · ‎02-02-2022

Hello
What is exactly are the users for that vlan experiencing.
Do they have dhcp allocation and is the addressing options they receive correct (dhcp server/dns/G/W)
Are you able to ping the default-gateway, or other network resources from the client, Does this happen to all user at the same time or is it intermittent
To negate any MC interference and as another test to rule out MC, disable pim for that vlan as an interim measure and test again

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

nnraymond · ‎02-02-2022

DHCP is fine for all users, no issues there. I've spent some time in the building today to gather some first-hand experiences, and I've learned that many computers have no trouble at all (contrary to the initial impressions I was given). The users with computers that aren't working well have either switched to WiFi devices (which don't show any issues anywhere in the building) or are going to other locations in the building and using other computers when they can. I tracked down one user who has in the past experienced so much loss of connectivity that most of the day she couldn't use websites without her computer saying she wasn't connected to the internet every two minutes, however she said that today it's been much better and she has been able to use her computer. I'm sitting at it right now. This is what show interface Gi1/0/11 returns on the switch her computer is connected to:

GigabitEthernet1/0/11 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is c067.afc1.028b (bia c067.afc1.028b)
Description: *** User Ports ***
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:08:02, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 41405
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 187000 bits/sec, 37 packets/sec
5 minute output rate 831000 bits/sec, 122 packets/sec
11046348 packets input, 4226689539 bytes, 0 no buffer
Received 154813 broadcasts (129837 multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 129837 multicast, 0 pause input
0 input packets with dribble condition detected
60490970 packets output, 14685158923 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

Is that 41405 total output drops something to be concerned about? Here's the configuration of that port, which is our standard user port configuration:

interface GigabitEthernet1/0/11
description *** User Ports ***
switchport access vlan 8
switchport voice vlan 4
spanning-tree portfast edge
end

I just did a constant ping test from that PC to the internal interface of the Firepower firewall, here's what I got:

Packets: Sent = 614, Received = 614, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 238ms, Average = 1ms

Besides some odd maximums which would show up periodically, no loss is good. Are those occasional high ping times something to be concerned with? This PC is connected to a 3750X switch stack which has a 10gig fiber connection to the 6509 Catalyst and the Firepower is plugged into a 10gig fiber connection on the Catalyst as well.

paul driver · ‎02-02-2022

Hello

Glad to hear its not as bad as you first thought-

@nnraymond wrote:

Is that 41405 total output drops something to be concerned about? Here's the configuration of that port, which is our standard user port configuration:

Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 41405

The ratio is about 1/1500 thats getting dropped - However this could be all historical as the interface counters have never been cleared.

Also it looks like the access port is hardcoded, does it really need to be, have your tried letting the interface negoicate its own speed/duplex setting.

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

nnraymond · ‎02-02-2022

That access port is not hard coded, it's set to auto and is dynamically negotiating that speed and duplex.

I just completed another, longer ping test to the internet Firepower interface from a different PC connected to another switch stack in the building, and this is what I got:

Packets: Sent = 7253, Received = 7224, Lost = 29 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 249ms, Average = 1ms

Not sure why some pings are getting lost, and again, some high maximums. What can I do to narrow those issues down?

nnraymond · ‎02-09-2022

So doing a no ip pim sparse-dense-mode on vlan8 has solved the performance issues, which were still sporadically severe. Why did disabling that fix it?

I've also been doing some cross-checking since we are transitioning our switch closets to Meraki from Cisco, and looks like Meraki only supports Sparse Mode:

https://documentation.meraki.com/MS/Layer_3_Switching/Troubleshooting_Layer_3_Multicast

Any reason I shouldn't just turn off PIM on all VLANs at this point? I have a lurking suspicion that my predecessor set up the ip pim sparse-dense-mode years ago to try and work around limitations of AirPlay across VLANs (which was ultimately solved later when Apple revised AirPlay so it worked by BlueTooth initiation followed by a non-broadcast temporary W-Fi SSID privately set up between the devices doing AirPlay).