Core switch not generating arp requests in response to ping

IntegraXP · ‎07-13-2020

Hi, I have inheirited a network which is experiencing an intermittent issue. The network structure is two 4500 core switches which handle all routing between the various vlans. There are several access switches (2960s) connected to the core switches.

There is a management vlan and all switches have their managment ip configured on that vlan.

What occurs is that I cannot connect to an access switch from a desktop PC. Ping/ssh fails. If I log onto the core switch (which has an IP address in the management vlan) and ping the access switch, the ping from the PC immediately starts working.

The issue also affects other 'passive' devices - boxes which don't send packets unless they are spoken to, like printers. The behaviour is always the same, regardless of which vlan the device is on. There is no communication with the device until you ping it from the core switch.

My theory is that the core switch is not sending an arp packet in response to an inbound syn packet destined for a device on another interface. If a device is 'chatty' then the core switches arp table will already contain an entry and so the communication works fine. The problem is only occurring for devices which don't see much activity.

My problem is that I don't know what this functionality is even called, so I don't know what could be blocking it. This is not a proxy arp issue, its much more basic.

The core of the problem:

I have a core switch and an access switch
I have a PC on vlan1 (desktops)
The core switch is configured with IPs on vlan1 and vlan2 (management) and routes between them
The access switch only has an ip on vlan2 (management)
Initially the PC cannot ping/ssh to the IP of access switch on vlan2.
I log on to the core switch and ping the IP address of the access switch on vlan2, this works
I can only then ping/ssh to the access switch

The real setup has a lot more vlans and the issue affects different devices on all of the vlans, but its always the 'quiet' devices.

Any suggestions as to where to even begin looking?

Reza Sharifi · ‎07-13-2020

Hi,

Is this only happening to vlan 2 (the management vlan)? Another word, if vlan 1 is a user vlan, can the users access the Internet, or access servers via HTTP or HTTPS? In general, is this impacting user applications?

Also, how do the 2960s connect to the core, using one uplink, 2 uplinks or Portchannels?

What is IOS version for both the core and access switches?

HTH

IntegraXP · ‎07-13-2020

The two core switches are stacked and run with hsrp on all vlan interfaces. They have portchannels configured to the access switches.

A couple of the access switches are stacked, some are not.

All configurations have two fibers running back to core1 and core2 for redundancy.

The problem occurs to both stacked and non-stacked configurations.

The issue is, as far as I can tell, from any vlan to any vlan. There aren't that many quiet devices and it doesn't happen all the time.

It *does* happen almost all the time ssh'ing to the access switches. Luckily that only affects me.

Versions:

Core - output from show ver:

IOS-XE version 03.09.00.E release (fc1)

ROM: 15.0(1r)SG12

Core - output from show ver running:

Base version 03.09.00.E

IOS 152-5.E

All the access switches are running 15.2(2)E6

Hope that helps

IntegraXP · ‎07-13-2020

The impact for users is mainly noticeable in printers which occasionally 'disappear' and stop responding.

I have also noticed that when an interface has a secondary ip configured that it occurs more frequently to the devices on the secondary.

i.e an interface has 192.168.100.1/24 and 192.168.200.1/24 on a single vlan. There will be problems communicating with a printer on, say, 192.168.200.234. If I move the printer to a printer vlan where the core switch only has a single ip, then the problem almost disappears, but its still occurring.

I have one printer in just this situation at the moment. I have to leave a ping open continuously to the printer to keep it 'alive'.

brselzer · ‎07-13-2020

Hello,

There is a very common issue where we don't flood packets if we don't have the mac learned on 4500:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvb78700

Check it out and see if it matches. Fixed in 3.9.2 or later. You might want to look at jumping to 3.11.2 as I believe that is the current recommended version.

Hope that helps!

-Bradley Selzer
CCIE# 60833

IntegraXP · ‎07-13-2020

hmm, it *sort* of sounds like it could be the issue, what the bug refers to as an "Unknown Unicast Flood" is retransmission-of-broadcasts-over-vlan-trunks, no?

The behaviour on a simple router doesn't have much to it: packet received, arp resolution, packet transmited

Now with vlan extension over multiple switches, the core could decide to not forward broadcast packets to other switches, but why? You do that and IP breaks.

This is something which is so basic to the operation of a router that I am mystified that it could have gotten through QA. Its not some esoteric functionality. The implication is that some sort of logic was developed to limit what is forwarded over trunks, but it wasn't tested... at all.

Thanks for the details of the bug. I would never have found that by searching. I'll get onto the head office people as they're the ones with the cisco maintenance details and they will have to organise any updates.