cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2097
Views
0
Helpful
10
Replies

Access switches can ping endpoint, core cannot.

brnhornt
Level 1
Level 1

Hi Folks,

Interesting issue I am troubleshooting.  Back story:

Layer 3 Core switch with single fiber trunk ports heading out to multiple Layer 2 access/idf switches.  

Hosts connected to various access/idf switches are complaining about loosing connectivity for 10-20 seconds at a time.

Pinging the end user's IP address is successful from all of the access switches...however pings fail from the core switch.  After so many seconds...pings return to normal from the core as well.  So an access switch on a different floor, which has to traverse thru the core swtich, can ping the address just fine during times that the core cannot. 

No logs or issues about Spanning Tree.  During an "outage" the spanning tree on the core remains the same as during successful pings.  

No issues with MAC addresses or ARP.  During an "outage" the end point's mac address and arp entry are still correct in the core switch.

Any ideas on what to check next.  This problem appears to be impacting only a handful of users in a site of 200-some.

10 Replies 10

Iulian Vaideanu
Level 4
Level 4

Do all those handful of users suffer from the issue simultaneously?  Are they connected to the same switch, or in the same vlan?  When the issue occurs, can they ping one another?

Is the addressing scheme something like "affected host IP A1.B1.C1.D1 on vlan V1 with gateway A1.B1.C1.1 on the core" and "access switch IP A2.B2.C2.D2 on (management) vlan V2 with gateway A2.B2.C2.1 also on the core"?

Good question that I should have started first.  No they do not suffer the outage at the same time.  Users are connected to various IDF switches (same VLAN) but I am focused in on two at the moment.  When the issue occurs they appear to be able to ping all addresses in their VLAN.

Address scheme is simple:

Core vlan1 - 172.16.204.1 255.255.252.0

Host 1 - 172.16.207.147 255.255.252.0 gateway 172.16.204.1

vantipov
Level 1
Level 1

You probably already ruled this out, but just in case, I would look at the fiber link interface statistics on both ends (the core and the access layer switch).  You might be dealing with some unidirectional drops or drops related to packets of greater than a certain size.

uplink interfaces look good/clear on both ends

Hello

Are these effected users on different vlans? 

When you ping from the core are your sourcing from different L3 interfaces and if so does it time out on any interface or just a particular one

Are you using any static addressing ?

Whats the cpu/memory util of the core 

Are you pruning the trunks?

What type of core is it - stacked- vss etc..

Are you running any first hop routing protocol (hrsp- vvrp- glbp)

Could you please post a run config  of the core and an effected closet switch

res

paul


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

No...same vlan.

Yes...core cannot ping no matter what vlan (only 4) it is sourced from

Not on this vlan

11% cpu 363408580 free memory

No trunk pruning

Two stacked 3750X's

No routing protocols

Hello

so between hosts same vlan no l3 routing involved

all stats look clean also you say

i would start looking at the hosts - software fw - AV scanning - infection-etc 

Have you tried disabling  all non ms services on an host and see that if that has an effect?

res

paul


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

More digging via wireshark that the host is receiving the ICMP requests from the core but not replying.  However...during the "outage" the host and many other devices are issuing ARP broadcasts looking to resolve the default gateway (IP of vlan 1 of the core)...after several (seemingly random) seconds (up to 30ish) I see the core reply to the ARP request and then pings/traffic flow normal.

Any broadcast storm control control configured on the switches?

brnhornt
Level 1
Level 1

After finding the ARP issues via Wireshark we decided to stop burning time troubleshooting.  A reboot of the switch resolved the issue.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card