05-25-2016 08:08 AM - edited 03-08-2019 05:56 AM
Hi Folks,
Interesting issue I am troubleshooting. Back story:
Layer 3 Core switch with single fiber trunk ports heading out to multiple Layer 2 access/idf switches.
Hosts connected to various access/idf switches are complaining about loosing connectivity for 10-20 seconds at a time.
Pinging the end user's IP address is successful from all of the access switches...however pings fail from the core switch. After so many seconds...pings return to normal from the core as well. So an access switch on a different floor, which has to traverse thru the core swtich, can ping the address just fine during times that the core cannot.
No logs or issues about Spanning Tree. During an "outage" the spanning tree on the core remains the same as during successful pings.
No issues with MAC addresses or ARP. During an "outage" the end point's mac address and arp entry are still correct in the core switch.
Any ideas on what to check next. This problem appears to be impacting only a handful of users in a site of 200-some.
05-25-2016 08:30 AM
Do all those handful of users suffer from the issue simultaneously? Are they connected to the same switch, or in the same vlan? When the issue occurs, can they ping one another?
Is the addressing scheme something like "affected host IP A1.B1.C1.D1 on vlan V1 with gateway A1.B1.C1.1 on the core" and "access switch IP A2.B2.C2.D2 on (management) vlan V2 with gateway A2.B2.C2.1 also on the core"?
05-25-2016 08:36 AM
Good question that I should have started first. No they do not suffer the outage at the same time. Users are connected to various IDF switches (same VLAN) but I am focused in on two at the moment. When the issue occurs they appear to be able to ping all addresses in their VLAN.
Address scheme is simple:
Core vlan1 - 172.16.204.1 255.255.252.0
Host 1 - 172.16.207.147 255.255.252.0 gateway 172.16.204.1
05-25-2016 08:41 AM
You probably already ruled this out, but just in case, I would look at the fiber link interface statistics on both ends (the core and the access layer switch). You might be dealing with some unidirectional drops or drops related to packets of greater than a certain size.
05-25-2016 09:03 AM
uplink interfaces look good/clear on both ends
05-25-2016 08:52 AM
Hello
Are these effected users on different vlans?
When you ping from the core are your sourcing from different L3 interfaces and if so does it time out on any interface or just a particular one
Are you using any static addressing ?
Whats the cpu/memory util of the core
Are you pruning the trunks?
What type of core is it - stacked- vss etc..
Are you running any first hop routing protocol (hrsp- vvrp- glbp)
Could you please post a run config of the core and an effected closet switch
res
paul
05-25-2016 09:09 AM
No...same vlan.
Yes...core cannot ping no matter what vlan (only 4) it is sourced from
Not on this vlan
11% cpu 363408580 free memory
No trunk pruning
Two stacked 3750X's
No routing protocols
05-25-2016 10:16 AM
Hello
so between hosts same vlan no l3 routing involved
all stats look clean also you say
i would start looking at the hosts - software fw - AV scanning - infection-etc
Have you tried disabling all non ms services on an host and see that if that has an effect?
res
paul
05-25-2016 11:53 AM
More digging via wireshark that the host is receiving the ICMP requests from the core but not replying. However...during the "outage" the host and many other devices are issuing ARP broadcasts looking to resolve the default gateway (IP of vlan 1 of the core)...after several (seemingly random) seconds (up to 30ish) I see the core reply to the ARP request and then pings/traffic flow normal.
05-26-2016 12:37 AM
Any broadcast storm control control configured on the switches?
05-31-2016 06:51 AM
After finding the ARP issues via Wireshark we decided to stop burning time troubleshooting. A reboot of the switch resolved the issue.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide