09-11-2011 09:05 PM - edited 03-07-2019 02:09 AM
hey all,
have a very odd issue, every now and again a switch will not be pingable it will drop about 10 pings and then the pings will come back alive, durring this time traffic is not affected at all only the pings,
it seems that untill the switch is pinged it will appear down ( our monitoring software tell's us its down) once a few pings are send the switch will come back.
any ideas?
thanks
09-12-2011 12:03 AM
Check :-
1) ARP timeouts
2) Physcial uplinks
3) Importantly *** Spanning-Tree ***
HTH>
09-12-2011 12:08 AM
thanks,
all our switches only have one link back to the root bridge so i dont think it can be spanning-tree
the uplinks seem to stay up when the switches are unpingable
how do i check arp timeouts?
thanks
09-12-2011 12:11 AM
I assume that your root bridge is handling all Layer 3 routing? So check your config on the SVI's....if you have a router - then check the router interface
As for the design - in a switched environment if you only have 1 uplink, you have a single point of failure.
09-12-2011 12:15 AM
The show interfaces EXEC command displays the ARP timeout value. The value follows the "Entry Timeout:" heading, as seen in the following example from the show interfaces command:
ARP type: ARPA, PROBE, Entry Timeout: 14400 sec
Default is 4 hours (14400 sec)
Can you check diagnostics as well once?
If the diagnostic test has failed than that can be the reason for the issue.
Sweta
Please rate useful posts.
09-12-2011 12:24 AM
the arp time out is at 4:00:00 ( 4 hours)
how do i check the diagnostics?
Andrew: the root bridge isnt doing the routing as we have a nortel L3 switch, i will check that.
09-12-2011 02:13 AM
show diagnostic result module all - for 6500 switch
09-12-2011 11:02 AM
Any chance that you have a duplicate address somewhere ?
09-12-2011 05:07 PM
Sweta: unfortantly we dont have 6500 cisco switches onlt l2 ( 2950,2960) cisco switches
glen: no chance of that.
the only thing i can think of is that managment vlan svi is shutting down as all traffic works other traffic works fine except the pings, is there away to check how long an svi has been up for?
09-12-2011 07:04 PM
Hi Scott not too sure if you can check for how long the mgmt is been up for but surely the logs will tell you if the interface has been up or down. and when that happened.. perhaps try to match the times that you could not ping the switches with the logs of your layer 2 and 3 switches...
make sure that all switches have their times syncronized with a ntp server....
Cheers,
Fabio
09-12-2011 07:06 PM
also when you ssh/telnet into the switches run "term mon" command. it will shed some light....
09-12-2011 11:59 PM
thanks Fabio, i checked the log's and there is nothing , what does term mon do? i havnt been able to get any info from it.
09-13-2011 02:03 AM
Hey Scott,
term mon is short for terminal monitor you will find more information about it here:
https://learningnetwork.cisco.com/thread/8961
is it only your NMS thats not able to ping a switch or you also confirmed that from another machine eq desktop/servers? in case you have not I sugest you doing so....
I have seen NMS system triggering false alarms before and I would not be surprise if it's the case in your scenario.
If it is your NMS....
what NMS are you using? is it somehow customizable? maybe you could tweak the settings to not trigger false alarms...
HTH
Cheers,
Fabio
09-13-2011 02:08 AM
hi Fabio,
no it seems to be other computers/servers as well it seems to stay down until pinged.
09-13-2011 04:57 AM
Just to be sure I understand you Scott, when you say other computers/servers as well, are you saying that those other servers are down or that the other servers notice that the switch is down?
What type of physical connection do you have to this switch? If it's copper, it may be worth your while to run a 'test cable-diagnostics tdr int fa#/#' and 'show cable-diagnostics tdr int fa#/#' if these are supported on your image. This should help you rule out physical trouble. If it's fiber, It may be worth the trouble to check db loss on the strands with a Fluke if you have one.
These are not UP/DOWN tests, these are more to see if there is any lost traffic due to link trouble that doesn't exactly bring down the link, but does enough to cause performance problems.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: