08-26-2015 02:18 PM - edited 03-08-2019 01:32 AM
My network infrastructure is set up with several 3750-Xs as my switch stack. From there I have several satellite 3650 switches that connect back to the core via two fiber pairs set up as a port channel. To manage my switches I have a separate management VLAN.
Here's my issue: while performing on something unrelated I noticed I could no longer SSH to one of my switches from either my linux machine nor my Win 7 machine (neither of these machines have an IP in the management VLAN). This switch trunks back to the core switch stack (like several others) and also has two switches that trunk into it to get back to the switch stack (they're "farther out" so to speak). I can ssh into those just fine, then ssh "back" into the switch I can no longer SSH into from my desktop machines.
At first I couldn't even SSH into the problem switch from other switches "closer" to the core switch stack including the core switch stack itself, then (through no change I made, and I'm the only one who should be working on these switches) suddenly I could.
Troubleshooting this further:
-I can ping all of the management IPs from my desktops besides the problem switch
-I can ping the problem switch's management IP from all of my switches, even when I couldn't connect into it from some of them
-SSH debug shows nothing helpful
-All switches have the same version of SSH
-Checking the allowed VLANs, the management VLAN is allowed on the trunk heading to the problem switch from the core stack
-I keep versioning history on all of my switches and this switch's config hasn't been changed for at least 4 weeks, even then, any of the changes made to the problem switch or the core switch within the last year have had nothing to do SSH communications and I know I've SSH'd into the switch recently with no issue.
This one is a head scratcher for me. Any help is appreciated.
Solved! Go to Solution.
08-27-2015 02:16 PM
Adam,
Thank you for the configurations. To me, it appears that the configuration contains a grave error in that the ip default-gateway is configured to be an IP address that is outside the IP network in the management VLAN. If the IP subnet in the management VLAN is 20.1.200.0/24 then I believe we both agree on the fact that the IP address 20.1.1.1 does not belong to this network, and for each address that is outside 20.1.200.0/24, you need to use a gateway. You surely see that this is a chicken-and-egg problem: To reach the "gateway" at 20.1.1.1, you need a gateway!
You have mentioned that in the management VLAN, there is a device 20.1.200.1 you have called the management VLAN gateway. Can you configure the problem switch with ip default-gateway 20.1.200.1 instead of the existing 20.1.1.1 and see if anything changes?
Best regards,
Peter
08-26-2015 02:46 PM
Hello
Can you SSH from the switch attaching the linux / Win 7 machines?
res
Paul
08-27-2015 07:53 AM
Paul, if you're asking if I can SSH from my core switch stack to the problem switch, the answer is yes.
08-26-2015 06:56 PM
Adam,
A couple of questions in addition to Paul's:
For some reason, it appears that the problem switch had troubles communicating with someone outside its management VLAN, so I am first focusing on the basic IP settings of the switch at this moment. Please note that with no ip routing and without ip default-gateway, the switch will resort to Proxy ARP client operation which could account for the intermittent connectivity.
Best regards,
Peter
08-27-2015 08:08 AM
Peter,
1. My switches are configured with ip default-gateway
2. I cannot ping the problem switches management IP at any time since I started troubleshooting this
3. I get a "connection timed out" error when running that command.
08-27-2015 08:09 AM
Further information, the status of the management VLAN interface on the switch is "up and up".
08-27-2015 08:50 AM
Adam,
Thanks for the answers! Okay, and how about the opposite direction?
Thanks!
Best regards,
Peter
08-27-2015 09:47 AM
1. The problem switch can ping it's default gateway and any active address in the management VLAN
2. The problem switch can't ping my workstation, or any subnet outside of the management VLAN which is curious.
3. Sh ip redirect shows nothing being redirected.
08-27-2015 09:52 AM
Adam,
2. The problem switch can't ping my workstation, or any subnet outside of the management VLAN which is curious.
This is curious indeed. Assuming that the problem switch can ping its default gateway, can it also ping another interface on its default gateway, i.e. can it ping another IP address configured on the device acting as the default gateway?
Can you also test the obligatory traceroute from the problem switch toward your Linux host to see if the problem switch truly uses the intended default gateway, and see where it gets stuck?
Finally, is the default gateway IP address a real IP address, or is it a virtual IP address provided by HSRP, VRRP, or GLBP?
Best regards,
Peter
08-27-2015 10:08 AM
Here's the strange thing I just realized, the management VLAN gateway is on the same device as the PC VLAN gateway. So I can ping one of two VLAN interfaces on the same switch. So there's almost like there's a translation problem, but only for one switch, that's had no changes made to it.
The gateway IP address is assigned to a virtual VLAN interface.
08-27-2015 10:28 AM
Adam,
I've got a large list here for you but we need to narrow down the problem. So far, it appears to be limited either to the problem switch or to its default gateway.
Thanks!
Best regards,
Peter
08-27-2015 11:28 AM
1. The problem switch can ping the management VLAN gateway, and I just realized the switch cannot ping it's own default gateway. Even though, the switch behind/"farther" away from the core stack can ping it just fine. See #5 for traceroute results
2. sh ip route on a problem switch and working switch look the same, shows the correct default gateway, no routes in the table, and ICMP redirect cache is empty
3. I did not mean to be misleading here, there's no NATting going on. I was referring to the process of information moving from VLAN to another as "translating" which I realize is probably not technically correct.
4. The only ACLs we have are on snmp access.
5. Traceroute from a good switch to it's default gateway shows step 1 hitting the management VLAN gateway which is on the same switch as the default gateway, which is the same gateway for my PC, step 2 is my PC. Traceroute from the problem switch to it's default gateway doesn't even show it hitting the management VLAN, it just blank traces for 30 steps then quits.
6. There is too much traffic on my network right now to verify any specific pings. But based on the results from #5 it looks like when my problem switch tries to go to any other VLAN besides the management VLAN (including it's own default gateway), it doesn't know where to go. Whereas working switches, when pinging things besides the management VLAN, still know to go to the default gateway to get out.
What makes this more confusing is that the problem switch is a production with at least 20 machines communicating with several different subnets without an issue.
08-27-2015 11:30 AM
Hi Adam,
Regarding 1) - I am somewhat confused. You wrote:The problem switch can ping the management VLAN gateway, and I just realized the switch cannot ping it's own default gateway. You make a distinction between the management VLAN gateway and a default gateway? What does it mean when you say that the problem switch can ping the management VLAN gateway and cannot ping its own default gateway? There can be only one gateway for a switch in the no ip routing mode.
Regarding 6) - You could make the ACL specific to the IP address of the problem switch, e.g.:
access-list 199 permit icmp host problem-switch-IP any
This would make sure that the debugs are only related to your pinging from the problem switch.
Additional question: Is the problem switch properly able to resolve the MAC address of its default gateway in the ARP table? Does the show ip arp contain the same MAC address for the same default gateway IP address both on a working switch and on the problem switch?
Best regards,
Peter
08-27-2015 11:51 AM
Peter, for example, the switch has a default gateway of 20.1.1.1. The management vlan is VLAN 200. The switch also has a VLAN 200 interface address of 20.1.200.5. The management VLANs gateway is 20.1.200.1. The problem switch can ping 20.1.200.1, but not 20.1.1.1 .
08-27-2015 11:39 AM
Additional note, 5 weeks ago this switch stopped reporting versioning information. Looking back 5 weeks ago on my core switch, the only change that's possibly relevant is the passive-interface management VLAN that I removed from the core switch's EIGRP statement for troubleshooting purposes.
I've just added it back in, in hopes that was the issue. It's been 15 minutes and I still cannot SSH in, some maybe I just need to give it more time?
That also supposes that for some reason it only affected one switch instead of all twelve that are currently in production. Also we need to take into account none of the other switches besides the core stack have EIGRP on it so I'm not sure why that would even affect them.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide