Solved: Cisco 3750 3650 switches SSH issue

Adam Hudson · ‎08-26-2015

My network infrastructure is set up with several 3750-Xs as my switch stack. From there I have several satellite 3650 switches that connect back to the core via two fiber pairs set up as a port channel. To manage my switches I have a separate management VLAN.

Here's my issue: while performing on something unrelated I noticed I could no longer SSH to one of my switches from either my linux machine nor my Win 7 machine (neither of these machines have an IP in the management VLAN). This switch trunks back to the core switch stack (like several others) and also has two switches that trunk into it to get back to the switch stack (they're "farther out" so to speak). I can ssh into those just fine, then ssh "back" into the switch I can no longer SSH into from my desktop machines.

At first I couldn't even SSH into the problem switch from other switches "closer" to the core switch stack including the core switch stack itself, then (through no change I made, and I'm the only one who should be working on these switches) suddenly I could.

Troubleshooting this further:
-I can ping all of the management IPs from my desktops besides the problem switch
-I can ping the problem switch's management IP from all of my switches, even when I couldn't connect into it from some of them
-SSH debug shows nothing helpful
-All switches have the same version of SSH
-Checking the allowed VLANs, the management VLAN is allowed on the trunk heading to the problem switch from the core stack
-I keep versioning history on all of my switches and this switch's config hasn't been changed for at least 4 weeks, even then, any of the changes made to the problem switch or the core switch within the last year have had nothing to do SSH communications and I know I've SSH'd into the switch recently with no issue.

This one is a head scratcher for me. Any help is appreciated.

Peter Paluch · ‎08-27-2015

Adam,

Thank you for the configurations. To me, it appears that the configuration contains a grave error in that the ip default-gateway is configured to be an IP address that is outside the IP network in the management VLAN. If the IP subnet in the management VLAN is 20.1.200.0/24 then I believe we both agree on the fact that the IP address 20.1.1.1 does not belong to this network, and for each address that is outside 20.1.200.0/24, you need to use a gateway. You surely see that this is a chicken-and-egg problem: To reach the "gateway" at 20.1.1.1, you need a gateway!

You have mentioned that in the management VLAN, there is a device 20.1.200.1 you have called the management VLAN gateway. Can you configure the problem switch with ip default-gateway 20.1.200.1 instead of the existing 20.1.1.1 and see if anything changes?

Best regards,
Peter

View solution in original post

paul driver · ‎08-26-2015

Hello

Can you SSH from the switch attaching the linux / Win 7 machines?

res

Paul

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Adam Hudson · ‎08-27-2015

Paul, if you're asking if I can SSH from my core switch stack to the problem switch, the answer is yes.

Peter Paluch · ‎08-26-2015

Adam,

A couple of questions in addition to Paul's:

Does the problem switch have a default gateway configured? If it is configured with ip routing then the default route must either be configured using ip route 0.0.0.0 0.0.0.0 or learned via a routing protocol. If no ip routing is configured then the default gateway must be configured using ip default-gateway command. These approaches are not interchangeable (the ip default-gateway will be ignored with ip routing; with no ip routing, the ip route 0.0.0.0 0.0.0.0 won't probably be accepted at all).
During the time when the problem switch was not SSH-able from your Linux machine, could you at least ping it successfully?
What would ssh -v problem-switch-IP say on your Linux machine? Any interesting clues there?

For some reason, it appears that the problem switch had troubles communicating with someone outside its management VLAN, so I am first focusing on the basic IP settings of the switch at this moment. Please note that with no ip routing and without ip default-gateway, the switch will resort to Proxy ARP client operation which could account for the intermittent connectivity.

Best regards,
Peter

Adam Hudson · ‎08-27-2015

Peter,

1. My switches are configured with ip default-gateway

2. I cannot ping the problem switches management IP at any time since I started troubleshooting this

3. I get a "connection timed out" error when running that command.

Adam Hudson · ‎08-27-2015

Further information, the status of the management VLAN interface on the switch is "up and up".

Peter Paluch · ‎08-27-2015

Adam,

Thanks for the answers! Okay, and how about the opposite direction?

Can the problem switch ping its default gateway or any host in its management VLAN?
Can the problem switch ping your Linux station, or any other host in any other VLAN?
What does the problem switch indicate in its show ip redirect output? Can you post it here?

Thanks!

Best regards,
Peter

Adam Hudson · ‎08-27-2015

1. The problem switch can ping it's default gateway and any active address in the management VLAN

2. The problem switch can't ping my workstation, or any subnet outside of the management VLAN which is curious.

3. Sh ip redirect shows nothing being redirected.

Peter Paluch · ‎08-27-2015

Adam,

2. The problem switch can't ping my workstation, or any subnet outside of the management VLAN which is curious.

This is curious indeed. Assuming that the problem switch can ping its default gateway, can it also ping another interface on its default gateway, i.e. can it ping another IP address configured on the device acting as the default gateway?

Can you also test the obligatory traceroute from the problem switch toward your Linux host to see if the problem switch truly uses the intended default gateway, and see where it gets stuck?

Finally, is the default gateway IP address a real IP address, or is it a virtual IP address provided by HSRP, VRRP, or GLBP?

Best regards,
Peter

Adam Hudson · ‎08-27-2015

Here's the strange thing I just realized, the management VLAN gateway is on the same device as the PC VLAN gateway. So I can ping one of two VLAN interfaces on the same switch. So there's almost like there's a translation problem, but only for one switch, that's had no changes made to it.

The gateway IP address is assigned to a virtual VLAN interface.

Peter Paluch · ‎08-27-2015

Adam,

I've got a large list here for you but we need to narrow down the problem. So far, it appears to be limited either to the problem switch or to its default gateway.

So you can ping only one of potentially many interface Vlan interfaces on the default gateway from the problem switch, correct? Please let's be absolutely sure about this: Assuming that the problem switch is 192.168.255.2/24 and its default gateway is 192.168.255.1/24, and this default gateway also has a 172.16.1.1/24 configured on another its interface, the problem switch can ping 192.168.255.1 but can not ping 172.16.1.1, correct?
What does the problem switch say when you enter the show ip route command? I need you to enter that command and comment the output.
What do you mean by saying that there's a "translation problem" - what kind of translation? Is there perhaps any NAT in action?
Are there any ACLs applied on the default gateway, or any kind of PBR?
Are there any ACLs applied on the problem switch?
Any wisdom gained from the traceroute command on the problem switch toward your Linux machine?
If you create the following ACL on the default gateway:

access-list 199 permit icmp any any

and run debug ip packet 199 detail on the default gateway (don't forget terminal monitor if accessing it remotely), can you confirm with absolute certainty that when the problem switch pings some other IP address on the default gateway, the default gateway reports these pings being received and replied to in the debug? If they are being replied to, can you confirm from the debgu they are sent back through the proper SVI?

Thanks!

Best regards,
Peter

Adam Hudson · ‎08-27-2015

1. The problem switch can ping the management VLAN gateway, and I just realized the switch cannot ping it's own default gateway. Even though, the switch behind/"farther" away from the core stack can ping it just fine. See #5 for traceroute results

2. sh ip route on a problem switch and working switch look the same, shows the correct default gateway, no routes in the table, and ICMP redirect cache is empty

3. I did not mean to be misleading here, there's no NATting going on. I was referring to the process of information moving from VLAN to another as "translating" which I realize is probably not technically correct.

4. The only ACLs we have are on snmp access.

5. Traceroute from a good switch to it's default gateway shows step 1 hitting the management VLAN gateway which is on the same switch as the default gateway, which is the same gateway for my PC, step 2 is my PC. Traceroute from the problem switch to it's default gateway doesn't even show it hitting the management VLAN, it just blank traces for 30 steps then quits.

6. There is too much traffic on my network right now to verify any specific pings. But based on the results from #5 it looks like when my problem switch tries to go to any other VLAN besides the management VLAN (including it's own default gateway), it doesn't know where to go. Whereas working switches, when pinging things besides the management VLAN, still know to go to the default gateway to get out.

What makes this more confusing is that the problem switch is a production with at least 20 machines communicating with several different subnets without an issue.

Peter Paluch · ‎08-27-2015

Hi Adam,

Regarding 1) - I am somewhat confused. You wrote:The problem switch can ping the management VLAN gateway, and I just realized the switch cannot ping it's own default gateway. You make a distinction between the management VLAN gateway and a default gateway? What does it mean when you say that the problem switch can ping the management VLAN gateway and cannot ping its own default gateway? There can be only one gateway for a switch in the no ip routing mode.

Regarding 6) - You could make the ACL specific to the IP address of the problem switch, e.g.:

access-list 199 permit icmp host problem-switch-IP any

This would make sure that the debugs are only related to your pinging from the problem switch.

Additional question: Is the problem switch properly able to resolve the MAC address of its default gateway in the ARP table? Does the show ip arp contain the same MAC address for the same default gateway IP address both on a working switch and on the problem switch?

Best regards,
Peter

Adam Hudson · ‎08-27-2015

Peter, for example, the switch has a default gateway of 20.1.1.1. The management vlan is VLAN 200. The switch also has a VLAN 200 interface address of 20.1.200.5. The management VLANs gateway is 20.1.200.1. The problem switch can ping 20.1.200.1, but not 20.1.1.1 .

Adam Hudson · ‎08-27-2015

Additional note, 5 weeks ago this switch stopped reporting versioning information. Looking back 5 weeks ago on my core switch, the only change that's possibly relevant is the passive-interface management VLAN that I removed from the core switch's EIGRP statement for troubleshooting purposes.

I've just added it back in, in hopes that was the issue. It's been 15 minutes and I still cannot SSH in, some maybe I just need to give it more time?

That also supposes that for some reason it only affected one switch instead of all twelve that are currently in production. Also we need to take into account none of the other switches besides the core stack have EIGRP on it so I'm not sure why that would even affect them.