cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements
Join Customer Connection to register!
515
Views
15
Helpful
15
Replies
andrew.butterworth
Rising star

Catalyst 9300-24T stack stops forwarding all traffic?

I built a stack of two C9300-24T switches yesterday to replace a pair of Catalyst 3560X's.  Very simple cut-&-paste configuration with some HSRP IPv4 addresses being removed and the HSRP VIP's added as the physical addresses.

About an hour after the swap out of the 3560X's to the C9300's it just stopped working.  Everything connected through the switch at L3 just stopped.

There are two CPE routers connected on a /29 subnet running HSRP - one connected to switch #1 and one connected to switch #2.  HSRP and L2 connectivity between these two CPE routers was working, however they couldn't ping the SVI interface on the C9300.

From a 2960X L2 switch connected on a trunk to this stack I could see CDP information for the C9300, however I couldn't the other way around from the C9300 to the 2960X.  From the L2 2960X I couldn't ping the C9300, from the C9300 I couldn't ping anything.

 

After troubleshooting for an hour from the console and not getting anywhere I just rebooted it and everything came back.  I checked the stacking cables and they were all finger tight so don't think its just a loose stacking cable.  I left if for a couple of hours and everything was OK.

However looking at our monitoring platform this morning at 6:15am it failed again.  I can logon to the CPEs and HSRP is OK between them, they can ping each other but they can't ping the SVI interface on the switch stack.  I suspect a reboot of the C9300 will temporarily will fix this, but it will reoccur.

 

Has anyone seen this behaviour?  Luckily there are no staff on site today.

 

 

15 REPLIES 15
Leo Laohoo
VIP Community Legend

What firmware is the stack running on?

17.3.4.  Both have Network Advantage licenses.

 

balaji.bandi
VIP Master

May be ARP ? (if the default 14400 seconds) - may be you would have clear the Table rather reboot.

any way - hope resolved as per the reboot, that might have cleared ARP or time expired co-incident..

 

what is the version of the Code on Cat 9300 ? what is the Logs show ?

May be worth connecting device to Cat 9300 do debug ?

 

 

 

BB

***** Rate All Helpful Responses *****

How to Ask The Community for Help

Its not ARP.  It worked all night and failed at 6:15 this morning.

I think its either a hardware or a stacking issue.

From the two CPE's there is no ARP response for the switch SVI, however the two CPEs can reach each other through the VLAN on the C9300 stack (VLAN 3000), and HSRP is working OK.

 

I put another pair of these in with very similar configuration at one of the customers other sites and they are working fine.

 

I have raised a TAC case.

As you mentioned you moved the config from old to new, May be i will put come time, some commands when you moved messing up (i am thinking)

 

also mentioned new one simple config works fine, so suspect config issue i guess.

 

Do you any Logs in the switch ?

 

Note : 17.3.4 is latest Code

BB

***** Rate All Helpful Responses *****

How to Ask The Community for Help

It worked all night without issues.  Its not configuration.

The configuration is very simple - VTP off, Rapid-PVST+, several VLANs and SVI interfaces, some trunks to ESXi boxes, 2 x 2 x 1Gbps port-channel trunks to two access C2960Xs, static default route to HSRP VIP on the CPE routers, ip routing enabled.  That's pretty much it (AAA, NTP etc all the usual stuff).

 

It is either hardware or a stacking issue.

 

It is a new bug. 

Good luck with TAC. 

Let us know the Bug ID when the time comes.

and you do not see any Logs and just halted., as @Leo Laohoo  this could be bug, let us know new number for our reference and help others found to be a bug so.

 

BB

***** Rate All Helpful Responses *****

How to Ask The Community for Help

The switch stack doesn't halt.  It responds fine on the console, it just doesn't have any connectivity.

There was nothing logged.  The stack was reloaded from the console with the reload command.

 

I don't have access to it now as its not responding on any IPv4 address.

I am just trying to work out if I can configure reverse telnet on a C1100 console port so I can get a console connected to the switch.  Doesn't look like reverse telnet works on the C1100 series though....

Before the stack gets rebooted, post the complete output to the command "sh platform software status con brief".

Just managed to get back to site.

 

switch#sho platform software status control-processor brief
Load Average
Slot Status 1-Min 5-Min 15-Min
1-RP0 Healthy 0.21 0.41 0.29
2-RP0 Healthy 0.41 0.91 0.54

Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
1-RP0 Healthy 7757632 2863576 (37%) 4894056 (63%) 3820908 (49%)
2-RP0 Healthy 7757632 2809492 (36%) 4948140 (64%) 3242004 (42%)

CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
1-RP0 0 1.79 0.49 0.00 97.60 0.00 0.09 0.00
1 0.90 0.50 0.00 98.50 0.00 0.10 0.00
2 1.00 0.80 0.00 98.19 0.00 0.00 0.00
3 1.00 0.60 0.00 98.39 0.00 0.00 0.00
4 0.80 0.50 0.00 98.69 0.00 0.00 0.00
5 1.20 0.70 0.00 98.10 0.00 0.00 0.00
6 1.00 0.50 0.00 98.49 0.00 0.00 0.00
7 1.40 0.50 0.00 98.10 0.00 0.00 0.00
2-RP0 0 1.10 0.90 0.00 98.00 0.00 0.00 0.00
1 0.90 0.50 0.00 98.60 0.00 0.00 0.00
2 1.00 0.30 0.00 98.69 0.00 0.00 0.00
3 1.40 0.40 0.00 98.20 0.00 0.00 0.00
4 0.80 0.70 0.00 98.50 0.00 0.00 0.00
5 1.20 0.50 0.00 98.30 0.00 0.00 0.00
6 1.00 0.50 0.00 98.50 0.00 0.00 0.00
7 1.20 0.90 0.00 97.89 0.00 0.00 0.00

 

When switch #1 is active it can't even ping its own SVI interfaces.  L2 is working fine - i.e. 'show mac address table' shows learnt MACs.  However LACP won't work and the switch can't see any CDP neighbours, although they can see it.

I am convinced this is a faulty switch.

 

Andy

Nothing looks suspicious.  

Can I see the output to the following commands: 

  1. dir
  2. dir flash-1:core
  3. dir flash-2:core
  4. dir crashinfo-1:
  5. dir crashinfo-2:

I replaced the switch I suspected faulty (switch #1) with a spare I had and its now working so I am fairly sure its faulty hardware (chip shortage, corner cuts maybe?).

I now have the 'faulty' switch at home and I'll do some testing with it tomorrow and update the TAC case.

It appears that any traffic that should reach the 'control-plane', isn't - i.e. STP, ARP, CDP, LACP.  L2 forwarding appears to be the only thing the switch is doing.

 

Seems a bizarre hardware fault.

 

Andy

andrew.butterworth
Rising star

So this turned out to be a faulty stacking cable.  Of the redundantly cabled 2-switch stack, one of the four stacking ports was reporting CRC errors (show switch stack-port detail).  This stack was working OK, however after a period of time this then caused the issue that we were experiencing.  Simply replacing the cable didn't resolve the issue once it happens and all control-plane functions stop working.  It doesn't even recognise the cable was replaced.

Removing and restoring the power (with the faulty cable replaced) got it back working.  It has now been working OK for 3-days and no CRC errors reported on the stack ports so I think its solved, however it leaves an unanswered question.

 

I have got the cable at home and have been testing it on the stack I have here temporarily.  It is definitely the cable that is faulty.  CRC errors are observed immediately after booting.

 

I have updated the TAC case with the details and asked why a faulty stacking cable would cause such a catastrophic fault.

 

Cheers

Andy