Catalyst 3850 Online Diagnostics: Failing Switch?

I'm having issues with a Catalyst 3850 switch. It was a member of a stack of 3 switches, but I removed it from the stack because of strange flapping of ports and stack ports. Now that it is no longer a part of the stack, the stack is stable. This leads me to believe I'm definitely dealing with the problem switch. The switch is currently running IOS-XE Version 3.3.5SE, which is the recommended release from the downloads area for this switch.

I was initially having problems with getting the switch to boot after updating the OS. After a couple of attempts at applying the software image using a couple of different methods, thinking that something was corrupt, I eventually got it to reliably boot after erasing the startup-config and going through the basic prompted configuration.

I changed the diagnostic bootup level to complete, and 'show post' tells me that all tests pass. (MBIST, PHY Loopback, Thermal Temp, Thermal Fan, and SIF)

Running 'diagnostic start switch 3 test basic' results in a reload with the following:

<Tue Mar 10 11:42:23 2015> Message from sysmgr: Reason Code:[0] Reset Reason:Reset/Reload requested by [stack-manager]. [GOLD Memory Test]

Also, every time it reloads, I see:

Unmounting ng3k filesystems...
Unmounted /dev/sda3...
Warning! - some ng3k filesystems may not have unmounted cleanly...
Please stand by while rebooting the system...
Restarting system.

I find it strange that file system issues like that would be normal, but can anyone confirm?

Bootup appears normal:

Booting...Initializing and Testing RAM ++++@@@@####...################################++@@++@@++@@++@@++@@++@@++@@++@@++@@++@@++@@++@@++@@++@@++@@++@@done.
Memory Test Pass!

Base ethernet MAC Address: dc:a5:f4:d3:63:00

Interface GE 0 link down***ERROR: PHY link is down
Initializing Flash...

flashfs[7]: 0 files, 1 directories
flashfs[7]: 0 orphaned files, 0 orphaned directories
flashfs[7]: Total bytes: 6784000
flashfs[7]: Bytes used: 1024
flashfs[7]: Bytes available: 6782976
flashfs[7]: flashfs fsck took 1 seconds....done Initializing Flash.
Getting rest of image
Reading full image into memory....done
Reading full base package into memory...: done = 79122052
Nova Bundle Image
Kernel Address    : 0x6042d350
Kernel Size       : 0x402ecf/4206287
Initramfs Address : 0x60830220
Initramfs Size    : 0xdb9c62/14392418
Compression Format: .mzip

Bootable image at @ ram:0x6042d350
Bootable image segment 0 address range [0x81100000, 0x82110000] is in range [0x80180000, 0x90000000].
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@boot_system: 380
Loading Linux kernel with entry point 0x81653a10 ...
Bootloader: Done loading app on core_mask: 0xf

### Launching Linux Kernel (flags = 0x5)

All packages are Digitally Signed
Starting System Services

cisco WS-C3850-48T (MIPS) processor with 4194304K bytes of physical memory.
Processor board ID FOC1729V1M2
2048K bytes of non-volatile configuration memory.
4194304K bytes of physical memory.
250456K bytes of Crash Files at crashinfo:.
1609272K bytes of Flash at flash:.
0K bytes of Dummy USB Flash at usbflash0:.
0K bytes of  at webui:.

Base Ethernet MAC Address          : dc:a5:f4:d3:63:00
Motherboard Assembly Number        : 73-14444-05
Motherboard Serial Number          : FOC17292043
Model Revision Number              : G0
Motherboard Revision Number        : B0
Model Number                       : WS-C3850-48T
System Serial Number               : FOC1729V1M2

Press RETURN to get started!

I can get DiagGoldPktTest and DiagPhyLoopbackTest to run successfully if I run them separately. Tests DiagThermalTest, DiagFanTest, and DiagScratchRegisterTest run on regular intervals by default. I'm not attempting the DiagStackCableTest at this point because I have this switch disconnected from the stack. The switch reloads when attempting the DiagMemoryTest:

switch3#show diagnostic result switch 3

Current bootup diagnostic level: complete

switch 3:   SerialNo : FOC1729V1M2

  Overall Diagnostic Result for switch 3 : MINOR ERROR
  Diagnostic level at card bootup: minimal

  Test results: (. = Pass, F = Fail, U = Untested)

    1) DiagGoldPktTest:

   Port  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F

   Port 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
         .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

   Port 49 50 51 52 53 54 55 56
         .  .  .  .  .  .  .  .

    2) DiagThermalTest -----------------> .
    3) DiagFanTest ---------------------> .
    4) DiagPhyLoopbackTest:

   Port  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F  F

   Port 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
         .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

   Port 49 50 51 52 53 54 55 56
         .  .  .  .  .  .  .  .

    5) DiagScratchRegisterTest ---------> .
    6) DiagStackCableTest --------------> U
    7) DiagMemoryTest ------------------> U

switch3#diagnostic start switch 3 test 7
Diagnostic[switch 3]: Running test(s) 7 may disrupt normal system operation
Do you want to continue? [no]: yes
<Tue Mar 10 12:04:20 2015> Message from sysmgr: Reason Code:[0] Reset Reason:Reset/Reload requested by [stack-manager]. [GOLD Memory Test]

We do not have a current cisco support contract on the switch, but it is still well within the 5 year warranty at about 1.5 years since purchase. My initial contact to Cisco support has resulted in telling us that Cisco has no record of the serial number and to contact my reseller. Before I start jumping through what appears to be a labyrinth of administrative hoops to get my switch replaced, can anyone help me to confirm hardware problems and/or fix this thing?

Thanks in advance! ...and sorry for the long post!



Nobody has any recommendations on this?? I thought for sure one of the many experts here could give me an idea what to do with this thing.



Hi Chris,


What is the original issue? Stack ports flapping ?

Have you tried another stack cable to check?





According to your diagnostics that you ran on this switch, the results for DiagGoldPktTest and DiagPhyLoopbackTest indicate that half the switch (or in this case, 1 of the 2 UADP ASICS) has failed.  I'm surprised the switch passed all POST tests given the output from the diagnostics that have been run. You should be able to give Cisco's support line a call and get a replacement under E-LLW.   Would it be possible for you to post a show tech output for this switch?  

I can post something from there, but it gives a metric crap ton of information with simply 'show tech.' It's been scrolling across my screen for almost 10 minutes. Is there a particular piece of show tech that is of interest?

Do you have a tftp server set up?  If so you can run the following command to export the info to a text file that you can upload here (rather than trying to post the entire show tech dump).

show tech | redirect tftp://ip-address-of-tftp-server/filename.text