cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
424
Views
0
Helpful
1
Replies

N9K stuck in an endless reboot

stuartkendrick
Level 1
Level 1

Anyone have experience with a Nexus 9000 running in an endless reboot?

I rebooted an N9K-C93180YC-FX24 running 9.3(10). I had intended to replace startup-config with a modified version to try out a new configuration ... but mistakenly skipped the 'cp startup-config.new startup-config' step just prior to the reboot

But now the box is stuck in an endless reboot cycle ... it stays up long enough that I have ~45 seconds to ssh into it (via mgmt0), run a few 'show' commands ... before it reboots again

A selection of log messages:

[...]

2023-03-25T01:25:05.496835-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:09.652 PDT: %KERN-6-SYSTEM_MSG: [ 115.239675] ktah_nl_asics_int_reg[0][26] pid=27820 is_fast_intr=0 - kernel
2023-03-25T01:25:05.497361-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:09.652 PDT: %KERN-6-SYSTEM_MSG: [ 115.239687] ktah_nl_data_ready: rcvd netlink intr registraiton for tid 27820 - kernel
2023-03-25T01:25:05.497620-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:09.652 PDT: %KERN-6-SYSTEM_MSG: [ 115.239690] - kernel
2023-03-25T01:25:05.498015-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:09.652 PDT: %KERN-6-SYSTEM_MSG: [ 115.239690] ktah_nl_asics_int_reg[0][24] pid=27820 is_fast_intr=0 - kernel
2023-03-25T01:25:06.497898-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:10.653 PDT: %VDC_MGR-5-VDC_STATE_CHANGE: vdc 1 state changed to updating
2023-03-25T01:25:06.498023-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:10.654 PDT: %VDC_MGR-5-VDC_STATE_CHANGE: vdc 1 state changed to active
2023-03-25T01:25:07.499240-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.655 PDT: %UFDM-6-FIB_IPv6_CONSISTENCY_CHECKER_STOP: FIB IPv6 consistency checker stopped on slot ALL
2023-03-25T01:25:07.499644-07:00 gigapop-b-rtr : message repeated 2 times: [ 2023 Mar 25 01:25:11.655 PDT: %UFDM-6-FIB_IPv6_CONSISTENCY_CHECKER_STOP: FIB IPv6 consistency checker stopped on slot ALL]
2023-03-25T01:25:07.500040-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.655 PDT: %VMM-2-VMM_SERVICE_ERR: VDC1: Service SAP Aclmgr SAP for slot 1 returned error 0x4265000e (Invalid Interface) in if_bind sequence
2023-03-25T01:25:07.500177-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.655 PDT: %IM-3-IM_RESP_ERROR: Component MTS_SAP_VMM opcode:MTS_OPC_IM_IF_VDC_BIND in vdc:1 returned error:Invalid Interface
2023-03-25T01:25:07.500454-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.655 PDT: %IM-3-IM_SEQ_ERROR: Error (Invalid Interface) while communicating with component <Internal Error> opcode:N/A (for:RID_MODULE: 1)
2023-03-25T01:25:07.500737-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.655 PDT: %MODULE-5-MOD_REINIT: Re-initializing module 1 (Serial number: )
2023-03-25T01:25:07.501009-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.656 PDT: %DIAGCLIENT-4-LC_OFFLINE_FOR_CURR_SLOT: Received LC offline event for the current slot:1. Ignoring it
2023-03-25T01:25:07.501279-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.657 PDT: %DIAGMGR-4-CURR_SLOT_OFFLINE: Recevied an LC offline event for the current slot:1. Ignoring it
2023-03-25T01:25:07.501547-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.657 PDT: %KERN-6-SYSTEM_MSG: [ 117.090853] obfl_set_mmc_rr initialized on mmcblk0p2 blksize=512, cpu=7 - kernel
2023-03-25T01:25:07.501839-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.657 PDT: %KERN-6-SYSTEM_MSG: [ 117.090858] obfl_set_mmc_rr: tv sec is 641eafe7, usec is 52457 rr=4 rr_str=Im SAP - kernel
2023-03-25T01:25:07.502114-07:00 gigapop-b-rtr : 2023 Mar 25 01:25:11.657 PDT: %KERN-6-SYSTEM_MSG: [ 117.092768] writing reset reason succeeded with retval=0 on cpu=7 - kernel
2023-03-25T01:25:31.390542-07:00 edge-b-esx : 2023 Mar 25 01:31:30.507 PDT: %ETHPORT-5-IF_DOWN_NONE: Interface Ethernet1/49 (description:To gigapop-b-rtr) is down (Transceiver Absent)
2023-03-25T01:27:26.925100-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:30.801 PDT: %CDP-5-NEIGHBOR_ADDED: Device wst-mgmt-l4-esx discovered of type cisco WS-C2960X-48TS-L with port GigabitEthernet1/0/7 on incoming port mgmt0 with ip addr 10.71.12.24 and mgmt ip 10.71.12.24
2023-03-25T01:27:26.925285-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:30.801 PDT: %KERN-6-SYSTEM_MSG: [ 79.893833] usdk_pcie_find: pci_enable_device on 1137:104:5 suceeded - kernel
2023-03-25T01:27:26.925530-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:30.801 PDT: %KERN-6-SYSTEM_MSG: [ 79.893838] usdk_pcie_find: pci_device copy to user bar: 0 V:1137 D:104i B:5 suceeded - kernel
2023-03-25T01:27:26.925780-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:30.801 PDT: %LLDP-5-SERVER_ADDED: Server with Chassis ID 00b6.70be.ca80 Port ID Gi1/0/7 management address 10.71.12.24 discovered on local port mgmt0 in vlan 12 with enabled capability Bridge
2023-03-25T01:27:28.927395-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:32.804 PDT: %DAEMON-4-SYSTEM_MSG: SNMPSET not sending all the DME property values. Use SNMPSET for deleting the same config - syslog[21244]
2023-03-25T01:27:39.567836-07:00 indra stuartk /home/netops/bin/reset-device[56583]: gigapop-b-rtr has hit three pings, continuing to next host
2023-03-25T01:27:50.688306-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.564 PDT: %KERN-6-SYSTEM_MSG: [ 103.678824] buffer allocated : ffff880638f40000, buffer size: 135168 tag 196608 - kernel
2023-03-25T01:27:50.688518-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.564 PDT: %KERN-6-SYSTEM_MSG: [ 103.678870] buffer allocated : ffff880638f80000, buffer size: 135168 tag 196609 - kernel
2023-03-25T01:27:50.688894-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.565 PDT: %KERN-6-SYSTEM_MSG: [ 103.678899] buffer allocated : ffff880638fc0000, buffer size: 135168 tag 196610 - kernel
2023-03-25T01:27:50.689253-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.565 PDT: %KERN-6-SYSTEM_MSG: [ 103.678926] buffer allocated : ffff880639000000, buffer size: 135168 tag 196611 - kernel
2023-03-25T01:27:50.689617-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.565 PDT: %KERN-6-SYSTEM_MSG: [ 103.678952] buffer allocated : ffff880639040000, buffer size: 135168 tag 196612 - kernel
2023-03-25T01:27:50.693436-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.570 PDT: %KERN-6-SYSTEM_MSG: [ 103.678979] buffer allocated : ffff880639080000, buffer size: 135168 tag 196613 - kernel
2023-03-25T01:27:50.693652-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.570 PDT: %KERN-6-SYSTEM_MSG: [ 103.679005] buffer allocated : ffff8806390c0000, buffer size: 135168 tag 196614 - kernel
2023-03-25T01:27:50.694033-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.570 PDT: %KERN-6-SYSTEM_MSG: [ 103.679031] buffer allocated : ffff880639100000, buffer size: 135168 tag 196615 - kernel

[...]

2023-03-25T01:27:50.724820-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.583 PDT: %KERN-6-SYSTEM_MSG: [ 103.680749] buffer allocated : ffff880638dd8000, buffer size: 12288 tag 196700 - kernel
2023-03-25T01:27:50.725183-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.583 PDT: %KERN-6-SYSTEM_MSG: [ 103.680762] buffer allocated : ffff880638a14000, buffer size: 12288 tag 196701 - kernel
2023-03-25T01:27:50.725540-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.583 PDT: %KERN-6-SYSTEM_MSG: [ 103.680776] buffer allocated : ffff880638db8000, buffer size: 12288 tag 196702 - kernel
2023-03-25T01:27:50.725898-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.583 PDT: %KERN-6-SYSTEM_MSG: [ 103.680789] buffer allocated : ffff880638e98000, buffer size: 12288 tag 196703 - kernel
2023-03-25T01:27:50.726260-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.583 PDT: %KERN-6-SYSTEM_MSG: [ 103.682185] Retrieving the kernel buffer ffff880638f40000 tag 196608 - kernel
2023-03-25T01:27:50.726618-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.583 PDT: %KERN-6-SYSTEM_MSG: [ 103.682204] Retrieving the kernel buffer ffff880638f80000 tag 196609 - kernel
2023-03-25T01:27:50.726985-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.583 PDT: %KERN-6-SYSTEM_MSG: [ 103.682216] Retrieving the kernel buffer ffff880638fc0000 tag 196610 - kernel
20

[...]

2023-03-25T01:27:50.760659-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.683357] Retrieving the kernel buffer ffff880638db8000 tag 196702 - kernel
2023-03-25T01:27:50.761035-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.683368] Retrieving the kernel buffer ffff880638e98000 tag 196703 - kernel
2023-03-25T01:27:50.761393-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694224] buffer allocated : ffff88063cb82000, buffer size: 4096 tag 16187392 - kernel
2023-03-25T01:27:50.761759-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694252] buffer allocated : ffff880638528000, buffer size: 4096 tag 16187393 - kernel
2023-03-25T01:27:50.762125-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694265] buffer allocated : ffff880638cd3000, buffer size: 4096 tag 16187394 - kernel
2023-03-25T01:27:50.762496-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694278] buffer allocated : ffff880638f2a000, buffer size: 4096 tag 16187395 - kernel
2023-03-25T01:27:50.762860-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694290] buffer allocated : ffff880638a24000, buffer size: 4096 tag 16187396 - kernel
2023-03-25T01:27:50.763229-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694303] buffer allocated : ffff880638974000, buffer size: 4096 tag 16187397 - kernel
2023-03-25T01:27:50.763592-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694316] buffer allocated : ffff880638ea3000, buffer size: 4096 tag 16187398 - kernel
2023-03-25T01:27:50.763957-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694328] buffer allocated : ffff880638e05000, buffer size: 4096 tag 16187399 - kernel
2023-03-25T01:27:50.764333-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694341] buffer allocated : ffff880638924000, buffer size: 4096 tag 16187400 - kernel
2023-03-25T01:27:50.764696-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:54.597 PDT: %KERN-6-SYSTEM_MSG: [ 103.694353] buffer allocated : ffff88063850b000, buffer size: 4096 tag 16187401 - kernel
2023-03-25T01:27:51.722397-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:55.599 PDT: %KERN-6-SYSTEM_MSG: [ 104.466248] usdk_pcie_find: pci_enable_device on 1137:103:9 suceeded - kernel
2023-03-25T01:27:51.722588-07:00 gigapop-b-rtr : 2023 Mar 25 01:27:55.599 PDT: %KERN-6-SYSTEM_MSG: [ 104.466253] usdk_pcie_find: pci_device copy to user bar: 0 V:1137 D:103i B:9 suceeded - kernel

And ... then it reboots

Capturing on the console ... I don't see any sign of hardware problems ... the output of 'show ver' and 'show hardware' seem clean ...


I have a TAC case open ... but TAC is busy ... and I'm realizing that I don't have a model for understanding what might be happening.  Am I seeing the result of a corrupted file system?  Failing memory?

 

--sk

 

 

 

 

1 Reply 1

stuartkendrick
Level 1
Level 1

I used this to get out of the loop ... had to type fast

copy startup-config startup-config.broken

copy startup-config.backup startup-config

reload

Turns out the 'broken' version was missing the 'interface Ethernet' stanzas ... don't know how that happened, but I have the suspicion that this induced the reboot cycling

 

--sk