cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
520
Views
1
Helpful
7
Replies

NCS-5502-SE: NFRA-VM_MANAGER-3-MSG_RESOURCE_ALLOCATION_UNRECOVERABLE

wgrassot
Level 1
Level 1

Good day,

I have an NCS 5502-SE which fails to reload properly, going to an endless retry of trying to start admin VM with errors messages:

%INFRA-VM_MANAGER-4-INFO : Info: VM Manager started. arguments -W
0_0_0Jun 6 04:36:38 : Send To Helper Failed - Msg : vm_manager[3945]: %INFRA-VM_MANAGER-3-MSG_RESOURCE_ALLOCATION_UNRECOVERABLE_ERROR : Allocation of a critical resource failed���ۭ[l[t��[3
59]: %INFRA-VM_MANAGER-3-MSG_RESOURCE_ALLOCATION_UNRECOVERABLE_ERROR : Allocation of a critical resource failed in vm_manager (Unable to connect to virtd daemon)
0_0_0Jun 6 04:36:49 : Send To Helper Failed - Msg : vm_manager[3973]: %INFRA-VM_MANAGER-4-INFO : Info: VM Manager�dڶ vm_manager (Unable to connect to virtd daemon)
0/RP0/ADMIN0:Jun 6 04:37:00.853 : pm[2525]: %INFRA-Process_Manager-3-PROCESS_DOWN : Process vm_manager (IID: 0) crashed or exited too often and exceeded max_respawn count.
0/RP0/ADMIN0:Jun 6 04:37:00.855 : shelf_mgr[2696]: %INFRA-SHELF_MGR-3-FAULT_ACTION_VM_RELOAD : Admin VM reload requested for card 0/RP0

The only stable piece of the boot process is the Linux host of the platform that is accessible typing "CTRL-o" when prompting during the boot process:

"If you want to connect to available console, press 'Ctrl-o' <'o' as in orange>"

host login:
Password:

During the initial boot of the NCS 5502-SE, I can interrupt the boot with "DEL" or "ESC" key, but it provide the only choice to boot from harddisk. No USB boot choice available. 

Any tips, once we are in the Linux host to try to diagnose the issue, troubleshoot or manually kick start the various processes to launch the admin VM and IOSXR VM ?

Thanks

 

7 Replies 7

smilstea
Cisco Employee
Cisco Employee

This is probably going to be an RMA.

You can try to delete files on harddisk and disk0 but i dont see how if the VMs aren't up. This is one possible root cause as the system is complaining of not enough resources to allocate to the vm.

What code is this?

An option to try before RMA is to USB boot the device and then check the smart values for the SSDs.

 

Sam

Hi Sam,

thank you for your feedback.

The thing is that I can log in to the Linux host. So I may have from there some possibilities of freeing some space. But which directories and files shall I look for ?

The [host:0_RP0:/misc/disk1]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/panini_vol_grp-host_lv223 991512 498324 425604 54% /
none 32730564 0 32730564 0% /dev
run 32737168 460 32736708 1% /run
tmpfs 32737168 20 32737148 1% /var/volatile
tmpfs 32737168 0 32737168 0% /media/ram
none 32737168 0 32737168 0% /dev/shm
none 32737168 0 32737168 0% /dev/shm
/dev/mapper/pci_disk1-ssd_disk1_hostos 1882244 2860 1765720 1% /misc/disk1
/dev/mapper/panini_vol_grp-host_data_scratch_lv0 1781464 7704 1665216 1% /misc/scratch
/dev/mapper/panini_vol_grp-host_data_config_lv0 87152 52 80508 1% /misc/config
/dev/mapper/panini_vol_grp-host_data_log_lv0 471624 129152 307208 30% /var/log
none 512 0 512 0% /mnt
/dev/ram7 14839 14839 0 100% /mnt/ram7
run 32737168 460 32736708 1% /run/netns
run 32737168 460 32736708 1% /run/lxc
/dev/loop0 2453228 1370544 938352 60% /lxc_rootfs/panini_vol_grp-calvados_lv223

Can I from the Linux host  delete some files to free up some space to allow the admin and ios-xr VMs to boot ?

 

Sam,

some output from the Linux ost:

 

[host:0_RP0:/misc/disk1]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 59.6G 0 disk
|-sda1 8:1 0 858.3M 0 part
|-sda2 8:2 0 5.7G 0 part
|-sda3 8:3 0 29.6G 0 part
| |-panini_vol_grp-host_data_scratch_lv0
253:6 0 1.8G 0 lvm /misc/scratch
| |-panini_vol_grp-host_data_log_lv0 253:7 0 492M 0 lvm /var/log
| |-panini_vol_grp-host_data_config_lv0
253:8 0 92M 0 lvm /misc/config
| |-panini_vol_grp-calvados_data_lv0 253:9 0 2G 0 lvm
| |-panini_vol_grp-xr_lv223 253:10 0 4G 0 lvm
| |-panini_vol_grp-host_lv223 253:11 0 1000M 0 lvm /
| |-panini_vol_grp-xr_lcp_lv223 253:12 0 4G 0 lvm
| |-panini_vol_grp-calvados_lv223 253:13 0 2.5G 0 lvm
| |-panini_vol_grp-xr_lv230 253:14 0 4G 0 lvm
| |-panini_vol_grp-host_lv230 253:15 0 1000M 0 lvm
| |-panini_vol_grp-xr_lcp_lv230 253:16 0 4G 0 lvm
| `-panini_vol_grp-calvados_lv230 253:17 0 2.5G 0 lvm
|-sda4 8:4 0 19.1M 0 part
|-sda5 8:5 0 19.7G 0 part
| |-pci_disk1-ssd_disk1_hostos 253:1 0 1.9G 0 lvm /misc/disk1
| |-pci_disk1-ssd_disk1_calvados_1 253:2 0 5.8G 0 lvm
| |-pci_disk1-ssd_disk1_xr_1 253:3 0 5.8G 0 lvm
| |-pci_disk1-xr_data_lv0 253:4 0 3G 0 lvm
| `-pci_disk1-xr_lcp_data_lv0 253:5 0 3G 0 lvm
`-sda6 8:6 0 3.8G 0 part
`-app_vol_grp-app_lv0 253:0 0 3.8G 0 lvm
sdb 8:16 1 14.8G 0 disk
`-sdb1 8:17 1 14.8G 0 part
loop0 7:0 0 2.5G 0 loop /lxc_rootfs/panin
loop1 7:1 0 490M 0 loop
loop2 7:2 0 490M 0 loop
loop3 7:3 0 1G 0 loop
loop4 7:4 0 5.7G 0 loop

That won't work, the XR and sysadmin modes are VMs, so its like looking at one giant disk image in the hostOS. The hostOS cannot read files from the individual VMs.

Sam,

understood. But from the linux host all sounds OK, HW wise. Lokks like more a SW issue loading the vm_manager or Unable to connect to virtd daemon

So lsolution of last ressort would be to be able to boot from external USB, but  during boot I press "ESC" or "DEL" I got a DOS-like screen, but with arrow or tab, I can not get the "Cisco  Boot Manager"

The arrow or tab keys are irresponsive. please see scree shot.  Any hints to get the router boot from external USB ?

wgrassot_1-1686073541010.png

 

wgrassot
Level 1
Level 1

Joining a screenshot as formating make it hard to read:

 

wgrassot
Level 1
Level 1

To sum up and providing a bit of background on when it happened:

I was testing creation of golden iso for version 7.5.2 and it went all fine. The machine booted with golden iso to 7.5.2. From there I upgraded using also golden iso to 7.7.2 and it went fine as well. I did not "install commit" the golden 7.7.2, as I wanted to test the rollback back to golden 7.5.2. So I issued in the admin VM an "install reload location all". This is supposed to bring me back to 7.5.2. From then on the router fails to finish the entire boot process:

The host OS is stable, but the admin VM boots and is accessible for two minutes and then reload with message on the terminal:

"shelf_mgr[2696]: %INFRA-SHELF_MGR-3-FAULT_ACTION_VM_RELOAD : Admin VM reload requested for card 0/RP0"

Msg : vm_manager[2713]: %INFRA-VM_MANAGER-3-MSG_RESOURCE_ALLOCATION_UNRECOVERABLE_ERROR : Allocation of a critical resource failed in vm_manager (Unable to connect to virtd daemon)

Send To Helper Failed - Msg : vm_manager[3964]: %INFRA-VM_MANAGER-4-INFO : Info: VM Manager started. arguments -W

and this keep going on for ever.

the observation: the host OS boot and is stable (I can access it with CTRL-O), the admin VM boots and keep reloading after two minutes being up. I manage to log in the admin VM and had enough time to issue the "sh version" before it reboot again

admin connected from 127.0.0.1 using console on sysadmin-vm:0_RP0
sysadmin-vm:0_RP0# sh version
Wed Jun 7 07:33:19.534 UTC+00:00

Cisco IOS XR Admin Software, Version 7.5.2
Copyright (c) 2013-2022 by Cisco Systems, Inc.

Build Information:
Built By : ingunawa
Built On : Tue Apr 26 16:15:30 PDT 2022
Build Host : iox-ucs-101
Workspace : /auto/srcarchive14/prod/7.5.2/ncs5500/ws
Built By : mlinga
Build On : Wed Feb 24 05:06:11 PST 2021
Build Host : sjc-ads-1587
Workspace : /auto/panini-projs2/brcm-sdk-6.5.21/tp-075
Workspace EFR : thirdparty EFR-00000416239 Lineup
Version : 7.5.2
Location : /opt/cisco/calvados/packages/
Label : 7.5.2-552_PCCWG_v1

System uptime is 2 minutes

sysadmin-vm:0_RP0#

Don´t know if it helps to shed light.

So I have two minutes to do something in the admin VM eventually to try to avoid it rebooting 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: