Re: XRv9K messages and problems

a-gould · ‎11-02-2020

My XRv9K 7.2.1 nodes in EVE-NG are unstable. Occassionally the interfaces are removed completely. Sometimes the nodes reboot. I see these messages.

0/RP0/ADMIN0:Nov 3 03:05:04.214 UTC: vm_manager[2832]: %INFRA-VM_MANAGER-4-INFO : Info: vm_manager started VM default-sdr--2
0/RP0/ADMIN0:Nov 3 03:17:06.093 UTC: mediasvr[2774]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : host /misc/scratch exceeded 84%
0/RP0/ADMIN0:Nov 3 03:32:06.103 UTC: mediasvr[2774]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : host /misc/scratch exceeded 84%
0/RP0/ADMIN0:Nov 3 03:47:06.116 UTC: mediasvr[2774]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : host /misc/scratch exceeded 84%

0/RP0/ADMIN0:Nov 3 02:35:18.101 UTC: vm_manager[2832]: %INFRA-VM_MANAGER-4-INFO : Info: vm_manager started VM default-sdr--2
0/RP0/ADMIN0:Nov 3 02:37:19.278 UTC: vm_manager[2832]: %INFRA-VM_MANAGER-3-MSG_HEARTBEAT_FAILURE : VM default-sdr--1 failed to maintain heartbeat (VM missed multiple heartbeats).
0/RP0/ADMIN0:Nov 3 02:37:19.436 UTC: sdr_mgr[2788]: %SM-SDR_MANAGER-3-MSG_VM_RELOAD_ON_HB_FAILURE : Info :SDR NM : VM Reload on HB failure, sdr default-sdr, vmid 1.
Connection closed by foreign host.
Tue Nov 3 02:39:41 UTC 2020 (/opt/cisco/hostos/bin/xr_con_telnet_wrapper.sh): XR console connection lost to port 9001
Tue Nov 3 02:40:02 UTC 2020 (/opt/cisco/hostos/bin/xr_con_telnet_wrapper.sh): XR console connected on port 9001
Telnet escape character is '^Q'.
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^Q'.
init: Unable to create device: /dev/kmsg
mkdir: cannot create directory '/run': File exists
bootlogd: ioctl(/dev/pts/2, TIOCCONS): Device or resource busy
Configuring network interfaces... done.
Starting system message bus: dbus.

...seems like it's restarting

this happens routinely.

Any idea why I'm having these problems?

-Aaron Gould

aaron@gvtc.com

omz · ‎11-17-2020

Happens all the time in Eve-NG with NX-OS v9K.. dont know why .. i think .. its virtualisation/resources related .. not sure

Out of 7 nodes in my topology .. at least 1 have issues .. reloading the box fixes most times

a-gould · ‎11-23-2020

OK so the virtual environment (eve-ng) provider I think has done something to make the system more stable. My eve-ng xrv9k nodes have been very stable for several days.

they also have a separate bare metal product that they put me on, which i believe has less layers of virtualization, so again, my eve-ng xrv9k nodes there are stable too.

-Aaron

Ferenc Keninger · ‎03-07-2021

which version of eve-ng and xrv9k are you using? I constantly experience the same symptom with eve-ng community edition and with xrv9k 6.5.1 and 7.2.2. Very frustrating...

a-gould · ‎03-08-2021

I'm running EVE-NG version: 2.0.3-110 (i think this is Pro version)

My XRv-9K is 7.2.1

I'm totally good now and have been since I made my comment on ‎11-23-2020 10:08 PM

The EVE-NG provider did something to stabilize me XRV routers .... and then later, they moved me to a bare metal install which I think made it ever better.

Ferenc Keninger · ‎10-02-2021

I am still struggling with this issue even with xrv9k 7.4.1.

those, who do not experience this issue, how many nodes are you running in parallel? for me, it seems like 4 nodes are running fine, but when the 5th nodes completely boots up, after few minutes other nodes are restarting.

I have 128G RAM and 24vCPU-s allocated for my lab.

I wonder if CCIE labs also experience these kind of issues?

frank.yong.zhao · ‎10-15-2021

I have same issue, xrv9k 731; at first, no any issue, but pass about 1 day, couldn't be accessed, I checked with EVE NG, after discussed, disable UKSM will let devices normal for long time, but after about 2 days, couldn't be accessed too... Juniper vMX no the issue.

jijain · ‎08-25-2022

Hi ...I am facing a similar issue in eve-ng.Have this issue resolved for you ?? If yes then how could u please let me know..

Ferenc Keninger · ‎08-26-2022

In my experience disabling UKSM, allocating 20GB RAM for each node resolved the issue. I have 128G RAM, and allocated 110GB for eve-ng, so I can run 5x xrv9k nodes without issues. 6 nodes fail.

jijain · ‎08-26-2022

ok thanks