11-02-2020 09:00 PM
My XRv9K 7.2.1 nodes in EVE-NG are unstable. Occassionally the interfaces are removed completely. Sometimes the nodes reboot. I see these messages.
0/RP0/ADMIN0:Nov 3 03:05:04.214 UTC: vm_manager[2832]: %INFRA-VM_MANAGER-4-INFO : Info: vm_manager started VM default-sdr--2
0/RP0/ADMIN0:Nov 3 03:17:06.093 UTC: mediasvr[2774]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : host /misc/scratch exceeded 84%
0/RP0/ADMIN0:Nov 3 03:32:06.103 UTC: mediasvr[2774]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : host /misc/scratch exceeded 84%
0/RP0/ADMIN0:Nov 3 03:47:06.116 UTC: mediasvr[2774]: %MEDIASVR-MEDIASVR-4-PARTITION_USAGE_ALERT : High disk usage alert : host /misc/scratch exceeded 84%
0/RP0/ADMIN0:Nov 3 02:35:18.101 UTC: vm_manager[2832]: %INFRA-VM_MANAGER-4-INFO : Info: vm_manager started VM default-sdr--2
0/RP0/ADMIN0:Nov 3 02:37:19.278 UTC: vm_manager[2832]: %INFRA-VM_MANAGER-3-MSG_HEARTBEAT_FAILURE : VM default-sdr--1 failed to maintain heartbeat (VM missed multiple heartbeats).
0/RP0/ADMIN0:Nov 3 02:37:19.436 UTC: sdr_mgr[2788]: %SM-SDR_MANAGER-3-MSG_VM_RELOAD_ON_HB_FAILURE : Info :SDR NM : VM Reload on HB failure, sdr default-sdr, vmid 1.
Connection closed by foreign host.
Tue Nov 3 02:39:41 UTC 2020 (/opt/cisco/hostos/bin/xr_con_telnet_wrapper.sh): XR console connection lost to port 9001
Tue Nov 3 02:40:02 UTC 2020 (/opt/cisco/hostos/bin/xr_con_telnet_wrapper.sh): XR console connected on port 9001
Telnet escape character is '^Q'.
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^Q'.
init: Unable to create device: /dev/kmsg
mkdir: cannot create directory '/run': File exists
bootlogd: ioctl(/dev/pts/2, TIOCCONS): Device or resource busy
Configuring network interfaces... done.
Starting system message bus: dbus.
...seems like it's restarting
this happens routinely.
Any idea why I'm having these problems?
-Aaron Gould
aaron@gvtc.com
11-17-2020 12:52 AM
Happens all the time in Eve-NG with NX-OS v9K.. dont know why .. i think .. its virtualisation/resources related .. not sure
Out of 7 nodes in my topology .. at least 1 have issues .. reloading the box fixes most times
11-23-2020 10:08 PM
OK so the virtual environment (eve-ng) provider I think has done something to make the system more stable. My eve-ng xrv9k nodes have been very stable for several days.
they also have a separate bare metal product that they put me on, which i believe has less layers of virtualization, so again, my eve-ng xrv9k nodes there are stable too.
-Aaron
03-07-2021 12:31 PM
which version of eve-ng and xrv9k are you using? I constantly experience the same symptom with eve-ng community edition and with xrv9k 6.5.1 and 7.2.2. Very frustrating...
03-08-2021 07:38 AM
I'm running EVE-NG version: 2.0.3-110 (i think this is Pro version)
My XRv-9K is 7.2.1
I'm totally good now and have been since I made my comment on 11-23-2020 10:08 PM
The EVE-NG provider did something to stabilize me XRV routers .... and then later, they moved me to a bare metal install which I think made it ever better.
10-02-2021 04:04 AM
I am still struggling with this issue even with xrv9k 7.4.1.
those, who do not experience this issue, how many nodes are you running in parallel? for me, it seems like 4 nodes are running fine, but when the 5th nodes completely boots up, after few minutes other nodes are restarting.
I have 128G RAM and 24vCPU-s allocated for my lab.
I wonder if CCIE labs also experience these kind of issues?
10-15-2021 02:46 AM
I have same issue, xrv9k 731; at first, no any issue, but pass about 1 day, couldn't be accessed, I checked with EVE NG, after discussed, disable UKSM will let devices normal for long time, but after about 2 days, couldn't be accessed too... Juniper vMX no the issue.
08-25-2022 11:59 AM
Hi ...I am facing a similar issue in eve-ng.Have this issue resolved for you ?? If yes then how could u please let me know..
08-26-2022 12:38 AM
In my experience disabling UKSM, allocating 20GB RAM for each node resolved the issue. I have 128G RAM, and allocated 110GB for eve-ng, so I can run 5x xrv9k nodes without issues. 6 nodes fail.
08-26-2022 02:48 AM
ok thanks
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide