02-22-2019 09:45 AM
Hi All
We have constructed an On-Prem Viptela Lab using Viptela version 18.4 using VMware with ESXi 6.5. I know that as per Cisco Docs the supported versions are 5.5 and 6.0 but have found many blogs where people have tested the Lab without any issue.
Now the problem is once everything is deployed the VSphere Web GUI shows high CPU usage for vBond and vEdges.
While the vManage and vSmart are running fine at 300 to 500MHz the vBond and vEdges are hogging around 8GHz to 10GHz. If I limit the CPU to 2.3GHz the processes become sluggish. Even the pings initiated from these devices exceed 1000ms to 2000ms.
Note that all the devices in Lab are using 2 CPU Cores.
02-22-2019 10:03 AM
02-22-2019 10:30 AM
Hi Ekhabaro
I have already gone through this document and understand the poll-mode driver logic. However that is not the issue here. As per data sheet 2 Core with a limit of 2GHz should be fine for vBond and vEdge.
In my case limiting the CPU to 2GHz or 3GHz makes the cli access, command response etc slow. Thus the transport gets affected by this.
Check below snippet.
While the CPU limit is set to 2.3GHz.
vManage# ping 10.81.80.142 vpn 512 count 10
Ping in VPN 512
PING 10.81.80.142 (10.81.80.142) 56(84) bytes of data.
64 bytes from 10.81.80.142: icmp_seq=1 ttl=64 time=313 ms
64 bytes from 10.81.80.142: icmp_seq=2 ttl=64 time=1148 ms
64 bytes from 10.81.80.142: icmp_seq=3 ttl=64 time=151 ms
64 bytes from 10.81.80.142: icmp_seq=4 ttl=64 time=790 ms
64 bytes from 10.81.80.142: icmp_seq=5 ttl=64 time=40.0 ms
64 bytes from 10.81.80.142: icmp_seq=6 ttl=64 time=1039 ms
64 bytes from 10.81.80.142: icmp_seq=7 ttl=64 time=40.4 ms
64 bytes from 10.81.80.142: icmp_seq=8 ttl=64 time=440 ms
64 bytes from 10.81.80.142: icmp_seq=9 ttl=64 time=1040 ms
64 bytes from 10.81.80.142: icmp_seq=10 ttl=64 time=44.2 ms
--- 10.81.80.142 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms
rtt min/avg/max/mdev = 40.074/504.812/1148.243/433.299 ms, pipe 2
When the limit is set to 10GHz or more.
vManage# ping 10.81.80.142 vpn 512 count 10
Ping in VPN 512
PING 10.81.80.142 (10.81.80.142) 56(84) bytes of data.
64 bytes from 10.81.80.142: icmp_seq=1 ttl=64 time=0.359 ms
64 bytes from 10.81.80.142: icmp_seq=2 ttl=64 time=0.226 ms
64 bytes from 10.81.80.142: icmp_seq=3 ttl=64 time=0.290 ms
64 bytes from 10.81.80.142: icmp_seq=4 ttl=64 time=0.267 ms
64 bytes from 10.81.80.142: icmp_seq=5 ttl=64 time=0.295 ms
64 bytes from 10.81.80.142: icmp_seq=6 ttl=64 time=0.277 ms
64 bytes from 10.81.80.142: icmp_seq=7 ttl=64 time=0.222 ms
64 bytes from 10.81.80.142: icmp_seq=8 ttl=64 time=0.265 ms
64 bytes from 10.81.80.142: icmp_seq=9 ttl=64 time=0.285 ms
64 bytes from 10.81.80.142: icmp_seq=10 ttl=64 time=0.265 ms
--- 10.81.80.142 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9000ms
rtt min/avg/max/mdev = 0.222/0.275/0.359/0.037 ms
04-16-2019 11:53 AM - edited 04-16-2019 01:31 PM
In the lab we have VMware ESXi, 6.0.0, 7967664.
On a vEdge cloud with 2 CPUs I did pings on vpn 0 and vpn 512.
The CPUs show 2992 MHz used.
vEdgeCloud2# vsh vEdgeCloud2:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Stepping: 2 CPU MHz: 2494.224 BogoMIPS: 4988.44 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K vEdgeCloud2:~$
The pings on VPN0 and VPN512 to vManage are below. Do not see outliers above 1000ms.
vEdgeCloud2# ping vpn 512 10.48.87.227 Ping in VPN 512 PING 10.48.87.227 (10.48.87.227) 56(84) bytes of data. 64 bytes from 10.48.87.227: icmp_seq=1 ttl=64 time=0.351 ms 64 bytes from 10.48.87.227: icmp_seq=2 ttl=64 time=0.202 ms 64 bytes from 10.48.87.227: icmp_seq=3 ttl=64 time=0.156 ms 64 bytes from 10.48.87.227: icmp_seq=4 ttl=64 time=0.196 ms 64 bytes from 10.48.87.227: icmp_seq=5 ttl=64 time=0.205 ms vEdgeCloud2# ping 192.168.0.227 Ping in VPN 0 PING 192.168.0.227 (192.168.0.227) 56(84) bytes of data. 64 bytes from 192.168.0.227: icmp_seq=1 ttl=63 time=0.466 ms 64 bytes from 192.168.0.227: icmp_seq=2 ttl=63 time=0.606 ms 64 bytes from 192.168.0.227: icmp_seq=3 ttl=63 time=0.499 ms 64 bytes from 192.168.0.227: icmp_seq=4 ttl=63 time=0.394 ms 64 bytes from 192.168.0.227: icmp_seq=5 ttl=63 time=0.633 ms
A cloud vEdge and vBond both use the DPDK driver for packet handling.
Here are pings from vManage to the cloud vEdge.
vmanage# ping vpn 0 10.10.10.233 Ping in VPN 0 PING 10.10.10.233 (10.10.10.233) 56(84) bytes of data. 64 bytes from 10.10.10.233: icmp_seq=1 ttl=64 time=0.834 ms 64 bytes from 10.10.10.233: icmp_seq=2 ttl=64 time=0.919 ms 64 bytes from 10.10.10.233: icmp_seq=3 ttl=64 time=0.744 ms 64 bytes from 10.10.10.233: icmp_seq=4 ttl=64 time=0.654 ms 64 bytes from 10.10.10.233: icmp_seq=5 ttl=64 time=0.613 ms 64 bytes from 10.10.10.233: icmp_seq=6 ttl=64 time=0.592 ms ^C --- 10.10.10.233 ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5000ms rtt min/avg/max/mdev = 0.592/0.726/0.919/0.119 ms vmanage# ping vpn 512 10.48.87.233 Ping in VPN 512 PING 10.48.87.233 (10.48.87.233) 56(84) bytes of data. 64 bytes from 10.48.87.233: icmp_seq=1 ttl=64 time=0.255 ms 64 bytes from 10.48.87.233: icmp_seq=2 ttl=64 time=0.205 ms 64 bytes from 10.48.87.233: icmp_seq=3 ttl=64 time=0.157 ms 64 bytes from 10.48.87.233: icmp_seq=4 ttl=64 time=0.214 ms ^C --- 10.48.87.233 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.157/0.207/0.255/0.039 ms vmanage#
The latency seems rather big.
04-16-2019 12:38 PM - edited 04-16-2019 12:39 PM
Hi Danny,
in my case is ping response quick (I'm not limiting CPUs, see below). The question is, why is CPU usage too high (at the end is my findings, maybe correct). I have read this document: https://www.cisco.com/c/en/us/support/docs/routers/vedge-router/213351-understand-high-cpu-utilization-that-is.html
vEdgeCloud with 2 CPUs (lab, no data plane traffic):
top - 20:15:29 up 9:12, 1 user, load average: 2.16, 2.19, 2.15 Tasks: 130 total, 4 running, 126 sleeping, 0 stopped, 0 zombie Cpu0 : 2.0%us, 4.3%sy, 0.0%ni, 93.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 81.7%us, 18.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1909344k total, 1289140k used, 620204k free, 77468k buffers Swap: 0k total, 0k used, 0k free, 234184k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 697 root 20 0 795m 73m 34m R 100 3.9 552:28.75 fp-um-1 <cut>
vEdgeCloud with 4 CPUs (lab, no data plane traffic)
top - 20:20:08 up 9:16, 1 user, load average: 4.67, 4.23, 4.11 Tasks: 143 total, 3 running, 140 sleeping, 0 stopped, 0 zombie Cpu0 : 1.9%us, 4.8%sy, 0.0%ni, 93.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 82.3%us, 17.6%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 78.6%us, 21.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 79.1%us, 20.9%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1909344k total, 1322380k used, 586964k free, 81332k buffers Swap: 0k total, 0k used, 0k free, 250196k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 709 root 20 0 944m 73m 35m R 301 4.0 1671:16 fp-um-1 <cut>
Based on above linked document (part #1):
Based on linked document (part #2):
Can you confirm me my findings?
The reason of 100% CPU usage (for data plane) is packet processing running in a loop (fp-um process). In other words, it is desired behavior. Right?
Question: Exists any guideline from Cisco for limiting CPU in VM environment for vEdgeCloud routers? or best guideline is unlimited settings?
martin
04-16-2019 01:35 PM
Yes, your findings are correct. The data plane processors run high due to constant polling and the control processor is virtually idle in absence of control traffic. The system status shows #control versus # data processors :
vEdgeCloud2# show system status Viptela (tm) vedge Operating System Software Copyright (c) 2013-2017 by Viptela, Inc. Controller Compatibility: Version: 18.4.1 Build: 29 System logging to host is disabled System logging to disk is enabled System state: GREEN. All daemons up System FIPS state: Enabled Last reboot: Unknown. Core files found CPU-reported reboot: Not Applicable Boot loader version: Not applicable System uptime: 18 days 04 hrs 14 min 10 sec Current time: Tue Apr 16 22:34:26 CEST 2019 Load average: 1 minute: 1.10, 5 minutes: 1.20, 15 minutes: 1.17 Processes: 193 total CPU allocation: 2 total, 1 control, 1 data CPU states: 1.98% user, 4.30% system, 93.70% idle Memory usage: 5963804K total, 2212872K used, 2933796K free 106852K buffers, 710284K cache Disk usage: Filesystem Size Used Avail Use % Mounted on /dev/root 7615M 1011M 6178M 14% / Personality: vedge Model name: vedge-cloud Services: None vManaged: false Commit pending: false Configuration template: None vEdgeCloud2#
04-16-2019 10:12 PM
Hi Danny
Like I mentioned earlier we are using ESXI 6.5. Also, the CPU utilisation is nominal on Shell or CLI promt. But on VSphere the utilisation peaks 8.9 GHz in case CPU allocated in Infinite (With no limit) and Confining it to any value below 8.9GHz causes SSH Access and ICMP delay (Refer Attachment). Pretty sure this will affect data traffic too.
BR02A:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 2095.078
BogoMIPS: 4190.15
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 39424K
Regards
Ajinkya P.
04-17-2019 12:59 AM - edited 04-17-2019 01:00 AM
Hi Ajinkya,
8GHz is normal because your are using 4 CPUs per vEdge. Each CPU is in your case 2.1GHz. It is ~8GHz in summary. Share your CPUs usage with "top" command and then press key "1" (show all CPUs).
Here is my output for vEdgeCloud with 2 and 4 cpus (no data plane traffic):
2x CPU (ESX shows 2.6GHz):
vEdge41:~$ lscpu | egrep "CPU\(s\)|Model\ name:" CPU(s): 2 On-line CPU(s) list: 0,1 Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
top - 10:43:32 up 17:29, 1 user, load average: 2.10, 2.12, 2.11 Tasks: 135 total, 2 running, 133 sleeping, 0 stopped, 0 zombie Cpu0 : 3.0%us, 5.0%sy, 0.0%ni, 92.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 80.5%us, 19.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1905884k total, 1331648k used, 574236k free, 94204k buffers Swap: 0k total, 0k used, 0k free, 251004k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 683 root 20 0 793m 73m 35m R 100 4.0 1050:04 fp-um-1
4x CPU - the same vEdge (ESX shows 7.5GHz):
vEdge41:~$ lscpu | egrep "CPU\(s\)|Model\ name:"
CPU(s): 4
On-line CPU(s) list: 0-3
Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
top - 10:52:32 up 1 min, 1 user, load average: 3.30, 1.11, 0.39 Tasks: 150 total, 2 running, 148 sleeping, 0 stopped, 0 zombie Cpu0 : 11.8%us, 11.8%sy, 0.0%ni, 75.9%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 70.8%us, 19.1%sy, 0.0%ni, 10.0%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 64.3%us, 24.3%sy, 0.0%ni, 11.1%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 64.8%us, 22.8%sy, 0.0%ni, 12.4%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1905884k total, 1271804k used, 634080k free, 41224k buffers Swap: 0k total, 0k used, 0k free, 250120k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 709 root 20 0 942m 74m 35m R 300 4.0 4:07.54 fp-um-1
As we discussed earlier, CPUs used for data plane have higher utilization (because fm-um process). Fisrt CPU (Cpu0) is used only for control plane with "normal" utilization. The reason of this: VMs have no HW components for "efective" packet-forwarding and solution for "efective packet-forwarding" is SW process running in loop => high CPU load.
martin
04-17-2019 12:30 PM
For data plane performance, we should sink/originate traffic on a service VPN. All test results thus far target the control processor. ICMP, ssh to the device, etc...
Data processor performance should be tested as follows :
host 1 --- VPN 1 --- vEdge1 --- transport VPN 0 --- vEdge2 --- VPN 1 --- host 2
pings from host 1 to host 2 should show dataplane behaviour.
04-16-2019 12:52 AM
Hi,
did you solve this issue? I'm also fighting with the same behavior.
martin
04-16-2019 10:30 AM
Hi Martin
I have not been able to resolve this issue. I have tried all version from 17.x to 18.4.x. All have same issue on ESXI 6.5.
Cisco folks are not replying too. :(
04-16-2019 11:17 AM
When it's in production with support, open SR on TAC. My scenario is in lab.
04-16-2019 10:34 AM - edited 04-16-2019 10:35 AM
08-13-2021 07:56 AM
For the guys which read this discussion :
There is a command which reduces the CPU consumption :
vedge# config Entering configuration mode terminal vedge(config)# system eco-friendly-mode vedge(config-system)# commit
10-18-2023 01:18 AM
很好,解决了我的问题。
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide