Re: VBond and vEdge CPU Usage

Ajinkya.Purohit · ‎02-22-2019

Hi All

We have constructed an On-Prem Viptela Lab using Viptela version 18.4 using VMware with ESXi 6.5. I know that as per Cisco Docs the supported versions are 5.5 and 6.0 but have found many blogs where people have tested the Lab without any issue.

Now the problem is once everything is deployed the VSphere Web GUI shows high CPU usage for vBond and vEdges.

While the vManage and vSmart are running fine at 300 to 500MHz the vBond and vEdges are hogging around 8GHz to 10GHz. If I limit the CPU to 2.3GHz the processes become sluggish. Even the pings initiated from these devices exceed 1000ms to 2000ms.

Note that all the devices in Lab are using 2 CPU Cores.

ekhabaro · ‎02-22-2019

Hi, This article should explain this behavior: https://www.cisco.com/c/en/us/support/docs/routers/vedge-router/213351-understand-high-cpu-utilization-that-is.html

Ajinkya.Purohit · ‎02-22-2019

Hi Ekhabaro

I have already gone through this document and understand the poll-mode driver logic. However that is not the issue here. As per data sheet 2 Core with a limit of 2GHz should be fine for vBond and vEdge.

In my case limiting the CPU to 2GHz or 3GHz makes the cli access, command response etc slow. Thus the transport gets affected by this.

Check below snippet.

While the CPU limit is set to 2.3GHz.

vManage# ping 10.81.80.142 vpn 512 count 10
Ping in VPN 512
PING 10.81.80.142 (10.81.80.142) 56(84) bytes of data.
64 bytes from 10.81.80.142: icmp_seq=1 ttl=64 time=313 ms
64 bytes from 10.81.80.142: icmp_seq=2 ttl=64 time=1148 ms
64 bytes from 10.81.80.142: icmp_seq=3 ttl=64 time=151 ms
64 bytes from 10.81.80.142: icmp_seq=4 ttl=64 time=790 ms
64 bytes from 10.81.80.142: icmp_seq=5 ttl=64 time=40.0 ms
64 bytes from 10.81.80.142: icmp_seq=6 ttl=64 time=1039 ms
64 bytes from 10.81.80.142: icmp_seq=7 ttl=64 time=40.4 ms
64 bytes from 10.81.80.142: icmp_seq=8 ttl=64 time=440 ms
64 bytes from 10.81.80.142: icmp_seq=9 ttl=64 time=1040 ms
64 bytes from 10.81.80.142: icmp_seq=10 ttl=64 time=44.2 ms

--- 10.81.80.142 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms

rtt min/avg/max/mdev = 40.074/504.812/1148.243/433.299 ms, pipe 2

When the limit is set to 10GHz or more.

vManage# ping 10.81.80.142 vpn 512 count 10
Ping in VPN 512
PING 10.81.80.142 (10.81.80.142) 56(84) bytes of data.
64 bytes from 10.81.80.142: icmp_seq=1 ttl=64 time=0.359 ms
64 bytes from 10.81.80.142: icmp_seq=2 ttl=64 time=0.226 ms
64 bytes from 10.81.80.142: icmp_seq=3 ttl=64 time=0.290 ms
64 bytes from 10.81.80.142: icmp_seq=4 ttl=64 time=0.267 ms
64 bytes from 10.81.80.142: icmp_seq=5 ttl=64 time=0.295 ms
64 bytes from 10.81.80.142: icmp_seq=6 ttl=64 time=0.277 ms
64 bytes from 10.81.80.142: icmp_seq=7 ttl=64 time=0.222 ms
64 bytes from 10.81.80.142: icmp_seq=8 ttl=64 time=0.265 ms
64 bytes from 10.81.80.142: icmp_seq=9 ttl=64 time=0.285 ms
64 bytes from 10.81.80.142: icmp_seq=10 ttl=64 time=0.265 ms

--- 10.81.80.142 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9000ms
rtt min/avg/max/mdev = 0.222/0.275/0.359/0.037 ms

Danny De Ridder · ‎04-16-2019

In the lab we have VMware ESXi, 6.0.0, 7967664.

On a vEdge cloud with 2 CPUs I did pings on vpn 0 and vpn 512.

The CPUs show 2992 MHz used.

vEdgeCloud2# vsh
vEdgeCloud2:~$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Stepping:              2
CPU MHz:               2494.224
BogoMIPS:              4988.44
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
vEdgeCloud2:~$

The pings on VPN0 and VPN512 to vManage are below. Do not see outliers above 1000ms.

vEdgeCloud2# ping vpn 512 10.48.87.227
Ping in VPN 512
PING 10.48.87.227 (10.48.87.227) 56(84) bytes of data.
64 bytes from 10.48.87.227: icmp_seq=1 ttl=64 time=0.351 ms
64 bytes from 10.48.87.227: icmp_seq=2 ttl=64 time=0.202 ms
64 bytes from 10.48.87.227: icmp_seq=3 ttl=64 time=0.156 ms
64 bytes from 10.48.87.227: icmp_seq=4 ttl=64 time=0.196 ms
64 bytes from 10.48.87.227: icmp_seq=5 ttl=64 time=0.205 ms

vEdgeCloud2# ping 192.168.0.227
Ping in VPN 0
PING 192.168.0.227 (192.168.0.227) 56(84) bytes of data.
64 bytes from 192.168.0.227: icmp_seq=1 ttl=63 time=0.466 ms
64 bytes from 192.168.0.227: icmp_seq=2 ttl=63 time=0.606 ms
64 bytes from 192.168.0.227: icmp_seq=3 ttl=63 time=0.499 ms
64 bytes from 192.168.0.227: icmp_seq=4 ttl=63 time=0.394 ms
64 bytes from 192.168.0.227: icmp_seq=5 ttl=63 time=0.633 ms

A cloud vEdge and vBond both use the DPDK driver for packet handling.

Here are pings from vManage to the cloud vEdge.

vmanage# ping vpn 0 10.10.10.233
Ping in VPN 0
PING 10.10.10.233 (10.10.10.233) 56(84) bytes of data.
64 bytes from 10.10.10.233: icmp_seq=1 ttl=64 time=0.834 ms
64 bytes from 10.10.10.233: icmp_seq=2 ttl=64 time=0.919 ms
64 bytes from 10.10.10.233: icmp_seq=3 ttl=64 time=0.744 ms
64 bytes from 10.10.10.233: icmp_seq=4 ttl=64 time=0.654 ms
64 bytes from 10.10.10.233: icmp_seq=5 ttl=64 time=0.613 ms
64 bytes from 10.10.10.233: icmp_seq=6 ttl=64 time=0.592 ms
^C
--- 10.10.10.233 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.592/0.726/0.919/0.119 ms
vmanage# ping vpn 512 10.48.87.233
Ping in VPN 512
PING 10.48.87.233 (10.48.87.233) 56(84) bytes of data.
64 bytes from 10.48.87.233: icmp_seq=1 ttl=64 time=0.255 ms
64 bytes from 10.48.87.233: icmp_seq=2 ttl=64 time=0.205 ms
64 bytes from 10.48.87.233: icmp_seq=3 ttl=64 time=0.157 ms
64 bytes from 10.48.87.233: icmp_seq=4 ttl=64 time=0.214 ms
^C
--- 10.48.87.233 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.157/0.207/0.255/0.039 ms
vmanage#

The latency seems rather big.

Martin Kyrc · ‎04-16-2019

Hi Danny,

in my case is ping response quick (I'm not limiting CPUs, see below). The question is, why is CPU usage too high (at the end is my findings, maybe correct). I have read this document: https://www.cisco.com/c/en/us/support/docs/routers/vedge-router/213351-understand-high-cpu-utilization-that-is.html

vEdgeCloud with 2 CPUs (lab, no data plane traffic):

top - 20:15:29 up  9:12,  1 user,  load average: 2.16, 2.19, 2.15
Tasks: 130 total,   4 running, 126 sleeping,   0 stopped,   0 zombie
Cpu0  :  2.0%us,  4.3%sy,  0.0%ni, 93.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 81.7%us, 18.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1909344k total,  1289140k used,   620204k free,    77468k buffers
Swap:        0k total,        0k used,        0k free,   234184k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  697 root      20   0  795m  73m  34m R  100  3.9 552:28.75 fp-um-1
<cut>

vEdgeCloud with 4 CPUs (lab, no data plane traffic)

top - 20:20:08 up  9:16,  1 user,  load average: 4.67, 4.23, 4.11
Tasks: 143 total,   3 running, 140 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.9%us,  4.8%sy,  0.0%ni, 93.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 82.3%us, 17.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 78.6%us, 21.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 79.1%us, 20.9%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1909344k total,  1322380k used,   586964k free,    81332k buffers
Swap:        0k total,        0k used,        0k free,   250196k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  709 root      20   0  944m  73m  35m R  301  4.0   1671:16 fp-um-1
<cut>

Based on above linked document (part #1):

First core (Cpu0) is used for Control plane and has normal utilization
Other cores (Cpu1 or Cpu1, 2, 3, etc) are used for Data plane processing with almost 100% utilization

Based on linked document (part #2):

fp-um process for VM (vEdgeCloud) is ekvivalent to fast-path on HW (vEdgeNNNN).
In other words, it is used for efficient packet processing based on Data Plane Development Kit (DPDK) framework (details here: https://doc.dpdk.org/guides/prog_guide/poll_mode_drv.html).

Can you confirm me my findings?

The reason of 100% CPU usage (for data plane) is packet processing running in a loop (fp-um process). In other words, it is desired behavior. Right?

Question: Exists any guideline from Cisco for limiting CPU in VM environment for vEdgeCloud routers? or best guideline is unlimited settings?

martin

Danny De Ridder · ‎04-16-2019

Yes, your findings are correct. The data plane processors run high due to constant polling and the control processor is virtually idle in absence of control traffic. The system status shows #control versus # data processors :

vEdgeCloud2# show system status

Viptela (tm) vedge Operating System Software
Copyright (c) 2013-2017 by Viptela, Inc.
Controller Compatibility:
Version: 18.4.1
Build: 29


System logging to host  is disabled
System logging to disk is enabled

System state:            GREEN. All daemons up
System FIPS state:       Enabled

Last reboot:             Unknown. Core files found
CPU-reported reboot:     Not Applicable
Boot loader version:     Not applicable
System uptime:           18 days 04 hrs 14 min 10 sec
Current time:            Tue Apr 16 22:34:26 CEST 2019

Load average:            1 minute: 1.10, 5 minutes: 1.20, 15 minutes: 1.17
Processes:               193 total
CPU allocation:          2 total,   1 control,   1 data
CPU states:              1.98% user,   4.30% system,   93.70% idle
Memory usage:            5963804K total,    2212872K used,   2933796K free
                         106852K buffers,  710284K cache

Disk usage:              Filesystem      Size   Used  Avail   Use %  Mounted on
                         /dev/root       7615M  1011M  6178M   14%   /


Personality:             vedge
Model name:              vedge-cloud
Services:                None
vManaged:                false
Commit pending:          false
Configuration template:  None

vEdgeCloud2#

Ajinkya.Purohit · ‎04-16-2019

Hi Danny

Like I mentioned earlier we are using ESXI 6.5. Also, the CPU utilisation is nominal on Shell or CLI promt. But on VSphere the utilisation peaks 8.9 GHz in case CPU allocated in Infinite (With no limit) and Confining it to any value below 8.9GHz causes SSH Access and ICMP delay (Refer Attachment). Pretty sure this will affect data traffic too.

BR02A:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 2095.078
BogoMIPS: 4190.15
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 39424K

Regards

Ajinkya P.

Martin Kyrc · ‎04-17-2019

Hi Ajinkya,

8GHz is normal because your are using 4 CPUs per vEdge. Each CPU is in your case 2.1GHz. It is ~8GHz in summary. Share your CPUs usage with "top" command and then press key "1" (show all CPUs).

Here is my output for vEdgeCloud with 2 and 4 cpus (no data plane traffic):

2x CPU (ESX shows 2.6GHz):

vEdge41:~$ lscpu | egrep "CPU\(s\)|Model\ name:"
CPU(s):                2
On-line CPU(s) list:   0,1
Model name:            Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

top - 10:43:32 up 17:29,  1 user,  load average: 2.10, 2.12, 2.11
Tasks: 135 total,   2 running, 133 sleeping,   0 stopped,   0 zombie
Cpu0  :  3.0%us,  5.0%sy,  0.0%ni, 92.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 80.5%us, 19.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1905884k total,  1331648k used,   574236k free,    94204k buffers
Swap:        0k total,        0k used,        0k free,   251004k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  683 root      20   0  793m  73m  35m R  100  4.0   1050:04 fp-um-1

4x CPU - the same vEdge (ESX shows 7.5GHz):

vEdge41:~$ lscpu | egrep "CPU\(s\)|Model\ name:"
CPU(s): 4
On-line CPU(s) list: 0-3
Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

top - 10:52:32 up 1 min,  1 user,  load average: 3.30, 1.11, 0.39
Tasks: 150 total,   2 running, 148 sleeping,   0 stopped,   0 zombie
Cpu0  : 11.8%us, 11.8%sy,  0.0%ni, 75.9%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 70.8%us, 19.1%sy,  0.0%ni, 10.0%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 64.3%us, 24.3%sy,  0.0%ni, 11.1%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 64.8%us, 22.8%sy,  0.0%ni, 12.4%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1905884k total,  1271804k used,   634080k free,    41224k buffers
Swap:        0k total,        0k used,        0k free,   250120k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  709 root      20   0  942m  74m  35m R  300  4.0   4:07.54 fp-um-1

As we discussed earlier, CPUs used for data plane have higher utilization (because fm-um process). Fisrt CPU (Cpu0) is used only for control plane with "normal" utilization. The reason of this: VMs have no HW components for "efective" packet-forwarding and solution for "efective packet-forwarding" is SW process running in loop => high CPU load.

martin

Danny De Ridder · ‎04-17-2019

For data plane performance, we should sink/originate traffic on a service VPN. All test results thus far target the control processor. ICMP, ssh to the device, etc...

Data processor performance should be tested as follows :

host 1 --- VPN 1 --- vEdge1 --- transport VPN 0 --- vEdge2 --- VPN 1 --- host 2

pings from host 1 to host 2 should show dataplane behaviour.

Martin Kyrc · ‎04-16-2019

Hi,
did you solve this issue? I'm also fighting with the same behavior.

martin

Ajinkya.Purohit · ‎04-16-2019

Hi Martin

I have not been able to resolve this issue. I have tried all version from 17.x to 18.4.x. All have same issue on ESXI 6.5.

Cisco folks are not replying too. :(

Martin Kyrc · ‎04-16-2019

When it's in production with support, open SR on TAC. My scenario is in lab.

Ajinkya.Purohit · ‎04-16-2019

PJO2 · ‎08-13-2021

For the guys which read this discussion :

There is a command which reduces the CPU consumption :

vedge# config
Entering configuration mode terminal
vedge(config)# system eco-friendly-mode
vedge(config-system)# commit

wuhao0015 · ‎10-18-2023

很好，解决了我的问题。