cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1965
Views
0
Helpful
13
Replies

vManage TLOCs High Latency

maverick0
Level 1
Level 1

I'm trying to understand the cause of the high latency that I'm seeing in the latency graph on vManage.

I have a few questions about how vManage collects statistics from devices:

  1. Is the loss, latency, and jitter information collected from BFD Poll Interval?
  2. When I perform a test from the transport, I don't see high latency over the transport.
  3. Does vManage test the underlay?

Please see the attachment.

 

13 Replies 13

Hi,

1) what vManages shows is based on BFD. By default, each poll-interval is 10min and 6x poll-interval are used (60min). Mean values are considered as a result. Your picture shows tunnel statistics so BFD between tunnel endpoints (respective TLOC of routers) are taken.

2) how do you tests? are you sure which exit interface router uses in your tests?

3) vManage does not do separate test

See "Bidirectional Forwarding Detection (BFD) " section from SD-WAN CVD

https://www.cisco.com/c/en/us/td/docs/solutions/CVD/SDWAN/cisco-sdwan-design-guide.html

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

Hi @Kanan Huseynli 

Thank you for your reply.

I opened a ticket with the service provider and they informed me that they did not find any issues in the transport. However, I am still seeing that the latency values remain high.

Because of this, I am investigating the reason why the latency in the overlay is so high even though the transport is not showing the same behavior.

Sincerely.

svemulap@cisco.com
Cisco Employee
Cisco Employee
hi maverick0,

for 1: BFD is run on the edge device on the data path. You can see these values with show (sdwan) app-route stats CLI command
It will show values that it gets between two end-points.
Sample output from a lab cEdge node:

EDGE-BRANCH-03-1#show sdwan app-route stats
app-route statistics 10.2.9.2 10.2.5.2 ipsec 12346 12366
remote-system-ip 1.1.10.1
local-color mpls
remote-color mpls
sla-class-index 0
fallback-sla-class-index None
app-probe-class-list None
mean-loss 0
mean-latency 1
mean-jitter 1
interval 0
total-packets 661
loss 0
average-latency 1
average-jitter 1
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
--More--

for 2: It is not an exact comparison. You need to compare from one Public-IP to the other end of the Public-IP.

for 3: All vManage is doing is getting the data from the Edge device and displaying it on the GUI.

In addition to the design guide that is mentioned in the other eMail, take a look into:

https://www.cisco.com/c/en/us/support/docs/routers/sd-wan/214510-troubleshoot-bidirectional-forwarding-de.html

HTH

Hi svemulap@cisco.com 

Thank you for your reply.

What I am trying to understand is the values that vManage is showing me in the UI whether they are collected based on all bfd sessions or by transport.

When I go to Monitor -> Network -> Select Device -> WAN -> TLOC, in the table below the vManage shows me the statistics by color. These values are being collected from BFD in each transport?

Please see the attachment and you will see the custom2 is showing high latency value (2,814.59ms), the question is why?

Sincerely.

 

 

 

 

Hi,

it is based on aggregated average loss or latency/jitter information for TLOC color.  he time interval in the graph is determined by the value of the BFD application-aware routing poll interval. Check below subs-section.

https://www.cisco.com/c/en/us/td/docs/routers/sdwan/configuration/Monitor-And-Maintain/monitor-maintain-book/m-network.html#concept_nh2_my2_tlb

Regarding, why you have higher latency for custom2, check all tunnels statistics to see which exact tunnel (which is initiated by custom2 TLOC) have higher latency.

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

maverick0
Level 1
Level 1

I was checking the vEdge's events and I have found some entries related to high CPU (system + user). The vEdge 1000 has only one CPU for both control and data plane. This could be affect the tunnels?

Hi maverick0,

A clarification for your statement:
> The vEdge 1000 has only one CPU for both control and data plane.
No. vEdge 1K has 4 vCPUs [3 for data and 1 for control]

vEdge1K# show system status | inc "CPU allo"
CPU allocation: 4 total, 1 control, 3 data
vEdge1K#

vEdge1K# show hardware inventory

HW
DEV
HW TYPE INDEX VERSION PART NUMBER PART INFO SERIAL NUMBER HW DESCRIPTION
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Chassis 0 7.1 vEdge-1000 vEdge-1000 11OG5xxxxxx. vEdge-1000. CPLD rev: 0xB, PCB rev: G.

 

I can see in the show system status output that there are 4, but I not sure if this number represents physical CPUs or cores. I am asking because when I run the "top" command in the vshell, I can only see one CPU as shown in the attachment.

show system status is accurate.
vE1K is a 4 core node (1 control and 3 data)

Good to know.

I have observed some critical events related to CPU usage in the vEdge device, but I am not sure if they are causing the high latency in a tunnel.

There is a tunnel that is showing high latency more than 2000ms, but in the only one direction.

This vEdge model supports how many tunnels?

Could high CPU usage be causing the one-way latency?

Hi,

latency for TLOC is aggregated value. That's why I asked to search in tunnel list.

What do you mean by one-way, how did you check? If from A to B there is latency, should be from B to A also (normally).

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

Hi @Kanan Huseynli 

Exactly, the latency is from A to B, but not B to A in the same tunnel.

Hi,

could you share screen? In reality, I don't see reason to have such scenario. BFD is calculated 2-way (I didn't find explicit doc, but I believe it works like so) and checked in our environment, the same values from A to B and from B to A (can be +/- very very little difference). Maybe you look incorrect tunnel?

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

Review Cisco Networking for a $25 gift card