07-07-2023 10:35 PM - edited 07-07-2023 10:37 PM
vmanage - 20.9.3
cedges - 20.9.3a
I noticed that with AAR policy after reloading device all of the traffic is dropping for 15-20 minutes
I started packet-trace and saw - DROP 483 (SdwanDataPolicyDrop)
show platform packet-trace packet 1
Packet: 1 CBUG ID: 1
Summary
Input : GigabitEthernet0/0/0.920
Output : GigabitEthernet0/0/0.920
State : DROP 483 (SdwanDataPolicyDrop)
Timestamp
Start : 560305080124 ns (07/08/2023 05:12:19.338387 UTC)
Stop : 560305192379 ns (07/08/2023 05:12:19.338500 UTC)
Path Trace
Feature: IPV4(Input)
Input : GigabitEthernet0/0/0.920
Output : <unknown>
Source : 172.26.98.4
Destination : 172.18.7.22
Protocol : 1 (ICMP)
Feature: CFT
API : cft_handle_pkt
packet capabilities : 0x0000018c
input vrf_idx : 0
calling feature : STILE
direction : Input
triplet.vrf_idx : 6
triplet.network_start : 0x100bf92
triplet.triplet_flags : 0x00000000
triplet.counter : 26
cft_bucket_number : 1313395
cft_l3_payload_size : 64
cft_pkt_ind_flags : 0x00000000
cft_pkt_ind_valid : 0x00000931
tuple.src_ip : 172.26.98.4
tuple.dst_ip : 172.18.7.22
tuple.src_port : 5060
tuple.dst_port : 51060
tuple.vrfid : 4
tuple.l4_protocol : ICMP
tuple.l3_protocol : IPV4
vrf_nums : 1
pkt_sb.num_flows : 0
pkt_sb.tuple_epoch : 26
returned cft_error : 14
returned fid : 0
Feature: NBAR
Packet number in flow: N/A
Classification state: Final
Classification name: ping
Classification ID: 1404 [CANA-L7:479]
Candidate classification sources:
N/A
Classification visibility name: ping
Classification visibility ID: 1404 [CANA-L7:479]
Number of matched sub-classifications: 0
Number of extracted fields: 0
Is PA (split) packet: False
Is FIF (first in flow) packet: False
TPH-MQC bitmask value: 0x0
Source MAC address: 70:0B:4F:FF:C7:C1
Destination MAC address: 00:87:64:80:06:30
Traffic Categories:
ms-office-365/category: unset
ms-office-365/service-area: unset
sdavc/feed-id: 0
webex/region: 0
Feature: SDWAN App Route Policy
VPN ID : 15
VRF : 6
Policy Name : _VPN-12_Branch-Voice_AAR-VOIP-BRANCH_VPN-10-11_15_Branch_AAR-DATA-BRANCH-VPN-10-11_15_Branch (CG:3)
Seq : 1
Req SLA : Default (1)
Act SLA : __all_tunnels__ (0)
Policy Flags : 0x21
Fallback to best Path : no
SLA Strict : Yes
Actual Color : Undetermined (0)
Preferred Color : biz-internet public-internet (0x30)
Tunnel Match Reason : MATCHED_NONE_SLA_STRICT
I use AAR to force voip traffic to be routed to the mpls channel, and prevent the rest of the traffic from using the mpls channel
sh sdwan policy from-vsmart
from-vsmart sla-class Default
loss 25
latency 300
jitter 100
from-vsmart sla-class Realtime
loss 1
latency 150
jitter 30
from-vsmart app-route-policy _VPN-12_Branch-Voice_AAR-VOIP-BRANCH_VPN-10-11_13_15-16_Branch_AAR-DATA-BRANCH
vpn-list VPN-10-11_13_15-16_Branch
sequence 1
match
source-data-prefix-list aar-data-global
destination-ip 0.0.0.0/0
action
sla-class Default
sla-class strict
sla-class preferred-color biz-internet public-internet
vpn-list VPN-12_Branch-Voice
sequence 1
match
source-ip 10.10.0.0/16
destination-ip 10.10.0.0/16
action
backup-sla-preferred-color biz-internet public-internet
sla-class Realtime
no sla-class strict
sla-class preferred-color mpls
from-vsmart lists vpn-list VPN-10-11_13_15-16_Branch
vpn 10-11
vpn 13
vpn 15-16
from-vsmart lists vpn-list VPN-12_Branch-Voice
vpn 12
from-vsmart lists data-prefix-list aar-data-global
ip-prefix 172.16.0.0/12
ip-prefix 192.168.0.0/19
Solved! Go to Solution.
07-14-2023 01:23 AM
Based on our investigation, it looks like misbehavior.
Remote device failed -> BFDs go down -> local device still tries to create tunnel to previously known devices inform -> local device counts SLA parameters for next poll intervals and include them in SLA measurement. And this happens due to OMP graceful restart (known TLOCs are not purged when OMP peering is down - reasonable).
Misbehavior is remote devices still include poll intervals for calculation, while BFD is down (100% loss).
07-08-2023 12:41 AM
It happens when I enable option Strict/Drop, if I change to Load Balance traffic goes, but I notice that mpls channel starts to use for forwarding not only for voip traffic
07-08-2023 07:44 AM - edited 07-08-2023 07:46 AM
Did you do packet trace on reloaded device or one of the remote devices? And what is your BFD parameters (poll interval and multiplier)?
07-08-2023 08:24 AM - edited 07-08-2023 08:25 AM
I did it after reboting, but when I had vmanage and cedges version 20.7 it worked normal, but now after reloading device it waits Poll Interval. stupid behaviour after rebooting... cisco sdwan becames worse (it's my opinion. I compare with vmware, I use it for 100 branches).
Default poll interval 10 minutes so traffic doesn't go after reboot for 10 minutes
07-09-2023 01:19 AM
So, rebooting device and device where you did trace is the same right?
What I suspects, when remote device fails, tunnels (bfd) go down for that device (as expected). But also any local device which had information about that remote node (which reloaded), counts poll interval results (%-loss) for previously known tunnels (tloc to tloc).
07-09-2023 07:17 AM
Yes it's the same, I repeated it for some devices, c8000v, 1111x, 4331. They have the same behaviour, after reloading traffic is dropping until one poll interval expires. It happens only when aar has action Strict/Drop
07-09-2023 01:28 PM
If the same, it is strange behavior. For remote sites, it can be understandable because of previously known BFD information and poll interval, but for local route (where reboot happens) it is strange.
Share "show sdwan app-route stats" and "show sdwan app-route sla" outputs immediately after reboot.
07-09-2023 11:44 PM
I'll try to do it today (show sdwan app-route stats)
should command "show sdwan app-route sla" show something especial?
show sdwan app-route sla
APP PROBE
INDEX NAME LOSS LATENCY JITTER CLASS ID APP PROBE CLASS FALLBACK BEST TUNNEL
-------------------------------------------------------------------------------------------------------------------------------------------
0 __all_tunnels__ 0 0 0 0 None None
1 Default 25 300 100 0 None None
2 Realtime 1 150 30 0 None None
07-11-2023 04:48 AM
No,this command gives information about SLA classes and then we may use to compare with actual tunnel values to understand whether tunnel meets SLA or not.
Do reboot and show result of show sdwan app-route stats immediately after it.
07-11-2023 08:01 AM
show sdwan app-route stats remote-system-ip 10.80.100.102
app-route statistics 10.20.10.10 10.10.10.10 ipsec 12346 12346
remote-system-ip 10.80.100.102
local-color biz-internet
remote-color public-internet
sla-class-index 0
fallback-sla-class-index None
app-probe-class-list None
mean-loss 0
mean-latency 0
mean-jitter 0
interval 0
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
interval 1
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
app-route statistics 10.30.10.10 10.10.10.10 ipsec 12346 12346
remote-system-ip 10.80.100.102
local-color public-internet
remote-color public-internet
sla-class-index 0
fallback-sla-class-index None
app-probe-class-list None
mean-loss 0
mean-latency 0
mean-jitter 0
interval 0
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
interval 1
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
app-route statistics 192.168.1.198 192.168.1.219 ipsec 12346 12346
remote-system-ip 10.80.100.102
local-color mpls
remote-color mpls
sla-class-index 0
fallback-sla-class-index None
app-probe-class-list None
mean-loss 0
mean-latency 0
mean-jitter 0
interval 0
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
interval 1
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
I did after restarting immediately and there weren't packets until first poll interval left. If I change action AAR to load balance there will be packets immediately after restarting
07-11-2023 01:30 PM
Please, share several outputs result within 10min time frame.
For example, output after 3minute, 5minute, 8minute.
Are tunnels(bfd) UP during the first 10min (poll interval)?
What are the bfd configuration parameters?
Interval 10min, multiplier 2?
07-11-2023 07:37 PM
After starting all of the bfd's are up. bfd was configurated with Interval 5 min, multiplier 2
exactly after 5min I can see increase packets
show sdwan app-route stats remote-system-ip 10.80.100.102
app-route statistics 10.20.10.10 10.10.10.10 ipsec 12346 12346
remote-system-ip 10.80.100.102
local-color biz-internet
remote-color public-internet
sla-class-index 0
fallback-sla-class-index None
app-probe-class-list None
mean-loss 1
mean-latency 1
mean-jitter 0
interval 0
total-packets 166
loss 3
average-latency 1
average-jitter 0
tx-data-pkts 809
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
interval 1
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
app-route statistics 10.30.10.10 10.10.10.10 ipsec 12346 12346
remote-system-ip 10.80.100.102
local-color public-internet
remote-color public-internet
sla-class-index 0
fallback-sla-class-index None
app-probe-class-list None
mean-loss 0
mean-latency 1
mean-jitter 0
interval 0
total-packets 167
loss 1
average-latency 1
average-jitter 0
tx-data-pkts 1134
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
interval 1
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
app-route statistics 192.168.1.198 192.168.1.219 ipsec 12346 12346
remote-system-ip 10.80.100.102
local-color mpls
remote-color mpls
sla-class-index 0
fallback-sla-class-index None
app-probe-class-list None
mean-loss 0
mean-latency 0
mean-jitter 0
interval 0
total-packets 166
loss 1
average-latency 0
average-jitter 0
tx-data-pkts 169
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
interval 1
total-packets 0
loss 0
average-latency 0
average-jitter 0
tx-data-pkts 0
rx-data-pkts 0
ipv6-tx-data-pkts 0
ipv6-rx-data-pkts 0
07-12-2023 01:28 AM
You dont have rx-data,strange.
What do you see in remote node (10.80.100.102)? This output, bfd result etc.
07-12-2023 01:56 AM
I'll check and share with you
Hm, maybe is it new behaviour for AAR strict? it depends on 1st poll interval. I mean if I change poll interval to 10 min traffic doesn't sent until 1st poll interval is gone
07-12-2023 02:52 AM - edited 07-12-2023 12:22 PM
I don't think so. Because it is "bad user experience" when customer should wait poll interval. Let's see what happens on remote node after reboot of local router. Check the same output result on (sysIP) 10.80.100.102 , while local device is in reboot state, after reboot, within first poll interval.
I suspect, problem in return traffic. This also can be verified by capturing on remote device.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide