01-28-2012 11:05 AM - edited 03-04-2019 03:03 PM
I have a customer with a main office and 9 branch offices. Their phone system is an NEC NetLink, which I admittedly know little about as I was not involved in its implementation. The main office has an ASR 1001 with a trunk ethernet handoff to Time Warner, and each branch office has 2921's with access handoffs to Time Warner. The branches are a mix of 5Mbps and 50Mbps circuits, and the 5Mbps circuits often get saturated, resulting in the phone system going offline.
The way the phone guy explained it to me is that there are hosts at each branch that communicate back to the 'main' host via a TCP heartbeat. Within the phone system itself, all Signaling (heartbeat) traffic is marked with DiffServ 24 (CS3) and the rtp voice traffic is marked with DiffServ 46 (EF). The Time Warner 'cloud' is nothing more than L2 VLANs using QinQ to provide virtual point to point links, with no QoS in between (just traffic policing to ensure we stay near our CIR).
Here is the sanitized config for the main office ASR:
!
class-map match-any voip-signaling
match dscp cs3
class-map match-any voip-rtp
match dscp ef
class-map match-any phone
match access-group 103
match qos-group 46
match qos-group 24
match dscp ef
match dscp cs3
!
policy-map Branch1
class phone
priority 384 2048
policy-map Branch2
class phone
priority 384 2048
policy-map Branch3
class phone
priority 384 2048
policy-map Branch4
class phone
priority 384 2048
policy-map Branch5
class phone
priority 384 2048
policy-map Branch6
class phone
priority 384 2048
policy-map Branch7-voip
class voip-signaling
bandwidth 500
set dscp cs3
class voip-rtp
set dscp ef
priority 2000
policy-map Branch8
class phone
priority 2048
policy-map Branch9
class phone
priority 384 2048
policy-map Branch1-Shape
class class-default
shape average 50000000
service-policy Branch1
policy-map Branch2-Shape
class class-default
shape average 5000000
service-policy Branch2
policy-map Branch3-Shape
class class-default
shape average 5000000
service-policy Branch3
policy-map Branch4-Shape
class class-default
shape average 5000000
service-policy Branch4
policy-map Branch5-Shape
class class-default
shape average 50000000
service-policy Branch5
policy-map Branch6-Shape
class class-default
shape average 50000000
service-policy Branch6
policy-map Branch7-Shape
class class-default
shape average 5000000
service-policy Branch7-voip
policy-map Branch8-Shape
class class-default
shape average 50000000
service-policy Branch8
policy-map Branch9-Shape
class class-default
shape average 5000000
service-policy Branch9
!
interface GigabitEthernet0/0/1
no ip address
speed 1000
no negotiation auto
!
interface GigabitEthernet0/0/1.1525
description Branch1 50Mbps link
bandwidth 50000
encapsulation dot1Q 1525
ip address 192.168.254.5 255.255.255.252
service-policy output Branch1-Shape
!
interface GigabitEthernet0/0/1.1526
description Branch3 5Mbps Link
bandwidth 5000
encapsulation dot1Q 1526
ip address 192.168.254.13 255.255.255.252
service-policy output Branch3-Shape
!
interface GigabitEthernet0/0/1.1527
description Branch25Mbps Link
bandwidth 5000
encapsulation dot1Q 1527
ip address 192.168.254.9 255.255.255.252
service-policy output Branch2-Shape
!
interface GigabitEthernet0/0/1.1528
description Branch4 5Mbps Link
bandwidth 5000
encapsulation dot1Q 1528
ip address 192.168.254.17 255.255.255.252
service-policy output Branch4-Shape
!
interface GigabitEthernet0/0/1.1529
description Branch5 50Mbps Link
bandwidth 50000
encapsulation dot1Q 1529
ip address 192.168.254.21 255.255.255.252
service-policy output Branch5-Shape
!
interface GigabitEthernet0/0/1.1530
description Branch6 50Mbps Link
bandwidth 50000
encapsulation dot1Q 1530
ip address 192.168.254.25 255.255.255.252
service-policy output Branch6-Shape
!
interface GigabitEthernet0/0/1.1531
description Branch7 5Mbps Link
bandwidth 5000
encapsulation dot1Q 1531
ip address 192.168.254.29 255.255.255.252
service-policy output Branch7-Shape
!
interface GigabitEthernet0/0/1.1532
description Branch8 50Mbps Link
bandwidth 50000
encapsulation dot1Q 1532
ip address 192.168.254.33 255.255.255.252
service-policy output Branch8-Shape
!
interface GigabitEthernet0/0/1.1533
description Branch9 5Mbps Link
bandwidth 5000
encapsulation dot1Q 1533
ip address 192.168.254.37 255.255.255.252
service-policy output Branch9-Shape
!
The two branches experiencing outages are Branch3 and Branch7. During the outages, I can see that the point to point circuit for that branch is completely saturated (from main office to branch...very low usage the other way). The strange thing is that Branch2 and Branch4 appear to be unaffected, although they also have 5Mbps circuits that actually seem to stay saturated more than Branch3 and Branch7. Based on the config snippet above, does this seem like a QoS problem on the ASR or misconfiguration in the phone system? I lean toward the latter, but I keep getting pushback that it is a network issue, and as you can see above, I changed the policy-map for Branch7, but it made no difference whatsoever.
01-28-2012 01:42 PM
Hello,
where is your phone system located at the HQ? Do you have classification and Marking at the Interface connected to the Phone System? the Voice VLAN interface , Can you post the complet config highliting the VOice LAN interface?
Regards,
Mohamed
01-28-2012 07:15 PM
The phone system is comprised of several NEC SV8100's (one at each branch), with the 'main' one being at Branch8. As I said before, I was not involved in the implementation, or it would have been at the main branch. As it is now, all voice traffic must traverse the circuit back to the main office, then the circuit to Branch8, which is not very efficient. That said, the link to Branch8 is nevery completely saturated (seldom exceeds 50% of the CIR), and I can see very clearly that the point of contention is the egress interface of the ASR.
This is a pretty flat network, too. Everything rides on the default VLAN (1), and the switches are a mix of Netgear and Cisco (2950) with very flat configs. The tagging I referred to is supposedly done by the phone system itself.
01-29-2012 04:59 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
What's curious is your description that two branches have issues with link saturation and two don't.
Is it possible there's branch to branch traffic? If there is, it's possible when a branch is saturated from the HQ, another branch oversubscribes bandwidth the the same destination branch.
Have you confirmed/tested you can obtain same bandwidth to "like" branches? It's possible your service provider's policing (I'm assuming branches with 5 Mbps have "faster" physical handoffs) differs between 5 Mbps links.
In your last post, when you describe ". . . I can see very clearly that the point of contention is the egress interface of the ASR.", do you refer to you subinterfaces with their shapers or the physical interface itself? Does the HQ link also have less than physical interface bandwidth?
Have you confirmed the problem HQ subinterfaces, when congested, show traffic flowing through LLQ class? Have you also confirmed LLQ class isn't discarding packets? LLQ is configured with non-default (?) Bc, if so, why? (NB: on Bc question, there may be nothing wrong with the Bc setting, but if it has been manually configured, would like to understand the reasoning for that setting.)
01-29-2012 08:01 AM
Branch to branch traffic still passes through the HQ, as this is not an MPLS cloud, but rather point to point links (logical star). We use MRTG to graph out bandwidth utilization, and the egress traffic for the ASR's subinterface connecting to Branch4 actually appears to stay saturated longer, yet they say their phones work fine. The more I look at it, the more it points to differences in the actual phone system between branches, as only 2 branches suffer if the QoS policies match across the board. The physical interface of the ASR is a 1Gbps full-duplex handoff, with the physical interfaces for all branches (5Mbps or 50Mbps) at 100Mbps full-duplex. I'm not sure about the 50Mbps links, but the ISP is policing traffic on the 5Mbps links to ensure I stay under 6Mbps. I'm using 'shape average 5000000' to keep the egress traffic from exceeding 5Mbps on those links, so I could try to raise it closer to the ISP's limit of 6Mbps, but our CIR is 5Mbps. According to MRTG, the router is doing what it is told, as I will see utilization spike to 5Mbps and sustain that level for an hour straight, but it never goes over 5Mbps.
The saturation is mostly in the morning, as all the PCs get powered on around the same time and many of the staff download some large reports regarding the previous day's activity. These reports are the cause of the congestion, and while they are being downloaded, the phones suffer. Sometimes it is just garbled voice from rtp packets being dropped and sometimes the system reboots at one branch, with the latter being more often.
There is a TCP heartbeat from each branch back to Branch8, and if this heartbeat is lost, the local system reboots into a standalone operating mode. Once the heartbeat is re-established, the system reboots again to return to a master/slave operating mode.
01-29-2012 06:54 PM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
The easiest explanation is something does involve the different in phone systems, if that is truly the only real difference between sites that have this issue and sites that don't.
You may want to further investigate VoIP packets are being properly marked. You might try defining SLA tests using the same markings.
Your MRTG graphs cover both sides of the link and confirm one sides egress matches the other sides ingess?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide