We have been experiencing a 2-5 second delay for the following:
Picking up the handset to get dialtone
If I answer a call and start saying 1-2-3-4-5 the other party will start hearing the conversation when I say 5. This kind of delay isn't as typical as a constant 2 second delay.
We are running a centralized CM 3.3(2) (with latest SP). Also we are using Cat4224 in our 3 remote offices. I've worked with the TAC to try and capture the delays but they are not seeing anything. Also, we've ruled out a high CPU on the CM.
This seems to be a rare issue but it is obviously very frustrating for our users.
We'll need more information.
Some questions for you.
Where does the delay NOT occur, in contrast, where specifically does the delay occur and provide specifics, VLANs, QoS etc.
Has this delay always been evident since the CallManager installation or is it a new(er) issue?
Is the delay IP Phone to IP Phone, IP Phone to PSTN Gateway or BOTH? Code levels of the gateways, etc.
If the delay occurs on the LAN that CallManager is connected, will need more information regarding the LAN infrastructure and the models of the servers in the CallManager cluster and what the NIC of the CCM servers are set too, ie; 100/Full, etc.
If the delay effects only the REMOTE sites, will need more information of the WAN design and routing architecture, the remote LAN infrastructure, VLAN's, etc.
If the delay effects BOTH LAN and Remote/WAN, let's first remedy the LAN/CallManager delay and then reach out to the edge.
This document is very good for AVVID Design:
Voice Quality Support Page:
They delay does not occur on the local LAN, just the remote sites. This has been an issue for about 1 year now and happens to about 25-30% of the calls. The delay happens more with PSTN to IP calls and also internal PBX (through a VG200) to IP. Each remote office has a Cat4224. I update one Cat IOS to the latest version and we are still experiencing issues. I've also update to the latest patches on CM (3.3.2). I've noticed this issue is more prevelant on the forums and dont' see any solutions. Anyone have any luck?
If everything you've said here is true, you almost certainly have a signaling QoS issue. Here's how to break it down:
> Receiving/placing calls
> Picking up the handset to get dialtone
> Transferring calls
If we assume (and this is a big assumption) that we have exactly one root cause, then the dialtone issue is the key. Picking up the handset to get dialtone involves no gateways, no PSTN circuits, no media streaming. It consists of the phone interacting with CallManager using SCCP signaling, CallManager doing some limited dialplan analysis (remember you can have
* The phone(s)
* The network in-between the phone and CallManager
Now let's take another one of your statements:
> They delay does not occur on the local LAN, just the remote sites.
Assuming again (but I think this is reasonable) that you have the same types of phones running the same firmware loads at your host site as you do at the remotes, you can probably eliminate the phones. Also, this lets us eliminate the CallManager - CPU utilization or other issues would affect all phones more or less equally. In your case, the remote sites have the issue and the host site is fine. Our problem domain now consists of:
* The network in-between the phone and CallManager
This seems kind of the obvious place to look, given that only the remotes are affected. Okay, so what could be the problem? You're not complaining of any voice quality problems, so you probably have some sort of QoS set up (or have lots of extra capacity). Are you sure that you're doing QoS properly for signaling traffic?
Depending on your settings and version of CallManager, signaling may take place with DSCP AF31 or CS3. Make sure you don't have switches configured to not trust (rewrite down) the CallManager's DSCP, and make sure the phones at the remote site are having their DSCP preserved as well. Use a sniffer to verify this.
Make sure your QoS configuration is actually catching signaling traffic. Check this with "show policy-map interface". Make sure you have enough bandwidth set aside for signaling. I've seen people configure as little as 16kb/s for signaling on lower speed circuits; if you have a number of phones at your remote site you can eat a lot of that up just with keepalives! If the circuit is relatively small and under load, your signaling traffic may be getting policed down too low.
Thanks for your feedback!
Just to clarify a few things. All phones are running the same load and CallManager has no CPU issues. We do have plenty of bandwidth between sites.
I'm including a sh run of our 4224. Can you see if anything in the config could be causing my issues?
I'm not that familiar with DSCP AF31 or CS3. How can I verify that my switches are not configured to not trust the CM's DSCP? What will on be looking for when I use the sniffer?
Also, I don't have any policy-map's configured. Please take a look at my config. Very much appreciated!
Your 4224 configuration has no obvious problems. One thing I question is using a DSCP of CS5 rather than EF for voice media streams. I'm assuming you had a specific reason for that. The dial-peers should default to AF31 or CS3 for signaling, so that should be fine.
I'm more interested in the devices that connect your remote sites to your host site. Can you post the configuration from your remote site WAN router, and your host site router as well? Also, we kind of need you to quantify what "plenty of bandwidth" means. What speed circuits are they, and what sort of technology/topology? Point to point T1, frame-relay, ATM, IP VPN/MPLS, VPN over the public internet, etc.
For all IPT remote offices we have Point to Point T1's that are connected with Cisco 2600 routers each with 1.54 MB. I will attach the config for one of our remote sites. Thanks again!
It looks like you're doing your classification based on precedence rather than DSCP, which is kind of the old way to do it but it'll work. You're also specifying some ports, which should be redundant but we'll let that go. But the real problem is that you appear to have accidentally pasted together two unrelated access lists and used them for QoS.
On both the host and remote, you have a single access-list 101 which appears to be used both for "interesting" dialer traffic and for QoS classification into the VoIP-Control MQC bucket. I'm guessing this was a cut/paste error of some sort. The effect is that access-list 101 permits all IP traffic except EIGRP, which has the further effect that all traffic except EIGRP is classed into the 16k bandwidth VoIP-Control bucket. You effectively have no signaling QoS at all, because all the rest of your traffic is lumped in with it by the "permit ip any any" ACL statement.
Before making any changes, save the output of "show policy interface" on your WAN routers. You should see a very large amount of traffic going into the VoIP-Control bucket and none at all into the default bucket.
Remove access-list 101 and rebuild it into two different access lists. Build one ACL for your interesting traffic dialer-list if you still need it, and build another for QoS classification. Apply the QoS ACL to the VoIP-Control class-map. Just for good measure, adjust the policy-map for VoIP-Control to 128kb from 16kb. This won't do any harm to your network; it won't reserve it away or prevent other applications from using it in the normal case. It just makes sure there's plenty of bandwidth available to VoIP signaling traffic if you have network congestion.
After the change, check the output of "show policy interface" again. You should see most of your normal network traffic falling into the default bucket, and a much smaller rate of traffic accumulating in the VoIP-Control bucket.
Do you have any examples for implementing the new ACL for the Qos classification. I want to make sure the ACL is implemented correctly. Thanks again!
Let's get you to the point of trusting DSCP (the new version of IP precedence) rather than using ACLs. Remove your existing QoS map classes and the service policy statement on your Serial interface with this, on the routers at both ends of the T1, and replace it with this:
class-map match-any VoIP-Control
match ip dscp af31
match ip dscp cs3
class-map match-any VoIP-RTP
match ip dscp ef
service-policy output QoS-Policy
This eliminates the need for ACL 101, unless you're still using it for a dialer-list (which I didn't see, but you edited the configs).
I'm seeing some variance on your routers for QoS settings on your dial peers, which determines what QoS the router will mark on traffic coming from that dial peer. Let's make that consistent. On each of your "dial-peer XXX voip" peers, configure the following:
dial-peer voice XXX voip
ip qos dscp ef media
ip qos dscp cs3 signaling
The "ip qos dscp ef media" line will likely not appear in the config after you set it, because it's the default. I'm mainly listing it because you have it explicitly configured to cs5 in a few places I saw and we want to get you back to the default, which is more correct. The default DSCP for signaling is AF31, and that would work fine, but lately Cisco has been recommending the use of CS3 so we'll go that way.
You need to make this change to every router on your network that has dial-peers. Your router that is configured with "ip precedence" might have old code that doesn't support using DSCP in dial-peers. If it won't take those commands, don't worry about it, the defaults should still be fine.
This configuration should get you to a point where the voice gateways are marking traffic correctly and the routers are treating it correctly in terms of queueing and priority. There could still be potential issues at the host side re: the 3550 or 4000 switches rewriting COS/DSCP to zero depending on how they're configured, but let's see where we're at with these changes first.
Use the "show policy interface" command I gave in my post above to make sure that traffic is actually falling into the right buckets. Test signaling traffic by having someone at the remote site pick up a handset and toggle the switch-hook repeatedly. This is a quick and dirty way to generate some signaling traffic. You should see the counters on that signaling queue go up relatively rapidly on both sides of the T1. Obviously, signaling should be more consistent and responsive if this was the fix.
You need to do 'match-any' rather than 'match-all' for your QoS class-maps. AF31 and CS3 are mutually exclusive; you'll never match both. Once you do that, you should hopefully see some stuff fall into the VoIP-Control bucket. Be sure to apply this configuration to your remote(s) as well as the host so that QoS works the other way too. Let us know how it goes!
I made the change and now Qos is working properly. Thanks! I've included the sh policy interface. I am doing some testing with users now (pick up/hang up tests). Can you think of anything else besides Qos (or lack there of) that could be causing this issue? Again, I really appreciate your assitance!
Service-policy output: QoS-Policy
Class-map: VoIP-RTP (match-all)
9005 packets, 1835430 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Match: ip dscp ef
Output Queue: Conversation 264
Bandwidth 768 (kbps) Burst 19200 (Bytes)
(pkts matched/bytes matched) 1067/177058
(total drops/bytes drops) 0/0
Class-map: VoIP-Control (match-any)
7683 packets, 400095 bytes
5 minute offered rate 1000 bps, drop rate 0 bps
Match: ip dscp af31
7683 packets, 400095 bytes
5 minute rate 1000 bps
Match: ip dscp cs3
0 packets, 0 bytes
5 minute rate 0 bps
Output Queue: Conversation 265
Bandwidth 128 (kbps) Max Threshold 64 (packets)
(pkts matched/bytes matched) 213/11169
(depth/total drops/no-buffer drops) 0/0/0
Class-map: class-default (match-any)
236108 packets, 166146683 bytes
5 minute offered rate 6000 bps, drop rate 0 bps
Flow Based Fair Queueing
Maximum Number of Hashed Queues 256
(total queued/total drops/no-buffer drops) 0/0/0
I can't really think of anything else it can be. It virtually has to be signaling QoS causing the problem since it's only happening for the remotes and there isn't a voice quality problem. The old configuration of putting everything into the 16k bandwidth queue would have actually been worse than no QoS configuration at all, since you would default to weighted fair queueing and the bandwidth queue itself I believe is FIFO. When you do your testing, do it with and without extra load on the circuit - see how responsive the phones are when you're downloading large files at the remote site.
Please do let us know if it fixes the problem for you. Also, please do remember to set "Right Info" ratings and/or the check the Solve checkbox for posts from people that helped you in this and other matters; it highlights the posts for other NetPro users with the same problem and also keeps our fragile egos inflated. :)
Qos is setup on all routers and we had a couple delay issues today. Here they are...
1) Had a delay with client hearing us and us seeing the 'connecting' on the display.
2) Call was received and after hitting the "transfer" button there was a 5 second delay and the client ended hanging up.
Would a delay issue affect when a user receives a call and hits transfer button? Also, I talked to a few users today and they said occasionally there will be echo's and voice distortion. I know there are many possible causes of this but I wanted to bring this up. The echo can happen internally and externally (through PSTN) with the IP user being the one that hears their own voice (not the client). We also have a VG200 that is connected to a PBX (half offices are IP and other are PBX). Thanks again, you are really getting us on the right page.