Re: Delay when answer/place calls

CHRIS KALETH · ‎04-29-2004

We have been experiencing a 2-5 second delay for the following:

Receiving/placing calls

Picking up the handset to get dialtone

Transferring calls

If I answer a call and start saying 1-2-3-4-5 the other party will start hearing the conversation when I say 5. This kind of delay isn't as typical as a constant 2 second delay.

We are running a centralized CM 3.3(2) (with latest SP). Also we are using Cat4224 in our 3 remote offices. I've worked with the TAC to try and capture the delays but they are not seeing anything. Also, we've ruled out a high CPU on the CM.

This seems to be a rare issue but it is obviously very frustrating for our users.

Daniel Baum · ‎05-01-2004

We'll need more information.

Some questions for you.

Where does the delay NOT occur, in contrast, where specifically does the delay occur and provide specifics, VLANs, QoS etc.

Has this delay always been evident since the CallManager installation or is it a new(er) issue?

Is the delay IP Phone to IP Phone, IP Phone to PSTN Gateway or BOTH? Code levels of the gateways, etc.

If the delay occurs on the LAN that CallManager is connected, will need more information regarding the LAN infrastructure and the models of the servers in the CallManager cluster and what the NIC of the CCM servers are set too, ie; 100/Full, etc.

If the delay effects only the REMOTE sites, will need more information of the WAN design and routing architecture, the remote LAN infrastructure, VLAN's, etc.

If the delay effects BOTH LAN and Remote/WAN, let's first remedy the LAN/CallManager delay and then reach out to the edge.

This document is very good for AVVID Design:

http://www.cisco.com/application/pdf/en/us/guest/netsol/ns17/c649/ccmigration_09186a00800d67ed.pdf

Voice Quality Support Page:

http://www.cisco.com/pcgi-bin/Support/browse/psp_view.pl?p=Technologies:Voice:QoS&s=Verification_and_Troubleshooting

Dan

CHRIS KALETH · ‎05-21-2004

They delay does not occur on the local LAN, just the remote sites. This has been an issue for about 1 year now and happens to about 25-30% of the calls. The delay happens more with PSTN to IP calls and also internal PBX (through a VG200) to IP. Each remote office has a Cat4224. I update one Cat IOS to the latest version and we are still experiencing issues. I've also update to the latest patches on CM (3.3.2). I've noticed this issue is more prevelant on the forums and dont' see any solutions. Anyone have any luck?

jasyoung · ‎05-21-2004

If everything you've said here is true, you almost certainly have a signaling QoS issue. Here's how to break it down:

> Receiving/placing calls

> Picking up the handset to get dialtone

> Transferring calls

If we assume (and this is a big assumption) that we have exactly one root cause, then the dialtone issue is the key. Picking up the handset to get dialtone involves no gateways, no PSTN circuits, no media streaming. It consists of the phone interacting with CallManager using SCCP signaling, CallManager doing some limited dialplan analysis (remember you can have dial patterns for PLAR), and CallManager signaling the phone to start playing local dialtone. That isn't streaming audio from anywhere, that's the phone generating dialtone locally based on SCCP signaling instructions from CallManager. Our problem domain now consists of:

* The phone(s)

* CallManager

* The network in-between the phone and CallManager

Now let's take another one of your statements:

> They delay does not occur on the local LAN, just the remote sites.

Assuming again (but I think this is reasonable) that you have the same types of phones running the same firmware loads at your host site as you do at the remotes, you can probably eliminate the phones. Also, this lets us eliminate the CallManager - CPU utilization or other issues would affect all phones more or less equally. In your case, the remote sites have the issue and the host site is fine. Our problem domain now consists of:

* The network in-between the phone and CallManager

This seems kind of the obvious place to look, given that only the remotes are affected. Okay, so what could be the problem? You're not complaining of any voice quality problems, so you probably have some sort of QoS set up (or have lots of extra capacity). Are you sure that you're doing QoS properly for signaling traffic?

Depending on your settings and version of CallManager, signaling may take place with DSCP AF31 or CS3. Make sure you don't have switches configured to not trust (rewrite down) the CallManager's DSCP, and make sure the phones at the remote site are having their DSCP preserved as well. Use a sniffer to verify this.

Make sure your QoS configuration is actually catching signaling traffic. Check this with "show policy-map interface". Make sure you have enough bandwidth set aside for signaling. I've seen people configure as little as 16kb/s for signaling on lower speed circuits; if you have a number of phones at your remote site you can eat a lot of that up just with keepalives! If the circuit is relatively small and under load, your signaling traffic may be getting policed down too low.

CHRIS KALETH · ‎05-22-2004

Thanks for your feedback!

Just to clarify a few things. All phones are running the same load and CallManager has no CPU issues. We do have plenty of bandwidth between sites.

I'm including a sh run of our 4224. Can you see if anything in the config could be causing my issues?

I'm not that familiar with DSCP AF31 or CS3. How can I verify that my switches are not configured to not trust the CM's DSCP? What will on be looking for when I use the sniffer?

Also, I don't have any policy-map's configured. Please take a look at my config. Very much appreciated!

jasyoung · ‎05-22-2004

Your 4224 configuration has no obvious problems. One thing I question is using a DSCP of CS5 rather than EF for voice media streams. I'm assuming you had a specific reason for that. The dial-peers should default to AF31 or CS3 for signaling, so that should be fine.

I'm more interested in the devices that connect your remote sites to your host site. Can you post the configuration from your remote site WAN router, and your host site router as well? Also, we kind of need you to quantify what "plenty of bandwidth" means. What speed circuits are they, and what sort of technology/topology? Point to point T1, frame-relay, ATM, IP VPN/MPLS, VPN over the public internet, etc.

CHRIS KALETH · ‎05-22-2004

For all IPT remote offices we have Point to Point T1's that are connected with Cisco 2600 routers each with 1.54 MB. I will attach the config for one of our remote sites. Thanks again!

Remote Router Host Router <> Cat3550 Backbone Switch <> Cisco 2600 <> Cat4000 (Call Manager connected to Cat4000)

jasyoung · ‎05-22-2004

It looks like you're doing your classification based on precedence rather than DSCP, which is kind of the old way to do it but it'll work. You're also specifying some ports, which should be redundant but we'll let that go. But the real problem is that you appear to have accidentally pasted together two unrelated access lists and used them for QoS.

On both the host and remote, you have a single access-list 101 which appears to be used both for "interesting" dialer traffic and for QoS classification into the VoIP-Control MQC bucket. I'm guessing this was a cut/paste error of some sort. The effect is that access-list 101 permits all IP traffic except EIGRP, which has the further effect that all traffic except EIGRP is classed into the 16k bandwidth VoIP-Control bucket. You effectively have no signaling QoS at all, because all the rest of your traffic is lumped in with it by the "permit ip any any" ACL statement.

Before making any changes, save the output of "show policy interface" on your WAN routers. You should see a very large amount of traffic going into the VoIP-Control bucket and none at all into the default bucket.

Remove access-list 101 and rebuild it into two different access lists. Build one ACL for your interesting traffic dialer-list if you still need it, and build another for QoS classification. Apply the QoS ACL to the VoIP-Control class-map. Just for good measure, adjust the policy-map for VoIP-Control to 128kb from 16kb. This won't do any harm to your network; it won't reserve it away or prevent other applications from using it in the normal case. It just makes sure there's plenty of bandwidth available to VoIP signaling traffic if you have network congestion.

After the change, check the output of "show policy interface" again. You should see most of your normal network traffic falling into the default bucket, and a much smaller rate of traffic accumulating in the VoIP-Control bucket.

CHRIS KALETH · ‎05-24-2004

Do you have any examples for implementing the new ACL for the Qos classification. I want to make sure the ACL is implemented correctly. Thanks again!

jasyoung · ‎05-24-2004

Let's get you to the point of trusting DSCP (the new version of IP precedence) rather than using ACLs. Remove your existing QoS map classes and the service policy statement on your Serial interface with this, on the routers at both ends of the T1, and replace it with this:

class-map match-any VoIP-Control

match ip dscp af31

match ip dscp cs3

class-map match-any VoIP-RTP

match ip dscp ef

policy-map QoS-Policy

class VoIP-RTP

priority 768

class VoIP-Control

bandwidth 128

class class-default

fair-queue

interface Serial0/0

service-policy output QoS-Policy

This eliminates the need for ACL 101, unless you're still using it for a dialer-list (which I didn't see, but you edited the configs).

I'm seeing some variance on your routers for QoS settings on your dial peers, which determines what QoS the router will mark on traffic coming from that dial peer. Let's make that consistent. On each of your "dial-peer XXX voip" peers, configure the following:

dial-peer voice XXX voip

ip qos dscp ef media

ip qos dscp cs3 signaling

The "ip qos dscp ef media" line will likely not appear in the config after you set it, because it's the default. I'm mainly listing it because you have it explicitly configured to cs5 in a few places I saw and we want to get you back to the default, which is more correct. The default DSCP for signaling is AF31, and that would work fine, but lately Cisco has been recommending the use of CS3 so we'll go that way.

You need to make this change to every router on your network that has dial-peers. Your router that is configured with "ip precedence" might have old code that doesn't support using DSCP in dial-peers. If it won't take those commands, don't worry about it, the defaults should still be fine.

This configuration should get you to a point where the voice gateways are marking traffic correctly and the routers are treating it correctly in terms of queueing and priority. There could still be potential issues at the host side re: the 3550 or 4000 switches rewriting COS/DSCP to zero depending on how they're configured, but let's see where we're at with these changes first.

Use the "show policy interface" command I gave in my post above to make sure that traffic is actually falling into the right buckets. Test signaling traffic by having someone at the remote site pick up a handset and toggle the switch-hook repeatedly. This is a quick and dirty way to generate some signaling traffic. You should see the counters on that signaling queue go up relatively rapidly on both sides of the T1. Obviously, signaling should be more consistent and responsive if this was the fix.

CHRIS KALETH · ‎05-26-2004

I've added the host router config again and also the sh policy interface command. I cleared the counters and had a user pick up/hang up about 20 times and I did not see any signaling traffic. The VoIP-Control should see packets for signaling correct?

jasyoung · ‎05-26-2004

You need to do 'match-any' rather than 'match-all' for your QoS class-maps. AF31 and CS3 are mutually exclusive; you'll never match both. Once you do that, you should hopefully see some stuff fall into the VoIP-Control bucket. Be sure to apply this configuration to your remote(s) as well as the host so that QoS works the other way too. Let us know how it goes!

CHRIS KALETH · ‎05-26-2004

I made the change and now Qos is working properly. Thanks! I've included the sh policy interface. I am doing some testing with users now (pick up/hang up tests). Can you think of anything else besides Qos (or lack there of) that could be causing this issue? Again, I really appreciate your assitance!

Service-policy output: QoS-Policy

Class-map: VoIP-RTP (match-all)

9005 packets, 1835430 bytes

5 minute offered rate 0 bps, drop rate 0 bps

Match: ip dscp ef

Queueing

Strict Priority

Output Queue: Conversation 264

Bandwidth 768 (kbps) Burst 19200 (Bytes)

(pkts matched/bytes matched) 1067/177058

(total drops/bytes drops) 0/0

Class-map: VoIP-Control (match-any)

7683 packets, 400095 bytes

5 minute offered rate 1000 bps, drop rate 0 bps

Match: ip dscp af31

7683 packets, 400095 bytes

5 minute rate 1000 bps

Match: ip dscp cs3

0 packets, 0 bytes

5 minute rate 0 bps

Queueing

Output Queue: Conversation 265

Bandwidth 128 (kbps) Max Threshold 64 (packets)

(pkts matched/bytes matched) 213/11169

(depth/total drops/no-buffer drops) 0/0/0

Class-map: class-default (match-any)

236108 packets, 166146683 bytes

5 minute offered rate 6000 bps, drop rate 0 bps

Match: any

Queueing

Flow Based Fair Queueing

Maximum Number of Hashed Queues 256

(total queued/total drops/no-buffer drops) 0/0/0

jasyoung · ‎05-26-2004

I can't really think of anything else it can be. It virtually has to be signaling QoS causing the problem since it's only happening for the remotes and there isn't a voice quality problem. The old configuration of putting everything into the 16k bandwidth queue would have actually been worse than no QoS configuration at all, since you would default to weighted fair queueing and the bandwidth queue itself I believe is FIFO. When you do your testing, do it with and without extra load on the circuit - see how responsive the phones are when you're downloading large files at the remote site.

Please do let us know if it fixes the problem for you. Also, please do remember to set "Right Info" ratings and/or the check the Solve checkbox for posts from people that helped you in this and other matters; it highlights the posts for other NetPro users with the same problem and also keeps our fragile egos inflated. :)

CHRIS KALETH · ‎05-26-2004

Qos is setup on all routers and we had a couple delay issues today. Here they are...

1) Had a delay with client hearing us and us seeing the 'connecting' on the display.

2) Call was received and after hitting the "transfer" button there was a 5 second delay and the client ended hanging up.

Would a delay issue affect when a user receives a call and hits transfer button? Also, I talked to a few users today and they said occasionally there will be echo's and voice distortion. I know there are many possible causes of this but I wanted to bring this up. The echo can happen internally and externally (through PSTN) with the IP user being the one that hears their own voice (not the client). We also have a VG200 that is connected to a PBX (half offices are IP and other are PBX). Thanks again, you are really getting us on the right page.