T3 suddenly has 38ms latency

philip.r.hayes · ‎08-24-2009

Hello,

I am a new employee and I'm looking at this for the first time.

We have two 3825's running 12.3(14)T3, with a "Subrate T3/E3 port" connection between them.

The remote office is complaining about the speed of certain application dropping down suddenly. It's only this one office and it's affecting all applications. As far as we can tell, no changes have been made to either device and there are no new applications that have been added to the desktops in the remote office.

I cleared the counters on both ends and for several days they have been running clean. Yet, ping times remain at a steady 38ms (bandwidth utilization is about 10-20%).

Here's the only thing I can find:

R1:

card type t3 1

controller T3 1/0

cablelength 10

R2: (remote office)

card type t3 1

controller T3 1/0

cablelength 210

When I did a 'show controllers t3', both routers show the same thing:

Framing is C-BIT Parity, Line Code is B3ZS, Clock Source is Internal

The only thing that pops out is the cable length setting and it's been set that way for years. I can't seem to find out much about that setting.

Anyone?

paolo bevilacqua · ‎08-24-2009

You should have clock source line.

Cable lenght introduces attenuation, sometime needed.

Also check "show interface and show controllers T3" for errors.

philip.r.hayes · ‎08-25-2009

'Show interface' and 'show controllers t3' are coming up clean. Once AT&T was done monitoring (and they didn't find anything), I cleared the counters.

Joseph W. Doherty · ‎08-24-2009

It's a bit unclear, besides the remote office complaining about an application's speed, ping times have always been about 38 ms or now they're 38 ms? If the latter, what were they?

Re: bandwidth utilization is only about 10 to 20%; measured how and over what time period?

What's the amount of available bandwidth?

Re: counters are clean, all counters?

What queuing method are you currently using on the interaces?

PS:

Also curious, is there a reason you're still running a "T" train version?

philip.r.hayes · ‎08-25-2009

As far as I can tell, it was not always 38ms. BW is 44210.

I cleared all counters once I got brought into it. The problem started in the week prior to me starting my job; ergo, a lot of network monitoring (snmp polling) was not in place.

CEF is enabled on both routers. Otherwise, there is no queuing configured (default).

Not sure what "train version" means.

vmiller · ‎08-25-2009

Different applications behave differently over the wan. Once you clear the counters and get a sense of real time behavior and utiliziation, I'd suggest a packet capture for the app that gets the most complaints.

philip.r.hayes · ‎08-25-2009

The first thing to consider is that across the board, ping time is 38ms; constantly.

However, now that I have snmp polling working, I see "Output discards" on the remote router that show:

Output Requests: 292791

Output Discards: 109154

That pretty much makes me think that 'cef' is at the heart of the problem.

I also just found that doing a "sho int ser1/0", both sides show:

DSU mode 0, bandwidth 44210, real bandwidth 44210, scramble 0

That makes me think that both are set as DSU's(?). If that's wrong, I'm surprised it works at all.

paolo bevilacqua · ‎08-25-2009

It's not wrong. Clock internal on both sides is certainly wrong.

But, nothing can be said until you post the complete output of show controllers t3 and show interrface.

philip.r.hayes · ‎08-25-2009

Here are the two outputs from each router: (attached files)

My past experience has been to set one as DSU and the other to CSU but I had a hard time searching cisco.com for whether it was the same or different for their t3 card. I've always done t1's, PRI's, etc. The weird part is that other than the remote side complaining about speed, this is working and as you can see from the output below, there are no errors. Also, this link has been in use for quite some time without any problems. Go figure.

BTW: I appreciate the support of this forum!!

paolo bevilacqua · ‎08-25-2009

There is no "CSU Vs DSU" mode settings for T3. Actually, neither for T1 there is, but some T1 hardware had CSU/DSU functionality embedded, while other do not. What that difference actually is, is not pertinent to this discussion.

The show commands you've included do not show any problem, although as I mentioned having clock source internal on both sides is always and definitely wrong.

Try to remove service-policy from the T3 interface and most important have a check with your eyes on the alleged slowdowns.

philip.r.hayes · ‎08-26-2009

I believe this is the problem. When this circuit was brought up, no clock timing was set. What I'm guessing is that AT&T's internal settings or provisioning may have been changed. Even if the change was non customer impacting, it may have been just enough to trigger what may have been a 25ms delay to a 38ms delay. With 1/3 of the packets being dropped from 'cef' and sent to process switching (1 packet per second) because if each side of the circuit is doing some kind of frame synching, or buffering, that could explain what's been happening. I'll reply back after making the changes.

vmiller · ‎08-26-2009

Your reports of drops would be consistent with timing slips. with each end doing its own timing, things can "appear" to work.

paolo bevilacqua · ‎08-26-2009

Thing is, this theory is not supported by evidence, clock problems do show as errors in show controller and show interface (not CEF drops), but you have none.

On the other hand, CEF drops can be malformed packets, may one or more PCs got infected and are sending rogue traffic.