cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1317
Views
16
Helpful
24
Replies

Intermittent Problem with Frame Relay Circuits

incanus555
Level 1
Level 1

Here's the problem: Every few hours or so, we stop receiving packets on either one of two Serial interfaces on our Cisco 2610 router. However, the interface line protocol stays "up" the whole time. Reloading the router fixes the problem temporarily, as does "shutdown" then "no shutdown" on the interface in question. Our ISP and Frame Relay providers (separate entities) said it was probably our router, so we bought a new one, and it's still happening. Now, both said entities are pointing fingers at each other instead of fixing the problem. Can anyone help? I can provide any information that I can get from our router, and some from those other two parties involved. I may be able to get one or both of them to join in this discussion. This is a serious issue for my company, and I'm out of ideas! Any help would be greatly appreciated.

-A. Bronson

24 Replies 24

dbellaze
Level 4
Level 4

When the problem occur's are the other interfaces on the router still responsive?

Can you post the following info when you are having the problem.

show int ser x

show frame pvc

sh frame lmi

show proc cpu

show service-module

These would be helpful to rule out IOS or config problems.

show ver

show run

Daniel

When the problem occurs, it seems to only affect one interface or the other, in no particular pattern. Packets input on this interface suddenly stop incrementing. Packets output slow to a fraction of what they were. We still see a normal flow of packets both in and out of the other interface, though, so it appears to be working. Normal for us is about:

5 minute input rate 1048000 bits/sec, 138 packets/sec

5 minute output rate 365000 bits/sec, 109 packets/sec

I waited until the next time the line went down and this is the output of the commands you requested. I didn't catch it right away, so I'm not sure how long it's been down for. No more than 1 hour or so.

(see attachments due to size)

These are just the stats for the affected interface. Let me know if the other (working) interface stats are needed as well. I will post show ver and show run outputs if necessary also.

The only thing I can really see is that your interface has had 19 resets, other than that you haven't missed many keep alives or anything.

Can you post your sh ver and configuration (all of it). You can change real IP's if you'd like.

Daniel

OK, I've attached the output of show ver and show run, as requested. I replaced the IPs and passwords with *'s, and I left out the access lists, since they are huge. I've tried removing the ACLs all together for a while and the problem still happened, so they should be fine.

(see attachment)

Thank you for helping. I hope we can figure this out.

Just curious, but why are you bridging on your circuits?

Daniel

I asked our ISP the same question when I began trying to figure this mess out. Apparently it's because we go frame relay to Verizon, then ATM to our ISP. It is my understanding that it was necessary to use bridged virtual interfaces to do this, although I did not configure the router myself. Unfortunately, I'm not well-versed in the WAN department.

Yeah I don't understand this either why you have bridged interfaces. I would get rid of the bridge groups and put the IP address directly on your point-to-point links on your router and see if that works for you. Ensure that you have out of band management of your D/E router berfore changing the configuration. Tell us how that works out for you.

Just to clear that up, the interface resets are from a script that i'm running which automatically shuts down the interface and restarts it when it sees packets/second drop to 0.

smif101
Level 4
Level 4

Does this problem happen everyday? What version of IOS are you running on both routers? What is the config of both router interfaces? Is this a new circuit or was the circuit been up for a while and now all of a sudden this started happening? Have you did an extended end to end test with a firebird yet? If not that would be your next step and from there you should be able to pinpoint the problem.

To answer your questions: Yes, this happens every day. Several times a day, actually. The config for the interfaces can be seen on my last post, in the attached file. This is not a new circuit. We have had it for a few years, and have had it go down like this once or twice before, about a year ago. We never figured out what was causing it to happen, nor did our Frame Relay provider. It just started working again. Now, the problem is back with a vengeance, going down every few hours. I have replaced our router (with same model, more memory), and replaced the cables that connect the two interfaces on the router to the Frame Relay provider's "smart jack". The jack and everything else in that direction is their equipment, and they say they've replaced everything they can between my location and their central office.

What is this firebird you speak of? If it will help pinpoint the problem, I'd like to try it. But we don't have much testing equipment. The best we have is a Compas tester.

Such issues are very difficult to tackle because there are so many people involved.

Since you and your providers say you have tried everything,

I will point towards a problem inside the FR/ATM core network.

This could be a problem on the ATM side not well communicated towards the FR side.

FR/ATM interworking is not perfect.

I suppose you do not have much control on the ATM side,

since you post only information about FR endpoints.

Your PVC consists of many little pieces that could switch between different paths

inside the FR/ATM network for various reasons (physical circuit or cabling problems,

card failures, improper clocking settings, and who knows what else).

This rerouting is done to protect PVCs from long outages, and if it occurs,

it is a good sign that your provider is struggling for availability.

Such reroutings are usually massive, so they can't go unnoticed.

Because many circuits are rerouted, this can take time and result

in short outages. If your PVC cannot be rerouted

because of lack of available bandwidth, you will experience longer outage.

When the failure condition is cleared, PVCs could be switched back to their

original path (either manually, automatically, or scheduled) and you might

see another short interruption in your service.

Ask your provider's customer support if they see failures of your PVCs

or frequent reroutings of your PVCs inside their network core.

(Supply them with some details about the time that this occurs,

so they can search their log files.) If they cannot answer you this,

they should escalate the problem to the people who can.

If your PVCs are indeed frequently rerouted or failed, you can ask them to

temporarily force your PVCs through the most reliable path (if one exists),

until they resolve the root cause of the rerouting or failure to reroute events

(either inside their FR/ATM equipment, or together with their physical circuit provider).

Remember afterwards to tell them to let your PVCs free to twist between paths

for the future case of failures in the currently reliable path.

They could have many PVCs and forget the gave you some specialized treatment.

I sympathize with your problem (and the providers ;-).

Good luck!

Thanks for your help and suggestions, guys. Unfortunately, we still haven't figured it out. One thing we are having problems understanding is this line in the serial interface stats:

5 minute input rate 90000 bits/sec, 0 packets/sec

How can we have ~10k/second in but 0 packets/second in?

incanus555
Level 1
Level 1

To make it easier for anyone who would like to help me here, I've made a file full of my router's stats during normal operation (that is, between occurences of the problem). It is attached.

Thanks for your help!

When did you last upgrade? I see you have an image from August.

So, you are using BVI + Bridge since the frame relay circuit demand bridge packets? What is on the other side of the link? Another Cisco? Or internet access?

When it fails, can you check ALL of your inetrfaces (BVI included) and see if the interface queues are "full" (eg 76/75). There are some bugs where packets get "stuck" in queue and eventually stop traffic. Usually only a reboot, not a shut/no shut will solve the problem.

An example of a recent queue-filling bug is

http://www.cisco.com/en/US/products/products_security_advisory09186a00803448c7.shtml

Though that particalar one does not affect your version of software.

When did it start happening? What changed?

Review Cisco Networking for a $25 gift card