We've got a cisco 2821 router which periodically stops routing all traffic. It seems to happen about once every 2 weeks, and I can't find anything that could be causing it. There are no entries in the log and the router stays up and running but requires a restart to begin processing traffic again. We're running 12.4(13r)T11.
Any thoughts, or troubleshooting steps to track this down?
We've got a single external WAN link running @ 100Mb/s full duplex. Here's the relevant config data:
ip address x.x.x.x 255.255.255.248
no ip unreachables
no ip proxy-arp
ip nbar protocol-discovery
ip flow ingress
ip flow egress
ip nat outside
ip route-cache flow
no mop enabled
One configuration change you can make to provide more stability is to replace the default route with a true next hop address. Something like this:
ip route 0.0.0.0 0.0.0.0 a.b.c.d
no ip route 0.0.0.0 0.0.0.0 gig0/0
Where a.b.c.d is the actual IP address of the next hop.
During the failure can you access the router via the LAN interface? If not can you access it via the console?
Once you have access while the failure is occurring execute a:
show ip interface brief -or- show interface
Are all the interfaces up/up that should be?
Does a 'show ip route' result in the default route being displayed in the routing table.
Does a 'show arp' result in the router being able to resolve a mac address for the ip address of the default gateway?
Can you ping the default gateway?
Start with these questions and move out from there.
Thanks Chris, I'll give the next hop fix a go. We're (fortunately) not down now so I'll have to wait until next time to troubleshoot this stuff further. Thanks for your help!
I believe that it is a very good suggestion from Chris. Part of what happens when you have a static default route pointed at an Ethernet interface, as you have, is that the router must ARP for every destination to which it forwards traffic. And once the router puts a destination into the ARP table it will keep that entry, and keep renewing that entry. When you boot the router the ARP table is very small. But as you forward to more destinations in the Internet the table gets larger. After two weeks I wonder how large the table has gotten? I wonder if you are running into some constraint that prevents adding more destinations to the ARP table?
I believe that the best long term solution is to change the static default route as suggested by Chris (and +5 to him for the suggestion). If you are interested in doing some testing to see if my theory of the problem is correct then I have a couple of suggestions:
- when you start to have the problem turn on debug arp. make sure that your logging mechanism (logging buffered or logging monitor) is set for the debug level. look in the log for what happens when the router arps for a destination. is there some error?
- clear the arp cache. If my theory is correct the router should start forwarding traffic again.
Thanks for the detailed explanation, Richard. I think you're absolutely right-on regarding the ARP table growing too large. It fits right inline with my original theory, thinking it may be due to our large number of NAT hosts internally.
Hi Chris, Thanks a lot for the good suggestion u gave Ben, and thank you Richard, for the enlightment you brought to this matter. And thanks Ben for bringing up that matter wich I've been facing for weeks now with the same 2821 cisco router in my company, with no solution. I was stuck to restarting the router every 4 to 5 hours to restore the service (That is probably because my memory (256Mb) is smaller than yours. Any time I entered the show arp command, the router would display and endless arp table. Thanks to the different suggestions brought up here in this discussion, I just solved my problem, simply with two commands: ip route 0.0.0.0 0.0.0.0 x.x.x.x (next hop's IP address) and no ip route 0.0.0.0 0.0.0.0 gigabitethernet0/0. And after 30 minutes the arp table contains only 8 entries. Yupiiiiiie!! I can rest now!lol