02-20-2009 03:17 AM - edited 03-04-2019 03:39 AM
Hello,
I have a strange problem with several BGP peerings on the same 12008 router, apparently caused by underlying L2 issues.
We are connected by means of a port-channel to a switch not owned by our company.
Please follow the output below:
sho proc cpu
CPU utilization for five seconds: 3%/0%; one minute: 22%; five minutes: 18%
sho ip bgp summ | i 195.69.145.117
195.69.145.117 4 41420 142685 143545 0 0 0 10:17:47 Active THE BGP SESSION IS STUCK FOR 10 hrs now.
sho arp | i 195.69.145.117
Internet 195.69.145.117 0 000c.db1f.a400 ARPA Port-channel1
ping 195.69.145.117
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 195.69.145.117, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
sho ip cef 195.69.145.117
195.69.145.117/32, version 26316038, epoch 0, connected, cached adjacency 195.69.145.117
0 packets, 0 bytes
Flow: AS 0, mask 23
via 195.69.145.117, Port-channel1, 0 dependencies
next hop 195.69.145.117, Port-channel1
valid cached adjacency
clear ip arp 195.69.145.117
ping 195.69.145.117
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 195.69.145.117, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms
sho arp | i 195.69.145.117
Internet 195.69.145.117 0 000c.db1f.a400 ARPA Port-channel1
sho ip bgp summ | i 195.69.145.117
195.69.145.117 4 41420 142688 143555 68197271 0 0 00:00:51 8 AND THE BGP SESSION IS ESTABLISHED AGAIN!
The router is running gsr-p-mz 12.0(32).S11 and apparently no references in the bug toolkit exist in regard to such problems.
There was a etherchannel/MAC problem before with a 12008 but it got resolved long time ago.
Is there a way to determine if this is actually a problem on the router itself or is it a problem with the non-cisco lan-infrastructure to which the router is connected to? I 've noticed that the ARP-entry's age is 0 minutes both before and after the clear which implies constant traffic coming from the peer's ip - maybe because the BGP peering is "active". I suspect it has something to do with the etherchannel and ARP/CEF on the router/
Does somebody have any idea what is happening here?
Thanks & Kind Rgds,
Dirk Versavel
02-20-2009 05:24 AM
Hello Dirk,
the ARP entry IP address and MAC address is not changed after clear ip arp so the old entry wasn't wrong or overwritten by a third party device.
What is the usage level of the member links of the port-channel ?
Do you see a fair distribution ?
What happens if you try to use an extended ping with a different source when direct ping fails ?
if you see traffic coming to you when the session is stucked it should be only the BGP attempts to restart the session.
It looks like that forwarding path from your neighbor to your router works but traffic from your router and in special mode the BGP packets session are lost somewhere.
What type of linecards are involved ?
How much memory is installed in the linecards ?
How many BGP routes are present in the routing table ?
and how many entries are in the CEF table?
the last ones have to be replicated on all linecards.
if volume traffic allows : what happens with only one active member link in the bundle ?
Hope to help
Giuseppe
02-20-2009 06:53 AM
Hi Giuseppe,
thanks man for your valuable contribution to this forum. I suspect the port-channeling of 2 distributed CEF cards imposes a huge burden on the CPU.
Moreover my BGP peerings are sometimes falling back to active and then te CPU peaks through the roof.
Maybe there is no other solution but upgrading the system to redundant 10Gig connections without port channeling.
You can find all info in the attached text file.
Thanks and have a nice weekend,
Dirk
02-23-2009 12:27 AM
Hi Giuseppe,
the system were our router is connected to uses an arp sponge system.
Problem is apparently that our router sometimes only sends out unicast arp requests (no broadcast) and does not reply on arp request in a timely manner. At that point the arp sponge takes over and responds with it's own mac-address, thus actively breaking the BGP session(s).
Cheers,
Dirk
02-23-2009 02:15 AM
Hello Dirk,
in this case you can fix it with a static ARP entry in your router for the eBGP neighbor and the other neighbor can do the same.
Hope to help
Giuseppe
02-23-2009 04:47 AM
Hi Giuseppe,
this would be indeed a solution but there are about 70 peers that we don't maintain so this option is not really scalable. However it would be suitable
for testing purposes for instance our peering organization claims that when they arpping our mac-addr they receive a reply for only 50% of the requests sent.
Cheers,
Dirk
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide