cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4627
Views
0
Helpful
13
Replies

ASR9K BGP issue

pgyogeshkumar
Level 1
Level 1

I have configured BGP in ASR9K and BGP status are established and the prefixes are learnt and even end to end reachability is fine.

However after certain amount of time, I could see that BGP state goes down and never comes up automatically.

After giving the below command, bgp neighbor is getting established again. 

clear bgp <peer id>

This is hampering the network performance. Can you please suggest the possible root cause for this issue which will help me in resolving this issue.

13 Replies 13

Philip D'Ath
VIP Alumni
VIP Alumni

What appears in the log when the BGP neighbour goes down?  This will probably be a big hint.

What does "show ip bgp nei ..." show when the issue is happening?

Have you got any limits configured for the peer, such as max number of prefixes or anything else?

Hi

I havent set neither peer limits nor prefix list.

But in show bgp neighbor command i could see some strange outputs for the reason it went down.

Last reset 1d06h, due to Shutdown during SEVERE low memory condition (CEASE notification sent - out of resource)
Peer shut down during SEVERE low memory condition.
Reduce memory usage and use 'clear bgp <Peer IP>' to restore peering

Uh - I think the answer is obvious.  It says you run out of memory in your router, so it shut down the BGP peer.

Can you post your BGP config relating to this peering session and we'll see if we can reduce the memory being used.  Either that or you get more memory for you ASR.

You aren't using redistribution between two routing protocols are you (and potentially creating a redistribution loop that sucks up all your memory)?

I guess it is also possible you have a buggy software version with a memory link.  What is the exact model of ASR you have and what software version are you running?

how frequently is the issue happening?

Could you please run the below commands:

Show memory compare start

wait for 4-5 minutes

Show memory compare end

Show memory compare report

This will help you figure out if there is any kind of memory leaking happening on the router and which are the top processes consuming more memory during those 4-5 minutes.

It may be possible that you might have to run with multiple iterations of this process to get concrete data but this should help.

- Also check out for blocked processes on the RP using show processes blocked command. This could lead to such issues. Check if for some reason, BGP / TCP related process is going into blocked state.

- Have noticed sometimes that Next-Hop information is not getting updated, which may cause the BGP process to consume more memory. This may be again due to a software defect.

Also, please share show processes bgp and show install active summary output.

I am hoping you have all the necessary SMU's installed that are recommended by Cisco for a certain release.

Regards

Vinit

Thanks
--Vinit

Apologies for late reply, below the outputs as asked, Not sure how to free the memory by stopping which process. I also tried restarting process 1048 which was for BGP

The ASR model i have is - 

cisco ASR9K Series (MPC8641D) processor with 4194304K bytes of memory.
MPC8641D processor at 1333MHz, Revision 2.2
ASR 9006 4 Line Card Slot Chassis with V1 AC PEM

4 Management Ethernet
40 GigabitEthernet
4 TenGigE
4 DWDM controller(s)
4 WANPHY controller(s)
219k bytes of non-volatile configuration memory.
977M bytes of compact flash card.
67988M bytes of hard disk.
1605616k bytes of disk0: (Sector size 512 bytes).
1605616k bytes of disk1: (Sector size 512 bytes).

-------------------------------------------------------------------------------------------------------------------------

Show memory compare report


Mon Jan 18 14:29:53.473 EDT
JID name mem before mem after difference mallocs restart/exit/new
--- ---- ---------- --------- ---------- ------- ----------------
326 mibd_entity 201934220 202205260 271040 4258
65890 malloc_dump 29496 0 29496 415 E
65885 more 13604 0 13604 14 E
329 mibd_route 4039680 4043408 3728 47
1135 snmpd 3681156 3684372 3216 76
425 sysdb_shared_data_nc 712024 712844 820 12
318 lrd 842428 843100 672 5
72 mdio_sup 32412 32912 500 4
328 mibd_interface 110304424 110304728 304 6
65902 exec 212676 212756 80 3
1043 mpls_ldp 3174592 3174640 48 1
429 sysdb_svr_admin 1976996 1976956 -40 -7
89 qnet 786752 786696 -56 -2
186 dsc 410588 410524 -64 -1
161 cepki 150284 150052 -232 -3
382 qsm 1388236 1387968 -268 -3
95 syslog_dev 243656 243360 -296 -4
97 sysmgr 321588 321124 -464 -6
92 shmwin_svr 71212 70748 -464 -6
317 lpts_pa 2546528 2545960 -568 -5
314 locald_DSC 437552 435768 -1784 -38
339 netio 4211348 4209324 -2024 -48
243 gsp 5101596 5097516 -4080 -2
427 sysdb_shared_sc 4848956 4844848 -4108 -99
440 tcp 1946168 1932336 -13832 -7
430 sysdb_svr_local 5737092 5719916 -17176 -361
426 sysdb_shared_nc 3856588 3838284 -18304 -408
424 sysdb_mc 3566884 3534796 -32088 -377
184 devc-vty 272472 226052 -46420 -79
65875 sshd_child_handler 102684 13604 -89080 -1494 R
65876 exec 210724 29040 -181684 -1355 R
355 parser_server 22739980 21673072 -1066908 -60


You are now free to remove snapshot memcmp_start.out and memcmp_end.out under /harddisk:/malloc_dump

--------------------------------------------------------------------------------------------------------------------

show install active summary
Mon Jan 18 14:32:22.204 EDT
Default Profile:
SDRs:
Owner
Active Packages:
disk0:asr9k-fpd-px-5.3.0
disk0:asr9k-k9sec-px-5.3.0
disk0:asr9k-li-px-5.3.0
disk0:asr9k-bng-px-5.3.0
disk0:asr9k-doc-px-5.3.0
disk0:asr9k-mini-px-5.3.0
disk0:asr9k-mpls-px-5.3.0
disk0:asr9k-optic-px-5.3.0
disk0:asr9k-services-px-5.3.0
disk0:asr9k-video-px-5.3.0
disk0:asr9k-asr901-nV-px-5.3.0
disk0:asr9k-mgbl-px-5.3.0

----------------------------------------------------------------------------------------------------------------------

show processes bgp
Mon Jan 18 14:33:11.337 EDT
Job Id: 1048
PID: 1722376452
Executable path: /disk0/iosxr-routing-5.3.0/0x100000/bin/bgp
Instance #: 1
Version ID: 00.00.0000
Respawn: ON
Respawn count: 2
Last started: Mon Jan 18 03:55:17 2016
Process state: Run (last exit due to SIGTERM)
Package state: Normal
Started on config: default
Feature name: ON
Tag : default
Process group: v4-routing
core: MAINMEM
Max. core: 0
Placement: Placeable
startup_path: /pkg/startup/bgp.startup
Ready: 1.158s
Available: 27.863s
Process cpu time: 34.394 user, 4.008 kernel, 38.402 total
JID TID CPU Stack pri state TimeInState HR:MM:SS:MSEC NAME
1048 1 0 424K 10 Receive 0:00:02:0241 0:00:03:0443 bgp
1048 2 1 424K 10 Receive 0:00:09:0252 0:00:26:0822 bgp
1048 3 0 424K 10 Receive 10:37:53:0620 0:00:00:0000 bgp
1048 4 0 424K 10 Receive 10:37:51:0800 0:00:00:0001 bgp
1048 5 1 424K 10 Receive 0:30:48:0099 0:00:00:0006 bgp
1048 6 1 424K 10 Sigwaitinfo 10:37:52:0497 0:00:00:0000 bgp
1048 7 1 424K 10 Receive 10:04:31:0726 0:00:00:0043 bgp
1048 8 0 424K 10 Receive 0:00:00:0245 0:00:00:0272 bgp
1048 9 1 424K 10 Receive 6:31:03:0899 0:00:00:0207 bgp
1048 10 1 424K 10 Receive 0:51:50:0360 0:00:00:0014 bgp
1048 11 0 424K 10 Nanosleep 0:00:03:0197 0:00:01:0414 bgp
1048 12 0 424K 10 Receive 10:37:52:0258 0:00:00:0000 bgp
1048 13 1 424K 10 Receive 10:37:52:0240 0:00:00:0000 bgp
1048 14 1 424K 10 Receive 0:00:02:0767 0:00:00:0459 bgp
1048 15 0 424K 10 Receive 10:25:48:0318 0:00:00:0009 bgp
1048 16 0 424K 10 Receive 0:00:09:0737 0:00:00:0974 bgp
1048 17 1 424K 10 Receive 0:00:00:0800 0:00:01:0463 bgp
1048 18 1 424K 10 Receive 0:51:52:0365 0:00:00:0009 bgp
1048 19 1 424K 10 Receive 0:51:50:0362 0:00:00:0001 bgp
1048 20 0 424K 10 Receive 0:50:52:0363 0:00:00:0053 bgp
1048 21 0 424K 10 Receive 0:51:44:0358 0:00:00:0032 bgp
1048 22 0 424K 10 Receive 0:00:51:0129 0:00:00:0048 bgp
1048 23 0 424K 10 Receive 0:00:44:0793 0:00:01:0438 bgp
1048 24 1 424K 10 Receive 10:37:51:0324 0:00:00:0000 bgp
1048 25 0 424K 10 Receive 0:51:50:0332 0:00:00:0996 bgp
1048 26 1 424K 10 Receive 10:37:51:0320 0:00:00:0000 bgp
1048 27 0 424K 10 Receive 0:00:00:0749 0:00:00:0695 bgp
-------------------------------------------------------------------------------

Hi All,

Can you please confirm on how to free the memory space without impacting any other running process.

Thanks

Have you got dual supervisor engines?

Hi Phil..

yes Dual supervisor engine is available.

Why don't you use the ISSU feature and do a hitless software upgrade to resolve the issue?

Hi,

I have noticed you have all the packages activated on your chassis, do you need all of them? 

Is it not a good practice to have all the packages activated whether you are using them or not. They certainly occupy memory, try to deactivate and remove unnecessary packages to free some memory.

Also, how many BGP routes do you have on your ASR? Can you minimize them?

Also look at your RIB if it can be reduced (summarization)

I would also recommend to upgrade to 5.3.3 with SP1, if there is any memory leak it would solve it.

Cheers!

Hi,

ASR9k when surpasses memory levels such as yours "severe" it will try to recover the memory. BGP being at the top of memory eater in protocols. You can check memory thresholds you have on your box via "sh watch thresh mem conf loc all" and current memory status can be checked via "sh mem sum loc all" this will give you location where memory is running out.

for a quick fix to avoid this BGP peering torn down, you can change the memory thresholds via

ASR9010(config)#watchdog threshold memory location <location> minor <> severe <> critical <>

Please note this is not recommended to change configured values as these are best values to have on the box. However you can change them to get Peer back immediately.

I would suggest you to Open case with TAC and do memory compare for a day or two to identity whats eating up your memory.

Hello

Are you receiving the full internet bpp route table and if so is this required?

Do you have soft-inbound reconfiguration enabled ? - you his is very memory resource intensive and with later iOS not required

Have you checked with your peering rtr for possible memory issue?

Res

Paul


Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul
Review Cisco Networking for a $25 gift card