cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2246
Views
5
Helpful
12
Replies
Serg_tsk
Beginner

ASR 9k1 ipsub_ma iedged crush

There is ASR 9k1 router and XR 6.2.25 with BNG-pie installed.

iedged and ipsub_ma processes are crushed when the role of unit changes in SRG group:

 

LC/0/0/CPU0:Jan 17 14:20:11.139 : srg_agt[356]: %SUBSCRIBER-SRG-5-ROLE_MASTER : BNG Subscriber Redundancy role change to MASTER from None for group 1 reason PEER-DOWN
LC/0/0/CPU0:Jan 17 14:20:11.141 : srg_agt[356]: %SUBSCRIBER-SRG-5-ACTIVATION_COMPLETE : BNG Subscriber sessions activation on SLAVE completed for group 1
LC/0/0/CPU0:Jan 17 14:20:27.486 : srg_agt[356]: %SUBSCRIBER-SRG-5-ROLE_SLAVE : BNG Subscriber Redundancy role change to SLAVE from Master for group 1 reason PEER-UP
LC/0/0/CPU0:Jan 17 14:20:28.042 : dumper[56]: %OS-DUMPER-7-DUMP_REQUEST : Dump request for process pkg/bin/ipsub_ma
LC/0/0/CPU0:Jan 17 14:20:28.057 : dumper[56]: %OS-DUMPER-7-DUMP_ATTRIBUTE : Dump request with attribute 7 for process pkg/bin/ipsub_ma
LC/0/0/CPU0:Jan 17 14:20:28.059 : dumper[56]: %OS-DUMPER-4-SIGSEGV : Thread 1 received SIGSEGV - Segmentation Fault
LC/0/0/CPU0:Jan 17 14:20:28.059 : dumper[56]: %OS-DUMPER-4-SIGSEGV_INFO : Accessed BadAddr 0x1 at PC 0x40043af0. Signal code 1 - SEGV_MAPPER. Address not mapped.
LC/0/0/CPU0:Jan 17 14:20:28.059 : dumper[56]: %OS-DUMPER-4-CRASH_INFO : Crashed pid = 532651 (pkg/bin/ipsub_ma)
LC/0/0/CPU0:Jan 17 14:20:28.063 : dumper[56]: %OS-DUMPER-7-PROC_PAGES : Process memory pages 684
LC/0/0/CPU0:Jan 17 14:20:28.080 : dumper[56]: %OS-RSVDPMEM-7-NO_MATCHING_STRING : Failed to find any line in /etc/platform_reserved_physmem for infra-structure : buffman
LC/0/0/CPU0:Jan 17 14:20:28.091 : dumper[56]: %OS-DUMPER-6-FALLBACK_CHOICE : Fall back choice:  0(bootflash:/dumper) in use

 

And Iedged is crushing in random time without any reason. There are multiple rows in log:

 

LC/0/0/CPU0:Jan 17 14:18:57.697 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailable].
LC/0/0/CPU0:Jan 17 14:18:57.699 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V6 Subscriber infra process(es) is unavailable].
LC/0/0/CPU0:Jan 17 14:18:58.532 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is ready. Reason: [V4 Subscriber infra process(es) is available].
LC/0/0/CPU0:Jan 17 14:18:58.532 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is ready. Reason: [V6 Subscriber infra process(es) is available].
LC/0/0/CPU0:Jan 17 14:19:01.425 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailable].
LC/0/0/CPU0:Jan 17 14:19:03.724 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is ready. Reason: [V4 Subscriber infra process(es) is available].
LC/0/0/CPU0:Jan 17 14:19:06.009 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailable].
LC/0/0/CPU0:Jan 17 14:19:06.009 : iedged[214]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V6 Subscriber infra process(es) is unavailable].

 

What is it ? Thank you

Best Regards

Sergey

12 REPLIES 12
Aleksandar Vidakovic
Cisco Employee

hi Sergey,

we would need the crashinfo and coredump of the ipsub_ma process, along with some other information. Would you mind opening a service request with TAC for this issue also? That would be the optimal way to provide you support.

/Aleksandar

Aleksandar thank you for your reply. Service contract for this equipment has expired. But coredump and crashinfo i posted with this message. But ipsub_ma did not crash on 5.3.4 XR and iedged crashed once or twice. We have 1k test-IPoE sessions, SRG warm redundancy, DHCP-relay and AAA-authorization. Access-interface is physical 10G link and Uplink is bundled-ethernet consisting of two 10G physicals.

 

iedged crushes are similar to the https://quickview.cloudapps.cisco.com/quickview/bug/CSCuv60353, but this bug was fixed...I've no ideas about this.

 

Thanks

Best regards

Sergey

hi Sergey,

Geo-Redundancy for LC-based subscribers is not supported yet. This should come in release 6.5.1. Can you use bundle as access-interface instead? 

regards,

/Aleksandar

I'm really suprised: GEO-RED is working from the end of 2017 year, but sometimes crushes. But ok, I'll try to terminate subscribers sessions on Bundled-Ethernet. How many non-LC-sessions will handle ASR9k1 in this case ?

Another one question: if subscribers sessions are migrating from Master to Slave in warm mode - Account Session ID is numeric: 00006624. If subscribers sessions are coming back to Master - Account session ID changes to "char": 04006a2c for example.  Radius-server can't send CoA for that modified session as a result. This is normal ?

 

Thank you

Best regards

Sergey

On Typhoon line cards (and hence asr9001 as well), the scale remains 32k per NP and 64/LC. If bundle members are shared between two NPs, the scale will be 32k/LC. So on asr9001 the scale is the same if you use LC-based or RP-based subscribers.  The LC-based subscriber model makes a difference on a multi-LC chassis because the overall scale is higher.

 

Session ID shouldn't change. Let us know if you see this when you move to RP-based subscribers.

 

/Aleksandar

On Typhoon line cards (and hence asr9001 as well), the scale remains 32k per NP and 64/LC. If bundle members are shared between two NPs, the scale will be 32k/LC. So on asr9001 the scale is the same if you use LC-based or RP-based subscribers.  The LC-based subscriber model makes a difference on a multi-LC chassis because the overall scale is higher.

 

Session ID shouldn't change. Let us know if you see this when you move to RP-based subscribers.

 

/Aleksandar

Good morning colleagues.

I moved subscribers to Bundled-ethernet, but session-id's were changed in warm-stanby mode. For example:

 

"clear" session on Master:

 

RP/0/RSP0/CPU0:asr9k1_master#sh subscriber session filter ipv4-address 10.243.108.16 det int                    
Tue Jan 23 07:18:48.730
Interface:                Bundle-Ether91.403.ip273
Circuit ID:               Unknown
Remote ID:                Unknown
Type:                     IP: Packet-trigger
IPv4 State:               Up, Tue Jan 23 07:17:37 2018
IPv4 Address:             10.243.108.16, VRF: default
IPv4 Up helpers:          0x00000040 {IPSUB}
IPv4 Up requestors:       0x00000040 {IPSUB}
Mac Address:              0021.913b.6861
Account-Session Id:       00043971
Nas-Port:                 1526829937
User name:                0403.0021913b6861.10.243.108.16
Formatted User name:      0403.0021913b6861.10.243.108.16
Client User name:         unknown
Outer VLAN ID:            403
Subscriber Label:         0x00000043
Created:                  Tue Jan 23 07:17:37 2018
State:                    Activated
Authentication:           unauthenticated
Authorization:            authorized
Ifhandle:                 0x0001fba0
Session History ID:       8
Access-interface:         Bundle-Ether91.403
SRG Flags:                0x00004000
Policy Executed:

  event Session-Start match-first [at Tue Jan 23 07:17:37 2018]
    class type control subscriber class-default do-until-failure [Succeeded]
      10 set-timer TIMER_UNAUTH 1 [cerr: No error][aaa: Success]
      20 activate dynamic-template DYNTPL_IP_SUB_26 [cerr: No error][aaa: Success]
      30 authorize aaa list default [cerr: No error][aaa: Success]
  event Timer-Expiry match-first [at Tue Jan 23 07:18:37 2018]
    class type control subscriber UNAUTH_TIMER_CLASS do-all [Succeeded]
      10 set-timer TIMER_UNAUTH 3 [cerr: No error][aaa: Success]
      20 authorize aaa list default [cerr: No error][aaa: Success]
Session Accounting: disabled
Last COA request received: unavailable
User Profile received from AAA:
 Attribute List: 0x4a0129c8
1:  session-timeout len=  4  value= 3600(e10)
2:  primary-dns     len=  4  value= 10.117.162.226
3:  inacl           len= 17  value= ACL_PERMIT_ANY_IN
4:  outacl          len= 18  value= ACL_PERMIT_ANY_OUT
5:  sub-qos-policy-in len= 14  value= QOS_100000K_IN
6:  sub-qos-policy-out len= 15  value= QOS_100000K_OUT
7:  sub-pbr-policy-in len= 14  value= PBR_PERMIT_ANY
Services:
  Name        : DYNTPL_IP_SUB_26
  Service-ID  : 0x4000002
  Type        : Multi Template
  Status      : Applied
-------------------------
[Event History]
   Jan 23 07:17:37.280 IPv4 Start
   Jan 23 07:17:37.664 IPv4 Up
   Jan 23 07:18:37.312 SUBDB produce done [many]


Session migrated from Master to Slave once:

RP/0/RSP0/CPU0:asr9k1_slave#sh subscriber session filter ipv4-address 10.243.108.16 det int
Tue Jan 23 07:22:00.224 TOMSK
Interface:                Bundle-Ether91.403.ip190
Circuit ID:               Unknown
Remote ID:                Unknown
Type:                     IP: Packet-trigger
IPv4 State:               Up, Tue Jan 23 07:21:27 2018
IPv4 Address:             10.243.108.16, VRF: default
IPv4 Up helpers:          0x00000040 {IPSUB}
IPv4 Up requestors:       0x00000040 {IPSUB}
Mac Address:              0021.913b.6861
Account-Session Id:       000005ad
Nas-Port:                 1526829937
User name:                0403.0021913b6861.10.243.108.16
Formatted User name:      0403.0021913b6861.10.243.108.16
Client User name:         unknown
Outer VLAN ID:            403
Subscriber Label:         0x000005d8
Created:                  Tue Jan 23 07:21:25 2018
State:                    Activated
Authentication:           unauthenticated
Authorization:            authorized
Ifhandle:                 0x00016060
Session History ID:       10
Access-interface:         Bundle-Ether91.403
SRG Flags:                0x00024000
Policy Executed:

Session Accounting: disabled
Last COA request received: unavailable
User Profile received from AAA:
 Attribute List: 0x4a012c50
1:  session-timeout len=  4  value= 3600(e10)
2:  primary-dns     len=  4  value= 10.117.162.226
3:  inacl           len= 17  value= ACL_PERMIT_ANY_IN
4:  outacl          len= 18  value= ACL_PERMIT_ANY_OUT
5:  sub-qos-policy-in len= 14  value= QOS_100000K_IN
6:  sub-qos-policy-out len= 15  value= QOS_100000K_OUT
7:  sub-pbr-policy-in len= 14  value= PBR_PERMIT_ANY
Services:
  Name        : DYNTPL_IP_SUB_26
  Service-ID  : 0x4000002
  Type        : Multi Template
  Status      : Applied
-------------------------
[Event History]
   Jan 23 07:21:26.912 IPv4 Up
   Jan 23 07:21:26.912 SUBDB produce done


Session came back from Slave to Master:


RP/0/RSP0/CPU0:asr9k1_Master#sh subscriber session filter ipv4-address 10.243.108.16 det int
Tue Jan 23 07:21:38.603 TOMSK
Interface:                None
Circuit ID:               Unknown
Remote ID:                Unknown
Type:                     IP: Packet-trigger
IPv4 State:               Up Pending, Tue Jan 23 07:21:37 2018
IPv4 Address:             10.243.108.16, VRF: default
Mac Address:              0021.913b.6861
Account-Session Id:       00000aee
Nas-Port:                 1526829937
User name:                0403.0021913b6861.10.243.108.16
Formatted User name:      0403.0021913b6861.10.243.108.16
Client User name:         unknown
Outer VLAN ID:            403
Subscriber Label:         0x000005da
Created:                  Tue Jan 23 07:21:37 2018
State:                    Connected
Authentication:           unauthenticated
Authorization:            authorized
Ifhandle:                 0x00000000
Session History ID:       0
Access-interface:         Bundle-Ether91.403
SRG Flags:                0x00064004
Policy Executed:

Session Accounting: disabled
Last COA request received: unavailable
User Profile received from AAA:
 Attribute List: 0x4a012ba0
1:  session-timeout len=  4  value= 3600(e10)
2:  primary-dns     len=  4  value= 10.117.162.226
3:  inacl           len= 17  value= ACL_PERMIT_ANY_IN
4:  outacl          len= 18  value= ACL_PERMIT_ANY_OUT
5:  sub-qos-policy-in len= 14  value= QOS_100000K_IN
6:  sub-qos-policy-out len= 15  value= QOS_100000K_OUT
7:  sub-pbr-policy-in len= 14  value= PBR_PERMIT_ANY
Services:
  Name        : DYNTPL_IP_SUB_26
  Service-ID  : 0x4000002
  Type        : Multi Template
  Status      : Request PD Association

 

I tried to switch SRG-mode to hot-standby. I saw error messages when sessions were migrating between nodes:

 

[Event History]
   Jan 23 08:03:44.768 SUBDB produce done(fail) [many]

 

Interface:                None
Circuit ID:               Unknown
Remote ID:                Unknown
Type:                     IP: Packet-trigger
IPv4 State:               Up, Tue Jan 23 07:48:46 2018
IPv4 Address:             10.227.251.178, VRF: default
IPv4 Up helpers:          0x00000040 {IPSUB}
IPv4 Up requestors:       0x00000040 {IPSUB}
Mac Address:              c8d3.a3ac.2533
Account-Session Id:       00003323
Nas-Port:                 1526871075
User name:                0564.c8d3a3ac2533.10.227.251.178
Formatted User name:      0564.c8d3a3ac2533.10.227.251.178
Client User name:         unknown
Outer VLAN ID:            564
Subscriber Label:         0x00000fc3
Created:                  Tue Jan 23 07:27:02 2018
State:                    Activated
Authentication:           unauthenticated
Authorization:            authorized
Ifhandle:                 0x00000000
Session History ID:       328
Access-interface:         Bundle-Ether91.564
SRG Flags:                0x00030000
Policy Executed:

  event Timer-Expiry match-first [at Tue Jan 23 07:46:47 2018]
    class type control subscriber UNAUTH_TIMER_CLASS do-all [Succeeded] {repeated 1}
      10 set-timer TIMER_UNAUTH 3 [cerr: No error][aaa: Success]
      20 authorize aaa list default [cerr: 'iEdge' detected the 'warning' condition 'iEdge SVM, Unable to complete this request'][aaa: Success]
Session Accounting: disabled
Last COA request received: unavailable
User Profile received from AAA: None
No Services
[Event History]
   Jan 23 07:48:45.568 IPv4 Up
   Jan 23 08:03:46.816 SUBDB produce done(fail) [many]

 

I put configuration of both routers with this post. What do i incorrect ?

 

The 2nd question: state-control-routes doesn't work. If node became master - summarized-routes doesn't appear on BGP-advertized-routes and show route subscriber shows only /32 hosts and virtual interfaces of subscribers.

The 3d question is: if i put one physical port in one bundled-ether, and put 2nd physical port from another NP to 2nd bundle. 1st bundle will be Master in SRG 1 and 2nd Bundle will be SLAVE in SRG 2. Can i terminate 64k session (balanced by SRGroups in hot - standby mode) summary on two 9k1 routers ? 32K sessions per SRG, or 16K in 4 SRGs ?

 

Thank you

Best regards

Sergey

 

Hey all,

 

I faced service impact on the subscribers and the ASR 9000 was showing me the logs below at the same time the issue occured.

 

LC/0/1/CPU0:Feb  8 20:52:49.402 IST: pppoe_ma[172]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailable].

LC/0/1/CPU0:Feb  8 20:52:49.402 IST: pppoe_ma[172]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V6 Subscriber infra process(es) is unavailable].

LC/0/1/CPU0:Feb  8 20:52:49.403 IST: dhcpv6d[127]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailable].

LC/0/1/CPU0:Feb  8 20:52:49.403 IST: dhcpv6d[127]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V6 Subscriber infra process(es) is unavailable].

LC/0/1/CPU0:Feb  8 20:52:49.403 IST: iedged[177]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailable].

LC/0/1/CPU0:Feb  8 20:52:49.403 IST: iedged[177]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V6 Subscriber infra process(es) is unavailable].

 

does anyone one knows the cause of these message?  issue solved after the rebooted of the external radius server.

 

Thank you.

Also does anyone knows a reference to check the log messages sent by ASR 9000?

 

Thank you.

The syslog messages that you have shared are the highest level of notification that something is not going well in the subscriber infra. This message on its own is not meant to provide any indications of the cause. Considering how the issue was resolved, it's possible that the comms with radius was somehow blocked. As some of the messages must be exchanged with radius using a reliable delivery mechanism, queues can fill up when there's no response. For root cause analysis it's best to collect the show tech bng and open a case with our TAC. As always, it's best to run one of the more recent extended maintenance releases. That always makes the analysis easier.

 

best,

/Aleksandar

Thank you Aleksandar,

 

Is there any reference guide or any guide that explain the syslog message received.

 

many thanks.

"Cisco CLI Analyzer" tool should be the one to use for decoding error messages. You can download it from https://www.cisco.com/c/en/us/support/web/tools-catalog.html 

 

/Aleksandar