cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7429
Views
15
Helpful
28
Replies

BNG cluster - redundancy

elnur.mammadov
Level 1
Level 1

Hello everyone,

 

What would be the best way to achieve link/cluster redundancy into the access plant given the second diagram? 

What is the best practice for redundant BNG configuration?

Cluster is a couple of ASR9001 with a 2 port 10gbps line card in each, and access nodes can be considered third party L2 switches chain linked to each other over 10 gpbs ( this is the OSP for FTTx ).

BNG (experimental, both PPPoE and IPoE work) on a bundle-ether interface works, members of the bundle-ether are east and south interfaces from distinct cluster members, haven't tested cluster redundancy yet. 

EAPS Ring redundancy with the setup #1 works but it has a single point of failure, the access node itself with the up links.

In the second setup with the TLS (Basically an L2 bridge without any protection protocol, none) bundle-ether link redundancy only works when links directly connected to the cluster go down.  As per Cisco papers, bundle-ether interface are for point-to-point links, so it is not supposed to work if an intermediate link goes does down, I understand that, there's no signalization.

I guess my question is how would you suggest to implement link redundancy in setup 2 with TLS, thus eliminate the single point of failure of EAPS setup, I've already contacted the vendor of the access nodes on the subject, awaiting their response, it has been a little over an year.

Thank you.

http://en.wikipedia.org/wiki/Ethernet_Automatic_Protection_Switching

Regards

Elnur

28 Replies 28

Hi Xander,

Not quite set yet, how did you know?:) if you do not mind, I have more problems and a few questions:

I'm moving forward, overall XR to me seems more intuitive, ones you get used to it:)

problem: if the the ring is open in the middle, both nodes must play, everything works as expected on one interface, does not on the other, please take a look below, the only difference is that one of them is on LC of one chassis the other one is on LC of the other (this is ASR9001S nV cluster), local dhcp server shows the session stuck in INIT_DPM_WAIT state. I thought it is because of the shared pool (some sort of protection against split brain) and downed the working interface, did not help. Is there some particular reason that you know of that might be causing this?

How do I debug IPOE sessions? I googled all I could, no luck.

interface TenGigE1/0/0/2.47

 description rign0-ipoe-west
 ipv4 point-to-point
 ipv4 unnumbered Loopback20
 service-policy type control subscriber ES_IPOE_PM0
 shutdown
 encapsulation dot1q 47
 ipsubscriber ipv4 l2-connected
  initiator dhcp
 !
interface TenGigE0/0/0/2.47

 description rign0-ipoe-east
 ipv4 point-to-point
 ipv4 unnumbered Loopback20
 service-policy type control subscriber ES_IPOE_PM0
 encapsulation dot1q 47
 ipsubscriber ipv4 l2-connected
  initiator dhcp
 !
!

Thank you.

Elnur

Dear Xander,

please advice, what is INIT_DPM_WAIT state, the problem I have looks like L2 issue of the ring equipment, ASR is doing great! anyhow please let me know if you suspect(know) anything is wrong with ASR in this regards - ignore otherwise:) (I'm already planning procurement of CPT 200 based ring)

About debugging IPOE sessions, still stands, please advise.

IOS ISG based portal we developed is using portbundle to identify redirected (captured) sessions, as I understand there is no portbundle support in XR (increased scalability?). I need option82 for automation of the service provisioning in the billing system, given client IP extracting it from accounting records is trivial, the only problem that I see is with VRF's (client IP collision), a portal per VRF could be used to avoid this collision (additional interfaces), have not thought it through yet. Do you see any problems with this? please advise.

Is there another document besides this with iEdge (CoA) commands?

Thank your very much!

Elnur

This is what DPM debug says

LC/0/0/CPU0:Jul 24 08:14:37.759 : dhcpd[153]: DPM INTERNAL: TP13: DPM Session create called for client DHCPV4
LC/0/0/CPU0:Jul 24 08:14:37.759 : dhcpd[153]: DPM INTERNAL: TP14: DPM Session create params client DHCPV4: chaddr 3a02.7138.f050chaddr_len 6, client_id , client_id_len 0, parentIfHandle 0x4000740 (67110720)
LC/0/0/CPU0:Jul 24 08:14:37.759 : dhcpd[153]: DPM INTERNAL: TP15: Session create params client DHCPV4 circuit_id 31302E3139322E36342E32303A312D392D332D302D6574682D34373A2D3437, circuit_id_len 31
LC/0/0/CPU0:Jul 24 08:14:37.759 : dhcpd[153]: DPM INTERNAL: TP38: Session create params client DHCPV4 vendor_id , vendor_id_len 0
LC/0/0/CPU0:Jul 24 08:14:37.759 : dhcpd[153]: DPM INTERNAL: TP57: Session create params client DHCPV4  port 2 chan 0subif 47 rack 0 slot 0 instance 8
LC/0/0/CPU0:Jul 24 08:14:37.759 : dhcpd[153]: DPM INTERNAL: TP66: Session create params client DHCPV4 nas port type 22tag_count 1 outer tag 47 inner tag 0
LC/0/0/CPU0:Jul 24 08:14:58.041 : dhcpd[153]: DPM INTERNAL: TP50: Session start response callback received with sub_label = 0x4000014 (67108884), client = DHCPV4 ctx = 0x3001351 (50336593), result = 'AAA_BASE' detected the 'fatal' condition 'Invalid state (aaa base lib error)', trans_id = 0x1379 (49
85)
LC/0/0/CPU0:Jul 24 08:14:58.041 : dhcpd[153]: DPM ERROR: TP7: Session start response for sub_label 0x4000014 (67108884) client DHCPV4result is not ok: 'AAA_BASE' detected the 'fatal' condition 'Invalid state (aaa base lib error)'
LC/0/0/CPU0:Jul 24 08:14:58.042 : dhcpd[153]: DPM INTERNAL: TP119: Session disconnect called for client DHCPV4 reason Session start failure
LC/0/0/CPU0:Jul 24 08:14:58.042 : dhcpd[153]: DPM ERROR: TP36: Invalid sub_label passed to disconnect sessionfor client DHCPV4

I think this is a known issue. Can you recover from this scenario by doing a proc restart on the dhcp process on LC0 and the RP?

If so, you may be hitting CSCuo78296 , it doesnt have any release note yet, but I'll check that out. Try the proc restart first.

Besides the link you have found for the VSA/attributes, there is not much else today, however our doc group is putting some effort in documenting this properly, so you'll likely see it appearing on CCO at some point soon.

regards

xander

Thank you Xander, I was lost

Unfortunately I was unable to access the bug info, on CSCuo78296.

about debugging, I wonder what is this, a several of times debug output just stopped (this is the reason I struggled above, I thought i'm not enabling it correctly ), I have to undebug and debug back everything, in order to get output back on to the monitor terminals, several times it even closed the telnet session all together, the one where debug was enabled from, the other active telnet sessions stayed alive, but stopped debug output as well.

 

as to the restart of dhcpd, it did not help on the affected member, so i decided to reboot the affected cluster member, for some reason i lost from the cluster, I tried to bring it back by rebooting the unaffected member, reboot did not bring the lost member, and now the same DPM problem is affecting this single member as well, below output is from it:

 

RP/1/RSP0/CPU0:ironman0#process restart dhcpd location all
Fri Jul 25 03:14:54.678 Baku
Location all can affect the stability of the System. Proceed? [confirm]On node node1_0_CPU0 ...
RP/1/RSP0/CPU0:Jul 25 03:14:55.308 : sysmgr_control[65886]: %OS-SYSMGR-4-PROC_RESTART_NAME : User elnour (vty0) requested a restart of process dhcpd at 1/0/CPU0
On node node1_RSP0_CPU0 ...
complete
RP/1/RSP0/CPU0:ironman0#LC/1/0/CPU0:Jul 25 03:14:55.394 : ipsub_ma[242]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailble].
RP/1/RSP0/CPU0:Jul 25 03:14:55.439 : ipsub_ma[290]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailble].
LC/1/0/CPU0:Jul 25 03:14:56.048 : dhcpd[153]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailble].
RP/1/RSP0/CPU0:Jul 25 03:14:56.117 : dhcpd[1080]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is not ready. Reason: [V4 Subscriber infra process(es) is unavailble].
LC/1/0/CPU0:Jul 25 03:14:58.120 : dhcpd[153]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is ready. Reason: [V4 Subscriber infra process(es) is availble].
LC/1/0/CPU0:Jul 25 03:14:58.120 : ipsub_ma[242]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is ready. Reason: [V4 Subscriber infra process(es) is availble].
RP/1/RSP0/CPU0:Jul 25 03:14:58.168 : dhcpd[1080]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is ready. Reason: [V4 Subscriber infra process(es) is availble].
RP/1/RSP0/CPU0:Jul 25 03:14:58.168 : ipsub_ma[290]: %SUBSCRIBER-SUB_UTIL-5-SESSION_THROTTLE : Subscriber Infra is ready. Reason: [V4 Subscriber infra process(es) is availble].

RP/1/RSP0/CPU0:ironman0#
RP/1/RSP0/CPU0:ironman0#
RP/1/RSP0/CPU0:ironman0#show proc dhcpd location all
Fri Jul 25 03:15:04.433 Baku
node:      node1_0_CPU0
-------------------------------------------------------------------------------
                  Job Id: 153
                     PID: 512176
         Executable path: /iosxr-fwding-5.1.1/bin/dhcpd
              Instance #: 1
              Version ID: 00.00.0000
                 Respawn: ON
           Respawn count: 7
  Max. spawns per minute: 12
            Last started: Fri Jul 25 03:14:55 2014
           Process state: Run (last exit status : 1)
           Package state: Normal
       Started on config: cfg/gl/dhcpd/profile/IPOE_BASE/0x3/server/type
                    core: MAINMEM
               Max. core: 0
               Placement: None
            startup_path: /pkg/startup/dhcpd.startup
                   Ready: 0.616s
        Process cpu time: 0.229 user, 0.032 kernel, 0.261 total
JID   TID CPU Stack pri state        TimeInState    HR:MM:SS:MSEC   NAME
153    1    3  128K  10 Join           0:00:08:0379    0:00:00:0191 dhcpd
153    2    2  128K  10 Sigwaitinfo    0:00:08:0813    0:00:00:0000 dhcpd
153    3    3  128K  10 Receive        0:00:08:0614    0:00:00:0000 dhcpd
153    4    3  128K  10 Receive        0:00:03:0507    0:00:00:0013 dhcpd
153    5    3  128K  10 Receive        0:00:07:0352    0:00:00:0001 dhcpd
153    6    2  128K  10 Receive        0:00:00:0060    0:00:00:0027 dhcpd
153    7    2  128K  10 Receive        0:00:00:0309    0:00:00:0028 dhcpd
153    8    3  128K  10 Receive        0:00:08:0383    0:00:00:0000 dhcpd
153    9    1  128K  10 Receive        0:00:08:0323    0:00:00:0001 dhcpd
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
node:      node1_RSP0_CPU0
-------------------------------------------------------------------------------
                  Job Id: 1080
                     PID: 1851601
         Executable path: /disk0/iosxr-fwding-5.1.1/0x204/bin/dhcpd
              Instance #: 1
              Version ID: 00.00.0000
                 Respawn: ON
           Respawn count: 7
  Max. spawns per minute: 12
            Last started: Fri Jul 25 03:14:55 2014
           Process state: Run (last exit status : 1)
           Package state: Normal
       Started on config: cfg/gl/dhcpd/
           Process group: central-services
                    core: OFF
               Max. core: 0
               Placement: Placeable
            startup_path: /pkg/startup/dhcpd.startup
                   Ready: 0.695s
        Process cpu time: 0.215 user, 0.038 kernel, 0.253 total
1080   1    1  144K  10 Join           0:00:08:0228    0:00:00:0208 dhcpd
1080   2    3  144K  10 Receive        0:00:08:0800    0:00:00:0000 dhcpd
1080   3    3  144K  10 Sigwaitinfo    0:00:08:0745    0:00:00:0000 dhcpd
1080   4    2  144K  10 Receive        0:00:03:0460    0:00:00:0010 dhcpd
1080   5    1  144K  10 Receive        0:00:07:0356    0:00:00:0000 dhcpd
1080   6    2  144K  10 Condvar        0:00:07:0947    0:00:00:0001 dhcpd
1080   7    0  144K  10 Receive        0:00:00:0123    0:00:00:0022 dhcpd
1080   8    3  144K  10 Receive        0:00:00:0168    0:00:00:0010 dhcpd
1080   9    3  144K  10 Receive        0:00:08:0227    0:00:00:0000 dhcpd
1080   10   3  144K  10 Receive        0:00:08:0178    0:00:00:0002 dhcpd
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

RP/1/RSP0/CPU0:ironman0#show subscriber session all
Fri Jul 25 03:34:16.625 Baku
Codes: IN - Initialize, CN - Connecting, CD - Connected, AC - Activated,
       ID - Idle, DN - Disconnecting, ED - End

Type         Interface                State     Subscriber IP Addr / Prefix
                                                LNS Address (Vrf)
--------------------------------------------------------------------------------
PPPoE:PTA    BE100.47.pppoe1          AC        185.26.184.0 (default)
IP:DHCP      No                       CN        -
RP/1/RSP0/CPU0:ironman0#debug dpm
Fri Jul 25 03:34:20.302 Baku
RP/1/RSP0/CPU0:ironman0#LC/1/0/CPU0:Jul 25 03:34:27.160 : dhcpd[153]: DPM INTERNAL: TP50: Session start response callback received with sub_label = 0x400001d (67108893), client = DHCPV4 ctx = 0x200002a (33554474), result = 'AAA_BASE' detected the 'fatal' condition 'Invalid state (aaa base lib error)', trans_id = 0x2a (42)
LC/1/0/CPU0:Jul 25 03:34:27.161 : dhcpd[153]: DPM INTERNAL: TP119: Session disconnect called for client DHCPV4 reason Session start failure
LC/1/0/CPU0:Jul 25 03:34:27.160 : dhcpd[153]: DPM ERROR: TP7: Session start response for sub_label 0x400001d (67108893) client DHCPV4result is not ok: 'AAA_BASE' detected the 'fatal' condition 'Invalid state (aaa base lib error)'
LC/1/0/CPU0:Jul 25 03:34:27.161 : dhcpd[153]: DPM ERROR: TP36: Invalid sub_label passed to disconnect sessionfor client DHCPV4

RP/1/RSP0/CPU0:ironman0#undebug allLC/1/0/CPU0:Jul 25 03:34:34.527 : dhcpd[153]: DPM INTERNAL: TP13: DPM Session create called for client DHCPV4
LC/1/0/CPU0:Jul 25 03:34:34.527 : dhcpd[153]: DPM INTERNAL: TP14: DPM Session create params client DHCPV4: chaddr 2a02.7138.f076chaddr_len 6, client_id , client_id_len 0, parentIfHandle 0x440003c0 (1140851648)
LC/1/0/CPU0:Jul 25 03:34:34.527 : dhcpd[153]: DPM INTERNAL: TP15: Session create params client DHCPV4 circuit_id 31302E3139322E36342E34303A312D31382D332D302D6574682D34373A2D3437, circuit_id_len 32
LC/1/0/CPU0:Jul 25 03:34:34.527 : dhcpd[153]: DPM INTERNAL: TP38: Session create params client DHCPV4 vendor_id , vendor_id_len 0
LC/1/0/CPU0:Jul 25 03:34:34.527 : dhcpd[153]: DPM INTERNAL: TP57: Session create params client DHCPV4  port 2 chan 0subif 47 rack 1 slot 0 instance 8
LC/1/0/CPU0:Jul 25 03:34:34.527 : dhcpd[153]: DPM INTERNAL: TP66: Session create params client DHCPV4 nas port type 22tag_count 1 outer tag 47 inner tag 0

Fri Jul 25 03:34:34.635 Baku
All possible debugging has been turned off
RP/1/RSP0/CPU0:ironman0#

Ok this is not good. We need to investigate this more closely. At this point I think a TAC case might be best to continue the triage allowing us to give some dedication to this problem.

I would want to recommend you to first check with XR 512, if this is a lab environment, to make sure this is not a known issue in 511.

If you could check the behavior in XR512 and if the same or similar collect this same logging and have that entered in a TAC case.

would that work for you?

regards

xander

Thank you Xander,

Unfortunately, at the moment I don't have Smartnet, I'll purchase it ASAP, in the meanwhile I would really appreciate if you could provide XR512 for tests, is it possible?

Thank you.

Regards

Elnur

oops! :) haha no problem.

If you have a CCO ID you should be able to download XR512 no problem. I could post it on file exchange on CCO, but that still requires a CCO ID.

regards

xander

Thank you Xander, you are a good man!

I'm not sure:) (is it the login I use to access cisco.com?) I think my CCO ID is elnur.mammadov

 

Thank you

Elnur

yup that is the one Elnur, I verified in the CCO lookup tool and that uid indeed exists. So with that you should be able to go to the:

support, then downloads, add asr9000 in the search box.

when you have that then select all releases from the left, 5 and take the 512 (ED) image to add to your cart for download.

regards!

xander

Hi Xander

It still does not work, I tried, please find the image attached.

What about file exchange?

 

Thank you! 

 

Regards,

Elnur

Hello everyone,

I saw this thread is last 3 year ago. But I hope someone can help to answer and advise me.

I would like to know what is the main difference between BNG cluster and BNG geo - redundancy ? what is pro and con ? which one is better solution?

I have 3xASR9006 use as BNG in difference location. I want to do load sharing and redundancy for them. Not so sure to use which solution.

Thanks.

xthuijs
Cisco Employee
Cisco Employee

hi laung,

cluster is taking 2 physical chassis and make them a single logical router.

this requires a brain extension (EOBC extension) and a dataplane extension between the chassis (aka IRL). because it is a single control plane the overall scale doesnt increase as much. but since it is now a single control plane, active/active LAG can be done.

geored is superior. you can have 2 devices use another one as backup. the control plane is separate, so the scale is higher. no need for low latency EOBC extensions as geored uses iccp to communicate and sync state for the sessions.

also cluster is no longer supported in XR6 and not on rsp880/tomahawk linecards either btw.

in short geored is what you want to be looking at for your scenario.

cheers

xander

Xander,

 

Google found this http://www.ciscoknowledgenetwork.com/files/96_03-15-11_BNG_evolution_seminar_v0.9.pdf

on the page 19 I was very happy to see the attached snipped:)

 

Regards

Elnur