ASR9000/XR : Understanding SSRP Session State Redundancy Protocol for IC-SSO

xthuijs · ‎03-03-2011

Introduction

This document answers frequently asked questions about the Session State Redundancy Protocol (SSRP).

SSRP is an integral part of achieving sub second failover for MLP and PPP interfaces, extremely useful in mobile backhaul deployment scenarios.

While APS (Automated Protection Switching) provides for Sonet level failover, with regular MR-APS your PPP sessions would still need to re-establish. By using MR-APS your outtage time ranges around 20 seconds (protocol bound as PPP sesions have to re-establish).

By leveraging SSRP, you will synchronize your ppp sessions from the working/active to the standby/protect, which means that you eliminate the need for PPP session re-establishment and that saves a lot of time.

Further making use of the IP-FRR (Fast ReRoute) technology brings the failover and time to forward down to several hundreds of msecs!

IP-FRR has a shadow route in the FIB (forwarding information base) pointing primarily to the mlp egress interface and secondary to the standby MR-APS peer. Whenever the primary route disappears, a sub second switch is made to the secondary route.

SSRP is Session State Redundancy Protocol which is used in the ASR9000 to achieve IC-SSO or Inter Chassis Stateful Switchover.

The configuration for SSRPis highlighted in different articles, but for arguments sake of this Q&A a config snippet is highlighted:

Interface configuration (Required on EACH serial and mlp interface that needs to be sync’d)

interface Multilink0/3/0/0/1

ipv4 address 30.30.34.2 255.255.255.252

ssrp group 1 id 10001 ppp

encapsulation ppp

ssrp location 0/3/CPU0 << specifies a slot that is subject to synchronisation

group 1 profile ssrp1 <<< define the group (unique group per LC) and attached profile

ssrp profile ssrp1 << profile definition

security ttl max-hops 1 << TTL hops

peer ipv4 address 192.168.1.2 << peer addr (this peer address needs to be in the same vrf as all multilink interfaces)

Q&A

(1)

Can I reuse the SSRPprofile for multiple interfaces?

The SSRPprofile is associated with an SSRPgroup - and a group can be used for multiple interfaces.

Whether an SSRPprofile can be used for multiple groups? If so, the answer is theoretically yes - assuming both groups want

to use the same peer IPv4 address. In practice, this setup would be that common (since if the groups are using the same peer ip-addr,

they could be combined into one group). The only possible use I can think of is when 2 LCs in the same (or even different) routers want to connect

to the same peer, they want to use the same peer address, but a different TCP port number - in which case the same profile but different group-ids

could be used.

You need a group per linecard effectively, so if I have 2 linecards, then I could create 2 group ID's but referencing the same profile as they are going to the same peer, that should work fine.

(2)

Does the group number need to be unique per linecard or per interface,

So if I have a 4 OC12 on a linecard, can I have them all use the same group number?

The same group number can be used for multiple interfaces, as long as they are all on the same LC, and are all replicating to the same LC on the peer. So in general, you should be using the same group for all interfaces on a LC, unless you're replicating to multiple routers/LCs.

(3)

Config:

interface Multilink0/2/0/0/2

ipv4 address 192.168.101.3 255.255.255.254

ssrp group 1 id 10002 ppp

Considering this: Link this to the group number 1, which corresponds to the same group that I linked the CPU from “1” to the ID is here 10002, it needs to be unique, do the members of the T1’s need to have the same ID or basically the ID needs to be unique per interface that is sync’d that is each serial and each mlp if need to have a unique ID and the combination of group+id identifies a unique interface in a unique slot that gets synced to the standby?

The combination of group+id needs to be sufficient to uniquely identify a PPP interface (i.e. the serial or multilink interface). So within each

group, it's perfectly safe to start the ID numbering 1,2,3... We detect at config verification time whether the user attempts to configure

the same ID to 2 interfaces within the same group - and fail the config.

So the group+id identifies a unique interface, hence the same id cannot be used on any other interface on that uses the same group.

(4)

Is there is a limit of how many interfaces I can sync per group? and box wide?

Not explicitly, no - config allows any number of interfaces to be configured per-group. I think officially we only support up to 2600 interfaces currently.

(5)

How is mastership determined in SSRP? Does it look at the APS state or something?

In our implementation of SSRP, SSRP doesn't actually know whether it is Active or Standby - it also runs assuming it is both active and standby. PPP itself is the one to decide, on a per-interface basis, whether each serial/multilink interface is active or standby - and this is based (indirectly) on the APS state. Depending on the whether an interface is active or standby, PPP either replicates the state to the peer over SSRP, or requests the peer to send it the active state.

(6)

Wondering that if both sides are perceiving them to be master, then RTR-left might send a sync state of say UP while we are receiving state down from the right side router, what will be the end result, I mean how would a router know to override its own state by what it received? Would that then be derived by the APS state, say if I am APS master, then I don't override, if I am slave/backup/standby, I am using my received SSRP state?

On each router, each interface will determine its 'redundancy state' (i.e. whether it is Active or Standby) from the APS state (i.e. whether the SONET interface is up or down). When Active, it will simply ignore any SSRPState messages it receives, while when Standby, it will just use whatever info it receives in SSRPState messages. So if both sides think they are Active, this doesn't cause any problems (both sides just ignore SSRPState messages).

(7)

Can I link say an interface in slot 2 to an interface on the standby in slot 4? or does it require the same phy mapping?

Same physical mapping isn't required at all - so linking slot 2 to slot 4 would be fine.

(8)

Would a state sync packet effectively include: group/id/PPP-Phase/State/Options is that how it sort of looks what we'd be sending

across? How many interfaces fit in a signle update packet? Say if a single state/interface is 100 bytes, would we pack 15 interfaces in one update? Or is it a packet per interface?

The State messages contain the group, id, the CP (or authentication protocol) the state is for, whether it is Up or Down, and all the negotiated PPP Options.We batch up multiple interfaces into single packets, allowing packets of up to 3000 bytes.

(9)

How does the syncing work? if I have an interface doing an IPCP reneg say for instance, does that get replicated immediately?

Effectively, yes. When IPCP renegotations start, a State-Down message will be sent to the standby router, indicating that IPCP is no longer up (although in fact we delay these State-Down messages by ~10s, to avoid clearing replicated state during certain MR-APS events). Then when IPCP negotiations complete again, a State-Up message is sent to the standby router, indicating that IPCP is up again, and containing the negotiated IPCP options.

(10)

Can I tune this timer or is it fixed? so if it goes down, we wait 10 secs for sending a stat down, if the if comes backup within 10 seconds, would I send out a state up message anyway?

The timer can't be tuned - it's fixed to 10s. If it comes back up within 10s, we'd send a State-Up anyway.

(11)

How is pacing implemented in SSRP, I mean if there are a lot of updates sent, is there some sort of ack’ing and retransmission?

There is no acking. We treat TCP as reliable - so as long as the TCP socket APIs don't return an error locally, we assume the peer has received the message. If TCP does return an error, we resync to recover. Similarly, if the TCP connection flaps, we resync to recover. Additionally, the Standby router is able at any time to request a replay of state for particular interfaces, if for some reason it hasn't yet received the state.

(12)

Is there any full sync operation that we sync ALL states from ALL interfaces under any circumstance?

Whenever the TCP connection for a group gets established (both for the first time, and if it goes down and comes back up again), a full resync is performed for that group.

Related Information

Configuring PPP and SSRP on the ASR9000

Carlos A. Silva · ‎02-20-2013

Xander:

Hopefully you can help me figure this out.

I'm currently testing an ASR9010 with redundant RSPs and 2 STM1c SPAs with local APS configuration.

When doing failover testing, that is failing the 'working' STM1c #1, everything goes well and the 'protect' STM1c #2 takes over.

now i have 3 interfaces working for this test: one MLPPP, one HDLC and one FR.

If I fail STM1c#2, STM1c#1 takes over but the MLPPP interfase never comes back, even though the HDLC and FR do.

After this second failover, the only way to get the MLPPP interface working again is to either reload the box or do a redundancy switchover. Again, doesn't matter what I do HDLC/FR always recover and work. MLPPP switches over the first time around and never works again.

Do you know of a bug or am I missing special configuration required for MLPPP/APS config?

Thanks in advance!

c.

xthuijs · ‎02-21-2013

Carlos,

Without a config, debugs, version used and show command outputs it is hard to identify the precise reason for this failure.

This is meant to be working just fine and used widely by many folks.

I would recommend to capture the show tech APS, debug ppp nego, show aps <x>, show ppp int <x> and version with installed smu's from this test case witht he markation of when the aps failover was initiated (how do you do that btw, with a fiber pull or manual cmd failover or...?). With that info, it may be best to open a TAC case.

thanks

xander

Carlos A. Silva · ‎02-21-2013

Thanks for your reply, Xander. Actually, we do have a case open, with no response yet.

As to how we simulate the failover: we shutdown the sonet controller and we pull the fibers. Same results on both. I really appreciate you taking the time.