cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7273
Views
41
Helpful
34
Replies

Problem with ASA active/standby set-up after migrating to new ISP circuits

mitchen
Level 2
Level 2

We have an Active/Standby ASA5540 firewall set-up with the Primary Active unit at our head office site (Site A) and the Secondary Standby unit at our DR site (Site B)

Both sites had their "outside" interfaces directly connected to our ISP (We connect the ASA outside interface to the provider's NTE at each site)   This all seemed to work reasonably well - our active traffic would go through Site A and, in the event of a failure with Site A firewall or interface, comms would failover to Site B.

We recently decided to upgrade the bandwidth of our outside links to the ISP.  This meant getting completely new circuits installed and new NTEs but we requested that we keep the same IP Addressing for the new circuits (we have a number of VPN connections so didn't want to have to be changing configuration)

So, come time to move to the new circuits, we presumed it would just be a case of changing the interface speed on the ASA interface (from 10 to 100) and moving the cables across from old NTE to new NTE.  Meanwhile the ISP would activate the "new" ports on their network switch and shutdown the "old"      ports.   And this could be carried out relatively quickly to minimise any disruption.

However, this is not how it panned out.  It seems that when the ISP activates the new ports, Site B takes over as Active firewall and the Site A firewall has its outside interface marked as "failed"  - The ISP had to shutdown the Site B link in order to allow us to pass traffic through the Site A firewall and circuit again.  And we are left with the situation where we effectively DON'T have our Active/Standby set-up with automatic failover any longer!  We can either have Site A active and passing traffic and Site B marked as "failed" on its outside interface or vice versa.

I don't know too much about the ISP's set-up to be honest but, as far as I'm aware, the ISP connects both the circuits for Site A and Site B to the same network switch in their datacentre and to the same VLAN.

Can anyone suggest what the problem might be and how to resolve it?  I'm assuming it has to be something at the ISP end since I don't really understand what else could be necessary from our point of view (i.e. what else would we need to do other than move the cables and configure the new interface speed)?   Its as if there is some sort of conflict on the ISP's network switch - I don't know if it is something to do with the way the standby ASA takes over the active ASA IP and MAC address and that somehow gets the ISP network switch in a state of confusion?

Does anyone have any ideas/suggestions?  Naturally we are a bit disappointed since we hoped this would be a relatively straightforward task to migrate to our new circuits with increased bandwidth!

Thanks.

34 Replies 34

Hi Jouni,

thanks, I'm not sure either - but your guess would certainly make sense. 

Let's see what tomorrow brings - maybe the ISP will make some progress for me!

Only if the customer has stateful failover link defined ARP is replicated from Active to standby:

Stateful Failover

When stateful failover is enabled, the active unit continually passes       per-connection state information to the standby unit. After a failover occurs,       the same connection information is available at the new active unit. Supported       end-user applications are not required to reconnect to keep the same       communication session.

The state information passed to the standby unit includes these:

  • The NAT translation table

  • The TCP connection states

  • The UDP connection states

  • The ARP table

  • The Layer 2 bridge table (when it runs in the transparent firewall           mode)

  • The HTTP connection states (if HTTP replication is           enabled)

  • The ISAKMP and IPSec SA table

  • The GTP PDP connection database

The information that is not passed to the standby unit when stateful       failover is enabled includes these:

  • The HTTP connection table (unless HTTP replication is           enabled)

  • The user authentication (uauth) table

  • The routing tables

  • State information for security service           modules

Value our effort and rate the assistance!

Hello Mitchen,

I want to add my 2 cents here

I definetly agree the problem is on the ISP side ( I mean that's for sure ).

Now, due to the nature of the problem (1 of the Switch interfaces that belong to that same vlan needs to be shutdown ) let us know we could be dealing with a STP problem.

I would encourage them to check for STP blocking the link

show spanning-tree vlan #

And look for both ports (one of them should be on the blocking state).

Also a Port-mirroring session on the switch side in order to capture all traffic being received would be great.

Rate all of the helpful posts!!!

Regards,

Jcarvaja

Follow me on http://laguiadelnetworking.com

Julio Carvajal
Senior Network Security and Core Specialist
CCIE #42930, 2xCCNP, JNCIP-SEC

OK, so I understand we can explain multiple scenarios to the customer but the problem is to assume that the customer has X,Y and Z, if this was a configuration example question I would help but this is not the case. My intention is not to confuse the customer because he can go on with thousands of questions that don’t come into case because he does not manage the ISP devices. If for example the customer told me that this happened with HSRP setup that he controls I would be looking at that.

I hope that the customer and all of you understands my point and get the ISP involved so that they can fix the customer’s issue or tell us what they see that could be causing the failure that I still believe has nothing to do with the ASAs.

Value our effort and rate the assistance!

Jumora/Jouni - we do have stateful failover so that explains the ARP info being replicated from Active to Standby and therefore what I am seeing makes sense.

Julio - yes, I agree - that's what I also suspect.

Jumora (and all!) - sorry, I should have made this clear in my original post:  Obviously,  I DO have a call out with the ISP already (and have since escalated this with them) and I also believe the problem is at their end rather than the ASA's and have said as much to them right from the start!  

But, as I explained earlier in the thread I wanted to make sure I covered all bases by investigating everything at "my" end too - with the bonus of gathering more evidence to beat up the ISP with and increasing my own understanding of the ASA's and their interactions and dependencies.  In that regard, I'm very happy to have done so as I have had some great advice and suggestions from everyone (I don't want to neccessarily single anyone out as I'm grateful for all help but Jouni in particular has given some excellent troubleshooting tips which I'm sure will benefit myself, and hopefully others who have chanced upon this forum topic, in many other situations too)

I understand that we are limited in what we can achieve given we (or I) have no control over the ISP side of things but it still seemed to me that it was worthwhile posing the question because there was every chance someone out there could have experienced similar (or could offer helpful troubleshooting advice and suggestions as has certainly been the case)

Happy to close the topic if it has gone outside the boundaries of what the Cisco forum is intended for.  And it's been useful for me even if it's been of no use to any other Cisco networkers! 

Hello Mitchen,

Perfect, so let them know what we are talking about.

If they talk about the fact that the ASAs are showing the ARP entry of the other device let them know that happens via the Failover link as they exchange that information but we are still being blocked on the ISP side (at least one port at a time).

And if you need help reaching them or trying to explain that to them let me know and we can setup something so I can help on this,

It's all about having it fix buddy.

Rate all of the helpful posts!!!

Regards,

Jcarvaja

Follow me on http://laguiadelnetworking.com

Julio Carvajal
Senior Network Security and Core Specialist
CCIE #42930, 2xCCNP, JNCIP-SEC

I completely understand your point, you don't want to just point out a  finger without giving your share of evidence that the ASAs have nothing  to do with the failure, knowing that you have already reached out to  the ISP really does help to resolve your issue and as Julio said, if you  have the ISP on a scheduled callback and need of our assistance then  you can reach out to Julio or myself so we can help out. My point still  stands where if we give you assumptions on equipment or protocols that  we don’t know if you are running or not what we can do is just confuse  you and overcomplicate the problem that you are addressing because maybe  will inject ideas that have no reason for you to bring up with the ISP.  Also the ISP needs to inform you what is going and they should have  resolved this for you a lot faster, as a company it’s imperative that  they understand what they are doing and how it affects your business, if  they are not interested in maintaining you as a customer they will  produce these type of issues with no promptness to resolve. Depending on  the type of business that you are you can suffer financial  repercussions if your network is down and also depending on the service  contract that you signed with the ISP they could also suffer financial  penalties.

Now looking at this at the technical side, I mean if the  idea of your questions would have been to understand how to troubleshoot  a common failover setup my first comment would have been ASA normal  setup is ASA failover pair interconnected through a switch or multiple  switches with all ports that connect to the ASA in portfast unless it  was a trunk port then it would be trunk portfast if it is facing the  ASA. Then confirm connectivity between primary and secondary through  ICMP between the primary IP and the secondary IP from both Active and  Standby device and then I could have talked about other troubleshooting  steps to follow but this was not the case.

Every tool has a special task that almost always only it can do but if it is used for the right task.

I wish that you would continue to use our forum to  help you out and I do apologize if it seemed I was pushing you away, I  am just trying to make you understand my point as well as you are trying  to make understand yours.

Value our effort and rate the assistance!

Well, some progress... of sorts!    I had a call with the ISP today and despite them saying initially there were no STP issues, I continued to press using the troubleshooting info we have gathered on this thread.   I got them to supply the port config info for their switch ports connecting to our ASAs (real IP address changed in output):

interface GigabitEthernet1/0/10

description *** Connection to Customer Primary ASA ***

switchport access vlan 100

speed 100

duplex full

mls qos vlan-based

fair-queue

spanning-tree portfast

spanning-tree cost 15

!

!

interface GigabitEthernet2/0/20

description *** Connection to Customer Secondary ASA ***

switchport access vlan 100

speed 100

duplex full

mls qos vlan-based

fair-queue

spanning-tree portfast

spanning-tree cost 16

!

interface Vlan100

ip address 1.1.1.1 255.255.255.240

fair-queue


Apart from the port speed, the spanning-tree cost info is the only thing they say they has changed from the original settings as that was added in as a workaround to allow us to have the active unit that we wanted after the initial problems were experienced)

Then I pressed for output of the show spanning-tree vlan 100 to see if the link was being blocked (as I think we all suspected!) and, lo and behold, it is!

Gi2/0/20         Altn BLK 16        128.23   P2p

So, that is the current state - we have established at least that Spanning Tree on the ISP switches is causing one of the ports to go into the blocking state which explains the issue I am facing.  The question (and, again, I appreciate this is really more for the ISP to answer) is why?    What would be the implications if they simply disabled spanning-tree for those ports?

(Jumora - I fully appreciate the point you're making, but, all the same, I've found this thread has been very valuable with troubleshooting and hope that if anyone finds themselves in a similar situation it may benefit them too.  I know that the ISP is fully responsible for this issue but my aim is to get the problem fixed as quickly as possible and I'll explore every avenue to achieve that, particularly if the ISP are dragging their heels. Besides, the ISP uses Cisco switches so I'm hoping that still fulfils the criteria of the forum so that I can post for further advice! )

Hi again,

Switching isnt really my thing but I though I would think aloud here a bit. So forgive me if I spew nonsense

It seems to me that the ports might be from some switch stack.

I would imagine that both ports connected to the ASAs should be "Designated" ports and not "Alternate" like the blocked one is. Did they provide you with the full output of the command "show spanning-tree vlan 100" ? If not could you ask them to provide that and perhaps also "show vlan id 100" output (thought I am not sure if that is needed if you can get the whole output of the other command). That is if you want to keep asking information from them for this discussion and troubleshooting but as it has been stated before I would imagine the ISP would have the people to determine the actual cause of the problem.

Again with my almost nonexistend knowledge of STP/RSTP I wonder why this 2/0/20 port would be "Alternate". Doesnt this mean that this port is actually connected so that it has a "route" to the Root switch?

So is this Gi2/0/20 actually connected to your Secondary ASA or is it connected to some other switch? Are these 2 ports mentioned from the same device/stack? is there other switches between them and the firewalls?

Again, apologies if I have misslead with any of the above writing (which is possible). Though I am sure I will be corrected

- Jouni

Hi Jouni,

yes, I agree again with you - I think both ports connected to the ASAs should be designated.  I think they only altered the costs when trying to investigate this issue so don't think that was the cause of the original problem (but I guess that again would seem to have been due to STP blocking the port)

I only got the output that I have shown above (after some pressing from me on whether that's what was happening!) unfortunately I didn't get the full output from them. 

We connect our ASA outside ports to the carrier's NTE.  The carrier then connects us from there to the ISP's datacentre where they have their switchstack.  I presume they cable from their switchstack to the carrier's NTE in their datacentre.  So effectively, our outside secondary ASA port is connected to this port Gi2/0/20 on their switchstack and our primary ASA outside port is connected to port Gi1/0/10  on their switchstack (whether the carrier equipment has any influence on things I have absolutely no idea whatsoever?)

Ignoring the "cost" which I think they should take out (apparently, it was not there at the start of the issue so presumably can't be the cause anyway) - then I can't see anything obvious that is wrong with their port settings?  So why would their switch decide to put one of the ports to our ASAs in a blocking state?  My switching knowledge is very rusty and I haven't been able to come up with anything in the ASA documentation that gives much guidance (but I guess it's maybe not likely to contain that sort of info)   The ISP are suggesting that they could just turn off Spanning tree for this VLAN - I know the ASAs themselves don't participate in STP but this approach still seems risky to me?   I'd rather know why STP was blocking the port in the first place than potentially risk introducing loops in future e.g. the next time the ISP carries out any changes on their switching network!  Am I being sensible or over-cautious?! 

ps just to emphasise this call IS still with the ISP so - as Jumora has said - the risk is I over-complicate/confuse things by delving into it too much myself.  I obviously don't want to do that but I do understand the risks and appreciate that any advice/suggestions given come with that caveat!

Hi,

Again I am wondering if they have created a looped connection (through port Gi2/0/20) in their network and RSTP has moved Gi2/0/20 to blocked state as its "Alternate"

Here is a link to one Cisco documentation

http://www.cisco.com/en/US/tech/tk389/tk621/technologies_white_paper09186a0080094cfa.shtml

- Jouni

Thanks - I've asked some more questions of the ISP around the STP setup (no answers forthcoming as yet!) so will wait and see what they have to say next.

Hello,

Glad to know the troubleshooting provided is helpin you.

Now,  the interesting fact is that customer is running the interfaces as  access so there should not be any exchange of BPDUs on those interfaces  and just be in the forwarding state (not participating on RSTP).

U are not using sub-interfaces on the ASA's right?

Rate all of the helpful posts!!!

Regards,

Jcarvaja

Follow me on http://laguiadelnetworking.com

Rate all of the helpful posts!!!

Regards,

Jcarvaja

Follow me on http://laguiadelnetworking.com

Julio Carvajal
Senior Network Security and Core Specialist
CCIE #42930, 2xCCNP, JNCIP-SEC

Hi Julio,  nope - no sub interfaces on the ASAs.

Port g2/0/20 is receiving a BDPU inbound that was issued by itself (i.e the ISP switch stack) on another of its ports (i.e. there is a loop / redundant path). This is pretty much the only reason that a port will go into BLK state.

IMHO it would be a very bad idea to disable STP on that VLAN. If there is indeed a loop, as soon as you do this, everything will go down the toilet.

I presume the ASAs are in routed (L3) mode, and not transparent (L2) mode? If they are transparent, then this changes this conversation somewhat.... In L2 mode, whilst ASAs don't generate BPDUs, they do forward them between interfaces. However, I'm pretty sure a standby ASA won't forward them, although from memory there were some bugs in this area in early releases of 8.2.

In L3 mode, the ASA will just discard BPDUs, and can't create a L2 loop.

Barry Hesk
Intrinsic Network Solutions

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card