Site to Site Tunnel Forms but does not pass traffic both ways

timsnover · ‎04-24-2016

I am setting up a VPN connection between a Cisco 819HW router and a Cisco ASA 5520 VPN device. The VPN tunnel will successfully form, but when I do "sh crypto session detail", I can see where outbound packets are being encrypted (enc'ed?), but inbound packets are not:

Interface: GigabitEthernet0
Uptime: 00:24:37
Session status: UP-ACTIVE
Peer: 198.101.7.85 port 500 fvrf: (none) ivrf: (none)
Phase1_id: 198.101.7.85
Desc: (none)
Session ID: 0
IKEv1 SA: local 166.157.75.93/500 remote 198.101.7.85/500 Active
Capabilities:(none) connid:2004 lifetime:23:35:22
IPSEC FLOW: permit ip 172.16.47.0/255.255.255.0 0.0.0.0/0.0.0.0
Active SAs: 2, origin: crypto map
Inbound: #pkts dec'ed 0 drop 0 life (KB/Sec) 4357138/7 hours, 35 mins
Outbound: #pkts enc'ed 280 drop 0 life (KB/Sec) 4357039/7 hours, 35 mins

If I look at the tunnel on the ASA (using ASDM) I can see where the Bytes Tx are going up, but the Bytes Rx remain at 0. We have 3 separate VPN ASAs, all 5520's, and sometimes moving the VPN connection from one ASA to another will cause the tunnel to form and work fine, sometimes even that does not work.

This Cisco 819HW router is being used in the back of an ambulance and is connected to a Sierra Wireless 4G device for its WAN connection. What seems to be happening is, the connection is up and running fine but then the 4G connection loses it's signal or the device is turned off. Once it's turned back on or regains the signal, the tunnel re-forms but stops passing 2 way traffic.

I will be setting up, for now, 13 of these devices and need to know how to reliably have the tunnel come up and pass traffic both ways. Any assistance would be appreciated.

Philip D'Ath · ‎04-25-2016

We need to see the 819 configuration and the ASA 5520 configuration.

timsnover · ‎04-25-2016

Thank you for your reply. I have attached the running configs for both devices.

Also, just to let you know, I have had this opened as a TAC case with Cisco. When the engineer was on a WebEx with me, he questioned why we have 3 separate crypto maps configured on the 819 router. We have three ASA 5520's (A, B, and C) and if we need to move the tunnel from one ASA to another, we just change the active crypto map on the interface that is connected to the WAN link. The engineer said he had never seen a set up like that, but he was able to get 2 way traffic going by issuing "clear crypto session" and "clear crypto sa" commands on the 819 router. However, he never said, or figured out, what was causing the problem in the first place. That is what I am trying to determine so that we don't need to log into the WAN IP on the router to issue the "clear" commands to make it operational, especially since that doesn't seem to work all the time anyway, like right now. We have this issue happening on multiple devices, both on 4G and Comcast WAN connections. I am using this one device, which is not currently in a production mode, to find out how to fix all of our VPN connections.

Please let me know if you need to see any other diagnostic info and I appreciate any help you can provide. Thank you.

Philip D'Ath · ‎04-25-2016

With regard to the 891's, did you know a crypto map can have multiple peer IP address? The additional IP addresses act as a backup for the primary.

See an example below. Also, you can specify one as the "default", which it will treat as the "primary" VPN concentrator.

crypto map mymapA 10 ipsec-isakmp
 set peer 198.101.7.81 defgault
 set peer 198.101.7.85
 set peer 198.101.7.87
 set transform-set myset match address 110

However you are likely to get intermittent problems at the head end. With your current config if a tunnel comes up, the head end will reverse route advertise it. Then if the cellular connection drops and a new VPN is established the old one will not be cleaned up straight away, and the head end will now have two paths to choose to the destination.

I'm guessing this is the problem you experienced, and why is was only solved by manually clearing the crypto session.

On the router you should enable keepalive processing to mitigate this. The frequency of the keepalives will dictate the maximum time the system takes to recover from an issue.

crypto isakmp keepalive 30

You also should enable this on the head end, so both end can detect and recovery from failure nicely.

tunnel-group a.b.c.d ipsec-attributes
   isakmp keepalive threshold 30

timsnover · ‎04-26-2016

Thank you for your replies. I have added the keep alive commands on both the router and the ASA, but that has not resolved the issue. If I move the tunnel to a different ASA, it forms and works fine until the router reboots. Then, once again, the tunnel forms but does not pass 2-way traffic.

As for the suggestion to move to a pair of 4000 routers, I have brought this up with my manager and we are going to look into this. However, that won't happen for a while, so I need to try and get these working properly on the ASA.

I have opened up a new TAC case with Cisco to see if I can get them on another WebEx to take a look.

Philip D'Ath · ‎04-26-2016

Did you wait at least twice the keepalive time that had been set?

What version software are you using on the 819 and what version on the ASA?

timsnover · ‎05-11-2016

I'm not sure if you would get the notification that I replied to this thread because it was to a different person. In case you didn't, please see what I just sent to Nathaniel.

timsnover · ‎05-12-2016

We have resolved this issue. We made a setting change on the Sierra Wireless 4G device to have it use private IP addresses on the Ethernet port, which then gave the Gig 0 interface a 192.168.x.x private DHCP address. Before, the 4G device was set up, at the recommendation of the vendor we purchased through, that be in "Pass through" mode so it was just passing the Verizon public IP address through to the Gig 0 interface on the router. Once we made this changed, everything started working fine on the 3 devices that we were having an issue with as well as the ones that were working the other way.

Thank you for taking the time to provide me the assistance you did, it was greatly appreciated.

Nathaniel Wood · ‎05-10-2016

It sounds like you are hitting a bug that I have seen in a couple of ASA's. I can't remember the bug ID, but to sum it up the inbound/outbound packets over the IPSEC tunnel would use different SPI's (Security Parameter Index) than what they should actually be assigned. The clearing of the crypto sessions will temporarily force that traffic to form another SPI and will successfully pass the traffic as expected but eventually the problem would surface once again.

If it were me, I would look for a change window to do an upgrade to the ASA(s). Not only to fix this issue but to patch a pretty major IKE buffer overflow bug that has been found. Your fixed code is 8.4(7.30) at this point, and that bug can be found here:

https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20160210-asa-ike

Hope this helps!

timsnover · ‎05-11-2016

Nathaniel, sorry for taking a while to reply to your post but I have been working on another project that took priority over this one. Since the last comments I exchanged with Philip 15 days ago, I have had a TAC case open with Cisco and have spent around 10 hours on the phone with them troubleshooting this. We have 13 of these routers, 9 of which are up and in use. Of those 9, 3 are having the issue that, if the router reboots while the tunnel is formed to the ASA, or the 4G does and the tunnel goes down, when the systems come back and are having the problem, looking at the log file shows the following:

May 11 04:03:26.953: %CRYPTO-4-RECVD_PKT_INV_SPI: decaps: rec'd IPSEC packet has invalid spi for destaddr=166.157.75.94, prot=50, spi=0xB7A4AA96(3081022102), srcaddr=166.157.75.1, input interface=GigabitEthernet0

So, there is the SPI you referenced. The engineer from Cisco had asked me to find out from Verizon what the default gateway and network mask are that we should be using with the static IP address they have provided. They responded that since they are using PPP for these networks? addresses?, they don't use a default gateway or mask. I have sent another message to Verizon asking what that IP, 166.157.75.1, which is the Source Address that is sending the IPSEC packet with the invalid SPI. To me, since the 3 routers that are having the problem are all 166.157.75.x (.93, .94, and .95), that .1 address looks like a gateway address for a class C network 166.157.75.0. They have yet to respond. The confusing part is that we have another device in service that is using the IP 166.157.75.99 that has no problems at all, and I have tested 2 others, .96 and .97, and they too have no issues. It's just the 3 I mentioned above that get the invalid SPI packets if they lose their tunnel and then re-connect. And just so you know, I can fix these when they are getting the "invalid SPI" message simply by moving them from one of our VPN ASA's to another (we have 3).

As to your mention of the upgrade to the ASA code, we first experienced this problem when our VPN-C, which is non-production, had the latest recommended code and our 2 production ASAs, VPN-A and VPN-B, were running the 8.4 code. I have since upgraded both VPN-C and VPN-B to the latest recommended code, 9.1(7)6. With this new code on 2 of the ASA and the old code, 8.4(7)23, on the last, the behavior of these 3 devices is the same regardless of which ASA they are connected to.

Philip D'Ath · ‎05-11-2016

This is starting to sound like a service provider issue to me.

If you are really bored, and interesting test would be to swap a working 819H with one that has the issue, and see if the problem follows the device or the circuit.

I'm going to bet it follows the circuit.

Philip D'Ath · ‎04-25-2016

On a different note; that you probably wont want to hear; I would use routers and not firewalls as the head ends for a network like this.

If you did that then you could use DMVPN, which handles so many things automatically. You can also then use dynamic IP addresses for your spokes (much easier to get from a cellular carrier).

DMVPN allows you to do more complex things automatically, such as setting up a mobile command post (or a secondary DR site in case the primary is not usable) which has full and automatic connectivity to everything.

In your case, a pair of 4000 series routers would probably make for a great redundant head end. You would want routers with the "AX" feature set. If you deployed it make sure the head ends have public IP addresses on them with no NAT (makes it ultra reliable).

http://www.cisco.com/c/en/us/products/routers/4000-series-integrated-services-routers-isr/models-comparison.html