cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9307
Views
0
Helpful
27
Replies

CSR HA EEM

lemontree_61089
Level 1
Level 1

Hi,

I am configuring 2 CSR in HA mode within AWS as described in this doc:

http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/Intercloud/CSR/AWS/CSRAWS/CSRAWS_4.html

The issue I have is AWS never receives the API request from the CSR, even if the logs said it has been correctly scheduled.

CSR1#sh event manager history events
No. Job Id Proc Status Time of Event Event Type Name
1 23 Actv success Thu Mar 3 15:56:24 2016 syslog applet: replace-route2
2 24 Actv success Thu Mar 3 15:56:24 2016 application callback: onep event service init
3 25 Actv success Thu Mar 3 15:56:30 2016 syslog applet: replace-route2
4 26 Actv success Thu Mar 3 15:56:30 2016 application callback: onep event service init
5 27 Actv success Thu Mar 3 15:56:33 2016 syslog applet: replace-route2
6 28 Actv success Thu Mar 3 15:56:33 2016 application callback: onep event service init
7 29 Actv success Thu Mar 3 15:56:37 2016 syslog applet: replace-route2
8 30 Actv success Thu Mar 3 15:56:37 2016 application callback: onep event service init
9 31 Actv success Thu Mar 3 15:56:39 2016 syslog applet: replace-route2
10 32 Actv success Thu Mar 3 15:56:39 2016 application callback: onep event service init

Security group and network ACL in AWS allow HTTP and HTTPS traffic. 

I think the issue is coming from the CSR, as I have been able to perform this configuration on a CSR in different AWS VPC.

Those two CSR (the one working and the one not working) are from the same AMI.

However, for some reason, I cannot figure out what's wrong here.

CSR1#sh run | sec event
event manager environment CIDR 172.25.255.128/28
event manager environment RTB rtb-83e97fe6
event manager environment ENI eni-97b4e0de
event manager environment REGION ap-southeast-2/10.4.240.2
event manager applet replace-route2
event syslog pattern "LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback1"
action 1.0 publish-event sub-system 55 type 55 arg1 "$RTB" arg2 "$CIDR" arg3 "$ENI" arg4 "$REGION"

I am not entirely sure what I should check to make sure if the CSR is actually sending the request or no, as it is a bit difficult to take a pcap within AWS.

Any help would be greatly appreciated

EDIT: I have finally been able to take a packet capture while causing an API call. And on this capture I cannot see HTTP or HTTPS traffic, which let me think the CSR is not doing the API request, even if the log said the opposite :(

27 Replies 27

Yes, I double checked and the IAM role has been attached to the instance following Cisco documentation (http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/Intercloud/CSR/AWS/CSRAWS/CSRAWS_4.html)

I found the issue, the virtual service was configured on the inside interface instead of the outside interface. It works fine after correcting it.

Thanks!

Sorry for replying on old thread.

But can someone help me understand how this will work ?

"When triggered, the EEM applet will use the AWS API ec2-replace-route command to modify the VPC route table to make itself the new target for the default route"

I am unable to understand this action. where it is defined to call an API ec2-replace-route 

action 1.0 publish-event sub-system 55 type 55 arg1 $RTB arg2 $CIDR arg3 $ENI arg4 $REGION

I suppose some magic is happening under the hood, but want to to know how. any guidance will be appreciated.

Ahmad,

The "how" is kind of a black box given that it is an API call to Amazon.  The details about this "replace-route" call can be found here:

http://docs.aws.amazon.com/cli/latest/reference/ec2/replace-route.html

In the past it was necessary to maintain a Linux instance that the CSR would connect to, and execute this API call, however with the addition of "sub-system 55 type 55" in the CSR it is possible to accomplish this through the CLI in the EEM Applet itself.  Documentation about how to configure the CSR to make this possible can be found here:

http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/Intercloud/CSR/AWS/CSRAWS/CSRAWS_4.html

Hope that helps,

Nick

Hi Nick,

thank you for explaining this.

I am aware of the replace-route aws API , so my assumption is that ""sub-system 55 type 55"" this command configured in it, and it execute automatically?

Can i see what is actually configured with in "sub-system 55 type 55" config ?

or its something embeded in the code that i cant see? like atleast show run didnt give any clue of replace-route API. so i think this API is encapsulated in "sub-system 55 type 55" ? can we view this?

Hi Team,

It appears I'm running into a similar issue as above. Here's what I'm running:

Cisco IOS XE Software, Version 03.16.04a.S - Extended Support Release
Cisco IOS Software, CSR1000V Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 15.5(3)S4a, RELEASE SOFTWARE (fc1)

When simulating a failover on CSR A, CSR B appears to be making a call, but nothing is happening. 

CSR-B#sh event manager history events
No. Job Id Proc Status Time of Event Event Type Name
1 1 Actv success Tue May16 15:34:29 2017 syslog applet: replace-route2
2 2 Actv success Tue May16 15:34:29 2017 application callback: onep event service init
3 3 Actv success Tue May16 16:18:07 2017 syslog applet: replace-route2
4 4 Actv success Tue May16 16:18:07 2017 application callback: onep event service init

I tried no activate and activate on the CSR magmnt interface, but that doesn't seem to do anything. 

Here's the config I'm running:

event manager environment RTB rtb-8ad
event manager environment ENI eni-eed
event manager environment CIDR 10.100.0.0/16
event manager environment REGION us-west-2/10.250.250.2
event manager applet replace-route2
event syslog pattern "%BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed, ld:4097 neigh proc:EIGRP, handle:1 act"
action 1.0 publish-event sub-system 55 type 55 arg1 "$RTB" arg2 "$CIDR" arg3 "$ENI" arg4 "$REGION"

virtual-service csr_mgmt
ip shared host-interface GigabitEthernet1
activate
!

I'm using the ax license, spun up the CSR with API role, allowing http/https from outside. May have missed a step, any help would be great.

Thanks!

Hi,

Have you checked your DNS settings? Make sure you can resolve FQDN.

Also make sure that the route to your DNS server is using the interface configured under "virtual-service csr_mgmt".

Thomas

Great, thanks Thomas. Set up DNS ip name server pointing to 8.8.8.8 for testing purposes and it works. Will try failover again and reply.

Thanks

Hi Team,

I'm able to resolve FQDN:

CSR-A#ping ec2.amazonaws.com
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 54.239.29.8, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 91/91/91 ms

Simulated another failover but no go. I also added  http/https connections from 54.239.29.8

CSR-B# sh event manager history events
No. Job Id Proc Status Time of Event Event Type Name
1 7 Actv success Tue May16 18:01:04 2017 syslog applet: replace-route2
2 8 Actv success Tue May16 18:01:04 2017 application callback: onep event service init
3 9 Actv success Tue May16 18:31:37 2017 syslog applet: replace-route2
4 10 Actv success Tue May16 18:31:37 2017 application callback: onep event service init
5 11 Actv success Tue May16 19:13:18 2017 syslog applet: replace-route2
6 12 Actv success Tue May16 19:13:18 2017 application callback: onep event service init
7 13 Actv success Tue May16 19:21:23 2017 syslog applet: replace-route2
8 14 Actv success Tue May16 19:21:23 2017 application callback: onep event service init
9 15 Actv success Tue May16 19:22:10 2017 syslog applet: replace-route2
10 16 Actv success Tue May16 19:22:10 2017 application callback: onep event service init

Kind of stuck on this one.

Thanks

I guess next step now is to do a packet capture to see if anything is actually happening when you trigger the failover. You should see DNS and HTTPS traffic.

Also, make sure to configure NTP on the CSR, I had the issue before that the failover was not working anymore because of lack of NTP.

Moreover, doublechek your NACL and SG to make sure you allow HTTPS and DNS. Remember that NACL are stateless.

Thomas

Hello,

It's working now. I enabled DNS resolution on my VPC and added the correct DNS server to the ECM script, I fat fingered an incorrect one. Thanks for your help guys!

Here's a checklist:

create CSR Role and policy and apply to EC2

Follow ECM script (add correct DNS server :) )

Create tunnel and Routing protocol of choice

enable DNS on router

enable DNS on VPC

enable NTP

Allow DNS and HTTP on SG

If I missed anything please add along!

Thanks again Thomas!

Fabian

Hi all,

thanks a lot for your support.

it works very well in our environment.

thanks again, we saved us a lot of time!

Enrico

Thomas,

Just wanted to give you a quick update.  The version 15.5(3)S2 (3.16.2.S), which contains the fix for this issue is now posted to AWS.  You can now deploy that and should be able to configure and get HA to work within AWS.

-Nick