cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9307
Views
0
Helpful
27
Replies

CSR HA EEM

lemontree_61089
Level 1
Level 1

Hi,

I am configuring 2 CSR in HA mode within AWS as described in this doc:

http://www.cisco.com/c/en/us/td/docs/solutions/Hybrid_Cloud/Intercloud/CSR/AWS/CSRAWS/CSRAWS_4.html

The issue I have is AWS never receives the API request from the CSR, even if the logs said it has been correctly scheduled.

CSR1#sh event manager history events
No. Job Id Proc Status Time of Event Event Type Name
1 23 Actv success Thu Mar 3 15:56:24 2016 syslog applet: replace-route2
2 24 Actv success Thu Mar 3 15:56:24 2016 application callback: onep event service init
3 25 Actv success Thu Mar 3 15:56:30 2016 syslog applet: replace-route2
4 26 Actv success Thu Mar 3 15:56:30 2016 application callback: onep event service init
5 27 Actv success Thu Mar 3 15:56:33 2016 syslog applet: replace-route2
6 28 Actv success Thu Mar 3 15:56:33 2016 application callback: onep event service init
7 29 Actv success Thu Mar 3 15:56:37 2016 syslog applet: replace-route2
8 30 Actv success Thu Mar 3 15:56:37 2016 application callback: onep event service init
9 31 Actv success Thu Mar 3 15:56:39 2016 syslog applet: replace-route2
10 32 Actv success Thu Mar 3 15:56:39 2016 application callback: onep event service init

Security group and network ACL in AWS allow HTTP and HTTPS traffic. 

I think the issue is coming from the CSR, as I have been able to perform this configuration on a CSR in different AWS VPC.

Those two CSR (the one working and the one not working) are from the same AMI.

However, for some reason, I cannot figure out what's wrong here.

CSR1#sh run | sec event
event manager environment CIDR 172.25.255.128/28
event manager environment RTB rtb-83e97fe6
event manager environment ENI eni-97b4e0de
event manager environment REGION ap-southeast-2/10.4.240.2
event manager applet replace-route2
event syslog pattern "LINEPROTO-5-UPDOWN: Line protocol on Interface Loopback1"
action 1.0 publish-event sub-system 55 type 55 arg1 "$RTB" arg2 "$CIDR" arg3 "$ENI" arg4 "$REGION"

I am not entirely sure what I should check to make sure if the CSR is actually sending the request or no, as it is a bit difficult to take a pcap within AWS.

Any help would be greatly appreciated

EDIT: I have finally been able to take a packet capture while causing an API call. And on this capture I cannot see HTTP or HTTPS traffic, which let me think the CSR is not doing the API request, even if the log said the opposite :(

27 Replies 27

Nicholas Oliver
Cisco Employee
Cisco Employee

Thomas,

What is the version of CSR code you are running?  There was a bug with the API calls not occurring correctly.  The bug ID is:

CSCuw35757 - CSR1000V incorrect API call to AWS

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuw35757/?referring_site=bugquickviewredir

This is resolved in 15.5(3)S2 and 15.6(1)S and later releases.  

As for taking a pcap, it is difficult within the AWS infrastructure, however you can do it with the IOS-XE based Embedded Packet Capture:

http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/epc/configuration/xe-3s/asr1000/epc-xe-3s-asr1000-book/nm-packet-capture-xe.html

It may not be necessary in this case, but for future reference this is a great tool to use.

-Nick

Thanks for your answer Nick.

The firmware version I use is 15.5(3)S1a (that's the one provided by AWS), so I could be hitting that bug. I am going to check if I can upgrade it.

Do you know if the only way to get the new IOS version is get it from Cisco? Or can we get it from AWS?

Thanks again,

Thomas

Thomas,

The release that will contain this fix is 3.16.2, which is not yet on Amazon but has been fixed.  This release should be on Amazon shortly, we're just waiting on the final stages of the approval process for them to publish it.  Right now, if you look at deploying a new instance on Amazon you will see 3.16.1a.S (this is what you have deployed), once this changes to 3.16.2.S you can deploy an instance of that type and it will have the fix for CSCuw35757.

-Nick

Tested it today but failed ->

Test-vPC-A#show version | incl 16
Cisco IOS XE Software, Version 03.16.02.S - Extended Support Release
Copyright (c) 1986-2016 by Cisco Systems, Inc.
Compiled Tue 09-Feb-16 07:03 by mcpre
Cisco IOS-XE software, Copyright (c) 2005-2016 by cisco Systems, Inc.
Test-vPC-A#

Test-vPC-A#show event manager history events
No. Job Id Proc Status Time of Event Event Type Name
1 5 Actv success Sat Mar26 21:40:10 2016 syslog applet: replace-route2
2 6 Actv success Sat Mar26 21:40:10 2016 application callback: onep event service init
3 7 Actv success Sat Mar26 21:41:35 2016 syslog applet: replace-route2
4 8 Actv success Sat Mar26 21:41:35 2016 application callback: onep event service init
5 9 Actv success Sat Mar26 21:42:56 2016 syslog applet: replace-route2
6 10 Actv success Sat Mar26 21:42:56 2016 application callback: onep event service init
7 11 Actv success Sat Mar26 21:43:40 2016 syslog applet: replace-route2
8 12 Actv success Sat Mar26 21:43:40 2016 application callback: onep event service init
9 13 Actv success Sat Mar26 21:49:22 2016 syslog applet: replace-route2
10 14 Actv success Sat Mar26 21:49:22 2016 application callback: onep event service init
Test-vPC-A#
*Mar 26 21:50:22.384: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed, ld:4097 neigh proc:CEF, handle:1 act
Test-vPC-A#conf t
Enter configuration commands, one per line. End with CNTL/Z.
Test-vPC-A(config)#event manager environment ENI eni-dac174b1
Test-vPC-A(config)#
*Mar 26 21:50:57.337: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 1.1.1.2 (Tunnel0) is up: new adjacency
*Mar 26 21:50:57.355: %BFD-6-BFD_SESS_CREATED: BFD-SYSLOG: bfd_session_created, neigh 1.1.1.2 proc:EIGRP, idb:Tunnel0 handle:1 act
*Mar 26 21:50:58.282: %BFDFSM-6-BFD_SESS_UP: BFD-SYSLOG: BFD session ld:4097 handle:1 is going UP
Test-vPC-A(config)#
*Mar 26 21:51:03.473: %BFDFSM-6-BFD_SESS_DOWN: BFD-SYSLOG: BFD session ld:4097 handle:1,is going Down Reason: ECHO FAILURE
*Mar 26 21:51:03.473: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed, ld:4097 neigh proc:EIGRP, handle:1 act
*Mar 26 21:51:03.475: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 1.1.1.2 (Tunnel0) is down: BFD peer down notified
*Mar 26 21:51:03.475: %HA_EM-6-LOG: replace-route2: Will change route now
*Mar 26 21:52:03.474: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed, ld:4097 neigh proc:CEF, handle:1 act
Test-vPC-A(config)#
Test-vPC-A(config)#
Test-vPC-A(config)#

EEM Applet :

!
event manager environment RTB rtb-824519XX
event manager environment CIDR 0.0.0.0/0
event manager environment REGION eu-central-1/192.168.0.2
event manager environment ENI eni-dac174XX
event manager applet replace-route2
event syslog pattern "\(Tunnel0\) is down: BFD peer down notified"
action 0.5 syslog msg "Will change route now"
action 2.0 publish-event sub-system 55 type 55 arg1 "$RTB" arg2 "$CIDR" arg3 "$ENI" arg4 "$REGION"
!

IAM Role is attached, CSR SG is allowed for HTTP/HTTPS

 I redeployed all routers with recent AMI and the API call works now. 

I am also having issues getting this to work.  We have deployed this in the same eu-central-1 region and are using the latest Marketplace AMi:

Cisco IOS XE Software, Version 03.16.02.S - Extended Support Release
Cisco IOS Software, CSR1000V Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 15.5(3)S2, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2016 by Cisco Systems, Inc.
Compiled Tue 09-Feb-16 07:03 by mcpre

I have configured the EEM script to be able to be run manually:

event manager environment RTB rtb-xxxxxx
event manager environment CIDR 0.0.0.0/0
event manager environment REGION eu-central-1/10.1.0.2
event manager environment ENI eni-xxxxxxx
event manager applet replace-route2
event none
action 1.0 syslog msg "Will change route now"
action 2.0 publish-event sub-system 55 type 55 arg1 "$RTB" arg2 "$CIDR" arg3 "$ENI" arg4 "$REGION"
!

I performed a packet capture on the CSR.  You can see DNS queries go out against 10.1.0.2 for ec2.eu-central-1.amazonaws.com but then nothing else happens.  I would expect a HTTPS call to then be performed calling the replace route API.  We never see this happen.  

Does anyone have any ideas on how to troubleshoot on the CSR for this?  I have turned on EEM and onep debugs but there isn't a whole lot there.

Hi all,

we're hitting a strange behavior too.

we saw that the API calls work only when the instance is created. after a shut-down/restart, API calls stop working. we should terminate and re-create router from blank ami to give it back to work.

why?

e-giraudo1 - 

I was reviewing the bug toolkit.  It seems like this might be a good fit for the issue you are experiencing:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuy96961

Just a heads up this fixed our problem and was provided in the above bug toolkit workaround:

virtual-service csr_mgmt
ip shared host-interface GigabitEthernet1
  no activate

<wait wait wait>

virtual-service csr_mgmt
ip shared host-interface GigabitEthernet1
  activate

I am now seeing a new issue -- when we reload the router BFD support is unavailable.  

router(config)#int t0
router(config-if)#bfd ?
% Unrecognized command

Hello

Regarding the API calls failing, this is a known issue when the CSR is rebooted and is using a DHCP address under the interface instead of a statically assigned address. If you use statically assigned IP address the issue is gone. Alternatively using this EEM script will restart the mngmnt container agent that generates the API calls after a reboot when using ip dhcp under the interface

event manager applet restart-csrmgmt

 description "disable/enable csr_mgmt after Gi1 gets DHCP IP address"

 event syslog pattern "%DHCP-6-ADDRESS_ASSIGN: Interface GigabitEthernet1"

 action 1.0 syslog msg "%dCloud-vPodGW: Starting eem restart-csrmgmt"

 action 1.1 cli command "enable"

 action 1.2 cli command "config terminal"

 action 1.3 cli command "virtual-service csr_mgmt"

 action 1.4 cli command "no activate"

 action 1.5 syslog msg "%dCloud-vPodGW: Disabled csr_mgmt and waiting for 10 seconds"

 action 1.6 wait 10

 action 1.7 cli command "activate"

 action 1.8 syslog msg "%dCloud-vPodGW: Enabled csr_mgt and exiting eem restart-csrmgmt"

 action 1.9 cli command "end"

The BFD feature not being present sounds like an issue with licensing. Today BFD requires the AX license

It also happens with  15.5(3)S2 and static ip assigned in IOS.

You need to stop and start virtual-service first. :(

Hi all,

I still have the same issue in Version 03.16.02.S unfortunately.

We can see the CSR saying it performed the API call:

CSRSYD1-PROD#sh event manager history events
No. Job Id Proc Status Time of Event Event Type Name
 service init
5 36 Actv success Mon May 9 14:55:02 2016 application callback: onep event service init
6 35 Actv success Mon May 9 14:55:02 2016 syslog applet: replace-route2

But we do not see any API request from AWS. Even if the IP addresses are statically configured on the device, I tried to disable/enable the virtual management container but that did not change anything. Same issue on 2 CSR on the latest AMI :(

Any help would be greatly appreciated

Thomas

Are you sure that you attached the IAM Role to the CSR Instance correctly ?