06-03-2019 08:59 AM
Hi all
I have configured HA with CSR1000v as per the following guide (using the v2 instructions with the python scrips in the guest shell)
The problem I have is that the failover does not work correclty
If I reload the primary router, I get the following file created on the secondary router, so it is recognising the failure, but doesn't appear to actually be doign anything with the info:
[guestshell@guestshell events]$ more event.2019-06-03\ 15\:43\:00.261285 Event type is peerFail index 100 routeTableName FlexVPN route 10.0.0.0/8 nextHop 10.192.226.12 resourceGroup <redacted> mode secondary subscriptionId <redacted> cloud azure
Is there a way to troubleshoot this? There is nothing in the way of any useful logging I can find
I have checked the permissions of the VMs on the route table object in azure and it seems fine
Thanks in advance
Solved! Go to Solution.
06-26-2019 08:27 AM
I've had this problem after our CSRs had to be moved to a different Availability Zone. I had to rebuild the CSRs, moved the configuration over 1 for 1, but the HA piece never seemed to kick off.
Everything looked like it was running (azure-ha, waagent and auth-token), but it looks like files and permissions were messed up in the guestshell:
Broken Guestshell (under the /home/guestshell/.local/lib/python2.7/site-packages/csr_azure_ha/client_api folder):
Working Guestshell (same folder):
Notice the additional files and the permissions are all set to the guestshell user.
I found that destroying the guestshell and rebuilding sometimes resolves this. I had to destroy the guestshell a couple of times for one CSR.
Steps:
guestshell destroy guestshell enable guestshell pip install csr_azure_guestshell~=1.1 --user pip install csr_azure_ha~=1.0 --user
I would then create a node and do the verify command, you should see a lot of additional information (API URLs, etc).
I wish this side was documented better.
HTH.
-Aaron
06-26-2019 08:27 AM
I've had this problem after our CSRs had to be moved to a different Availability Zone. I had to rebuild the CSRs, moved the configuration over 1 for 1, but the HA piece never seemed to kick off.
Everything looked like it was running (azure-ha, waagent and auth-token), but it looks like files and permissions were messed up in the guestshell:
Broken Guestshell (under the /home/guestshell/.local/lib/python2.7/site-packages/csr_azure_ha/client_api folder):
Working Guestshell (same folder):
Notice the additional files and the permissions are all set to the guestshell user.
I found that destroying the guestshell and rebuilding sometimes resolves this. I had to destroy the guestshell a couple of times for one CSR.
Steps:
guestshell destroy guestshell enable guestshell pip install csr_azure_guestshell~=1.1 --user pip install csr_azure_ha~=1.0 --user
I would then create a node and do the verify command, you should see a lot of additional information (API URLs, etc).
I wish this side was documented better.
HTH.
-Aaron
07-17-2019 02:49 AM - edited 07-17-2019 02:50 AM
I checked the directory you suggested above and I had a lower than expected number of files too (53 I think in my case)
I destroyed the guestshell and rebuilt .. configured the HA again, and now its working? Very strange...
I also upgraded to CSR1000v version 16.09.03 in the mean time (when I had this issue I was using 16.09.02) so maybe that was also part of the solution to the guest shell problems?
Whatever the cause, it works now! :)
Many thanks for your help
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide