01-28-2020 07:37 AM - edited 01-28-2020 08:03 AM
Does anyone have more information about this bug. There are no usable notes, there are no workaround notes. It is marked as fixed but under the Known Fixed Releases: section there are no documented fixed versions.
I am seeing this alert, and getting a bunch of emails, on this issue after restarting the services on the active member. the secondary member took over but receiving this critical alert over 400 times the last 24 hours. I am currently running CMX 10.6.2-66.
04-23-2020 06:29 AM
There is a documented fix in 10.6(2.72) listed now. However, installed that yesterday and got my servers paired up within HA. Within 5.5 hours it broke again with this error.
Wed Apr 22 2020 4:21:44 PM | Primary Active | Successfully enabled high availability. Primary is syncing with secondary. |
Wed Apr 22 2020 9:52:25 PM | Primary Active | Redis check failed for master. Attempt to restart redis |
Wed Apr 22 2020 9:52:27 PM | Primary Active | Redis check failed for master even after a restart of the agent |
Wed Apr 22 2020 9:52:28 PM | Primary Failover Invoked | Attempting to failover to secondary. Reason: Redis check get writeable failed for port: 6383Redis check get writeable failed for port: 6383 |
Then it successfully failed over to the secondary server.
04-23-2020 09:16 AM
04-23-2020 11:17 AM
They have a section in the release notes you may want to try:
Tip To clean up long queues and long running processes, we recommend that you schedule a full restart of Cisco CMX once a month during a low activity time, such as late at night or early in the morning. You either can manually restart Cisco CMX or can apply the root patch and create a scheduled CRON job to restart Cisco CMX. The restart takes approximately 5 minutes to complete.
1.
To restart Cisco CMX services, follow these steps: Enter the cmxctl stop -a command.
2. Enter the cmxctl start -a command.
I don't see how that will help given that mine didn't last longer than 6 hours before it failed over. I've had to rebuild my appliances at least 4 times due to issues like the following:
1. Hard disk size doesn't mach up - HA pairing failed. This is even after I confirmed with TAC the sizing was identical.
2. Can't login via ssh with any username/password. Fails auth when secondary is active after an extended period of time
3. Some appliances failed HA pairing until i swapped the roles between primary and secondary
4. API becomes inaccessible after a period of time.
I used to think the MSE was really bad, but these appliances take the issues to another level of bad.
05-19-2020 08:03 AM - edited 05-19-2020 08:05 AM
I have an update on this. I had a TAC case with Cisco to determine why we were having these issues. The engineer mentioned that they have seen performance issues when the unique client count approaches 90,000. We had between 140,000 and 120,000 unique clients. You can run the following to see what your unique client counts are:
To determine the number of unique clients SSH to the actve box with cmxadmin:
shell
su
cd /opt/cmx/var/log/location
grep -i "unique device" server* | grep 2020-05 (Note: Grep whatever dates your are looking for. Example shows May 2020)
You will see an output similar to this:
server-1.log:2020-05-01T05:00:00,001 [pool-64-thread-1] INFO com.cisco.mse.location.intf.ElementCounters - Cleaning up element counts, unique devices 30308, locally administered macs 289 as partof daily midnight job
In our case we also adjusted the minimum number of detecting AP from 1 to 2 as well as setting our RSSI cutoff to -75. We felt okay changing the min # of detecting AP to 2 because we have a dense deployment. To change the minimum # of detecting AP you perform the following steps from command line SSH to the active box:
shell
su
curl -X POST -H "Content-Type: application/json" -d '{"minapwithvalidrssi”:2 }' http://localhost/api/config/v1/filteringParams/1
Hopefully this helps.
08-02-2020 10:04 AM
This is extraordinarily good info, thank you very much.
The only problem is:
From Cisco CMX Release 10.5.0, you must install the root patch to access root user account.
And no updates to fix any of these redis related bugs (CSCvs86719, CSCvu10693) since April. I'm getting the feeling Cisco is abandoning CMX for DNA Spaces, which is unfortunate since not all customers want their user location data sent to a cloud service.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide