cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
433
Views
2
Helpful
9
Replies

C9800-40-K9 version 17.9.3 HA issue

cncs-vn
Level 1
Level 1

Hello!
I'm using device C9800-40-K9 version 17.9.3. Two devices are connected directly back-to-back with HA port redundancy. I chose to run HA using the RMI+RP option, but last week, I encountered a strange error. That is, when the physical redundant port (RP port) is shut down, the virtual management interface (RMI) is also shut down immediately. The standby wlc then immediately switches to standby recovery.

It stands to reason that RMI has a dual-active detection function and is configured as a virtual interface to back up when the RP physical port goes down. But what is strange is why when the RP physical port goes down, the RMI port also goes down immediately.

Can anyone explain this to me? Thanks

9 Replies 9

Max Jobs
Level 1
Level 1

Hi buddy..
In Cisco Catalyst 9800 Series Wireless Controllers, the RMI and Redundant Port (RP) feature is used for HA configurations. In this setup, the RMI serves as a virtual management interface that allows communication between the active and standby controllers, while the RP provides physical redundancy for this communication.

When the physical RP port goes down, it should not cause the RMI to immediately shut down. The RMI should remain active on the active controller to maintain communication between the active and standby controllers.

However, it's possible that there may be a misconfiguration or a bug causing this behavior. Here are a few potential reasons why the RMI might be going down when the RP port is shut down:

Misconfiguration: Double-check the configuration of the HA setup, including the RMI and RP settings, to ensure that everything is configured correctly.

Software Bug: There could be a software bug in the specific version of the C9800 software you're using. It's possible that this bug causes the RMI to go down when the RP port is shut down. Consider upgrading to a newer version of the software that may have addressed this issue.

Platform Limitation: Some platforms may have limitations or specific behaviors regarding HA configurations. Check the documentation or consult with Cisco support to determine if there are any platform-specific considerations for your setup.

Interface Dependency: In some cases, the RMI may be dependent on the physical RP port for its operation. If the RP port goes down, it could trigger a failover event or cause the RMI to go down as well. Review the documentation and configuration options to understand any dependencies between these interfaces.

Logs and Diagnostics: Check the logs and diagnostic information on the controllers to see if there are any error messages or events related to the RMI going down when the RP port is shut down. This can provide valuable insight into the root cause of the issue.

If you're unable to resolve the issue with the information provided, consider reaching out to Cisco TAC (Technical Assistance Center) for further assistance. They can help troubleshoot the problem and provide guidance on resolving it.

Thanks for the reply brother
Misconfiguration: Double-check the configuration of the HA setup, including the RMI and RP settings, to ensure that everything is configured correctly.

The IP I set is as follows:
Pri wlc :
interfaceVlan1
IP address 10.221.128.115 255.255.255.0 secondary
IP address 10.221.128.113 255.255.255.0

Secondary wlc:
interfaceVlan1
IP address 10.221.128.116 255.255.255.0 secondary
IP address 10.221.128.114 255.255.255.0

I have checked many times and found no configuration errors. Can you please help me check the pictures and configuration information again? Thanks bro



@Max Jobs that looks suspiciously like a ChatGPT style robot answer.  Your personal experience and knowledge is welcome here but if you're just regurgitating robot answers which people could have got themselves then that's really not very helpful here!

Rich, thanks for your message. I think we all have a common goal, which is to help each other and improve the knowledge base. And of course respect each other! I don't know why writing a formal text should form in someone's mind that a non-human intelligence must have written it. I hope that the efforts of all of us in this environment will only lead to each other's growth, not discouragement.

Absolutely - are you saying that was not a robot generated answer?

Max Jobs
Level 1
Level 1

As far as I know, configuration seems to be OK.

What is the version? It's possible that the behavior you're experiencing is due to a software bug in the specific version (17.9.3) of the controller software. Check the release notes for any known issues related to HA, RMI, or RP functionality. If a bug is identified, consider upgrading to a newer software version that addresses the issue.

I'm currently using c9800-40-k9 version 17.9.3. I suspect this version is a bug, add the rmi port to the physical rp port. Therefore, when the physical rp port goes down, the rmi port also goes down at the same time.
I'm not sure if updating the version will solve the problem. Because the system is currently running, maintenance is a big problem.
I think i should open a tac case and consult the Cisco technical team.

Thank you for your above reply

marce1000
VIP
VIP

 

                                 >...Can anyone explain this to me? Thanks
       - Have a checkup of the (primary) controller configuration using the CLI command show tech wireless and feed the output into :                           Wireless Config Analyzer
           Note that the tasks above do not impact a running (HA) controller setup and are safe to execute ; and is  a 'must do' in all circumstances.

          Below are a series of commands useful for HA-SSO troubleshooting ; important ones are highlighted 
show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show logging process stack_mgr internal to-file bootflash:

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Rich R
VIP
VIP

It's always best to make sure you are running the TAC recommended code version as per the link below - currently 17.9.4a + APSP8

Have you been through https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/220277-configure-high-availability-sso-on-catal.html to make sure you've done all the right things in the right order?  Also refer to the best practices guide (link below).

Now just to clarify - you're shutting down the port your vlan 1 is on for your test (and vlan1 is also your RMI port so RMI must logically go down too when vlan 1 is down)?
If so, then that config change will be instantly synchronised to the secondary WLC which then also won't be able to reach the default gateway and then I'd expect it to go into recovery as you've observed.

Also good to make sure you have the latest chassis firmware installed:
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/config-guide/b_upgrade_fpga_c9800.html
The TAC recommended doc also suggests the versions you should be running and recommends latest.

Review Cisco Networking for a $25 gift card