cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1027
Views
0
Helpful
11
Replies

Cisco Expressway E "Cluster is in a Partitioned State"

fdharmawan
Level 4
Level 4

Hi Community Members,

Today I just realized that the Expressway E in our deployment is on partitioned state. On the home page, I got "This Expressway cluster is in a partitioned state. Do not make any configuration changes until the cluster is operating normally. See Clustering page." message.

On my deployment, I have 1 publisher and 1 subscriber. On thae clustering page, the peer state is "Clustering: Failed (Connection unexpectedly lost)" on both nodes. On the alarms page, I got several messages:

  1. Unable to establish a TCP connection with "expressway-pub" on ports 4371,4372. I have checked with test-net connection on powershell, both nodes are reachable on 4371 but not on 4372. Should I worry about this?
  2. The database is unable to replicate with one or more of the cluster peers. 
  3. An unexpected software error was detected in python[30586]: unknown reason. I have checked on Maintenance > Diagnostics > Incident reporting > View, it came back with the following messages:File "/share/python/site-packages/ni/utils/filesystem/monitor.py", line 99, in process_file_change
    File "/share/python/site-packages/ni/utils/filesystem/monitor.py", line 115, in _notify_observer
    File "/lib64/python2.7/threading.py", line 736, in start
    error: can't start new thread

All the messages above is not on the same day, but happened in the last couple of months. Last thing I did to the system was to upgrade to 12.5.4 back in May 2022. The earliest log message is in June 2022.

I am about to update the SSL cert of the expressway but given the warning, I am afraid to change the current config. Do you guys have any suggestion?

1 Accepted Solution

Accepted Solutions

First thing I would recommend you to do is to restart all the nodes in your E cluster, starting with the master node and then the rest one by one. About the certificate, has it expired or is it still valid? If it has expired it could depending on your cluster configuration have an effect on if the cluster can form and be in an operational state. If this is the case I would suggest that you renew it.



Response Signature


View solution in original post

11 Replies 11

First thing I would recommend you to do is to restart all the nodes in your E cluster, starting with the master node and then the rest one by one. About the certificate, has it expired or is it still valid? If it has expired it could depending on your cluster configuration have an effect on if the cluster can form and be in an operational state. If this is the case I would suggest that you renew it.



Response Signature


Hi Roger,

The certificate is valid but about to expire next month. So I am currently assessing the existing deployment.

So based on your suggestion, I need to restart the nodes first, starting with the publisher, correct?

That’s what I would do. Please note that the first node in an Expressway cluster is not referenced as the publisher, the terminology is master or first node if I’m not mistaken.



Response Signature


Hi Roger,

How do I check the node is acting as master? I do not see any information showing so. I did some googling as well, the results was not helpful.

On the clustering page you define what node is the master node. The standard setting for this is “1”.



Response Signature


Hi Roger,

On both nodes, current value is set to '1'. So in this case, both acting as master?

No, all notes in the Expressway cluster will have the same number as the master node. If you were to have different numbers on each node you’d not have a coherent view of what node is the cluster master.



Response Signature


regarding point number, if i remeber correctly there is bug.

 

what versions of expressway you use. If you have doubt about certificates, you can make them permissive.



Response Signature


Hi Nithin,

I'm currently running on X12.5.4.

If I change the settings to permissive, what kind of change that is expected? I read some online resources, what I can conclude is that it will make the clustering communication is less secure.

Permissive is absolutely not as secure as Enforced. With enforced the name of the node needs to be present in the SAN of the certificate, otherwise the cluster will not form. Having it set to Permissive the cluster can form even if the name of the node isn’t listed in the certificate. The normal recommendation is to get the cluster to form in Permissive mode before you set it to Enforced.



Response Signature


fdharmawan
Level 4
Level 4

Restarting all the nodes solved the issue. Thank you Roger