DNAC Node -2 not joining to DNAC Node - 1 Cluster - Page 2

akhil kamalakaran · ‎07-06-2024

While re-imaging and trying to join DNAC Node -2 to DNAC Node - 1 Cluster getting failed & getting below error. Anyone got similar issue? TAC suggesting re-imaging, we did multiple time. But issue still existing.

Software Version = 2.3.5.5

Hardware Model = 44-core appliance: Cisco part number DN2-HW-APL

Error Log mentioning below

post_reboot : Fix DNS Nameservers to use node-local DNS for addon nodes
post_reboot : Fix netplan config file to drop un-necessary dns on each nodes

IntraCluster IP Address = 192.168.123.0/24

Node-1 IP = 192.168.123.11

Node-2 IP = 192.168.123.12

Node-3 IP = 192.168.123.13

ERROR:etcd.client:Request to server https://192.168.123.11:4001 failed: MaxRetryError(u'HTTPSConnectionPool(host=u\'192.168.123.11\', port=4001): Max retries exceeded with url: /v2/keys/maglev/config/node-192.168.123.11?sorted=true&recursive=true (Caused by ReadTimeoutError("HTTPSConnectionPool(host=u\'192.168.123.11\', port=4001): Read timed out. (read timeout=1)",))',)
WARNING:root:[Attempt 3] Connection to etcd failed due to MaxRetryError(u'HTTPSConnectionPool(host=u\'192.168.123.11\', port=4001): Max retries exceeded with url: /v2/keys/maglev/config/node-192.168.123.11?sorted=true&recursive=true (Caused by ReadTimeoutError("HTTPSConnectionPool(host=u\'192.168.123.11\', port=4001): Read timed out. (read timeout=1)",))',). Retrying in 4 seconds...
sudWARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host=u'192.168.123.11', port=4001): Read timed out. (read timeout=1)",)': /v2/keys/maglev/config/node-192.168.123.11?sorted=true&recursive=true

maflesch · ‎07-19-2024

Yes, I'm familiar with the TAC case that is open. Part of the issue here is how NIC bonding is being defined. TAC is looking into it.

Giordano Lucci · ‎03-26-2025

Hi, I've been having the same issue, did you get any fix at all? I've been waiting for the Cisco BU team aswell but its taking long for them to answer. @akhil kamalakaran

akhil kamalakaran

akhil kamalakaran · ‎03-26-2025

Hi @Giordano Lucci

DNAC Appliance connected to Cisco ACI switches? And are you using LACP bundling mode?

Giordano Lucci · ‎03-26-2025

Hi @akhil kamalakaran , no they are not connected to Cisco ACI switches, and no we are not using LACP mode as we use a single link for enterprise and cluster links. On the switch side we have configured an access port on the dedicated vlan.

akhil kamalakaran · ‎03-26-2025

Actually, our issue was that we connected our DNAC appliance to an ACI switch using LACP, with vPC configured on the switch side. However, after consulting with TAC, we learned that our DNAC software version does not support connectivity to an ACI switch with LACP mode. As a result, we moved our DNAC appliance to a Catalyst switch environment, which resolved the issue.

@Giordano Lucci, what error are you encountering while clustering DNAC nodes?

Additionally, please ensure that the DNAC appliance has proper connectivity to the AD, DNS, Gateway, and NTP servers. Double-check that all connections are intact and that NTP is properly synchronized.

Giordano Lucci · ‎03-26-2025

Lets say our main error seems connected to the etcdctl service, as soon as we install the secondary node, the cluster breaks down and nothing seems to work anymore. We get this error "ERROR:etcd.client:Request to server https://xxx.xx.xx.x:4001 failed" everytime we enter our first node CLI. After doing some research here and there seems like the etcdctl service is not working/has been shutdown on the secondary node. As you can see in the image, it looks like there is no etcdctl service running(this is on our secondary node).

Yes we did check any connectivities, also tried to re-image all 3 clusters, but nothing seems to work. We are waiting for a response by the Cisco BU team.

maflesch · ‎03-26-2025

You are correct. As part of the add-on of an additional node, the new node attempts to reach out to the existing cluster and establish etcd membership. Once that happens, if etcd fails, both nodes go down as etcd expects quorum, which requires 2 or more members at this point. I am curious as to why etcd is failing and then the container isn't even spinning. I'll check the case.

Giordano Lucci · ‎04-01-2025

Hi @maflesch , did you have by any chance managed to take a look at my case? As we are still waiting, any addiotional information would be crucial.

Best regards

SIB9 · ‎03-26-2025

Hi

maflesch · ‎03-26-2025

Catalyst Center (formerly known as Cisco DNA Center) does support connecting to ACI with LACP. Who said it isn't?

Giordano, what is the SR number that is opened for this issue so I can review it.

Giordano Lucci · ‎03-26-2025

@maflesch I've sent you a PM with the SR number.

Thanks

akhil kamalakaran · ‎03-26-2025

Hi @maflesch ,

Please check this SR too: 697614337