cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2668
Views
26
Helpful
14
Replies

Cisco hyperflex edge deployment - duplicate IP address issue

nikhil93
Level 1
Level 1

Dear Team,

 

Hope you are all doing fine. 

 

I am trying to deploy a 2 node hyperflex cluster using intersight. I created all the required policies, validated the configuration and start to deploy. During deployment it gets fails at 29% giving me an error mentioned below. 

" Ip address 169.254.1.2 already in use. Please verify there are no duplicate IPs. "

 

I understand that by default, the HX Installer automatically assigns IP addresses in the 169.254.1.X range, to the Hypervisor Data Network and the Storage Controller Data Network. This IP subnet is not user configurable.

 

But I checked my whole network and I am not seeing this IP anywhere. Not able to ping also.

 

Please help resolving this issue as the deployment is pending from long time.

 

Best Regards,

Nikhil A Satpute

1 Accepted Solution

Accepted Solutions

RedNectar
VIP
VIP

Hi @nikhil93 ,

[Edit: 2022.08.08] - Just came across this issue again today.  The problem came up because the Storage Data VLAN had another device attached - a Catalyst L3 switch with an assigned IP address assigned. Shutting down the VLAN interface made teh probelm go away on the next Retry 

RedNectar
[/End Edit]


The error you see:

image.png

indicates to me that the VLAN you have chosen for your data network has some other device on it.

IT SHOULD BE GLOBALLY UNIQUE - no firewall interfaces, no other devices on that VLAN. It is strictly there for the SCVMs to be able to exchanage data.

The Intersight install process will allocate IPs from 168.254.1.x for the Data subnet (sorry @malkovich_david - for an Intersight install, IP addresses NOT statically assigned on the Data VLAN) but still check to see if there is a duplicate.

But the thing that worries me more than anything else is that if this is a 2 node install, the 10Gb/s interfaces of the two nodes (where the Data VLAN lives) should be connected back-to-back, so NO other device should see the IP checking ARPs.

Are the nodes cabled back-to-back on the 10Gb/s interfaces?

 

 

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

View solution in original post

14 Replies 14

malkovich_david
Level 1
Level 1
If you see a 169.254. x.x address, it means the systems NIC's are set to look for DHCP and did not find a DHCP server. Hyperflex should be using Static IP so Check and make sure you put in static IP, subnet mask and gateway IP

Hello David,

 

Thanks for your valuable response.

 

It holds good for standard hyperflex cluster deployment and in 3 or 4 node edge deployment but in 2 node edge deployment we do not have option to provide the static IP for data networks.

 

As per Cisco documentation, these data networks will be assigned a 169.254.x.x APIPA IP range automatically by cisco hyperflex installer.

 

Best Regards,

Nikhil A Satpute

Hello @malkovich_david 

 

Thanks for your valuable response.

 

It holds good for standard hyperflex cluster deployment and in 3 or 4 node edge deployment but in 2 node edge deployment we do not have option to provide the static IP for data networks.

 

As per Cisco documentation, these data networks will be assigned a 169.254.x.x APIPA IP range automatically by cisco hyperflex installer.

 

Best Regards,

Nikhil A Satpute

gkumark
Cisco Employee
Cisco Employee

Hi Nikhil,

 

From what you mentioned, you don't seem to have that IP in use anywhere in your network. I am suspecting, this could be a proxy response from one of the interfaces in your router. The installer does a check of this 169.254.x.x IP using the management interface during the initial validations. If you have a Cisco catalyst device or a ASA firewall, there are high chances one of these are responding as a proxy to these tests.

 

You can either disable the proxy ARP specifically on your router or firewall or to workaround and move forward, easy way would be to create a rule to drop all traffic on the management VLAN towards 169.254.x.x network. This should let you continue with the installation. 

 

Hope that helps!

-Ganesh

Hello @gkumark 

 

Thanks for your valuable response,

 

I tried applying ACL deny on asa firewall to all the traffic destined to 169.254.1.2/32, but it did not work. I also found these IPs are being used on the ASA for the internal data network communication just like Hyperflex. 

 

I was hesitant to put an ACL deny on the whole 169.254.x.x network as my hyperflex is required that IP.

 

Any workarounds can be done? Can we assign a unique vlan for the hyperflex esxi data network and controller VM data networks?

 

Looking forward for your valuable response.

 

Best Regards,

Nikhil A Satpute

RedNectar
VIP
VIP

Hi @nikhil93 ,

[Edit: 2022.08.08] - Just came across this issue again today.  The problem came up because the Storage Data VLAN had another device attached - a Catalyst L3 switch with an assigned IP address assigned. Shutting down the VLAN interface made teh probelm go away on the next Retry 

RedNectar
[/End Edit]


The error you see:

image.png

indicates to me that the VLAN you have chosen for your data network has some other device on it.

IT SHOULD BE GLOBALLY UNIQUE - no firewall interfaces, no other devices on that VLAN. It is strictly there for the SCVMs to be able to exchanage data.

The Intersight install process will allocate IPs from 168.254.1.x for the Data subnet (sorry @malkovich_david - for an Intersight install, IP addresses NOT statically assigned on the Data VLAN) but still check to see if there is a duplicate.

But the thing that worries me more than anything else is that if this is a 2 node install, the 10Gb/s interfaces of the two nodes (where the Data VLAN lives) should be connected back-to-back, so NO other device should see the IP checking ARPs.

Are the nodes cabled back-to-back on the 10Gb/s interfaces?

 

 

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Hi @RedNectar 

Hope you are doing great !!
Thanks for the response.

Let me address to your response 1 at a time.

1. Regarding usage of globally unique vlan for the data traffic between
nodes : During the installation from hyperflex, we do not get any option to
specify the Vlan. Do we have any other way to assign a unique vlan which is
nowhere used on my network.

2. Regarding Other devices using the APIPA address. : When checked the
network, we found ASA is using these ip address for internal data
communication. The same process has happened on ASA also I guess.

Now I did put ACL on firewall to drop all traffic distined to 162.254.1.2/32
but I get the same error.

I am stuck with the same error message after multiple attempts.

Best Regards,
Nikhil A Satpute

Hi @nikhil93 ,

Firstly, if you celebrate Christmas - Merry Christmas! (If not, have a great day anyway)

I have seen this error before - but I can't remember where - I have a sneaking suspicion it was on a dCloud lab - I reported it and about a week later the labs were fixed. (But that may have been another problem that was fixed. My memory has a slow leak)

My memory also failed me to recall that you did not assign a Data VLAN during a 2 Node install (although I suspected this might be the case), so it seems that the installer process is sending ARP requests via the vmk0 (management/vmnetwork) interface rather than vmk1 or vmk2 (data/vmotion) - which is a bug. Probably just sending the ARP requests on the default VLAN given there is no VLAN defined for the Data VLAN.

If possible, what I'd try next (apart from calling TAC) is disconnecting the uplinks from the 1G switich or shutting down the ports onthe uplink switch if you have to work remotely.  The CIMC will of course need to remain connected because that's where the Intersight connection comes in.

image.png

 

 

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Greetings.

The check that is done, is a https/443 test against 162.254.1.2

You will need to get TAC involved, but the easiest way to track down who is generating a response from 162.254.1.2 is a tcpdump session in the CIMC (again you will need TAC assistance).

I've had a few cases for this issue, and was usually fixed by blackholing the traffic (on router/L3 device)by routing it to a null interface `ip route 169.254.1.0 255.255.255.0 Null 0`

 

Kirk...

Hi Kirk,

 

Thanks for your valuable inputs. We have hold the deployment as for now. I will update once done.

 

Regards,

Nikhil A Satpute

jeffbilbro
Level 1
Level 1

Hi Nikhil-

 

We are trying to deploy a new 3-node Hyperflex Edge cluster and are receiving the same error:

 

IP address: 169.254.1.2 is already in use. Please verify there are no duplicate IPs

 

Can you tell me how you were able to solve this?

 

 

Thanks!

-Jeff

Hi jeff,

 

Hope you are doing good !!

 

In my case, it was a lab setup so before we do some deep dig into the issue, we got POC call and we had to shift it to the customer place. It worked fine there.

 

But keep in mind that this happens only when there is an ASA FW in your environment because ASA uses 162.254.1.0 network assigned to some internal interface for the infrastructure communication. 

 

You have to find a way on how you will mask those internal IPs.

 

Already some solutions have been given by our good friends here in this thread. Please follow the same.

 

Best of luck. Do post your valuable inputs.

 

Regards,

Nikhil

Hi Jeff,

As I understand it, the problem arises because the installer gets the ESXi (Hyperflex) host to send an ARP request for 169.254.1.2 on ALL interfaces, not just the hx-storage-data VLAN.

If there is a router or firewall attached to one of the other hx VLANs (hx-inband-mgmt or hx-vm-network) then it might reply - my understanding is that this is particularly likely if there is an ASA firewall involved.

If you have access to the gateway device(s) on those VLANs, it seems the solution is to (as @Kirk said above) blackhole the traffic (on router/L3 device)by routing it to a null interface

ip route 169.254.1.0 255.255.255.0 Null 0

If you DON'T have access - then I noticed something the other day that MIGHT help.

In Intersight, after you have created your Profile, go to the CONFIGURE > Policies section

Using the 3 dots on the right, edit the cluster-nodeconfig-policy

Click Next

You'll get a screen where you'll be able to use a different IP range that (hopefully) will not get a response from the recalcitrant firewall/router

As I said - I have NOT tried this. If you DO try it - PLEASE let me know the result. If it works I'll edit my original answer to include this addditional information.

image.png

RedNectar aka Chris Welsh.
Forum Tips: 1. Paste images inline - don't attach. 2. Always mark helpful and correct answers, it helps others find what they need.

Thanks Chris and Nikhil!

 

I was able to find the device that was responding to this - a router on our network from our parent company.  Despite not having access to it, we do have an ASA 'between' the HX cluster and the router so we were able to put a deny ACL in place for now to get the deployment moving along.  

 

Thanks again!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: