ARP unicasts flooding vlan

rpastor · ‎09-15-2011

Hello,

I hope I'm presenting this correctly and to the right group. I have an unusual problem involving ARP. Specifically, I'm seeing flooded throughout a VLAN in our network an ARP unicast (not broadcast), asking "Who has 10.25.2.74?" and "Tell 10.25.2.9", with both the source and destination mac addresses in the packets. In other words, 10.25.2.9 is asking a question it already has the answer for: 10.25.2.74's mac address.

Noteworthy here is that 10.25.2.74 is a virtual IP address -- representing a couple of Exchange servers -- for which I have entered a routine static arp command on all routers, i.e. "arp n.n.n.n nnnn.nnnn.nnnn.nnnn ARPA". Maybe this is a problem with the device sending out the unicasts (a Backup Exec server). But I thought I'd ask if there is something on the Cisco side that I can configure to alleviate this.

Thank you for any feedback.

R. Pastor

Peter Paluch · ‎09-15-2011

Richard,

Do you believe you could record those ARP pakets using Wireshark and post the capture file here? It would help to verify the contents of the ARP messages and their Ethernet encapsulation.

Best regards,

Peter

rpastor · ‎09-15-2011

Hi Peter,

Pcap attached. Since posting the question, I've started thinking that the problem might be more to do with the server at 10.128.0.9 (I'd changed the addresses in my first post) and its connection to the network. The source mac in the pcap does not match the one in my routers' arp tables for 10.128.0.9. In fact, 10.128.0.9 (BackupExec server) has three teamed NICs, all connected to the core Cat 4510. A "show arp" on the 4510 and other routers displays one of the other teamed NIC's mac as the one associated with 10.128.0.9, not a4ba.db4f.f6b4 (in pcap).

On the 4510, there is no etherchannel set up for those 3 connections to the server at 10.128.0.9. Do you think an etherchannel would correct this situation?

By the way, pings from the server 0.9 succeed to the virtual IP 0.74 (03bf.0a80.004a).

Thank you again,

Richard

Peter Paluch · ‎09-15-2011

Hello Richard,

I am puzzled by seeing so many ARP requests being sent in a short time. Either the sender has gone berserk or this seems like a switching loop.

As you have your NICs teamed on your server, you should almost certainly create an EtherChannel on your switch so that both endpoints treat the port bundle in the same way. However, does your server support LACP NIC teaming? It would be best if you could run LACP on your server so that the EtherChannel creation is correctly negotiated with your switch.

Anyway, if it is possible, would the flood of ARPs stop if you left just a single port of the three teamed ports running, and deactivate or disconnect the remaining two?

By the way, these ARPs seem to be generated by some specific Windows service, as they are directed towards the NLB special MAC address and also the internal target MAC address in the ARP request is already pre-set.

Best regards,

Peter

rpastor · ‎09-16-2011

Peter,

Thank you very much. I did disable two of the three teamed ports and the problem continued. Then I just turned off the machine, and enjoyed the improved throughput for a few hours. Then I had to turn it back on so it could do backups. In the meantime, I noticed another Cisco support thread at https://supportforums.cisco.com/thread/154902 which has some relevance. This I will pursue on Monday and will keep you informed.

Talk to you later.

Thanks again,

Rick Pastor

Peter Paluch · ‎09-16-2011

Rick,

Thank you very much. Please keep me informed, this is an interesting issue.

Best regards,

Peter

andrew.butterworth · ‎09-23-2011

This sounds like a (fairly common?) 'Phantom' MAC address issue. You say you have added static ARP entries on the routers so the routers know where the device is at Layer-3, however does anyone know where the device is at Layer-2? If it is cluster-type application then typically what happens is the ARP reply from the application servers that the service is shared on contains a 'Phantom' MAC address that doesn't exist - the idea being that the traffic to the application is required to be sent to all servers by the switches flooding it because they don't know where the MAC address is (it is never seen so thay can create a Layer-2 forwarding entry). This is by design and typically you should have small subnets for the servers that only exist on one or a couple of switches.

HTH

Andy

rpastor · ‎09-23-2011

I was going to give a hopefully more informative reply after hearing from Symantec tech support, but they havent' figured it out yet. In this situation, the problem is not a general one related to the clustered (Exchange) servers with the virtual mac, but instead is isololated to one server trying to reach the virtual IP. After disabling services, etcon that 10.128.0.9 server, I could say it was a problem with Symantec BackUp exec. Then, I disabled email alerts in that program and the problem abated. The significant point is that their email alerts actually work OK. So it's not like they're not getting to the SMTP server. But after trying to reach the server once, and even after all backup jobs are completed, the backup Exec service keeps flooding the network with crippling "unicast" ARPS -- no one knows why. In Symantec's defense, this may turn out to be a Microsoft problem -- for anyone trying to reach their SMTP servers over NLB's virtual NIC... ..... this is why I got into networking in the first place -- to get away from dealing with all application layer weirdness..

I'll update you when I get an answer, or better resolution. Thanks.

Rick

Michal Gurbski · ‎12-10-2013

Hi,

do you use NIC Teaming? Try to disable this feature.

It can generate flood of arp unicast requests.

http://support.microsoft.com/kb/968703

http://www.confusedamused.com/notebook/broadcom-nic-teaming-and-hyper-v-on-server-2008-r2/

http://serverfault.com/questions/237670/windows-2008-r2-servers-sending-arp-requests-for-ips-outside-subnet

Kind regards,

paul driver · ‎12-10-2013

Hello

Until you find the root cause for this issue and its applicable to you situation, You can apply some suppression on the etherchannel port originating the uncast traffic.

int port-channel xxx

storm-control unicast level x x ( 100 = no trafic storm action)

storm-control action shutdown

sh storm-control unicast

res

Paul

Please don't forget to rate any posts that have been helpful.

Thanks.

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul