cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5371
Views
0
Helpful
3
Replies

getting ARP to shut up (disabling active ARP cache refreshing?)

Marc Luethi
Level 1
Level 1

Hi all!

I've been cracking my head over this for more than a day now - either it's not possible or I am missing something very obvious.

Background: Troubleshooting/analyzing an issue where the 1st ARP broadcasts not forwarded by a switch when the ARPing host's Src MAC is not yet in the CAM table. Yes, there is a veery complicated and looooong TAC case for this, still ongoing, and it's not directly related to this post.

Goal 1: to simulate multiple hosts (simple, TCP/IP enabled).  Idea: use one VRF with one routed port for each pseudo host, connected to different ports/ASICs of the problematic switch.

Goal 2: the ARP cache of these pseudo hosts should behave like simple operating system's ARP cache; entries older than a few minutes should age out, so that the pseudo host is forced to ARP-Broadcast for another host in the same subnet.

Test Rig:

[nota: we don't mean to reproduce the original ARP-broadcast-is-lost-problem here. This is just to see if one VRF equals one host actually works.]

  • Cat3560 (c3560-ipservicesk9-mz.122-55.SE8)
  • six VRFs ("HOST1", "HOST2",... "HOST6")  with one "no switchport" each.
  • one VRF "ROUTER", with one "interface vlan 53" and six "switchport access vlan 53"
  • the six "no switchports" are cabled to six "switchport access vlan 53" on the same box

    int f0/11

     description * to f0/21*

     no switchport

     ip vrf forwarding HOST1

     ip address 10.33.53.11 255.255.255.0

     arp timeout 300

    int f0/12

     description * to f0/23*

     no switchport

     ip vrf forwarding HOST2

     ip address 10.33.53.12 255.255.255.0

     arp timeout 300

    int f0/13

     description * to f0/23*

     no switchport

     ip vrf forwarding HOST3

     ip address 10.33.53.13 255.255.255.0

     arp timeout 300

   

     [...]

  • to emulate the problematic switch, we have (on the same Cat3560):

    int range f0/21-26

     switchport mode access

     switchport access vlan 53

     spanning-tree portfast

     switchport nonegotiate

     int vlan 53

      ip vrf forwarding ROUTER

      ip address 10.33.53.254 255.255.255.0

  • no routes (yet) for any of the VRFs, all just happening in everyone's directly connected subnet 10.33.53.0/24. (Yeah right, at a later stage, .254 will become the default gateway for the hosts, but not just yet...)

Observation:

  • After a reboot, the ARP caches of the "HOST"-VRFs are empty.
  • Once I trigger some traffic to 10.33.53.254, I get proper ARP resolution.
  • with arp timeout 300, every 4 minutes, I can see that the HOST-VRFs are ARPing for 10.33.53.254.
  • with arp timeout 240, every 3 minutes, I can see that the HOST-VRFs are ARPing for 10.33.53.254.
  • with arp timeout 180, every 2 minutes, I can see that the HOST-VRFs are ARPing for 10.33.53.254.
  • with arp timeout 120, every 1 minute, I can see that the HOST-VRFs are ARPing for 10.33.53.254.
  • with arp timeout  60, every 1 minute, I can see that the HOST-VRFs are ARPing for 10.33.53.254. 
  • with arp timeout  30, every 1 minute, I can see that the HOST-VRFs are ARPing for 10.33.53.254. 

In the latest case, the "host's" ARP cache is not empty during the 30 seconds after "arp timeout".

I thought that this might have to do with CEF and ARP and their inner "connectedness" (as in

http://blog.ipspace.net/2007/05/what-is-cached-cef-adjacency.html ) - but you can't disable CEF on a 3560. 

Therefore, I tried on a 3845 router with no VRFs and CEF disabled, and a (real) host with wireshark connected to gig0/0.  It's the same thing - a given amount of time before arp timeout occurs, the Router ARPs for the entries it has in its ARP cache

clear arp-cache interface fast0/11 and clear arp interface fast0/11 have another effect: it not only causes every interface to re-arp for it's cache entries, it also causes it to broadcast gratuitious ARPs to its neighbors.

The behavior is identical when i swap the "no switchport"  for a combination of vlan xyz & interface vlan xyz & switchport access vlan xyz.

Then I found https://supportforums.cisco.com/thread/2131395 .. from which I take that it is common operating procedure for a Cisco device to send ARP requests to maintain its cache.


So you actually can't "clear" the ARP cache - it repopulates itself instantly.

Questions:

a) why? What is the rationale for active ARP cache maintenance? No host seems to be doing it - why should a router?

b) Can I disable this "feature", and if yes  - how?

best regards and thanks a lot for your thoughts and ideas.

Marc

PS: yes, we've also been thinking "Why don't we get a handful of Raspberry Pi and use those?". A Cisco with it's debugging capabilities seemed a more obvious choice.

1 Accepted Solution

Accepted Solutions

Peter Paluch
Cisco Employee
Cisco Employee

Hi Marc,

You've got some great questions at hand!

a) why? What is the rationale for active ARP cache maintenance? No host seems to be doing it - why should a router?

A normal router would not need to do this. However, CEF-based platforms are different. As you know, CEF precomputes the frame rewrite information - basically, it prepares the frame headers beforehand, so if any packets arrive that are to be routed towards or via a particular neighbor, the frame header for that neighbor is already prepared. This way, much of the daunting work when doing routing table lookups and ARP table lookups is already done, and the results are reused. This speeds up routing significantly.

In order to deliver packets to directly connected stations using CEF (which is what the multilayer switch always tries to do - otherwise, delivering packets to directly connected stations would fall back to process switching which could easily overload the CPU), the switch continuously needs to have proper ARP entries available to keep the adjacency table up-to-date. If a directly connected station is up, the multilayer switch assumes it may receive packets for that station at any time, so to avoid process switching, it repeatedly ARPs for this station. If it is up, it will respond, and the switch will keep the rewrite information prepared in the CEF's adjacency table. If it does not respond then it is most probably down or disconnected, and the rewrite information can be removed.

b) Can I disable this "feature", and if yes  - how?

I do not think it is possible, and with the explanation above, I am not sure if that was a wise thing to do.

Feel welcome to ask further!

Best regards,

Peter

View solution in original post

3 Replies 3

Peter Paluch
Cisco Employee
Cisco Employee

Hi Marc,

You've got some great questions at hand!

a) why? What is the rationale for active ARP cache maintenance? No host seems to be doing it - why should a router?

A normal router would not need to do this. However, CEF-based platforms are different. As you know, CEF precomputes the frame rewrite information - basically, it prepares the frame headers beforehand, so if any packets arrive that are to be routed towards or via a particular neighbor, the frame header for that neighbor is already prepared. This way, much of the daunting work when doing routing table lookups and ARP table lookups is already done, and the results are reused. This speeds up routing significantly.

In order to deliver packets to directly connected stations using CEF (which is what the multilayer switch always tries to do - otherwise, delivering packets to directly connected stations would fall back to process switching which could easily overload the CPU), the switch continuously needs to have proper ARP entries available to keep the adjacency table up-to-date. If a directly connected station is up, the multilayer switch assumes it may receive packets for that station at any time, so to avoid process switching, it repeatedly ARPs for this station. If it is up, it will respond, and the switch will keep the rewrite information prepared in the CEF's adjacency table. If it does not respond then it is most probably down or disconnected, and the rewrite information can be removed.

b) Can I disable this "feature", and if yes  - how?

I do not think it is possible, and with the explanation above, I am not sure if that was a wise thing to do.

Feel welcome to ask further!

Best regards,

Peter

Thanks Peter for the enlightening answer.

In the meantime, based on your explanation, we've been using this ARP refreshing of a router to our advantage to solve/hide/mask our original problem.

The devices that suffered from the effect on the "problematic switch" are now being "ARPd" every 3 minutes by their default gateway (in extenso: their distribution L3-switch). This keeps the CAM tables of their respective switchports alive.

Thanks a lot!

Marc



Marc,

It has been a pleasure. Thank you!

Best regards,

Peter

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: