cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
488
Views
0
Helpful
6
Replies
Highlighted
Beginner

Nexus 9K OWN_SRCMAC Error

I am seeing a lot of ARP-4-OWN_SRCMAC errors being thrown by a few pairs of Nexus 9K switches that we have deployed. Here are an example of the errors from one pair:

 

Sep 23 14:30:53 sw2 : 2020 Sep 23 19:30:53.206 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.48.3 on Vlan948

Sep 23 14:30:41  sw2: 2020 Sep 23 19:30:41.196 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.48.3 on Vlan948

Sep 23 14:29:31  sw2 : 2020 Sep 23 19:29:31.110 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.48.3 on Vlan948

Sep 23 14:29:02  sw1 : 2020 Sep 23 19:29:02.179 UTC: %ARP-4-OWN_SRCMAC: arp [13905] Received packet with a local source MAC address (0062.ec02.c8bf) from 10.76.63.2 on Vlan963

Sep 23 14:28:39  sw2 : 2020 Sep 23 19:28:39.068 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.63.3 on Vlan963

Sep 23 14:28:09  sw2 : 2020 Sep 23 19:28:09.358 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.63.3 on Vlan963

Sep 23 14:27:15  sw2 : 2020 Sep 23 19:27:15.214 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.8.3 on Vlan908

Sep 23 14:26:35  sw1 : 2020 Sep 23 19:26:35.314 UTC: %ARP-4-OWN_SRCMAC: arp [13905] Received packet with a local source MAC address (0062.ec02.c8bf) from 10.76.63.2 on Vlan963

Sep 23 14:25:57  sw1 : 2020 Sep 23 19:25:57.142 UTC: %ARP-4-OWN_SRCMAC: arp [13905] Received packet with a local source MAC address (0062.ec02.c8bf) from 10.76.63.2 on Vlan963

Sep 23 14:25:55  sw2 : 2020 Sep 23 19:25:55.351 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.63.3 on Vlan963

Sep 23 14:25:45  sw1 : 2020 Sep 23 19:25:45.131 UTC: %ARP-4-OWN_SRCMAC: arp [13905] Received packet with a local source MAC address (0062.ec02.c8bf) from 10.76.8.2 on Vlan908


I have confirmed that vPC and HSRP are properly working for all VLANs.

Primary Switch (sw1):

interface Vlan948

  no shutdown

  no ip redirects

  ip address 10.76.48.2/24

  hsrp version 2

  hsrp 948

    authentication md5 key-chain hsrp-keys

    preempt delay minimum 60

    priority 110

    ip 10.76.48.1

vpc domain 99

  role priority 20

  system-priority 1000

  peer-keepalive destination 10.76.62.2 source 10.76.62.1 vrf VPCKA

  auto-recovery reload-delay 1200

interface port-channel99

  switchport mode trunk

  switchport trunk native vlan 948

  spanning-tree port type network

  vpc peer-link

interface Ethernet1/35

  description Link to sw2 eth1/35 (po99)

  switchport mode trunk

  switchport trunk native vlan 948

  spanning-tree port type network

  channel-group 99 mode active

  no shutdown

interface Ethernet2/35

  description Link to sw2 eth2/35 (po99)

  switchport mode trunk

  switchport trunk native vlan 948

  spanning-tree port type network

  channel-group 99 mode active

  no shutdown

interface Ethernet3/48

  description VPC Keepalive Link

  no switchport

  vrf member VPCKA

  no ip redirects

  ip address 10.76.62.1/30

  no shutdown

 

Secondary Switch (sw2):

interface Vlan948

  no shutdown

  no ip redirects

  ip address 10.76.48.3/24

  hsrp version 2

  hsrp 948

    authentication md5 key-chain hsrp-keys

    ip 10.76.48.1

vpc domain 99

  role priority 200

  system-priority 1000

  peer-keepalive destination 10.76.62.1 source 10.76.62.2 vrf VPCKA

  auto-recovery reload-delay 1200

interface port-channel99

  switchport mode trunk

  switchport trunk native vlan 948

  spanning-tree port type network

  vpc peer-link

interface Ethernet1/35

  description Link to sw1 eth1/35 (po99)

  switchport mode trunk

  switchport trunk native vlan 948

  spanning-tree port type network

  channel-group 99 mode active

  no shutdown

interface Ethernet2/35

  description Link to sw1 eth2/35 (po99)

  switchport mode trunk

  switchport trunk native vlan 948

  spanning-tree port type network

  channel-group 99 mode active

  no shutdown

interface Ethernet3/48

  description VPC Keepalive Link to sw1 eth3/48

  no switchport

  vrf member VPCKA

  no ip redirects

  ip address 10.76.62.2/30

  no shutdown

 

I've been searching around trying to find anything, but I haven't found anything at this point that has corrected this behavior. I found https://www.cisco.com/c/en/us/td/docs/switches/datacenter/sw/routing_messages/reference/7k_rout_mess_ref_book/7k_rout_mess_ref_2mess.html which included the following information/recommended action:

Error Message     ARP-4-OWN_SRCMAC Format: Received packet with a local source MAC
address (%s) from %s on %s
Explanation    There may be a connected router sending packets with local MAC address.

Recommended Action    Check all routers on the interface for a misconfiguration.

 

I have confirmed that there are no other devices that utilize the MAC addresses that we're seeing. I kept searching and I found this information, however, I'm not getting any Spanning-Tree events:

When Spanning-Tree Topology changes occur, NX-OS flushes its ARP cache, because it realizes that particular entries may no longer be valid -- after all, the Layer 2 topology has just changed. As the cache is being rebuilt via gleaning, a bug in the silicon used in the N9K port ASICs causes it, briefly, to forward frames across the vPC Peer Link which, according to vPC rules, it should not. NX-OS then logs the OWN_SRCMAC warning messages as a result. Per TAC, as far as they can tell, this issue is (a) transient, and (b) has yet to cause disruption in customer networks, (c) no software / firmware fix possible, as the bug lives inside silicon. As a result, they typically ignore OWN_SRCMAC messages *if* they can see tight correlation with Spanning-Tree Topology changes.

6 REPLIES 6
Highlighted
Cisco Employee

Hello!

To get started with this, I have a few questions:

  1. What specific line cards are installed in these Nexus 9500 switches?
  2. What specific NX-OS software release is running on both of these Nexus 9500 switches?
  3. Do these syslogs only cover VLANs 908, 948, and 963, or are these syslogs shown with other VLANs as well?
  4. Are these syslogs constantly appearing all the time, or do they only appear in "bursts" with a constant cadence? (e.g. they always happen at around 14:30 switchtime)

Taken very literally, this syslog most likely means that a switch is receiving an ARP packet that it originally sent. In other words, either some device on the network is reflecting packets back into the network, or there is a Layer 2 loop somewhere in the network. Knowing the scope of the problem (meaning, how many VLANs are affected and how often this message appears) as well as the exact platform we're working with (Nexus 9500 line cards and NX-OS software release) will help us get started in isolating and troubleshooting the issue.

Thank you!

-Christopher

Highlighted

I have more than 1 set of switches that are experiencing this issue. I'm doing a deep dive into one pair of them currently to try to determine the issue. Both switches have the same line cards and software versions listed below.

1. What specific line cards are installed in these Nexus 9500 switches?

NAME: "Slot 1", DESCR: "36p 40G Ethernet Module" PID: N9K-X9636PQ

NAME: "Slot 2", DESCR: "36p 40G Ethernet Module" PID: N9K-X9636PQ

NAME: "Slot 3", DESCR: "48x1/10G-T 4x40G Ethernet Module" PID: N9K-X9464TX2

NAME: "Slot 4", DESCR: "48x1/10G-T 4x40G Ethernet Module" PID: N9K-X9464TX2

2. What specific NX-OS software release is running on both of these Nexus 9500 switches?

NXOS: version 7.0(3)I4(4)

3. Do these syslogs only cover VLANs 908, 948, and 963, or are these syslogs shown with other VLANs as well?

This current pair are only reporting on these 3 VLANs. Other switch pairs report on additional VLANs though.

4. Are these syslogs constantly appearing all the time, or do they only appear in "bursts" with a constant cadence? (e.g. they always happen at around 14:30 switchtime)

They're pretty consistent. It isn't happening every single minute but in general I'll have 65-70 events per hour between the pair generally. Depending on the day, I'll see between 850 and 1000 events. I've pulled the amount of events per switch currently for the month of September. The breakdown is 16,030 events for sw1 and 25,346 events for sw2.

Highlighted

Hello!

I appreciate the answers! Let's focus on just this pair of switches right now wherein VLANs 908, 948, and 963 are affected.

Would you be willing to share the output of the below commands from both switches?

terminal width 511
show spanning-tree vlan 908
show spanning-tree vlan 948
show spanning-tree vlan 963
show cdp neighbors

My rationale behind these commands is to identify what interfaces (aside from the vPC Peer-Link) share all three VLANs in common (most likely one or more trunks leading down to distribution or access switches.)

Aside from that, we can also leverage a tool called Ethanalyzer (used for control plane packet captures) to identify the ingress interface of incoming ARP packets that trigger this syslog. I will forewarn you, this can get a little bit complex, so if you feel over your head, I highly recommend opening up a support case with Cisco TAC so that they can troubleshoot this further with you.

First, let's cover how packets are generally forwarded by ASICs. Typically, when an ASIC receives a packet, one of the first steps it must take is to parse that packet so that it understands what the packet is (Ethernet, IPv4, IPv6, etc.). Once the packet is parsed, it has sufficient information to make a forwarding decision on that packet based upon the MAC address table, routing table, the contents of TCAM, etc.

As part of making this forwarding decision, typically a small header is placed on top of the packet containing information important to the ASIC (such as ingress interface, egress interface, internal VLAN ID, traffic class, queuing information, etc.) We colloquially refer to this as a shim header. This shim header is stripped by the switch before the packet is transmitted so as to not "confuse" other Ethernet devices on the network (as they most likely cannot interpret the shim header correctly).

If an ASIC receives a packet that needs to be punted to the control plane/supervisor of the switch, then the packet will still have a shim header on it when it is received by the supervisor's inband interface. When you run Ethanalyzer normally, the shim header is not displayed, as the vast majority of the time, the user is not interested in the data within the shim header. However, in our scenario, the shim header contains ingress interface information that is valuable to us!

Therefore, we need to use Ethanalyzer with the decode-internal keyword, which will allow us to see the shim header (and, therefore, the ingress interface of these ARP packets.) I have included an example from my lab below, wherein an ARP Reply packet from 10.100.0.81 to 10.100.0.82 (which is configured on the local switch) is received by the control plane of the switch. Please note that I have cut out the irrelevant bits of the shim header, as the shim header is quite lengthy - only the "Source Port" element of the shim header is shown below. Also note that Ethanalyzer reports that this is a CPU-inbound packet, which will be important in your environment for distinguishing ingress ARP packets from egress ARP packets.

N9K# ethanalyzer local interface inband decode-internal display-filter arp limit-captured-frames 0 detail
<snip>
CPU-inbound Broadcom RCPU (88650)
    .001 1101 = Source Port: 29    <<<
Ethernet II, Src: 00:de:fb:fb:50:e7 (00:de:fb:fb:50:e7), Dst: e8:65:49:94:80:ff (e8:65:49:94:80:ff)
    Destination: e8:65:49:94:80:ff (e8:65:49:94:80:ff)
        Address: e8:65:49:94:80:ff (e8:65:49:94:80:ff)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Source: 00:de:fb:fb:50:e7 (00:de:fb:fb:50:e7)
        Address: 00:de:fb:fb:50:e7 (00:de:fb:fb:50:e7)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
    Type: 802.1Q Virtual LAN (0x8100)
802.1Q Virtual LAN, PRI: 0, CFI: 0, ID: 4095
    000. .... .... .... = Priority: 0
    ...0 .... .... .... = CFI: 0
    .... 1111 1111 1111 = ID: 4095
    Type: ARP (0x0806)
    Trailer: 000000000000000000000000000000000000
Address Resolution Protocol (reply)
    Hardware type: Ethernet (0x0001)
    Protocol type: IP (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (0x0002)
    [Is gratuitous: False]
    Sender MAC address: 00:de:fb:fb:50:e7 (00:de:fb:fb:50:e7)
    Sender IP address: 10.100.0.81 (10.100.0.81)
    Target MAC address: e8:65:49:94:80:ff (e8:65:49:94:80:ff)
    Target IP address: 10.100.0.82 (10.100.0.82)

At first glance, this Source Port doesn't seem terribly useful - after all, modular switches that run NX-OS do not have a "Port 29"! This is because this integer is the internal port identifier for the ingress interface. We must translate this internal port identifier to a front-panel port that you may recognize through the show hardware internal-mappings command, as shown below:

N9K# show interface hardware-mappings
<snip>
Legends:
       SMod  - Source Mod. 0 is N/A
       Unit  - Unit on which port resides. N/A for port channels
       HPort - Hardware Port Number or Hardware Trunk Id:
       HName - Hardware port name. None means N/A
       FPort - Fabric facing port number. 255 means N/A
       NPort - Front panel port number
       VPort - Virtual Port Number. -1 means N/A
       Slice - Slice Number. N/A for BCM systems
       SPort - Port Number wrt Slice. N/A for BCM systems
       SrcId - Source Id Number. N/A for BCM systems
       MacIdx - Mac index. N/A for BCM systems
       MacSubPort - Mac sub port. N/A for BCM systems

------------------------------------------------------------------
Name       Ifindex  Smod Unit HPort HName FPort NPort VPort SrcId
------------------------------------------------------------------
Eth1/5     1a000800 1    0    29    xe16  255   16    -1    0   

Based upon the above output, we can confirm that the ingress interface for this specific ARP Reply packet is Ethernet1/5. This makes logical sense, as in my lab, Ethernet1/5 owns the 10.100.0.82 IP address that the ARP Reply is targeting!

N9K# show running-config interface Ethernet1/5
<snip>
interface Ethernet1/5
  mtu 9216
  no ip redirects
  ip address 10.100.0.82/30
  no ipv6 redirects
  ip ospf network point-to-point
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown

In your specific environment, we should focus on a single VLAN to make things simple. The switch reports it is receiving ARP packets in VLAN 948 with a source MAC address of 0062.ec02.dbbf. Therefore, you should be able to use the below command to capture ingress ARP packets:

switch# ethanalyzer local interface inband decode-internal display-filter "arp && eth.addr==0062.ec02.dbbf && vlan.id==948" limit-captured-frames 0 detail 

A key thing to note - this will capture ingress and egress ARP traffic. Chances are fairly high that you will capture a number of "false positive" egress ARP packets before you catch a suspect ingress ARP packet that is triggering this syslog. Remember to look for the "CPU-Inbound" text at the top of each shim header - this signifies an ingress ARP packet from an egress ARP packet. I would recommend running the terminal length 0 command to disable terminal paging, logging a PuTTY session to the switch while this command is running for about an hour (or until you next see the syslogs trigger), then analyzing the PuTTY session using a text editor to find any ingress ARP traffic.

I know this is a lot of information, but I hope you find it helpful in troubleshooting this issue! Thank you!

-Christopher

 

Highlighted

Thank you for the level of detail. I found that very helpful.

There are a significant amount of switches in this environment. I've attached txt files which contain the output of the requested commands.

I executed the ethanalyzer command and monitored the syslog entries while it ran. I got a few entries in syslog while I was executing the command. When I reviewed the output of the command, I didn't find any traffic that had a source with the MAC address 0062.ec02.dbbf even though I got the following messages in syslog:

Sep 30 16:31:35 sw2 : 2020 Sep 30 21:31:35.556 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.48.3 on Vlan948
Sep 30 16:30:56 sw2 : 2020 Sep 30 21:30:56.757 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.48.3 on Vlan948
Sep 30 16:30:44 sw2 : 2020 Sep 30 21:30:44.750 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.48.3 on Vlan948
Sep 30 16:30:16 sw2 : 2020 Sep 30 21:30:16.648 UTC: %ARP-4-OWN_SRCMAC: arp [14128] Received packet with a local source MAC address (0062.ec02.dbbf) from 10.76.48.3 on Vlan948

During this window of time, the switch itself did not send out any ARP requests. All captured data showed the MAC/IP as the destination, never the source. I went back through and double-checked every switch in the environment to see if any had this MAC address assigned in any way. The only information I came across was the primary switch in this pair had it via the vPC Peer-Link(R).

Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*  963     0062.ec02.dbbf   dynamic  0         F      F    Po1000
*   18     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
* 3450     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  900     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  904     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  908     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  916     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  932     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  944     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  946     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  948     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
* 3800     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)
*  999     0062.ec02.dbbf   static   -         F      F    vPC Peer-Link(R)

sw2# show mac address-table address 0062.ec02.dbbf
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G    -     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G   18     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G 3450     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  900     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  904     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  908     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  916     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  932     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  944     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  946     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  948     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  960     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  963     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G 3800     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G  995     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
G 999     0062.ec02.dbbf   static   -         F      F    sup-eth1(R)
Highlighted

I continued to work on this issue this morning and I came across a way to reliably reproduce the issue. I ran the ethanalyzer command on both the primary and secondary switches even though the MAC address was only for the secondary switch. I found that I was seeing some ARP requests from sw2 going through sw1 for a switch that had been only connected to sw1 so far. I would then see the reply on sw2 with the far-end switch's MAC address as the source. When I checked syslog, I saw a message around that same time.

I cleared the ARP entry on sw2 for the single-homed switch and noted that I again saw the request on sw1 and the reply on sw2. This also corresponded to a syslog entry at the exact same time.

I checked the hardware-mappings for that source port on sw2 and verified that it came in on the vPC link between sw1 and sw2. As I had mentioned yesterday, sw1 does have the sw2 MAC address stored as a static entry on the vPC Peer-Link(R). It appears that sw1 is using this static MAC address (the address from sw2) to send information along to sw2. This is resulting in sw2 seeing its own MAC address as the source.

Highlighted

Do you have any suggestions on what I could review at this point? It appears that either due to cabling or STP, ARP requests that originate on sw2 and that flow through sw1 see the response MAC address as its own local MAC instead of a MAC address originating from sw1. It appears that this is due to the sharing of MAC addresses on the vPC link. Is there anything I could do to correct this behavior?

Content for Community-Ad