Solved: Peter,

crazyman143 · ‎05-22-2017

Hi,

Hoping someone can help me with our switch. We are new to the nexus series and don't know what we're doing.

We have two PCs connected to ports 1 and 3 in vlan 50.

Port 16 is a trunk connected to another switch, which has vlan 50 and a SVI on vlan 50. It should act as default gateway for the PCs.

PCs can't ping each other, although they're in the same vlan.
PCs can't ping the gateway (10.100.50.1) over the trunk.
The SVI on this switch (10.100.50.2) is reachable anywhere on our network
PCs can reach the SVI on this switch (10.100.50.2) but nothing else.

Can someone see why PCs cannot reach each other or their gateway? I assume I'm missing something simple.

Config attached.

This is a Nexus 3172T w/ PID: N3K-C3172TQ-10GT

NXOS: version 7.0(3)I2(2b)

FEATURE LAN_BASE_SERVICES_PKG

Thank you!

Peter Paluch · ‎05-23-2017

Hi,

Well, this certainly explains something.

Did you perhaps upgrade your system lately? The "default class-maps" you are mentioning likely correspond to CoPP - Control Plane Policing, and between different NX-OS versions, there can be significant differences. We know about issues where these changes gone wrong can lead to impacted connectivity.

I have to mention that CoPP only protects the CPU of the switch, and it is not concerned with transit traffic - at least it should not be. However, the matching rules for CoPP are programmed into TCAM - and if the programming fails, it can obviously start impairing transit traffic as well.

If you can afford to bring this switch out of operation for some 10 minutes, then this is what I strongly recommend:

Back up the running-config. However, from this backup, remove everything that is related to CoPP - the class-maps, the ACLs, the copp-system-policy, etc. Ideally, keep only those things you know you have added yourself to a factory-default configuration.
Erase the startup-config using "write erase"
Reload the switch without saving anything.
After the switch reloads, perform the initial configuration. As a part of initial setup dialog, the switch will generate a vanilla CoPP configuration.
After this, restore your running-config from the backup - remember to skip the CoPP-related configuration.

If this procedure is not applicable then you might want to run the "setup" command which will go through the common initial settings as if you had erased and reloaded the switch, and reapply them. This might restore the CoPP settings, but frankly, I do not consider it reliable, and I very much prefer the more radical approach above.

Would this be applicable for you? Please let me know.

Best regards,
Peter

View solution in original post

Peter Paluch · ‎05-22-2017

Hello,

Hmmm, let's see.

PCs can't ping each other, although they're in the same vlan.

This might be due to the firewall setting on the PCs. Can you try completely disabling the firewall on the PCs and try pinging them again? If that does not work, are the PCs at least able to populate their ARP caches with their mutual IP/MAC bindings?

PCs can't ping the gateway (10.100.50.1) over the trunk.

That would suggest that the packets are not being passed over the trunk. Can the N3K ping 10.100.50.1? If not, can we verify that both ends on the trunk link are statically configured as trunk? Note that Nexus switches do not support DTP and cannot negotiate a dynamic trunk unlike Catalyst switches, so if the other end is a Catalyst, we need to double-check if its port is statically configured as a trunk.

The SVI on this switch (10.100.50.2) is reachable anywhere on our network

Can you clarify what you mean by "is reachable"?

PCs can reach the SVI on this switch (10.100.50.2) but nothing else.

This would mean that the PCs can actually successfully communicate with the switch. If the problem with the PCs unable to ping each other is resolved by disabling the firewall on them, then we need to focus on the trunk connectivity.

Looking forward to hearing from you!

Best regards,
Peter

crazyman143 · ‎05-22-2017

Thanks for replying Peter. I appreciate your help. To follow up:

Can you try completely disabling the firewall on the PCs and try pinging them again? If that does not work, are the PCs at least able to populate their ARP caches with their mutual IP/MAC bindings?

Firewalls are disabled.
CAM table shows the PC's mac addresses on their respective ports.
ARP cache shows their IP addresses and MACs as well.

The PCs can ping 10.100.50.2, but no each other. However ping succeeds when I plug them into a TP-Link dumb switch.

Can the N3K ping 10.100.50.1? If not, can we verify that both ends on the trunk link are statically configured as trunk?

Yes, the N3K can ping 10.100.50.1 on the upstream switch over the trunk, but the PCs cannot.

Can you clarify what you mean by "is reachable"?

Reachable just meaning ping. The upstream switch is running in L3 and participates in EIGRP, so I can ping 10.100.50.2 on the N3K from elsewhere in the enterprise, but cannot ping the PCs connected to it.

Peter Paluch · ‎05-22-2017

Hello,

This is starting to be really interesting.

May I ask you to post the complete output of the all following commands from the Nexus? Please first replace the "PC1" and "PC2" below with the proper IP addresses of the two PCs, then paste the complete list of commands into the CLI of your N3K switch, and capture the complete output and post it here. Thank you!

terminal length 0
show version
show module
show inventory
ping 10.100.50.PC1
ping 10.100.50.PC2
show ip arp 10.100.50.PC1
show ip arp 10.100.50.PC2
show mac address-table
show hardware mac address-table 1
show interface e1/1
show interface e1/3
show interface e1/1 counters errors
show interface e1/3 counters errors
show hardware internal interface e1/1 asic counters
show hardware internal interface e1/3 asic counters
slot 1 show hardware internal interface indiscard-stats instance 0 asic-port 1
slot 1 show hardware internal interface indiscard-stats instance 0 asic-port 3
show spanning-tree vlan 50 detail
terminal no length

Best regards,
Peter

crazyman143 · ‎05-23-2017

Ok, I'm uploading a file with the outputs. A couple of the commands returned errors.

The "slot 1 show hardware...." commands didn't work.

hopefully you will see what you need. Thanks again!

Peter Paluch · ‎05-23-2017

Hi,

I am concerned about the Forward RxDrops reported for e1/1 and e1/3. A couple of suggestions:

Try repeating the pings from the PCs, and watch the output of show hardware internal interface e1/X asic counters - try to see if the increase in the Forward RxDrops corresponds to the number of pings sent from the pinging PC.
You have mentioned that the slot 1 commands did not work - perhaps due to a different syntax. Can you please try if these ones would work?

slot 1 show hardware internal interface indiscard-stats front-port 1
slot 1 show hardware internal interface indiscard-stats front-port 3
For purposes of testing, have you tried to shut/no-shut the interfaces? Also, what would happen if you created a totally new VLAN and put e1/1 and e1/3 into it? Could then the two PCs ping themselves? (If you test this, wait for at least 30 seconds time after putting the ports into the new VLAN due to RSTP).

Thanks!

Best regards,
Peter

crazyman143 · ‎05-23-2017

Peter,

I think I've just had a light bulb moment. Or, at least a "DUH!" moment.

I re-ran your tests and indeed, the "RxDrop" and "ACL Drop" counters incremented in tandem with the pings.

It hit me when I saw "ACL Drop." I wondered what ACL could exist that is blocking the traffic?

All of the default class-maps that are defined at the top correspond to ACLs that didn't exist. I guess they got removed somehow. I created one for ping that permitted "any any" and viola. Pings succeed.

So I guess now I just need to figure out how to either remove the class maps, or recreate the default ACLs.

EDIT: Read up on CoPP. I guess the default class maps can't be deleted. I re-ran the setup command it and recreated the default ACLs. I think we're in business now. Thanks for all your help!!!!

Peter Paluch · ‎05-23-2017

Hi,

Well, this certainly explains something.

Did you perhaps upgrade your system lately? The "default class-maps" you are mentioning likely correspond to CoPP - Control Plane Policing, and between different NX-OS versions, there can be significant differences. We know about issues where these changes gone wrong can lead to impacted connectivity.

I have to mention that CoPP only protects the CPU of the switch, and it is not concerned with transit traffic - at least it should not be. However, the matching rules for CoPP are programmed into TCAM - and if the programming fails, it can obviously start impairing transit traffic as well.

If you can afford to bring this switch out of operation for some 10 minutes, then this is what I strongly recommend:

Back up the running-config. However, from this backup, remove everything that is related to CoPP - the class-maps, the ACLs, the copp-system-policy, etc. Ideally, keep only those things you know you have added yourself to a factory-default configuration.
Erase the startup-config using "write erase"
Reload the switch without saving anything.
After the switch reloads, perform the initial configuration. As a part of initial setup dialog, the switch will generate a vanilla CoPP configuration.
After this, restore your running-config from the backup - remember to skip the CoPP-related configuration.

If this procedure is not applicable then you might want to run the "setup" command which will go through the common initial settings as if you had erased and reloaded the switch, and reapply them. This might restore the CoPP settings, but frankly, I do not consider it reliable, and I very much prefer the more radical approach above.

Would this be applicable for you? Please let me know.

Best regards,
Peter

crazyman143 · ‎05-23-2017

Peter- Yes, you are absolutely right.

Like I said, we are new to the Nexus OS. I just read up on CoPP and I understand whats going on here now.

The default ACLs which are part of the CoPP config were completely removed.

Fortunately this isn't a production switch, we are just testing it. I went ahead and re-ran the setup and once all the ACLs were created again, we were back in business.

Thanks so much for all your help.

Need help with NXOS cisco switch