Where are all these smart

Will Hrudey · ‎03-14-2016

Hi All:

I have initially posted my issue in the Cisco small business forums as it pertains to a pair of stacked SG500 switches. However, without any such replies, I am broadening the audience scope in hopes of someone who has encountered this issue before I have or perhaps suggestions on how to resolve this. That said, I fully recognize I am posting in the enterprise grade routing and switching forum.

Nonetheless, I have a pair of SG500-28s stacked using 5G stackwise cables running firmware rev 1.4.1 in a remote datacenter. I have a number of application (WS2012R2) with quad port NICs (configured as NIC teams) which have their ports equally distributed across both physical switches of this logical switch stack.

On the switch side, each of the 8 physical server's 4 port NIC connections distributed across the two physical switches are configured as trunked port channel which carries a handful of VLANs.

I am finding that any given two physical servers can't ping each other on the **same** VLAN. Each server can ping the respective interface VLAN IP on the switch yet the host system fail.

Configuration is as follows:

Physical server #1 (virtual Switch Mgmt i/f in host): 172.16.11.41/24 (WS2012 R2)

Physical server #2 (virtual Switch Mgmt i/f in host): 172.16.11.42/24 (WS2012 R2)

SwitchStack #show run int vlan 11

interface vlan 11

ip address 172.16.11.1 255.255.255.0

Physical server #1 switch config:

interface gigabitethernet1/1/1

no mdix

channel-group 1 mode auto

interface gigabitethernet1/1/2

no mdix

channel-group 1 mode auto

interface gigabitethernet2/1/1

no mdix

description HPv01

channel-group 1 mode auto

interface gigabitethernet2/1/2

no mdix

channel-group 1 mode auto

#show run int po1

interface Port-channel1

switchport trunk allowed vlan add 2-4,10-13

switchport trunk native vlan 99

Physical server #2 switch config:

interface gigabitethernet1/1/3

no mdix

channel-group 2 mode auto

interface gigabitethernet1/1/4

no mdix

channel-group 2 mode auto

interface gigabitethernet2/1/3

no mdix

channel-group 2 mode auto

interface gigabitethernet2/1/4

no mdix

channel-group 2 mode auto

interface Port-channel2

switchport trunk allowed vlan add 2-4,10-13

switchport trunk native vlan 99

#show int po1

Load balancing: src-dst-mac-ip.

Gathering information.....

Channel Ports

------- -----

Po1 Active: gi1/1/1-2,gi2/1/1-2

#show int po2

Load balancing: src-dst-mac-ip.

Gathering information.....

Channel Ports

------- -----

Po2 Active: gi1/1/3-4,gi2/1/3-4

SwitchStack#ping 172.16.11.41

Pinging 172.16.11.41 with 18 bytes of data:

18 bytes from 172.16.11.41: icmp_seq=1. time=0 ms

18 bytes from 172.16.11.41: icmp_seq=2. time=0 ms

18 bytes from 172.16.11.41: icmp_seq=3. time=0 ms

18 bytes from 172.16.11.41: icmp_seq=4. time=0 ms

----172.16.11.41 PING Statistics----

4 packets transmitted, 4 packets received, 0% packet loss

round-trip (ms) min/avg/max = 0/0/0

Switchstack#ping 172.16.11.42

Pinging 172.16.11.42 with 18 bytes of data:

18 bytes from 172.16.11.42: icmp_seq=1. time=0 ms

18 bytes from 172.16.11.42: icmp_seq=2. time=0 ms

18 bytes from 172.16.11.42: icmp_seq=3. time=0 ms

18 bytes from 172.16.11.42: icmp_seq=4. time=0 ms

----172.16.11.42 PING Statistics----

4 packets transmitted, 4 packets received, 0% packet loss

round-trip (ms) min/avg/max = 0/0/0

And of course the physical servers can ping 172.16.11.1 (Switch VLAN interface). Yet the physical servers can't each other on that same VLAN (11).

What am I missing ? I have done this configuration before on enterprise level stacked C3750's and WS2012R2 hosts without issue so I think it must be something to do with these SG500 firmware and/or CLI config pertaining to default VLANs ??? What else could it be ? This is L2 switched traffic between two nodes with the intervening trunks simply being carried over L2 etherchannels. Both trunked ether channels between the two WS2012 R2 nodes are configured to carry vlan 11. The switchstack ARP table looks good. I have confirmed in WS2012R2 that the virtual switch bound to the 4port NIC team is tagging VLAN 11.

Any ideas?

/wh

Will Hrudey · ‎03-16-2016

Where are all these smart CCIE's hiding ?

Jason Dance · ‎03-16-2016

Hi.

I've never worked with that switch line before. Out of curiosity, does a non-trunked etherchannel bundle work? How about a single interface configured as a trunk with no etherchannel?

Regards,

Jason

Edit: Also, what happens if you use trunked etherchannel ports on only one switch?

Will Hrudey · ‎03-22-2016

After further characterization, this issue is dynamic in nature. Specifically, pinging the other physical server's hyper-V management partition (VLAN4) from another similar machine yields intermittent success. Actually more failure than success … meaning that the first round of 4 pings works and then I go back and try the exact same ping and it fails even though there has not been any config changes. This immediately makes me think ARP cache given its purely switching logic within the VLAN - so I sniffed all the interfaces on all the application servers and observed the VLAN specific ARP request (broadcast) and the ARP response (unicast L2) without issue yet the ping fails. I even went as far as to put static ARP entries across all the servers without success. (and yet I triple checked that Server2012 firewall was disabled). There are no VLAN ACLs or protected ports. The behavior is VERY puzzling.

The only success I had was if I pull out all the NIC ports out of the Server 2012 R2 NIC team and just configure 1 NIC port on Server 2012 (basically bind a virtual switch to it and configured management IP on VLAN4 and then configure that single Ethernet port on the SG500 switch stack as trunk with the exact same trunk configuration I had applied to the aggregated LACP port channel, then it works fine (but that is a ridiculously handicapped solution). I then tried endless different incremental config changes and I was forced to upgrade to the latest firmware on the switch stack (I was one minor rev below the latest (1.4.1) but stayed on that version because it had a published MIB for that version on the Cisco Software download page). After I upgraded the switch stack firmware (1.4.2) I was able to get as far as running 4 ports in a NIC team on Server2012 in switch INDEPENDENT mode and configured each of the NIC ports on the switch stack as a trunk with the same trunk config I would have applied to the Etherchannel. While I still can't definitively prove it yet, this all points to the SG500 firmware. While the 1.4.2 release notes noted some Etherchannel fixes, they were relating to fiber links not copper. I still do NOT like this configuration as switch outbound traffic to the application server will not be load-balanced and I would envision be dictated by the switch stack CAM table which is indirectly driven by the outbound traffic load balanced scheme on the NIC team within Server2012.Using an etherchannel configuration (static or dynamic) would allow additional load-balancing from the switch stack perspective.

When looking back at my trunked multi-chassis Ethernet channel design, since my physical servers are fitted with 4 port NICs, by design, two ports from a given NIC port connect to one physical switch in the stack, and the other 2 ports connect to the other switch in the stack. (2 node stack). I went as far as shutting down the switch stack interfaces in a way that isolated specific physical paths on the switch stack. Meaning in one case, I shut down the appropriate ports so that server outbound traffic would be steered across the port channel to a specific switch and the target end system on another port channel had its ports shut down on the other switch so the traffic would come in on po1 and leave on po2 on the SAME physical switch in the stack. This resulted in no change in the results. I then did the inverse where outbound server traffic would come in on Po1 on one switch and leave to Po2 on the other switch in the stack. No change either.

Thus, while I have successfully achieved this configuration using enterprise-grade C3750's, this does not work as it should with these SG500s

Specifically, I am convinced this is a port channel issue on the SG500 switches. Any thoughts or related experiences ? Is there a Cisco Small business tech on this forum to refute or confirm my observations ??

Thanks,

/wh

Shaunak · ‎03-16-2016

Hi Will,

Can you disable the software firewall and software IPS on the servers and then test?

Also, check of the servers can resolve each other's MACs via the ARP process.

Thanks,

Shaunak

Etherchannel - VLAN switching issue across port channels