cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
536
Views
0
Helpful
7
Replies

Weird CSS/3550 VLAn problem...STP related?

ajlessard
Level 1
Level 1

Hi, all...

I have a pair of CSS load balancer each connected to a pair of cross-connected 3550 switches (I'm big on redundancy :)), and I'm using STP for redundancy, with the bridge priority such that the first nCSS is the root. STP seems configured fine, and works well, with the cross-connect between 3550s blocked, as well as one of the links from the second CSS to the second 3550...when I bring down the first 3550, access to devices is still available through the second 3550, blocking goes away, etc., and life is great.

The problem comes when I bring the first 3550 back up. STP kicks in, all the blocking goes back into effect, all is great and as it was before...except that the rebooted 3550 will not, absolutely not, allow IP traffic through anymore from the CSS's on the VLAN. I can't ping the 3550's IP address from CSS or the CSS from the 3550, etc., even though the VLAN configs all look fine.

So I thougtht it might have something to do with STP, and I actually, over a period of time, shutdown every single interface on the first 3550 (including the trunk one), except the one up to the first CSS, rebooted the 3550, etc., all to no avail...even when the 3550 only had one interface, up to the first CSS, they still couldn't ping each other. Ah, but when I rebooted the CSS's, all came back. Hmm...

So I set it up again, caused the failure, saw it all happen again, only this time I shutdown all interfaces on the first 3550 except the one from it to the other 3550...and IP works just fine through this link; the CSSs can ping the first 3550 through their links to the second 3550 and its connection to the first 3550. The only trouble appears to be if the CSSs are connected directly to the rebooted 3550 instead; this apparently is something between the CSS and the 3550, and no amount of 3550 reboots, reconfigs, etc., can seem to fix this until the CSS reboots.

I can't have this, obviously...if the 3550 dies and comes back, it takes over again for servers attached to it but can't actually pass IP traffic until I reboot the CSSs. I've verified, btw, that this is true for the other 3550...that is, if the other 3550 goes down and comes back, it can't pass IP traffic through the CSS either, but can if the crossover to the other 3550 is active.

Any ideas? This has flummoxed me...when I got down to just the 3550 connected by one port up to one CSS, and they still couldn't see each other, I almost cried. :) Needless to say, any help or pointers would be appreciated...

Arthur

7 Replies 7

pflunkert
Level 4
Level 4

Hi Arthur,

the CSS supports configuration of Spanning-Tree Protocol (STP) bridging for an Ethernet interface in a VLAN or for a trunked Ethernet interface. Spanning-tree bridging is used to detect, and then prevent, loops in the network. You can define the bridge spanning-tree path cost, priority, and state for an Ethernet interface or for a trunked Ethernet interface. Ensure you configure the spanning-tree bridging parameters the same on all switches running STP in the network.

But your description looks like a bug. Which software version you use on the CSS and on the cat 3550. Can you post the configs from the CSS and cat interfaces?

Regards

Peter

Hi, Peter...

Thanks for the response. STP actually *seems* to work fine in this configuration, with links being blocked appropriately and blocked links switching to forwarding when certain devices go down (we discovered all of this while doing failover testing for the site). It also all looks fine when the downed device, in this case the 3550, comes back up...STP re-blocks the appropriate interfaces, etc. However, the re-booted 3550 (in this sample case 3550-2) is unable to communicate using icmp or tcp through the CSS.

In thinking about this more last night, I really think the issue is the CSS: when the rebooted 3550 is attached to the network through the other 3550, traffic flows fine for it, including through the downstream CSSs. It's only when the rebooted 3550 is directly attached to one of the CSSs that IP from it seems completely blocked at the CSS, even though the links are up and forwarding. In fact, as I mentioned, I currently have the rebooted 3550 in question with only a single interface up, connecting it to one of the CSSs on one of the VLANs, so STP is not in the picture, and it still cannot talk to the CSS.

I've attached the configs for all 4 switches (removing the services, content rules, etc. for clarity). To be clear, there are actually 3 VLANs, and for a variety of reasons, the CSSs and 3550s have 3 sets of cross links to connect the VLANs in a redundant manner; the 3550-s are configured for switchport access to keep the VLANs completely separate.

In VLAN101, CSS-1 2/3 connects to 3550-1 0/1, and connects 2/4 to 3550-2 0/1. CSS-2 2/3 connects to 3550-1 0/2 and connects 2/4 to 3550-2 0/2. The other two VLANs are connected in a similar manner using 2/5 and 2/6 with fa 0/9 and 0/10, etc. Finally, we also had it set up with the 3550s trunked together on 0/8.

As you can see, however, at this point I have shutdown every port on 3550-2 except for port 0/1 connecting it back to CSS1 2/3; that is its only connection back into the network, on VLAN101. In the 5th and 6th attachments I've shown some command outputs: CSS1 (192.168.103.252) and 3550-2 (192.168.103.251) cannot ping each other, and 3550-2 cannot ping anything else on the 101 VLAN, even though it was able to before the initial reboot of 3550-2.

I really appreciate the help; let me know if you need any other info.

Arthur

Hi Arthur,

please try to set speed and duplex to 10MBit/s /Full.

Which CSS you use (hardware and software)??

Regards

Peter

Hi, Peter...

Resetting both sides of the link to 10M/FD had no effect (just tried it). The CSS's use sg0730005 (07.30.0.05), and they are 11503's with one 16-port card.

I believe the CSS is "stuck" and either won't send or receive on the link properly (don't know which) until a reboot clears some internal stack or process. Unfortunately it's not a one-shot deal: doing these things in this order seems to always put the CSS's into this state. Short of setting up a sniffer, is there a way to use debug on the 11503 to determine if it's seeing the incoming packets from the 3550, or what is going on? I've never gotten good use out the "flow" commands in llama mode on the CSS's. they don't seem to let me limit them to specific IP or Mac addresses even though the documentation says they will, and I get flooded with packet announcements.

Thanks...

Arthur

Hi Arthur,

i found a bug but i'am don't sure if the bug responsible for the error. The bug id is CSCed32955 and here is the description: Release Notes

Problem:

After power-cycling the Cat2950, the Rx port on the CSS stops incrementing.

The Tx port functions properly.

Code review indicates that there are more places that we need to check for

the FIFO error as indicated by the INTEL Errata #10.

Scenario:

First connect CAT2950 that has LXT9785 with C2 revision on the ethernet ports

to the CSS11501 and remove and put the power back on the CAT2950, then the

CSS11501 port that connected to the CAT2950 would get stuck following the

power up on the CAT2950.

Workaround:

The workaround is to reboot the CSS. To avoid this issue, configure both the

Cat2950 and the CSS for a speed of 10 megabits per second.

But with version 7.40(0.4) the bug is fixed. Furthermore the worakround don't works. I will search to another solution tomorrow.

Regards

Peter

That sounds very similar to what I'm experiencing. Upgrading to 7.40.0.4 will fix this issue?

Arthur

Hi Arthur,

yes, with 7.40.0.4 it should be fix. When it's possible try it. When it don't works please post it. I will look to another solution.

regards

Peter