cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
528
Views
0
Helpful
5
Replies

broadcast storm caused by css

VanDeynse
Level 1
Level 1

Since last night we migrated another application to css, giving us bad headace now. Any ideas for our problem would be helpfull otherwise we have to fallback.

We have a vlan with all kinds of hardware and 4 css in it. 2 times 2 couples working together. 2 of type 11150 and 2 of type 11501 each for a different system (nothing to do with eachother). (In total we have 9 spread over other lans)

From sniffing this morning I found out:

A server in the network where the vip of css's are went dead. Our routers still had the ip-mac relation in its arp cache and a monitoring platform kept sending messages/pings to the dead server. Since our switches haven't allocated the mac address any more, the packet is sent to all possible interfaces of that vlan including the ones of our css. The first 2 old css are just ignoring the thing. The other 2 11503 are behaving dangerously. They accept the packets find out that they belong the way they came from and send them back. Causing to accumulate the number of packets over & over again till we have lan overflow, the full 100Mb interfaces of the css are used, application doesn't work anymore, users on the phone etc. Powering one off the backup, logically stops the storm.

This problem can happen again at random times, and didn't happen during 3 months of testing, but today I tried to power the backup up again, but the storms start over & over again. What did Murphy say again.

I powered the first 2 old one down last night, but the problem still persists.

The only thing I can come up with is to narrow the incoming access-list allowing only traffic between the 2 css & towards the vips on it. But I'm not sure if this will work, and I can't do that right now since I've got a couple of 100's of session on the device cause a throughput of continiuos 3 à 4 Mbps.

Any ideas what the nature of the behaviour of those 2 css is, the other 2 in the same segment don't act this way.

2 good css of type 11150 version 6.10 build 201

2 bad css of type 11501 version 6.20 build 3

Upgrade is not such an option sinc all other version higher which I tried have problems with http polling towards an asp page.

Hans

5 Replies 5

d.parks
Level 1
Level 1

This sounds like a bridge loop. How are your CSS's configured in terms of redundancy. You generally should not have both CSS's in a redundant pair forwarding traffic at the same time.

Do both sets of CSS's link the same VLAN's, if so, you may have a loop there somewhere as well. Is spanning tree enabled?

Thanks for the idea.

The 2 couples of css do handle different sublans. So a loop is not possible there.

The css are configure in active / passive with an ISC between them. So indeed strange that they both act this way, actually none of them should. I see nno difference between the configuration of the 2 working and the 2 failing ones.

What do you mean with spanning tree enabled, on our backbone swiches this is the case yes. Do you think something in this area needs to be configured on the css as well? If so, could show me some example.

rgrds

Hans

I've checked the spanning tree configuration on all of the 4 css's and it's on each of them enabled whith moreless the same parameters. So that looks fine too. Any further ideas ?

HANS

There is something strange in your description.

You say that your switches forward a packet to all its port for an unknown address [ok] and both CSS11501 accept the packet and send it back the way it comes from [ok - that would be a big problem].

BUT if you disconnect one of the CSS the problem goes away.

I don't see why shutting down 1 CSS would impact the behavior described above.

Could you upload a sniffer trace of the traffic in this vlan [filtered to just 100 packets] so we can see what you described.

Could you capture a 'script play showtech' on both CSS at the time of the problem.

Upload everything to the forum or send me an email to gdufour@cisco.com

Thanks,

Gilles.

MARK BAKER
Level 4
Level 4

I have seen a bridge loop caused by a CSS. The configuration was to have a CSS connected to two 6500 switches for redundancy. The CSS does not use the same spanning-tree multicast address as the 6500 switches. This should not be a problem because the multicast traffic should pass through the CSS and be received by the other 6500 which would then detect and block the port connected to the CSS avoiding a L2 loop.

This seemed to work fine in the lab, but when it was put on a live network I would see what I believe is the following behavior: The CSS buffer was overwhelmed by the traffic on the subnet it was connected to. This would cause the spanning-tree traffic through the CSS to be dropped. This would lead to a major spanning-tree loop that would eventually take down the entire campus network.

If you are using two interfaces connected to the same vlan, this could be the case. If you check your root bridge on a switch it will be different from the one seen by the CSS. The CSS will see itself as the root.

The only reason I had two links in the same vlan was that I had two CSS in redundancy. One was a 11500 and the other a 11050. I wanted the 11500 to be used as the primary even if the primary switch failed. I eventually removed the second link and it ran fine after that. I would rather use the 11050 if the primary switch failed than to cause another L2 loop.

Hope this helps