Strange sniffer traffic on 3550s

innoval · ‎11-24-2004

Hi all,

our current setup is: 3550 aggregator provides Gbit connections to 4 3548s. We have 15 hubs plugged into switch 4 where all our end users connect to.

We have been having some issues where users complain about network slowness when our backups run during the day. The backups run from the tape server on switch 1 to a database server on switch 2. When I place a sniffer on a hub (which uplinks to switch 4) I can see all traffic going back and forth for the backups. I moved these hubs to switch 3 and I cannot see that traffic anymore.

Can you please throw some ideas as to why this has happened (always thought switched traffic was A->B) and if it is a config error or hardware error.

Many thanks,

I.

Kevin Dorrell · ‎11-24-2004

I presume, because you do not say otherwise, that everything is on the same VLAN.

There are two strange things happening here:

1. Why is the aggregator 3550 forwarding the backup traffic to switch 4, and not just shuffling it back and forth between switches 1 and 2?

2. Why is switch 4 forwarding the traffic to the hubs, when it should know that the MAC addresses of both the tape server and the database server are on its uplink port?

Let's look at the second question first. While the backup is going on, go to switch 4 and interrogate its forwarding database. Can you see the MAC addresses of the tape server and the database server? They should be present, and they should be recorded on the uplink port.

Similarly, go to the aggregator, and search its forwarding table for the MAC addresses of the tape server and the database server. You should see them on the downlinks to their respective switches.

I cannot think why this flooding should be happening. Is the backup system using broadcast or multicast addresses between the tape and the database, or something like that? But that would not explain why switch 3 behaves correctly.

Let us know the results of the two tests I suggested.

Kevin Dorrell

Luxembourg

innoval · ‎11-24-2004

Thanks Kevin,

will give that a go. By "interrogate the fwd db" do you mean the "sh forward" command? I had a look at the command and it goes deep into the interface rather than giving a full map.

I.

marikakis · ‎11-24-2004

I would check the aggregator first.

Are there any SPAN sessions configured ?

M.

innoval · ‎11-24-2004

Marikakis,

no SPAN and the aggr is a red herring. Only joined the cluster last week. Problem has been going on for weeks.

I.

innoval · ‎11-24-2004

Last minute addition:

went through both configs for switch 3 and 4 and although they are exactly identical, the only difference is that switch 3 (the one with the problem) has an IP name-server that does not really exist on the LAN (or any LAN...).All other switches point to valid DNS servers.

Could that have anything to do with this or another red herring?

Thanks,

I.

Kevin Dorrell · ‎11-24-2004

No, I think that's a red herring. Interesting observation though, and one you might like to remedy.

Kevin Dorrell

Luxembourg

Kevin Dorrell · ‎11-24-2004

Just a minute: you say the aggregator is new. I take it that the structure of the cluster is a star based around the 3550. What was the structure before you added the aggregator?

Kevin Dorrell

Luxembourg

Kevin Dorrell · ‎11-24-2004

No, it should be sufficient just to show the forwarding entry for the MAC address of the tape / server. The command is show mac address-table address nnnn.nnnn.nnnn, and here is the doc:

http://www.cisco.com/univercd/cc/td/doc/product/lan/c3550/12225se/3550cr/cli2.htm#wp2416917

See which interface (if any) each of the switches (aggregator and switch 4) would use to get to the tape and/or database.

Kevin Dorrell

Luxembourg

innoval · ‎11-24-2004

Thanks Kevin,

prompt responses much appreciated. Before the aggregator the 3 switches were (2 stacked together) and 1 on its own. Currently all in a star. But need to remember that these problems were there for months before "Aggy" was introduced.

I.

innoval · ‎11-24-2004

Kevin,

will be running an adhoc in a minute to test this. Is it worth running this before the adhoc backup for every single NIC on these servers to see where things are going?

Kevin Dorrell · ‎11-24-2004

It might be an idea to get a hard copy of the forwarding tables in the two switches before you start. But the most important thing is to be aware of the MAC addresses of the tape and the database before you start, so you know what to look for.

Good luck, and let us know how it goes.

Kevin Dorrell

Luxembourg

innoval · ‎11-24-2004

Kevin,

just checked everything (I think it was everything) on the main switches.

1) Aggy can resolve all out to its G0/1

2) All switches can resolve the NICs of the three servers

BUT I found something that looks like a problem:

Each server has 4 NICs. The burned in MAC of the first NIC is also the virtual NIC of the team.

Switch 1 cannot resolve the 1st (virtual/primary) MAC for either tape servers. It can resolve the MAC of db fine. All other switches can resolve those virtual MACs.

I can see the problem but I do not really know how to fix or why it happened.

Any help?

Kevin Dorrell · ‎11-24-2004

That could well be the problem. If Sw1 does not know where to send the MAC of the tape servers, then it will flood the packets that have the tape as destination. Where does Sw1 connect to? 'Cos all the ports will get a piece of the action.

I bet Sw3 knows where to send frames to for the tape units, but Sw4 does not. Question is why. Maybe there is someone on Sw3 who is talking to the tape unit, using the MAC address that the database server would be using.

You could try pinging the tape unit from something on Sw4 during the backup, and see if that stops the flooding.

These arrangements with multiple NICs are horrible for switched networks. Especially the ones that respond to an ARP with one address, and then send their traffic using a different one.

Try sniffing the exchange between the tape server and the database server. Is the destination MAC from database to tape different from the source address from tape to database? Or vice versa?

Often the only way to fix this is with static mac forwarding entries.

Kevin Dorrell

Luxembourg

innoval · ‎11-24-2004

Hi Kevin,

thank you very much for the reply.

Bizarre thing is that they seem to come and go... From this same switch I could resolve the MAC from the table 5 minutes ago and then I could not again... Correct me if I am wrong but I do not think "dynamic" means it times out every x minutes...

Can you think of any disadvantages for static? (apart from the margin of error and admin cost)

I.