There is IBM Blade Center with 12 Blades, connected to the core switch (Cat6509 VSS) via 2 stacked 3xxx built-in Cisco switches. Links to Core are in etherchannel (20 Gbps in total). Another switch (Cat 2960) is connected to Core via single Gigabit link. The servers in Blade Center are streaming servers and developers are experiencing bad quality of video streams received from servers. Video streams are sent via RTSP and while increasing the size and intensity of packets developers are saying that there is packet loss. What could be the problem?
You need to look a t a lot of things. Firstly the interface outputs of your switches to see where you have output drops, or in case you have any other errors as a result of bad cabliing etc.
Also sounds like in this sort of network you really should have QoS enabled on your switches to classify RTSP traffic and give it some priority over other "normal" data traffic. Ofc ourse you need "know" your network and take into consideration other mission critical data, time sensitive applications and VoIP (if you have any of this).
You should have a global corporate QoS policy.
Also you need to monitor the network and interfaces (CiscoWorks?, Cacti?) and baseline it. Sounds like you've got your work cut out. First though try to take a look at the switchports to see where the congestion (assuming it is congestion) is to begin with.
The network is in test phase and there is no other traffic than streaming. I have viewed interfaces output - there is no buffer counts and drops, the only counters that are changing are input/output packet counts. We have done such interesting testing: when client is receiving streams from server (laptop) that is connected to switch (this switch is connected to Core via another switch) there is no problem with quality of video, but when the stream is received from powerfull server in Blade Center the quality is low. OS on laptop is Win XP, on blade server is Linux SUSE. Now we are supposing these 2 main factors: the problem is either in OS (Linux SUSE) or in Cisco 3xxx switch (it is built in IBM Blade Center). Maybe there could be anything else?
OK. Don't you just love it. The Cisco blade switches have ports on the front panel, so you could plug the laptop in there and repeat the tests. That would prove that it was/wasn't a switch problem or point towards the linux problem. I'm inlcined to go with a SuSe linux problem/configuration. But do the test first if you can.
Then take a look at the SuSe. How is it configured.? Is there bonding? Is it active-active? Are the ports hard-coded for speed/duplex? And on the switch?
We have had errors with SuSe linux on HP blade switches, but now we hard code everything.
There is no ethernet ports on the front panel of the switch, only 10 Gb fibre-optic interfaces. We have 2 Cisco 3xxx switch per Blade Center and they are in stack witch 20 Gb etherchannel to Core. Every server is connected to the 3xxx switch via backplane by 1 Gb interface (one link per switch in stack) - in total 2 Gb for every switch (link aggregation is configured both on servers and switch (LACP 802.3ad)). Can you tell more about hard coding on SUSE - maybe that can can solve the problem?
Well you can use mii-tool or ethtool.
ethtool comes installed whereas mii-tool you usually have to install it. with ethtool you can see the speed/duplex capabilites and the current settings.
Just lookup on google how to set por/duplex on suse using ethtool.
On another note, you're using a different model switch to the ones we've got...do the blade switches have an ILO connected back to the core aswell? Well, the ILO is not actually connected to each switch but to the management card in the chassis. Check the traffic is actually flowing over the 20GB links and not over the ILO link. We've had that problem.
Management port of 3xxx switch connects to Mgmt switch's access port (in Mgmt vlan) but 20 Gb link is trunk and allows streaming vlan so it is not likely for traffic to go via Mgmt port. Another interesting moment: there is laptop with installed SUSE Enterprise 11 Server and acting as streaming server (for testing purposes) and the requested streams are of good quality, but clients receives bad quality video from the blade servers with identic OS
OK, But the laptop only has one link. So next is to check the port channels. Can you check the interface statistics on the physical ports of the channels and the linux server and post them? How many uplinks are you using on the servers? Are you creating two pairs of etherchannels per server/switch?
i.e. you are using 4 NICs. Two paris of two etherchannels. One etherchannel to switch A and one to switch B? If so are they active-passive or active-active with round robbin load balancing on the server? It does now more like a config issue.
I cant post the configs there - when I try "This message can not be displayed due to its content. Please use the Contact Us link with any questions." message appears. Server has 2 NICs which are bonded and are connected to Port-channel12 interface on 3xxx switch (actually one link per switch but these switches are in stack)
1. From a PC connected to switch with 1 uplink and suse, streaming is fine.
2. From the suse servers on the blade system streaming is not fine.
3. There are no physical L1/L2 errors on the switch.
4. There are no erros on the port-channel.
5. You have disabled port in channel alternatively and the streaming is still not fine.
6. You are sure the traffic is not going over management link.
What I would do next:
1.Check the configuraiton of the suse server NICs for errirs and speed/duplex.
2.Hard code speed/duplex on switch and server. See if things improve.
3. Get another switch with a 10GB gbic interface and plug into blade switch. Connect suse laptop to new switch and see if things improve. If things don't improve here then it's something between the blade switches and the core. Problem: getting your hands on a switch with a 10gb interface and 10gb gbic (not mnay people have a stock of these).
By the way, are your blade switches running the latest IOS?