Server has problems doing a ping to non native VLAN IP

mariolaniel · ‎07-19-2005

Hi All,

This one is really odd and I'm not even sure it has to do with the switch itself.

One of my server is on VLAN 10, when I try to ping any devices it ends up that I get really high latency from two VLAN out of five.

Here's the ping output from the server going to non native VLAN where there are no issues.

scoprod: covroot 46> ping 10.10.70.1

PING 10.10.70.1 (10.10.70.1): 56 data bytes

64 bytes from 10.10.70.1: icmp_seq=0 ttl=255 time=0 ms

64 bytes from 10.10.70.1: icmp_seq=1 ttl=255 time=0 ms

64 bytes from 10.10.70.1: icmp_seq=2 ttl=255 time=0 ms

64 bytes from 10.10.70.1: icmp_seq=3 ttl=255 time=0 ms

64 bytes from 10.10.70.1: icmp_seq=4 ttl=255 time=0 ms

64 bytes from 10.10.70.1: icmp_seq=5 ttl=255 time=0 ms

64 bytes from 10.10.70.1: icmp_seq=6 ttl=255 time=0 ms

--- 10.10.70.1 ping statistics ---

7 packets transmitted, 7 packets received, 0% packet loss

round-trip min/avg/max = 0/0/0 ms

Here's the output from one of the non native VLAN where I encounter high latency when it start.

scoprod: covroot 47> ping 10.10.40.1

PING 10.10.40.1 (10.10.40.1): 56 data bytes

64 bytes from 10.10.40.1: icmp_seq=0 ttl=255 time=0 ms

64 bytes from 10.10.40.1: icmp_seq=1 ttl=255 time=7070 ms

64 bytes from 10.10.40.1: icmp_seq=2 ttl=255 time=6060 ms

64 bytes from 10.10.40.1: icmp_seq=3 ttl=255 time=5050 ms

64 bytes from 10.10.40.1: icmp_seq=4 ttl=255 time=4040 ms

64 bytes from 10.10.40.1: icmp_seq=5 ttl=255 time=3030 ms

64 bytes from 10.10.40.1: icmp_seq=6 ttl=255 time=2020 ms

64 bytes from 10.10.40.1: icmp_seq=7 ttl=255 time=1010 ms

64 bytes from 10.10.40.1: icmp_seq=8 ttl=255 time=0 ms

64 bytes from 10.10.40.1: icmp_seq=9 ttl=255 time=0 ms

64 bytes from 10.10.40.1: icmp_seq=10 ttl=255 time=0 ms

64 bytes from 10.10.40.1: icmp_seq=11 ttl=255 time=0 ms

--- 10.10.40.1 ping statistics ---

12 packets transmitted, 12 packets received, 0% packet loss

round-trip min/avg/max = 0/2356/7070 ms

Anyone has a clue?

Thanks

Georg Pauwen · ‎07-19-2005

Hello,

where is your inter-VLAN routing taking place ? I would have a look at that (Layer 3) device. Another possibilty is that the root switch for the VLAN you are having problems with might be one of your access switches (not knowing the exact physical and logical setup of your network, make sure that the most central switch is the root for your VLANs).

Regards,

GP

mariolaniel · ‎07-19-2005

The inter-VLAN routing is taking place on a 6509 which is our core switch and where the server is connected to as well.

That switch is the root switch.

Thanks for the pointers, I'll keep on investigating.

Richard Burts · ‎07-19-2005

I find it interesting that the responses in your high latency ping are sort of bi-modal. There is a group of responses with 0ms and the other responses are all greater than 1000 and nothing in between. It makes me wonder if there is some flood of traffic (or something) that is bogging down forwarding of these ping packets. Or it might be that the destination to which you are pinging has gotten busy and treats ping response as a low priority task.

Are the symptoms consistent? Is it always the same destination that get high latency?

HTH

Rick

HTH

Rick

mariolaniel · ‎07-19-2005

Rick,

It is consistent with the same two vlans, all the others are okay. Tomorrow morning I'll try puting a sniffer on a span port to try to make some sense out of all this. By the way the only other interesting thing that I found and it could well point to the server (SCO unix Open Server 5) is that I got another server for devellopment configured the same way and it's acting almost the same way. The latency is not as high but it is there. Traceroute does not even resolve for those two VLANs on the two servers. It finds the default gateway (it's native VLAN interface) and after that it does not know where to go.

It's been acting like that for about a week now and nothing has been done or changed on the servers or the switch.

P.S. Every bit helps.

Thanks,

Mario

Richard Burts · ‎07-19-2005

Mario

Your comment about the server is interesting. I wonder if there are other end stations on these VLANs where you have high latency and if you get the same high latency for all machines in that subnet, or if it is specific to some machines?

I am trying to understand your comment about traceroute. Are you talking about traceroute from those VLANs to somewhere or are you talking about traceroute from somewhere to these VLANs? If I understand correctly that the traceroute gets a response from the first hop (default gateway) and then times out on hops after this, it would sound like some routing issue. But a routing issue should not produce high latency it should produce broken ping.

Help me understand where the problem is.

HTH

Rick

HTH

Rick

mariolaniel · ‎07-19-2005

There are no hign latency from workstations using ping or traceroute on these two VLANs but telnet takes forever and once the users login the application runs fine.

The traceroute I was talking about were initiated from the two servers.

Here's an example:

The IP of the server is 10.10.10.28

scoprod: covroot 115> ping 10.10.40.251

PING 10.10.40.251 (10.10.40.251): 56 data bytes

64 bytes from lp16 (10.10.40.251): icmp_seq=0 ttl=59 time=0 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=1 ttl=59 time=7070 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=2 ttl=59 time=6060 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=3 ttl=59 time=5050 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=4 ttl=59 time=4040 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=5 ttl=59 time=3030 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=6 ttl=59 time=2020 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=7 ttl=59 time=1010 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=8 ttl=59 time=0 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=9 ttl=59 time=0 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=10 ttl=59 time=0 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=11 ttl=59 time=0 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=12 ttl=59 time=0 ms

64 bytes from lp16 (10.10.40.251): icmp_seq=13 ttl=59 time=0 ms

--- 10.10.40.251 ping statistics ---

14 packets transmitted, 14 packets received, 0% packet loss

round-trip min/avg/max = 0/2020/7070 ms

scoprod: covroot 116> traceroute 10.10.40.251

traceroute to 10.10.40.251 (10.10.40.251), 30 hops max, 40 byte packets

1 10.10.10.2 (10.10.10.2) 10 ms 0 ms 0 ms

2 lp16 (10.10.40.251) 10 ms 0 ms 0 ms

Yeah I know, it is working now, it wasn't this afternoon.

Same kind of example on non defective VLAN

scoprod: covroot 118> ping 10.10.60.244

PING 10.10.60.244 (10.10.60.244): 56 data bytes

64 bytes from lp9 (10.10.60.244): icmp_seq=0 ttl=59 time=0 ms

64 bytes from lp9 (10.10.60.244): icmp_seq=1 ttl=59 time=0 ms

64 bytes from lp9 (10.10.60.244): icmp_seq=2 ttl=59 time=0 ms

64 bytes from lp9 (10.10.60.244): icmp_seq=3 ttl=59 time=0 ms

64 bytes from lp9 (10.10.60.244): icmp_seq=4 ttl=59 time=0 ms

--- 10.10.60.244 ping statistics ---

14 packets transmitted, 14 packets received, 0% packet loss

round-trip min/avg/max = 0/0/0 ms

scoprod: covroot 119> traceroute 10.10.60.244

traceroute to 10.10.60.244 (10.10.60.244), 30 hops max, 40 byte packets

1 10.10.10.2 (10.10.10.2) 0 ms 0 ms 0 ms

2 lp9 (10.10.60.244) 0 ms 0 ms 0 ms

P.S. In the first exaple traceroute takes approximately 3 minutes before coming back with the second hop and in the second one it is immediate.

The routing table is small mostly all VLAN interface.

MSFC1-StJoseph#sho ip route

Gateway of last resort is 10.10.1.9 to network 0.0.0.0

S 172.16.0.0/16 [1/0] via 10.10.1.9

S 192.168.200.0/24 [1/0] via 10.10.1.9

10.0.0.0/24 is subnetted, 8 subnets

C 10.10.1.0 is directly connected, Vlan50

C 10.10.10.0 is directly connected, Vlan55

C 10.10.20.0 is directly connected, Vlan60

C 10.10.30.0 is directly connected, Vlan65

C 10.10.40.0 is directly connected, Vlan70

C 10.10.50.0 is directly connected, Vlan75

C 10.10.60.0 is directly connected, Vlan80

C 10.10.70.0 is directly connected, Vlan85

S 192.168.50.0/24 [1/0] via 10.10.1.9

S 192.168.2.0/24 [1/0] via 10.10.70.15

192.168.48.0/27 is subnetted, 1 subnets

S 192.168.48.0 [1/0] via 10.10.1.9

S* 0.0.0.0/0 [1/0] via 10.10.1.9

Telnet session from the non affected VLANs to the servers are also immediate in their response.

Mario