Cisco 6509 ping response time problems

jens.boberg · ‎05-09-2006

Hi guys and girls,

I'm experiencing a rather annoying thing in a clients network.

Here's the setup:

2 Cisco Catalyst 6509 with SUP720

Cisco Internetwork Operating System Software

IOS (tm) s72033_rp Software (s72033_rp-JK9SV-M), Version 12.2(18)SXD5, RELEASE SOFTWARE (fc3)

Technical Support: http://www.cisco.com/techsupport

Compiled Fri 13-May-05 19:15 by ssearch

Image text-base: 0x4002100C, data-base: 0x42698000

ROM: System Bootstrap, Version 12.2(17r)S2, RELEASE SOFTWARE (fc1)

BOOTLDR: s72033_rp Software (s72033_rp-JK9SV-M), Version 12.2(18)SXD5, RELEASE SOFTWARE (fc3)

Here's the essentials out of my configuration:

mls ip multicast flow-stat-timer 9

no mls flow ip

no mls flow ipv6

mls cef error action freeze

spanning-tree mode pvst

no spanning-tree optimize bpdu transmission

redundancy

mode sso

main-cpu

auto-sync running-config

interface Vlan7

description XXXXX

ip vrf forwarding XXXXX

ip address XXX.XXX.XXX.61 255.255.255.0

ip access-group XXXXX in

no ip redirects

standby 1 ip XXX.XXX.XXX.60

standby 1 authentication XXXXX

router bgp XXXXX

bgp log-neighbor-changes

address-family ipv4 vrf XXXXX

neighbor XXX.XXX.XXX.62 remote-as XXXXX

neighbor XXX.XXX.XXX.62 activate

no auto-summary

no synchronization

network XXX.XXX.XXX.0 mask 255.255.255.0

exit-address-family

ip vrf XXXXX

rd 1:7

route-target export 1:7

My problem is this, when I am at one VLAN and ping's a device on another VLAN i get response times like this:

Response from XXX.XXX.XXX.XXX: byte=32 tid < 1 ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid=248ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid < 1 ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid=97ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid < 1 ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid=209ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid=9ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid=48ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid < 1 ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid=3ms TTL=127

Response from XXX.XXX.XXX.XXX: byte=32 tid=9ms TTL=127

I have tried to connect two PC's to the same physical card in one off my 6509's just at two seperate VLANs and I still have the same problem.

If anyone has any ideas I would be more than happy to hear, if you need to see some more config just post a reply here.

Thanks in advance

/Jens Boberg

romccallum · ‎05-10-2006

phew - needle in a haystack here mate.

First thing is that you are creating SVI interfaces and this is against the recommended methods by cisco on a 7600. What you should be doing is creating subinterfaces on a gig interface - again preferably on a ge-wan OSM port as this has more queues, more "smarts" and generally works better.

Now down to guess work - are your vlans stable? is spanning tree bouncing up and down, have you any physical cabling problems?

HTH

mheusinger · ‎05-10-2006

Hello,

the first - tricky - thing in my opinion is to locate the problem. You state that even for inter-VLAN routing - not involving MPLS - the problem exists. The ping does check end-to-end connectivity. This involves the network, but also the TCP/IP protocol stacks of the end systems including their CPU.

Does the problem occur, when you connect the two end systems into the same VLAN? Just to be sure not to chase artefacts from a MS TCP/IP implementation ...

Which other traffic is present at the time you take the measurements (broadcast, unknown, BPDU, CDP, ...)? Can you check with f.e. ethereal for traffic interfering with your response times?

Regards, Martin

jens.boberg · ‎05-10-2006

That's exactly my problem, I can't seem to find any source off the problem at all.

When I have 2 machines on the same VLAN the problem doesn't show.

It's first when I connect the machines to seperate VLAN's that I get those weird response times.

To answer both questions in one message, there is, as far as I can tell no problems with spanning-tree or the cabling, I have tried almost anything I can think off when trying to troubleshoot this.

I have made an IOS upgrade, IOS downgrade, changed the cabling, created two new VLANs with no other traffic on than my PING tests and still the same problem.

I'm afraid I somewhere missed out on an important line in my config, but I just can't seem to find it.

If you need me to post some more output just say the word, thanks for helping guys!

mheusinger · ‎05-11-2006

Hello,

I would use a packet analyzer (like ethereal from www.ethereal.com) to capture all ICMP packets on both machines. This way you could detect, whether the ICMP replies are sent immediately after receiving the ICMP requests.

This way you can at least exclude host specific problems. Once you are sure it is the Catalyst the mystery is still not resolved, but there might be more info on how and when the response times are increasing dramatically (pause frames, additional traffic interfering, etc.).

I would also suggest to capture some FTP downloads between the hosts to see, whether your problem is ICMP specific or a general one. Again the inter packet times should give a hint, where the problem exists. Can you supply this information?

Regards, Martin

jens.boberg · ‎05-12-2006

Hi again,

Here's what I have done now.

1) Did a capture with ethereal, everything looks just fine, an occasional HSRP message and an occasional STP message. Nothing much.

2) Created two new VLANS, with BGP and VRF's configured just as all the other VLANS I have.

3) Tested PING between these interfaces, and still the same problem.

4) Removed the VRF config for these 2 interfaces, and all the PING issues where gone.

My guess would be it has something to do with CEF, but my knowledge on this area is way to low, so any help I could recieve is appreciated.

jens.boberg · ‎05-16-2006

By the way, I also tried to do some FTP transfers, and they slow down exactly when I get the bad response times.

jens.boberg · ‎05-18-2006

Seriously, doesn't anyone have a clue to what my problem could be?

mheusinger · ‎05-18-2006

Hello Jens,

so let us assume it is a Catalyst intrinsic problem.

Which hardware is in use?

Do you have any indication of a lack of ressources MSFC CPU, whatever?

Regards, Martin

jens.boberg · ‎05-18-2006

Hi Martin,

What sort of hardware are you asking about, like I said in the initial post there are two Catalyst 6509 chassis with SUP720-3B.

They are both equpied with a NAM2 and one 24port SFP card per chassi.

I still think there is something wrong with my config, since everything works just fine when removing the VRF config, and just having two seperate VLANs.

Thanks for helping.