cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
912
Views
0
Helpful
7
Replies

4500-X as edge BGP router dropping pings

civersen
Level 1
Level 1

Recently "upgraded" to a 4500-X on a college campus network with a single ISP 1gbps connection to the internet.  We receive 1 default BGP route and were previously dropping a ton of packets during peak (~800-900 mbps down, minimal up) with our old Cisco 7200 series VXR router.  

In a general sense, packet loss to end users has been greatly reduced and performance has greatly improved.  We've been running like this for about a week and started getting reports of 10-20 second outages.  With this we started noting loss of pings to inside and outside interface IP's of the 4500-X.  At random times of the day, both at peak and at off peak, we'll see ~10 second periods of ping timeouts.  This happens approximately once per 30 minutes.  The first issue we noticed was 1 minute chunks of data missing out of our SNMP observium graphs (see attachment).  Then we re-created this by pinging the interface IP's from on campus and off and saw the periods of dropped pings coinciding with the periods of no graphs.  Loss of pings is more frequent then the blocks of missing graph data, but we're pretty sure it's the same issue.  

Important to note that if the ping TTL is increased we wont see any of these pings get dropped.  During the affected 10-30 second time periods we'll see ping latency of around 40,000 ms.  Also important to note that during these periods of high ping latency user traffic is not dropped (from what i can tell).  My own pings to google remain smooth and consistent at 8ms during the 10 or so seconds of dropped pings to the 4500-X.  Users are reporting otherwise.  It's going mostly un-noticed as most people will just wait out the 10 seconds and not create a helpdesk ticket.  

I've been all over this forum and others, checking and verifying many things.  TCAM memory shows less than 10% usage and sh proc cpu always shows 5% or less usage.  Furthermore CPU and memory is graphed via SNMP and very stable.  There are blocks missing matching up with the ping latency, could be spiking i suppose, but basically a flat line outside of missing short chunks.  Also inside and outside interfaces are not showing any dropped packets whatsoever as they did before with the VXR - overruns, etc..  

Any feedback is appreciated.  I'm hoping there's just something in the config i've missed or something inherently different about using a L3 switch vs a router.  any show commands that would be helpful i can post.  

7 Replies 7

Hello,

this could be MTU size related. Can you post the config of your 4500x ?

config posted.  

Hello,

not sure if your release supports that, but try and enable 'ip cef' globally...

yeah i actually enabled it this morning and verified.  I have a feeling that may help out the general user experience as all the permit statements in the ACL will be let through more quickly but it had no affect on the lost/delayed pings.  

I wonder if the problem might be simply related to the fact that ICMP traffic destined for the router is process switched and hence gets dropped when the CPU is too busy.

Can you try and implement a simple service policy that allocates 10 % of bandwidth to ICMP traffic and check if pings still get dropped ?

access-list 101 permit icmp any any

class-map match-any ICMP_TRAFFIC
 match access-group 101

policy-map ICMP_BANDWIDTH
 class ICMP_TRAFFIC
 bandwidth percent 10

interface TenGigabitEthernet1/1

 service-policy output ICMP_BANDWIDTH
 service-policy input ICMP_BANDWIDTH

Iulian Vaideanu
Level 4
Level 4

Any spanning-tree transitions occuring on the path between the graph monitoring server and the 4500X?

Yes.  path is 4500 - ASA firewall - core L2/L3 switch which runs STP.  sh spanning-tree on the 4500X shows "No spanning tree instance exists.".  

I've been suspicious that some L2 protocol is facing the internet and causing problems.  

Review Cisco Networking for a $25 gift card