I've been dealing with this problem for a while now.
I have multiple sites with the same MPLS provider. The provider has their Cisco 2431 router at each of my locations with a firmware of 12.4t7. I don't have access to the Cisco's so I'm forced to work with the provider. My LANs all have pretty similar equipment - Adtran router 4305/3305's, Extreme switches 250e, and about 50 computers.
At one of my locations, I discovered some packet loss occurring between my Adtran router (WAN interface) and the Cisco router (LAN interface). The provider was able to confirm the packet loss on their LAN interface (their WAN interface was clean). As a test, we removed my equipment from the equation and the provider plugged in a laptop into their Cisco LAN interface and did a ping test. No packet loss. This had put the ball in my court so I began ruling out issues with my equipment.
This is the topology:
Cisco Router > Extreme Switch > Adtran Router
I have the Extreme switch in between the routers because I have a secondary circuit (DSL) for backup purposes.
Things that I have tried:
To cut this story short...
- I have swapped my adtran with a known working one.
- I have updated its firmware to the latest.
- I had Adtran support look at my config and see if there was anything fishy on it (there wasn't and I'm not sure what config would possibly cause intermittent dropped packets especially when there is a fair load). My config is pretty basic and its similar to my other sites with the same setup.
- I have swapped network cables (a few times in fact).
- I removed my switch entirely from the topology by plugging the Adtran router directly into the Cisco router using a crossover cable. On the LAN side of the Adtran router, I plugged my laptop in.
There was still packet loss.
So I asked the provider to swap their router. Same model and the same firmware as in all of my other sites. They did.
Unfortunately I'm still seeing packet loss.
The last thing I did was do a packet capture. The packet capture is just filtered for ICMP packets between the Adtran Router and the Cisco Router.
The Adtran is sending request packets and the Cisco is sending reply packets back. Each time there was packet loss, the Cisco was the one that didn't reply to a request packet.
I'm setting up a call with my provider to see what they say about my packet capture findings but I would still like a stronger argument. It's hard to dispute that their is a problem with their equipment if they swapped it once already. Also, that I mostly see a problem when there is a fair amount of traffic. And finally, that they have the same equipment (model, firmware, and similar configs) at all of my sites.
I'm not too familiar with Cisco gear so perhaps someone can help on the Cisco side of things.
The IAD2431 has a built-in T-1 and a WIC slot. Is there a second T-1 in the WIC slot? In other words, is there one or 2 T-1's? Regardless, if your seeing traffic loss going from the 10/100 network to the IAD, I would recommend that you setup some queuing on your side. If your paying for voice and data, then the provider will most likely prioritize the voice, but everything else will be dropped if the T-1 is congested.
I know your probably thinking that "we don't use the entire T-1 though". And that might be true if you look at the "average" utilization. You said that the "cisco" didn't reply. But it doesn't sound like we even know if the packet made it to the destination. If you use some traffic shaping to ensure the adtran doesn't send more then 1.544Mbps you might be able to see when the traffic drops are happening. There could be a "microburst" that is causing this issue, and it would not really be an issue the provider could help with.
On a Cisco device I would recommend setting up a policing policy with an exceed action of forward. That would allow you to look at the policy-map and see if you are exceeding the 1.544Mbps. Not sure if you can do that on the adtran or the extreme.
Thanks danrya for replying.
The IAD2431 is actually just using it's ethernet ports - no T1 cards on it. The WAN side of the Cisco is plugged into an Actelis EAD for our MPLS circuit.
The pinging I'm doing is just between the WAN port on my Adtran router to their LAN port on their Cisco.
Do you think your traffic shaping suggestion still applies in this case? I'm far from an expert on this so any suggestions are greatly appreciated.
First, sorry that I assumed your using a T-1.
The question then becomes this:
1. What is the link speed from the IAD to the switch?
2. What is the link speed from the switch to the Adtran router?
3. What speed MPLS are you paying for?
The answer to your question is: Policing will still help you determine if traffic levels are above what your paying for. And shaping will allow you to drop or queue packets before you reach that limit, instead of allowing the SP to just "tail drop" any traffic that exceeds the limit.
The other thing, I didn't realize you were pinging the actual IAD device and that was failing. The IAD is a "software" based platform, and all packet forwarding is done in software. The CPU is responsible for everything that happens on the device, and if it's busy forwarding packets when you do a ping, it might drop those pings or be too busy to reply.
I would recommend setting up the QoS on your side to shape the traffic to your "paid for" bandwidth. That way you can create multiple queue's and prioritize traffic in a way to make any dropped packets less impacting to the users.
If you setup shaping or policing and still see traffic dropping, then you need to ask the provider to do a speed test to ensure your getting what your paying for. A simple way for you to do this on your own would be to setup an FTP server on one side and a client on the other side. Connect them directly to the IAD, and ftp a file. The ftp transfer will usually tell you how long it took to download and you can calculate the BW that your getting.
Sorry for the delay but I was away for the long weekend.
To answer your questions:
1. auto neg
2. auto neg
3. Paid-for bandwidth is 5
I just recently set traffic shaping on my interface at 5mbps with about 3000 priority on phone traffic QOS.
As for the IAD - what puzzles me is that its the same model and firmware at many of my other sites. Not sure why my other sites aren't experiencing the same issue, assuming that the CPU isn't up to par with the load.
Could there be something configured on the IAD at this one site that isn't mimicked at the rest of my sites?