10-29-2015 07:53 PM - edited 03-08-2019 02:30 AM
Hello all.
This is my first post and i'll try to be as detailed as possible. I am upgrading the core of our network with two NX 6004's that are connecting north to two Catalyst 7606's. The 6004's also have connections going south to two NX 6001's. Everything is eBGP with all P2P links, detailed like this: (for clarity sake, i'm just going to use a single of each box)
7606 -> 6004 (Port-Channel - two 10Gb links on both sides)
6004 -> 6001 (40Gb P2P)
The eBGP peerings between the NX boxes come up just fine. The peerings between the 6004 and the 7606 does not come up what so ever. After digging around and debugging some bgp packets, I noticed that TCP never establishes what so ever. Forgetting about BGP for the moment, I then noticed when I run some pings from the 7606 with the df-bit set and a size of 1500 and a count of 100 (per se), every 15th packet is dropped, consistently. If I were to change the size up or down, it affects the dropped packet but at different intervals. For example, send a packet size of 1100, and its every 25th packet. Send a size of 8000 (when trying to set MTU manually on the interface), every 3rd packet was dropped. Here is what I have done so far:
Set MTU manually
Set P2P to a single link only
WireShark the link (no good info aside from no tcp response, which didn't yield much)
Wipe the NX box clean and only configured interface
IP TCP PATH-MTU-DISCOVERY was enabled globally on the 7606. I added it to the 6004
Configured static speed and duplex settings
I'm certain I've done a lot more that I cannot think of at the moment (have it documented at work). When I run the debug ip tcp transactions, I notice that the syn_sent to the neighbor (when originally trying to setup bgp) was timing out. It almost appears as though this is some buffer or window issue with the NX box but I am coming up short in my research of how to potentially fix this. Before I call TAC, I figured i'd post this.
I'm 99% certain its not a fiber issue or L1 issue as both NX boxes which have redundant P2P links to both 7606's are having this same exact issue. I'm also leaning on the fact of a potential bug between IOS and NX-OS; not too sure.
Any help would be appreciated.
Thanks.
-Michael
Solved! Go to Solution.
10-31-2015 02:29 AM
I was able to identify the issue. The 7600 box being so old, someone had configured a copp_management ACL on it. I had found this by pure luck because a coworker suggested to peer up the 6004 with the 4948, with it being an IOS device. The 4948 was clean and the peering came right up.
What I did though was I just used a random 192.168 space for the point to point to come up between the 6004 and 4948. Therefore, I left the address space on the interface on the 6004, and since we were running single mode LC to SC, I changed the cable to multimode running between the 6004 and the 7606. I then configured the 7606 interface with the 192.168 space as well. Boom, BGP came up.
My initial thought was there was something funky with all 8 single mode runs. It didnt make sense to me but whatever. I then changed everything back to use the 10 space, and what do you know, no BGP peering. Now I was intrigued. Next step; I changed the interface to 172.16 space. No peering. This was screaming some type of ACL to me. I searched the ACL list on the 7606 and was looking for one that had 192.168 space in it but not 10 space or 172.16 space in it. Found one that was listed as copp_management that must have been created years ago. Noticed that the 192.168 space had hits on it as well and (when I set interfaces back to 192.168 space), the count was increasing. Bingo, I knew this was it. Added permit for the 10 space, readdressed my interfaces and BGP came up.
Sheer luck because if I hadn't addressed my test between the 6004 and 4948 strictly using 192.168 space, I wouldn't have went down this path. Good to know it was an isolated incident and not a bug between NX-OS and IOS, code, etc.
Thanks for the assistance anyways!