02-13-2004 05:07 AM - edited 03-02-2019 01:34 PM
Hello anybody well familiar with EIGRP/GRE, or QoS too!
Notice: I present now some trouble scenario, for which I place just one introductory question, at the end of text. Simultaneously I will do testlab research, planning to perform tests focused especially on QoS handling of EIGRP packets over specific serial link, accompanied with IPSec/GRE.
Problem: We noticed on our links between routers in the production network the EIGRP neighbor flap, which occurs on daily base varying from continuous burst of error message shown under, to sporadical occurences. The link is 2mbps serial WIC-1T (with no explicit bandwith cmd to 2mbps + no "max-res-bw 100" cmd too). I overtook the troubleshooting of this configuration done prev. by somebody other and see it's missing some careful EIGRP traffic handling, also due to both commmands missing + (not shown yet) local policy map applied which marks router originating EIGRP traffic to some IP_Prec, and policy-map on serial interface causing EIGRP routing traffic to act with other one within alloc. space, accord. to its IPPrec (the whole bw allocated to policymap to work with is 75% of iface bandw. by default). P-map will be shown later, if needed.
Link is somewhat errorneous too, but not to such extent like EIGRP flap messages (on daily base), as seen from buffered log.
Messages of flapping (can be seen on opposite box too) :
Bardejov#sh logging | begin Feb 10
Feb 10 04:04:39.550: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial1/0, changed state to down
Feb 10 04:04:49.550: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial1/0, changed state to up
Feb 10 04:04:49.550: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel701, changed state to down
Feb 10 04:04:49.566: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is down: interface down
Feb 10 04:04:49.566: destroy peer: 172.16.7.206
Feb 10 04:04:58.786: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is up: new adjacency
Feb 10 04:04:59.550: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel701, changed state to up
Feb 10 04:08:23.657: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is down: holding time expired
Feb 10 04:08:23.657: destroy peer: 172.16.7.206
Feb 10 04:08:39.941: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is up: new adjacency
Feb 10 04:08:39.961: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is down: K-value mismatch
Feb 10 04:08:39.965: destroy peer: 172.16.7.206
Feb 10 04:08:44.653: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is up: new adjacency
Feb 10 04:16:41.587: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is down: holding time expired
Feb 10 04:16:41.587: destroy peer: 172.16.7.206
Feb 10 04:16:45.771: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is up: new adjacency
CAUSE ASSUMPTION : I think that during total load period on link (at least 15sec = def. EIGRP hold time), due to bad priority or bandwith treatment of EIGRP traffic there comes to neigbour flap (hello packet delaying).
"K-value mismatch" message: On first two days of problem inspection I primarily investigated on reason of this, but according to reports found on web (on CCO not) which point to K-value config. mismatch on both routers eigrp process when using different "metric weights" cmds, I exclude some cause here, and regard this message type a result of some bad code in IOS to present such message. We (or I) never manipulate and configure that command!
Config excerpt (on opposite box it's the same, but other IP addresses) :
interface Tunnel701
ip address 172.16.7.205 255.255.255.252
ip mtu 1600
ip tcp adjust-mss 1370
tunnel source 10.107.0.205
tunnel destination 10.107.0.206
tunnel path-mtu-discovery
crypto map CM2
!
interface Serial1/0
description Bardejov-RLAN,BJ-BJ_NP_16
ip address 10.107.0.205 255.255.255.252
service-policy output POLICY_RLAN
crypto map CM2
crypto ipsec df-bit copy
crypto ipsec fragmentation before-encryption
!
router eigrp 1
passive-interface Serial1/0
passive-interface Loopback207
passive-interface FastEthernet0/0
network 10.207.207.75 0.0.0.0
network 172.16.7.0 0.0.0.255
no auto-summary
!
THIS IS IMPORTANT NOW! :
Bardejov#sh interfaces serial 1/0
Serial1/0 is up, line protocol is up
Hardware is PowerQUICC Serial
Description: Bardejov-RLAN,BJ-BJ_NP_16
Internet address is 10.107.0.205/30
MTU 1500 bytes, BW 1544 Kbit, DLY 20000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation HDLC, loopback not set
Bardejov#sh interfaces tunnel 701
Tunnel701 is up, line protocol is up
Hardware is Tunnel
Internet address is 172.16.7.205/30
MTU 1514 bytes, BW 9 Kbit, DLY 500000 usec,
reliability 255/255, txload 28/255, rxload 28/255
Encapsulation TUNNEL, loopback not set
Keepalive not set
Tunnel source 10.107.0.205, destination 10.107.0.206
Tunnel protocol/transport GRE/IP, key disabled, sequencing disabled
Tunnel TTL 255
Checksumming of packets disabled, fast tunneling enabled
Path MTU Discovery, ager 10 mins, MTU 0, expires never
Last input 00:00:04, output 00:00:04, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 19010
Queueing strategy: fifo
Output queue: 0/0 (size/max)
QUESTION (another may come later):
EIGRP calculates by default 50% of interface bw. to use for its traffic at the maximum. What in this scenario is "first enter" interface in this scenario? I think that it's the GRE iface and its bw of 9kpbs that is substantial for routing behaviour.
Thanks anybody for giving me the hints and possibly analyzing also this thing with me to some depth.
02-14-2004 04:57 AM
First, on the k value mismatch:
Feb 10 04:08:39.961: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.206 (Tunnel701) is down: K-value mismatch
This could be because of a new feature in EIGRP, committed just recently, called the "goodbye message." With this feature, EIGRP sends a "goodbye" to a neighbor if it is being deconfigured or shut down, to keep its neighbors from waiting on their hold timer to take the neighbor down (it makes the network converge more quickly around a neighbor known to be going down). If a router times out a neighbor due to its hold timer expiring, I think we will also send a goodbye message to the neighbor we are timing out.
We just redid the documentation on CCO to insert a note about this, but I don't see the change out there yet.
"EIGRP calculates by default 50% of interface bw. to use for its traffic at the maximum. What in this scenario is "first enter" interface in this scenario? I think that it's the GRE iface and its bw of 9kpbs that is substantial for routing behaviour."
I EIGRP is going to pull its bandwidth (from which to calculate 50% of) from the interface descriptor, which is the tunnel in this case. Since the tunnel is set to 9kb, EIGRP is only going to use 4.5kb, which, if there are good number of routes here, may not be enough.
The next step is to look at what the logs on the other router are saying about the neighbor reset. There are several possible cases here:
-- If the logs are showing a stuck in active, then you probably need to increase the bandwidth on this link a bit. I doubt this is the case, but it is possible.
-- If the logs indicate that you are taking the other neighbor down because of a hold timer expiration, then you could be seeing a problem with the link dropping too many packets.
Can you get to the other router, the other end of the tunnel? If you post the logs from that router relating to this EIGRP neighbor, we can probably do a little more analysis, and help figure it out more.
:-)
Russ.W
02-16-2004 11:16 AM
Hi Russ,
thank you for involvement, answering my question (I knew that! :-) and providing with first indicia.
I paste here now log from opposite router from that day, the time is correct, as both boxes are NTP-fed from common source.
Feb 10 04:04:49.534: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: holding time expired
Feb 10 04:05:02.338: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:08:39.957: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Feb 10 04:08:43.469: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:16:45.788: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Feb 10 04:16:49.256: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:20:19.263: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: holding time expired
Feb 10 04:20:31.703: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:25:24.690: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Feb 10 04:25:25.150: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 14:00:20.574: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Today I have just completed my lab with all the cabling and tunnel transport backbone connections, and tomorrow will do the hard overload tests with customer traffic and EIGRP configured. EIGRP is great protocol, but think also very bandwith- and treatment-sensitive - I know that, so I will start the test as a smart guy, then I will introduce it into the hell. :))
Bye now, I'm looking forward to you next comment.
Peter.
02-16-2004 11:17 AM
Hi Russ,
thank you for involvement, answering my question (I knew that! :-) and providing with first indicia.
I paste here now log from opposite router from that day, the time is correct, as both boxes are NTP-fed from common source.
Feb 10 04:04:49.534: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: holding time expired
Feb 10 04:05:02.338: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:08:39.957: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Feb 10 04:08:43.469: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:16:45.788: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Feb 10 04:16:49.256: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:20:19.263: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: holding time expired
Feb 10 04:20:31.703: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 04:25:24.690: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Feb 10 04:25:25.150: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is up: new adjacency
Feb 10 14:00:20.574: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor 172.16.7.205 (Tunnel701) is down: peer restarted
Today I have just completed my lab with all the cabling and tunnel transport backbone connections, and tomorrow will do the hard overload tests with customer traffic and EIGRP configured. EIGRP is great protocol, but think also very bandwith- and treatment-sensitive - I know that, so I will start the test as a smart guy, then I will introduce it into the hell. :))
Bye now, I'm looking forward to you next comment.
Peter.
02-16-2004 05:47 PM
Okay, so one end is reporting k value mismatches, and some hold timer expirations, and the other end is reporting hold timer expirations. I'd say you're having a problem getting packets across this link. :-( You could wind your hello timers out, so you can drop more packets without killing the neighbor, but I'm not certain how much of a help this will be. Perhaps setting the hello interval down much lower, but leaving the hold timer up mich higher, so there's a 4 or 5 to 1 ratio, rather than a 3 to 1.
Of course, if you're losing this many packets across the link, it doesn't tend to make me think it's going to work well for data, either. At least you aren't getting retransmission timeout exceeds, which would be much harder problem to diagnose and try to fix....
Anyway, my next step would be to look at the interface counters, and see what's up there. Are we really losing that much traffic on the link? If you ping across the link, are you seeing a lot of dropped packets, or does it look like it's just EIGRP having a problem on this link? If it's just EIGRP, I would increase the nadwidth percent, and play with the hello and hold timers (above), and see if I could it to stabilize. I would reduce the number of routes being transmitted across the link to the minimum possible (which you may have already done).
I would possibly try unicast neighbors--it shouldn't matter on a point-to-point link like a tunnel, but it might. I would also make certain I test a full query range across the link. There's no point in stable steady state neighbors if a pull of a cable, causing a full set of queries to be sent across the link, is going to reset the neighbors....
Anyway, this might be getting long'ish on the forum (?). While I don't mind continuing here, you can also email me off line with more information, if you want, and I might be able to offer suggestions or help.
:-)
Russ.W
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide