It is caused by the way how the GRE keepalives work. I suggest reading these two documents first:
In short, a router sending keepalive in essence constructs an IP packet whose source is the remote endpoint and recipient is the router itself. It then encapsulates it using GRE and attaches another IP header to it with the send being itself and destination being the remote end. This packet will be sent to the remote end, there it will be decapsulated and afterwards it will be routed as usual, thereby returning the inner IP packet back to the original sender.
Obviously, this keepalive mechanism is not integrated with the VRF feature. The keepalive packet may arrive at the remote endpoint but after it is decapsulated the association with the receiving Tunnel interface is obviously lost and the remote endpoint tries to route that packet back using the global routing table, not the VRF in which the tunnel resides. This in turn causes the keepalive packet to never return.
I am unfortunately not aware of any backup keepalive mechanism for this, apart of running routing protocols over the tunnel with more aggresive hello and dead intervals.
Thanks for pointing out the cause of this problem. The solution, apart from routing protocols over the tunnel, is to put a static route in the global routing table to allow the keepalives to be correctly routed back to the originating router. Thus if you have a tunnel with destination 10.1.2.1 which is reachable through interface Gi1 in vrf RED then by simply issuing the following:
ip route 10.1.2.1 255.255.255.255 Gi1
you will enable the router to route the keepalives back to the remote end of the tunnel.
I grant that it is not a very elegant solution (I dislike static routes on principle) but it allows the keepalives to work until Cisco get round to fixing the issue
Thanks for joining this thread! Man, it's been a while since I posted my response
Your solution is spot-on. As you admit yourself, it is not the most elegant way of solving the problem, and it would ultimately fail it you had overlapping or identical remote tunnel endpoint addresses in different VRFs - but it still can be a workaround in simple scenarios.
Please keep on having these great suggestions!
I have tested GRE keepalives with VRF on 4400 routers running version 16. For some reason it can understand how to route back traffic to the originator with no need of static route. I tested exactly the same setup on IOU running version 15 and encountered the problem. 3900 routers running version 15 also showed the same problem.