NSR troubleshooting involves a lot of components, the first thing to check on an asr9k is show pfm loc all for any failure in the punt inject path from lc to standby/active rsp. Then we need to check if the issue is occurring for a lot of different peers or just one and multiple times or just once. Check that the timers for BGP are not aggressive. Check tcp dump-files and traces, socket traces, bgp traces, nsr traces, etc. I would recommend opening a TAC case to get to the bottom of it as it involves a lot of processes talking to each other.
In short the reason for NSR going disabled on a peer is that after 60% of the session timeout we dont have a hello message, that then switches how the hellos messages are sent and received from using the standby to the active RSP.
Sam