I believe I'm encountering an issue or a conflict between the non-configurable 5 minute ISE External MDM heartbeat interval and the equally non-configurable 4 minute Azure Public IP outbound idle timeout setting and I'm curious if anyone else has experienced this issue.
So far, it appears to be not enough of a problem to prevent endpoints from authenticating via the Policy Set that references the External MDM server, but there's definitely a connection issue that's prevalent in our logs and I'd like to get to the bottom of it.
Consistently, this Critical Alarm triggers:
External MDM Server Connection Failure. : Reason is Connection Failed to the MDM server host - example.mdmserver.com; and port - 443 : Connection timeout occurred. Check if the MDM server is reachable : SocketTimeoutException message = Read timed out ServerType = MobileDeviceManager (PSN)
After consulting w/ TAC, collecting packet captures and debugging the mdm component, it appears there's a 'Heartbeat job' that runs every 5 minutes from the ISE node toward the External MDM server url. According to TAC, this heartbeat is not configurable and it keeps the same tcp session or stream alive to determine if the MDM server is reachable. It's not a new tcp session every heartbeat interval. A new tcp session from the ISE server toward the MDM server is only established if the previous tcp session was terminated for some reason.
In my scenario, the ISE servers are located in Azure and their outbound traffic flow to the internet involves an Azure Public IP address. As Microsoft documentation states, static public IPs "Have an adjustable inbound originated flow idle timeout of 4-30 minutes, with a default of 4 minutes, and fixed outbound originated flow idle timeout of 4 minutes."
Currently, the Public IPs have the default values but I'm wondering if setting the inbound idle timeout to something higher than the ISE hearbeat interval will even accomplish my goal if the outbound timeout is hard set at 4 minutes. Also, I'm curious if configuring something like tcp reset on the load balancer will 'force' a timeout once the idle timeout expires and result in less downtime than waiting for the tcp session to timeout naturally?
Please let me know if anyone else has encountered this or if you have any recommendations to try and make this a more stable connection.
Thank you,