I think the bigger question is, why would you want to do this? Like you noted, the solution guide says the default is 100 milliseconds (see below). Do you get your support from Cisco and/or a partner, as if you switch to something non-standard, or go to upgrade at some point, TAC/A2Q may flag it? ----- "There are several parameters associated with heartbeats. In general, leave these parameters set to their system default values. Some of these values are specified when a connection is established. Other parameters can be set in the Windows registry. The two values of most interest are:
The amount of time between heartbeats
The number of missed heartbeats (currently hard-coded to five) that indicate a failure
The default value for the private heartbeat interval between redundant components is 100 milliseconds. One side can detect the failure of the circuit or the other side after 500 ms. The default heartbeat interval between a central site and a peripheral gateway is 400 ms. In this case, it takes 2 seconds to reach the circuit failure threshold."
Are you using the same physical network for both private and public traffic? That is also a red flag for support. There are QoS settings that may help prioritize the high priority private traffic as well. One thing to try is a continuous ping to/from the private interfaces to get an idea of just how much variation there is in latency and packet loss. IMHO if your private network is not robust enough to consistently handle the defaults, you are better off running simplex than messing with the timers or counters - rarely does it improve matters and usually causes unpredictable performance.
We had similar case the system fails over from PG site A (Active node) to PG site B (Standby node) which isn't smooth and affecting on agents' finesse web interface and when it comes back to Site A again after it becomes reachable finesse flaps again. This flapping wasn't detected on our Network monitoring tools.
After a lot of discussions with our partner, we came with the conclusion that if we would increase the failover interval, we would have to get cisco professional services to redesign every component that relies on the CTI High available service and will pay professional services to any other system that will integrate with it to adjust its timeout to support that Custom PG high availability. and we could also face possible instability of these integrated systems' because of it.
You can find all about this topic, if you read chapter High Availability and Network Design in Solution Design Guide for Cisco Unified Contact Center Enterprise document.
However, After sometime issue was resolved on its own, when we tracked it back to see what might have solved it, we found that during one of unrelated CiscoTAC case, CiscoTAC Engineer did exit_opc command on PG that might have fixed it.
Caution: Use caution when you issue the exit_opc command. This command instructs the OPC process to exit on both sides of the PG, if duplexed. Node Manager forces the process to restart, which then forces a reload of the configuration for the Call Router. All internal peripheral and agent states are flushed. Then, OPC and Peripheral Interface Manager (PIM) relearn the PG and its configuration.