I have been comparing the latency performance of the K3P-Q (x100) vs the K35-Q (x40) cards on our systems using the exanic_perf_test application in loopback mode. While the stock image of the K3P-Q has lower latency than the K35-Q, I am finding that the FDK implementation of the K3P-Q has higher latency. Is there an optimisation that is only included in the stock image? I compiled both FDK examples using Vivado 2023.2 and internal probes show that the additional latency is within the PCIe IP block.