how much slower ?
Did you capture a sniffer trace to compare ?
The sniffer trace is always necessary when dealing with content devices.
It is important to see what's going on.
There could drop, retransmission or other stuff going on.
Also, when comparing just a few connections, it will for sure be slower since we are adding a device in the path and each device introduces a delay.
However, I would suggest you to try with 500k active connections going through. Then let me know which one is faster.
Obviously, if the difference of speed with a few connections is already more than 1 or 2 sec, then there must be something wrong and a sniffer trace is absolutely required to identify the problem.
Thanks,
Gilles.