cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1496
Views
0
Helpful
1
Replies

WRONG SYN,ACK SEQ Number

Hi Guys,

I have been investigating an issue for a month now and still can't figure out what the issue is.

Basically we have clients connecting to a server on port 80 but 1% of the connections fail due to a socket error. The packet traverses as below

Client - firewall - ipsec tunnel - firewall - loadbalancer - server

When we do a TCP dump between the firewall and loadbalancer we can see the successfull connections which follows the normal 3 way handshake as expected but when we look at the failed transactions we are seeing the follwing

SYN SEQ(lets say 1000000)

[TCP ACKed unseen segment]SYN,ACK SEQ(totally random number 124124) ACK(totally random number + 1 - 124125)

[TCP Spurious Retransmission] SYN SEQ(1000000)

[TCP ACKed unseen segment]SYN,ACK SEQ(t124124) ACK(124125)

[TCP Spurious Retransmission] SYN SEQ(1000000)

And so on until the packet is dropped.. So it seems either the LB or the server is sending the WRONG SYN,ACK SEQ number?? I checked previous TCP streams and also filtered on this wrong SEQ number and can't find it anywhere in my capture so its not something that was stuck in a previous segment or so. I can also confirm I have no packets missing from my wireshark.

The issue is totally random during the day and not a load or busy period issue.

From what I can see and read on the internet the LB might be involved with this 3 way handshake even though one would expect this to be between the server and the client.

I can't do a capture between the server and the LB due to the server being a VM that sits on a host on a different switch where I dont have access to atm. Also the LB Logs doesn't show any issues.

1 Reply 1

Greetings,

The LB can be configured to act as proxy for TCP connections so that clients connect to the LB with 3-way handshake and then LB connects to server with a different socket. (This can even be an already open socket being reused.)

You could check the LB configuration but it's hard to misconfigure a single server for TCP proxy and not the others if they are in the same server farm.

Do you have a lot of servers behind the LB for this service? This may but only a specific server having the issue only some of the time. This could explain the 1%.

If you have permission and sufficient capacity you can disable servers one at a time on the LB to see if the issue remains or not.

If it does than I'd focus on the LB more. If it goes away with a certain server then look at that specific server.

Hope this helps.

JF

Review Cisco Networking for a $25 gift card