cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
647
Views
9
Helpful
4
Replies

UDP Flow problem. FCBs exhausted.

mcroft
Level 1
Level 1

Hi,

I have either found a problem with the CSS and UDP flows (specifically SIP over UDP).....OR....I am missing something, hopefully I am missing something. For test, I have a very simple design :

1x CSS115001 to Load Balance UDP/SIP traffic.

Two Servers running a SIP application (VoIP)

One VIP address.

By Default, when I configure "application sip" on the CSS content rule, we see the "flow-state 5060 udp flow-enable" command in the global config (as expected).

As I understand the SIP-flow allows UDP flows using port 5060 to be set-up. These UDP flows last approx five seconds before cleaned up by the garbage collector.

The CSS is configured and VoIP calls are "successfully" set-up using the UDP (SIP) protocol from an 'Internet-Client' to the 'SIP-Server'. Now, if the call lasts over five seconds and the SERVER disconnects by sending a UDP (SIP-BYE) packet, this particular UDP packet does NOT reach the 'Internet-Client'. This is because the flow has expired and been cleaned-up. Therefore, there is no state-information and the CSS is unable to route this UDP (SIP-BYE) packet.

To summarize, this default SIP flow-timer is uselessfor VoIP calls over 5 seconds.

Therefore, an option is to increase the flow-timer for the SIP UDP port. flow-timeout-multiplier 400 (this gives me approx 1.5 hours maximum length phone call)

However, this poses new problems.

1) After this "1.5" hour limit, ALL UDP (SIP) flows will time-out and the garbage collector will clean-up. Flows will be lost and therefore all UDP packets that are sent from the SERVER to CLIENT will never reach their destination. (i.e SIP BYE packets for call hang-ups). I will therefore experience this flood problems every 1.5 hours (or so).

2) We will be limited to 65500 flows.

Although this seems like an generous limit, by using SIP over UDP, this is actually not many flows at all. Let me illustrate ...

One SIP-Client can produce hundreds of flows per hour. A SIP Client can send a UDP Keepalive packet to a Server every 30 secs or even less(SIP REGISTER).

Each UDP packet therefore creates a NEW flow. 120 Flows per hour. In addition to these keepalives from client to server, (or so called SIP-Registers) each individual VoIP Call also produces several UDP flows.

So as you can appreciate, 65500 flows is exhausted easily under one hour.

(we are a ITSP and have over 5x calls per seconds and thousands of SIP-Clients registering, so this design will not work for us and appears flawed).

Hopefully you have not fallen asleep at this point and you can possible tell me I am missing a command or two. I am sure SIP and UDP flows work well and is a proven design within the CSS. (I hope)

Please help if ya can.

Much Appreciated.

Thank you.

Matt

4 Replies 4

Gilles Dufour
Cisco Employee
Cisco Employee

I do not think the timeout is 5 sec.

If you do a 'show flow-timeout default' you will see it is 16 sec.

But anyway, configuring a flow-timeout-multiplier is a good idea.

However, you're misunderstanding when the flow times out. This is not an absolute time.

This is an idle timeout.

Which means that each time a packet is received, the timer is reset.

And the flow is considered idle only if the timer reaches the timeout value.

So even with a default idle timeout of 16 sec you could have a phone call lasting many hours.

Moreover, the timer runs for each flow separately. It's not like all flows timeout at the same time.

Finally, the amount of available FCB is higher than 65k.

Go in llama mode and do a 'flow stat'.

You will see the following line

Max Number of Flow Control Blocks 520721

The value above is for a CSS11503 with 3 modules.

Gilles.

Giles,

Thank you for your super speedy response. Much appreciated ! Okay, I understand better now, I was misunderstanding the 'idle' and 'absolute' time for udp-flows.

Agreed, the default timeout is 16 secs for UDP and TCP. I believe I was mistaking the "garbage collector" time-out after 5 seconds for UDP and 15 seconds for TCP.

From testing sip udp-flows I have found some big problems .... not sure if you have come across ?

1) Problem with "corrupt" flows ..

From one single sip-client I send approx 1000x SIP-INVITEs (UDP) packets to a SIP-Server behind the CSS.

This sets-up 1000x flows and 1000x FCBs. We have over 64,000 available free flows remaining, so there is plenty of capacity.

Now, when I send SIP packets to the SIP-server from a second sip-client (with a different src IP address) via the CCS11501, I see the responses on BOTH internet clients.

(bizarre). It appears the CSS flows or port mapping become corrupt and the CSS is confused where the response should be directed.

1a) A slight variation of the first problem.

Again, I test using two internet clients and ramp-up the active flows. When approximately 25,000 flows are active I see another problem:

The 2nd Internet-client, which periodically sends SIP REGISTERS (keeplives), every 30 seconds stops receiving responses. When i look at a trace on the server, I can clearly see these packets reaching the SIP-Server and responses returned, but these response SIP packets do not exit the CCS. It appears the flows/port mapping have hung or caused the CSS to fail and the UDP ack to the keepalive is never routed out of the CCS.

NOTE : Both the above is overcome by a reboot, clearing the flow/FCBs. And all works !

Have you ever seen this behaviour ?

2) Out of interest, do you know what's the behaviour of the CCS11501 when Flows (65000) and FCBs (120000) are exhausted ?

(I think the CSS can be DoS'd in a SIP environment quite easily).

If you can offer any help or advise, that'd be great. Thanks for reading.

Kind Regards

Matt

Matt,

a flow is an FCB [Flow Control Block].

But we only allocate 65k in software and 120k in hardware. When more are required, the CSS will allocate more memory.

There is a copy of every hardware flow in software. so, do not add those numbers.

What you need to look at is the value "Max Number of Flow Control Blocks" because this is what the CSS computes as the max FCB based on the memory.

Regarding, the problem you describe, since I'm not a SIP expert, I would need to know your config, and the source/destination ports used by your clients. Maybe a small sniffer trace showing the problem.

Gilles.

Giles,

Thanks for the info,

I have some small traces that illustate the problem.

Can I possible email these to you ?

I cannot upload to this discussion site (company policy) arrhhh.

really appreciate the hlp.

regrds

Matt

Review Cisco Networking for a $25 gift card