cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Who Me Too'd this topic

VIC1240 running out of buffers

igucs_support
Level 1
Level 1

We are having a problem with what appears to be NIC hardware buffers being over run.

We run B200 M3 servers with1 x E5-2670 v2 10C CPU+32GB RAM and the VIC1240 NIC Card on RHEL 6.4.

These servers run on both 6120 and 6248 FI's running firmware 2.1(3a).

These servers are running applications which receive UDP traffic at high packet rates when subscribing to approx 16 multicast groups.

During high load events ( we believe these to be microburst of traffic ) we see the processes which are running on these servers error with missing message sequences and the process goes to re-sync the missing data or kicks itself out of the cluster.

Both outcomes are not acceptable.

These errors co-incide with the rx_no_bufs value increasing.

NIC statistics:

     tx_frames_ok: 661216

     tx_unicast_frames_ok: 547203

     tx_multicast_frames_ok: 113970

     tx_broadcast_frames_ok: 43

     tx_bytes_ok: 60485096

     tx_unicast_bytes_ok: 51535636

     tx_multicast_bytes_ok: 8946708

     tx_broadcast_bytes_ok: 2752

     tx_drops: 0

     tx_errors: 0

     tx_tso: 0

     rx_frames_ok: 1576022571

     rx_frames_total: 1576031052

     rx_unicast_frames_ok: 546721

     rx_multicast_frames_ok: 1575481611

     rx_broadcast_frames_ok: 2720

     rx_bytes_ok: 656139514804

     rx_unicast_bytes_ok: 356487232

     rx_multicast_bytes_ok: 655789236895

     rx_broadcast_bytes_ok: 185672

     rx_drop: 0

     rx_no_bufs: 8481

     rx_errors: 0

     rx_rss: 0

     rx_crc_errors: 0

     rx_frames_64: 5856

     rx_frames_127: 12490212

     rx_frames_255: 693609513

     rx_frames_511: 456069311

     rx_frames_1023: 242721785

     rx_frames_1518: 171134375

     rx_frames_to_max: 0

We have tried up'ing the recieve queues to 8 and max buffers of 4096 with interupt timers of 10us, however still get drops.

The defaults of 1 queue and ether 512 or 1024 buffers( int timer 10us ) did not cut it and we had lots of errors.

    eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 10000

    link/ether 00:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

    RX: bytes  packets  errors  dropped overrun mcast

    2901579300 1590139645 0       8481    0       1589598683

    TX: bytes  packets  errors  dropped carrier collsns

    60491368   661314   0       0       0       0

The packet drops seem to be related how many multicast groups we subscribe as running perfomance tests across 1 multicast group performs fine with per second message counts far exceeding the amounts coming down the line when we subscribed to all 16 multicast groups.

It looks like the performance of the VIC 1240 is not up to it.

Would a different network card such as the VIC 1280 help .... the way it looks is that the ASIC is the same it just has more channels to enable 80GB/sec ?

We are having QoS issues in parallel with this issue at the moment and we are looking at implementing a no-drop QoS policy on the VLAN on which the server operates.

Is it logical to think that this is independent of the drops on the server NIC card as these are a plain case of the card not keeping up ?

Any input would be much appreciated.

Rob.

Who Me Too'd this topic