cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1005
Views
2
Helpful
3
Replies

Exablaze X10网卡在Linux上显示RX-Dropped

jeffrey-h
Level 1
Level 1

ExaBlaze X10低延迟网卡,安装在Centos8(Kernel 4.18)上。使用ifconfig命令显示网卡的RX-Dropped不断增加。但是直接使用exanic-config工具检查网卡,则发现RXDropped为0。请问两者数据不一致的原因是什么

ExaNIC X10的驱动版本为2.2.2

1 Accepted Solution

Accepted Solutions

Translated your question, i hope this is the correct reply

The inconsistency you observed between the RX dropped statistics reported by ifconfig and the exanic-config tool is likely caused by different statistical methods:

  1. The RX dropped statistic shown by the ifconfig command is obtained from the Linux kernel network statistics. These statistics are collected by the Linux kernel network stack.
  2. The exanic-config tool directly reads the counter values from the ExaNIC network card registers.

The statistics in the Linux kernel network stack may be more comprehensive than the card's own statistics. For example, packets dropped by Linux in the network stack due to memory shortage would also be counted in RX dropped.

However, the ExaNIC network card's own statistics could be more accurate, without being affected by other limitations of the network stack.

Therefore, if exanic-config reports RX dropped as 0, the problem is likely not at the ExaNIC card level, but more likely occurring later in the Linux kernel network stack.

You can further troubleshoot the downstream packet loss causes via kernel logs. For example, check dmesg logs for errors about packet loss due to memory shortage. You can also enable Linux network debugging traces, such as rtnl, skb, etc, to trace the packet flow.

Such statistical inconsistency is common when using external network cards, and you need to look at statistics from different levels to better troubleshoot the problem.

Please mark this as helpful or solution accepted to help others
Connect with me https://bigevilbeard.github.io

View solution in original post

3 Replies 3

Translated your question, i hope this is the correct reply

The inconsistency you observed between the RX dropped statistics reported by ifconfig and the exanic-config tool is likely caused by different statistical methods:

  1. The RX dropped statistic shown by the ifconfig command is obtained from the Linux kernel network statistics. These statistics are collected by the Linux kernel network stack.
  2. The exanic-config tool directly reads the counter values from the ExaNIC network card registers.

The statistics in the Linux kernel network stack may be more comprehensive than the card's own statistics. For example, packets dropped by Linux in the network stack due to memory shortage would also be counted in RX dropped.

However, the ExaNIC network card's own statistics could be more accurate, without being affected by other limitations of the network stack.

Therefore, if exanic-config reports RX dropped as 0, the problem is likely not at the ExaNIC card level, but more likely occurring later in the Linux kernel network stack.

You can further troubleshoot the downstream packet loss causes via kernel logs. For example, check dmesg logs for errors about packet loss due to memory shortage. You can also enable Linux network debugging traces, such as rtnl, skb, etc, to trace the packet flow.

Such statistical inconsistency is common when using external network cards, and you need to look at statistics from different levels to better troubleshoot the problem.

Please mark this as helpful or solution accepted to help others
Connect with me https://bigevilbeard.github.io

Thanks so much and this is a great explaination.

So on this stage I tried to identfy the reason that why linux kernel might drop packets and used 'ethtool' command to look into some detail statistics, especially the ringbuffer size which is the very common reason that causes packet drops (usually enlarge the ringbuffer can solve the problem). Somehow `ethtool -g/-G` didnot work for ExaNIC X10, which said:

   'Cannot get current device settings: Operation not supported'

I've upgraded ExaNIC driver from 2.2.2 to 2.7.3, hope to solve this issue but with no luck.

Anymore thoughs/suggestions about the ethtool issue?

Sure - maybe try looking at ring buffer size as a potential cause of drops. Ethtool relies on the driver implementing support for getting ring buffer params, and it seems the ExaNIC driver may not have this. You can check if ExaNIC provides any custom tools for getting ring buffer info, as vendors sometimes have their own tooling, i am not sure here on what tools.

 

The exanic-config tool likely has some useful buffer stats, though may not be as detailed as ethtool. I would check dmesg logs after driver upgrade for any errors or messages about missing ethtool support or consider probing the driver code to see if ring buffer size/tuning is possible, though it may require custom coding. For now, focus troubleshooting on factors like CPU or memory bottlenecks that could cause drops.

Hope this helps.

Please mark this as helpful or solution accepted to help others
Connect with me https://bigevilbeard.github.io