01-20-2017 06:20 AM - edited 03-01-2019 01:02 PM
We are seeing fcpio_data_cnt_mismatch in our vmkernel.log on one of our 10 esxi hosts running esxi 6.0 update 2. We are using fnic_drive 1.6.0.28. We started out with firmware 3.1(1k) and have upgrade firmware to 3.1(2e). It seems to be very similar to bug CSCva47085, but we are running C240M4s that are UCS Managed.
The VMkernel.log files shows
2017-01-18T17:06:20.086Z cpu3:33174)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60000970000197000380533030303238" state in doubt; requested fast path state update...
2017-01-18T17:06:20.086Z cpu3:33174)ScsiDeviceIO: 2651: Cmd(0x439e01625f00) 0x28, CmdSN 0xb6f from world 33043 to dev "naa.60000970000197000380533030303238" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-01-18T17:06:30.515Z cpu32:33560)<3>fnic : 2 :: hdr status = FCPIO_DATA_CNT_MISMATCH
2017-01-18T17:06:30.515Z cpu7:36674)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.60000970000197000380533030303238" state in doubt; requested fast path state update...
2017-01-18T17:06:30.515Z cpu7:36674)ScsiDeviceIO: 2651: Cmd(0x439e17558f80) 0x28, CmdSN 0xb9f from world 36434 to dev "naa.60000970000197000380533030303238" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-01-18T17:06:30.641Z cpu32:33810)<3>fnic : 2 :: hdr status = FCPIO_DATA_CNT_MISMATCH
2017-01-18T17:06:30.641Z cpu7:36674)ScsiDeviceIO: 2651: Cmd(0x439e15726c40) 0x28, CmdSN 0xbe9 from world 36437 to dev "naa.60000970000197000380533030303238" failed H:0x7 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
2017-01-18T17:06:32.446Z cpu2:37413)VMotionRecv: 659: 1484759080517443 D: Estimated network bandwidth 318.194 MB/s during pre-copy
Looking at the adapter in the UCS shows this
CSCUCSView-A# connect adapter 1/1
adapter 0/1/1 # connect
adapter 0/1/1 (top):2# show-log 100
170119-11:21:20.472158 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71
170119-11:21:20.472207 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71
We think this is strange that this is only happening on one server out of ten, they are all the same model and using the same service profile, same uplinks. We do have a TAC case open, 681634353, but after I gave them the logs I haven't heard anything back. I am wondering if anyone else has seen something similar.
Solved! Go to Solution.
01-23-2017 03:10 PM
Hey Kevin,
That's good information. I think the next step would be to rule in/out hardware and swap out the VIC adapter. Please share the results after the swap.
-Wes
01-26-2017 06:47 AM
I got my replacement VIC in yesterday. I replaced it, it took a few reacknowledgements for the UCS Manager to see the server again. Since the replacement, I have not seen the error message in the vmkernel.log. I tested the card with various VM operations, lots a vmotions, clone VMs, recompose a few pools, still clean. I believe the issue is now resolved.
01-20-2017 08:38 AM
Hello,
This error is typically indicative that the host is receiving frames out of order from the storage array.
Causes:
1. Incorrect FNIC driver
2. Physical Layer issues on the path to storage
I took a look at your case, and both the FCID that are reporting the issue are from your EMC array:
VSAN 4:
--------------------------------------------------------------------------
FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE
--------------------------------------------------------------------------
0xec0000 N XXXXXXXXXXXXXXXX (EMC) scsi-fcp:both 253
<output omitted>
0xec0180 N XXXXXXXXXXXXXXXX (EMC) scsi-fcp:both 253
Please engage EMC and see if they can provide insight on why we are getting out of order frames from the array on multiple FCID.
HTH,
Wes
01-20-2017 08:49 AM
Wes,
Thanks for looking into the case.
I have cases opened with EMC and with VMware. Hopefully somebody can found out something. Cisco TAC advised me to upgrade from the 1.6.0.25 fnic driver delivered in the 6.02 Cisco ISO file to 1.6.0.28 driver. Servers 2-10 go thru the same interconnect switch to the storage, which makes me think the uplinks from the 6248s are good. I don't see errors in servers 2-10. I am thinking it has is something from the server to the port where it is connected to in the interconnect. Not sure if it is card or cable or something else. I am not seeing any errors on the interconnects on any ports. Our interconnects are connected directly to the storage array. Are you only seeing errors on VSAN 4 or was that just an example?
Thanks.
01-20-2017 09:14 AM
Hey Kevin,
If you suspect a problem with the interfaces on the FI, you could try to re-integrate on different interfaces and see if the problem persists. You could also try to swap the cabling/SFP between the FI and the MLOM. It is possible that the VIC may be faulty, however, I would expect other failure messages in the adapter if this was the case vs just getting frames out of order.
I just checked for the FCID that are reporting out of order frames, and the source and destination FCID is EMC on all accounts.
HTH,
Wes
01-20-2017 02:54 PM
I can try on different ports to see if that makes a difference. Where do you see fcid info? I have the tar files I collected and just curious.
01-21-2017 04:53 AM
Hey Kevin,
Thanks for the update. The FCID info is in the messages in the adapter logs:
170119-11:21:20.472158 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71
170119-11:21:20.472207 ecom.ecom_main ecom(4:1): fcpio_data_cnt_mismatch for exch 5c85 status 1 rx_id 364 s_stat 0x3 xmit_recvd 0x55000 burst_offset 0x55000 sgl_err 0x0 last_param 0x54800 last_seq_cnt 0x0 tot_bytes_exp 0x80000 h_seq_cnt 0x29 exch_type 0x0 s_id 0xec0000 d_id 0xec0000 host_tag 0x71
s_id = source ID
d_id= dest ID
HTH,
Wes
01-23-2017 02:27 PM
I tried different ports on the interconnect and replacing the cable but that didn't help either.
I saw this https://quickview.cloudapps.cisco.com/quickview/bug/CSCva47085 and tried those commands. Still problems only on one side.
adapter 0/1/1 # connect
adapter 0/1/1 (top):1# attach-mcp
adapter 0/1/1 (mcp):1# dcem-macstats 0
TOTAL DESCRIPTION
227 Tx frames len == 64
1961961 Tx frames 64 < len <= 127
6518 Tx frames 128 <= len <= 255
1554 Tx frames 256 <= len <= 511
50223 Tx frames 512 <= len <= 1023
7379 Tx frames 1024 <= len <= 1518
5419 Tx frames 1519 <= len <= 2047
326982 Tx frames 2048 <= len <= 4095
2360263 Tx total packets
907325360 Tx bytes
2360263 Tx good packets
2358177 Tx unicast frames
1725 Tx multicast frames
361 Tx broadcast frames
166 Tx per-priority pause frames
18 Rx Frames len == 64
544173 Rx Frames 64 < len <= 127
69711 Rx Frames 128 <= len <= 255
8907 Rx Frames 256 <= len <= 511
62079 Rx Frames 512 <= len <= 1023
7674 Rx Frames 1024 <= len <= 1518
13592918 Rx Frames 1519 <= len <= 2047
1320493 Rx Frames 2048 <= len <= 4095
1 Rx Frames 4096 <= len <= 8191
15605974 Rx total received packets
23684998386 Rx bytes
15604935 Rx good packets
15469793 Rx unicast frames
84034 Rx multicast frames
51108 Rx broadcast frames
1 Rx frames with VLAN tag
1039 Rx CRC error frames not stomped
18 Rx per-priority pause frames
907325360 Tx bytes for good packets
23683254176 Rx bytes for good packets
0.000bps Tx Rate
0.000bps Rx Rate
adapter 0/1/1 (mcp):2# dcem-macstats 1
TOTAL DESCRIPTION
357 Tx frames len == 64
319628 Tx frames 64 < len <= 127
6744 Tx frames 128 <= len <= 255
2445 Tx frames 256 <= len <= 511
61779 Tx frames 512 <= len <= 1023
8714 Tx frames 1024 <= len <= 1518
41303 Tx frames 1519 <= len <= 2047
325616 Tx frames 2048 <= len <= 4095
766586 Tx total packets
836830402 Tx bytes
766586 Tx good packets
764637 Tx unicast frames
1478 Tx multicast frames
471 Tx broadcast frames
88 Tx frames with VLAN tag
14 Tx per-priority pause frames
162 Rx Frames len == 64
542052 Rx Frames 64 < len <= 127
65522 Rx Frames 128 <= len <= 255
13155 Rx Frames 256 <= len <= 511
65764 Rx Frames 512 <= len <= 1023
10361 Rx Frames 1024 <= len <= 1518
67240 Rx Frames 1519 <= len <= 2047
1419167 Rx Frames 2048 <= len <= 4095
2183423 Rx total received packets
3231684906 Rx bytes
2183423 Rx good packets
2046554 Rx unicast frames
85079 Rx multicast frames
51790 Rx broadcast frames
120 Rx per-priority pause frames
836830402 Tx bytes for good packets
3231684906 Rx bytes for good packets
0.000bps Tx Rate
0.000bps Rx Rate
01-23-2017 03:10 PM
Hey Kevin,
That's good information. I think the next step would be to rule in/out hardware and swap out the VIC adapter. Please share the results after the swap.
-Wes
01-24-2017 07:13 AM
Thanks, I will continue to update when the new adapter comes in.
01-26-2017 06:47 AM
I got my replacement VIC in yesterday. I replaced it, it took a few reacknowledgements for the UCS Manager to see the server again. Since the replacement, I have not seen the error message in the vmkernel.log. I tested the card with various VM operations, lots a vmotions, clone VMs, recompose a few pools, still clean. I believe the issue is now resolved.
08-28-2017 10:54 PM
01-20-2017 09:22 AM
Another thing I noticed is that the error is only reporting issues with FCID on the B side fabric.
As a test, you can shut down the vHBA on the B side or even the B side connection from the FI to MLOM and see if the errors persist. If they do not, you know there is something wrong on the B side path to the storage.
HTH,
Wes
01-20-2017 02:51 PM
Wes,
I disabled the B side on Server 1 and the messages stopped. No problems on Side A. I tried replacing the cable on side B and that didn’t work. The only thing connected is EMC VMAX and cisco C240M4.. I would think if the interface is bad on the FI I would see errors, which haven't seen yet.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide