cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
23633
Views
5
Helpful
17
Replies
Highlighted
Beginner

Error disabled. Reason:DCX-No ACK in 100 PDUs

Hi,

  I have a customer who lost all connectivity from the ESX host for both networking and FCoE because (as the title suggests) the interfaces were error disabled.  This happened across all 8, dual ported, dual homed CNAs at the same time.  Does anyone have any idea what this error comes from?  The are using ESX 4.0 and are running Nexus 5020 with 4.2(1)N2(1a).

Thanks,

Thom

17 REPLIES 17
Highlighted
Beginner

Thom,

I am seeing this same sort of issue with a Nexus 5548P with 5.0(2)N1(1).  the servers at issue are Oracle M5000 servers with QLogic CNA's.  These servers are configured as dual port, dual homed on the CNA's.  This issue occurs when the servers are rebooted.

I have noticed that if I issue a "show interface brief" command the Ethernet port shows a state of "unknown and the vfc port shows a state of "errdisable".  A shut and no shut of the interfaces seems to clear this up and the server will connect correctly after a reboot.

Have you had any luck in troubleshooting this issue?

Thanks,

John

Highlighted

Hi John,

  The issue, according to QLogic support, was the driver on ESX for the CNAs.  I was suspicious of the CNAs only because all of the ports were error disabled within seconds of one another.  So far there hasn't been any shutdowns but we may have to wait some time to feel safe that it was the resolution.

Thanks,

Thom

Highlighted

Hi Guys, What was the result? We are having the exact same problem. Upgraded all drivers and replaced what seemed to be a faulty cna. Has been fine for about 2 weeks and then last night all the ports err-disabled again! GRRRRRRRRRR.

Highlighted

It was a driver issue for us.  We haven't seen the issue since the update.

Thom

Highlighted
Cisco Employee

DCBX Type Length Values(TLV) are packaged within a LLDP frame which  is exchanged between the switch and the CNA. One such Control Sub-TLV is  used for ACK which is sequence based. For example, the switch sends  this control Sub-TLV with SeqNo of 1 and AckNo of 2. The host is  supposed to inverse this and send a LLDP frame with control sub-TLV with  SeqNo of 2 and AckNo of 1.

We expect this exchange every 30 seconds from the host and if the  switch does not see it for 100 times 30 which is 3000 seconds (or 50  minutes), the switch error disables with following error

2011 May 13 12:03:23 CSX_5020_A1 %ETHPORT-2-IF_DOWN_ERROR_DISABLED: Interface Ethernet115/1/17 is down (Error disabled. Reason:DCX-No ACK in 100 PDUs)
2011 May 13 12:03:27 CSX_5020_A1 %ETHPORT-2-IF_DOWN_ERROR_DISABLED: Interface Ethernet116/1/16 is down (Error disabled. Reason:DCX-No ACK in 100 PDUs)

Some commands on the switch which helps in narrowing down root cause.

F340.24.10-5548-1# show lldp interface ethernet 1/22 
Interface Information:
  Enable (tx/rx/dcbx): Y/Y/Y    Port Mac address: 00:05:73:ab:29:bd

Peer's LLDP TLVs:
Type Length Value
---- ------ -----
001  007    040000c9 9d2372
002  007    030000c9 9d2372
003  002    0078
006  045    456d756c 6578204f 6e65436f 6e6e6563 74203130 4762204d 756c7469
            2066756e 6374696f 6e204164 61707465 72
007  004    00800080
127  055    001b2102 020a0000 00000002 00000001 04110000 c0000001 00003232
            00000000 00000206 060000c0 00080808 0a0000c0 00890600 1b2108
000  000    
F340.24.10-5548-1# show lldp dcbx interface ethernet 1/22


Local DCBXP Control information:
Operation version: 00  Max version: 00  Seq no: 1  Ack no: 2  <<---Our sequence # and Ack #
Type/ 
Subtype    Version    En/Will/Adv Config
003/000     000        Y/N/Y      0808
004/000     000        Y/N/Y      8906001b21 08
002/000     000        Y/N/Y      0001000032 32000000 00000002 

Peer's DCBXP Control information:
Operation version: 00  Max version: 00  Seq no: 2  Ack no: 1  <<---Peer sequence # and Ack # should be reversed.
Type/      Max/Oper
Subtype    Version    En/Will/Err Config
002/000     000/000    Y/Y/N      0001000032 32000000 00000002 
003/000     000/000    Y/Y/N      0808
004/000     000/000    Y/Y/N      8906001b21 08
F340.24.10-5548-1#

Root cause for this problem in most cases is misbehaving CNA/server or incorrect firmware/driver on the CNA.

Highlighted

Thanks for that. Checked on a working switch and was the same as your example.

This is our output from a problem switch. The ACK seems to be "1"

Local DCBXP Control information:
Operation version: 00  Max version: 00  Seq no: 1  Ack no: 1
Type/
Subtype    Version    En/Will/Adv Config
003/000     000        Y/N/Y      0808
004/000     000        Y/N/Y      8906001b21 08
002/000     000        Y/N/Y      0001000032

32000000 00000002

Peer's DCBXP Control information:
Operation version: 00  Max version: 00  Seq no: 1  Ack no: 0
Type/      Max/Oper
Subtype    Version    En/Will/Err Config
002/000     000/000    Y/Y/N      0001000032

32000000 00000002
003/000     000/000    Y/Y/N      0801
004/000     000/000    Y/Y/N      8906001b21

Highlighted

Hi Simon

That is a problem for sure.. Its ok for ACK and seq number to be the same.. Here is one such example from my lab

24.10.5020B.1# show lldp dcbx interface ethernet 1/16

Local DCBXP Control information:

Operation version: 00  Max version: 00  Seq no: 4  Ack no: 4 

Type/

Subtype    Version    En/Will/Adv Config

004/000     000        Y/N/Y      8906001b21 08

003/000     000        Y/N/Y      0808

002/000     000        Y/N/Y      0001000032 32000000 00000002

Peer's DCBXP Control information:

Operation version: 00  Max version: 00  Seq no: 4  Ack no: 4 

Type/      Max/Oper

Subtype    Version    En/Will/Err Config

002/000     000/000    Y/Y/N      0001000032 32000000 00000002

003/000     000/000    Y/Y/N      0801

004/000     000/000    Y/Y/N      8906001b21 08891400 1b2108

24.10.5020B.1#

Now the question would be is the CNA sending incorrect ACK or N5k interpretting it in correct. If you can sniff ethernet interface, it would point to the culprit. Or you could use ethanalyzer if you know the source MAC of

the CNA. Here is an example

ethanalyzer local interface inbound-hi det display-filter eth.src==00:00:c9:9d:23:72

Wireshark/ethanalyzer does not decode LLDP but if you can send them to me, I have a way to figure it out

Thanks

-Prashanth

Highlighted

Hi Prashanth,

   I have a similar issue . But this is for my port channel between the 2 N5K.Every 50 mints it is going down.one side NX-OS is 4.0 and second one is 4.2.can you pls help me on this.

Regards,

Ajith

Highlighted

Ajith

4.0 is very old NX-OS and I am not very sure on what was supported in that release which could explain the problem you are seeing. I suggest that you upgrade both your 5ks to newer 4.2 or 5.0(3) release and you should be fine

Thanks

-Prashanth

Highlighted

Finally had a response from Dell as Qlogic had washed their hands of us. Dell do a firmware change on the card when they build them and have finallly admitted problems. they sent us an upgrade which we have done and now see the correct info. Seq 1, ack 2 and then seq 2, ack 1 on all the ports. Will keep you updated. Cheers for the help.

Highlighted

Hello Simon

Thanks for the update with resolution. This is a common case generator. If you do not mind can you let everyone know the driver/firware you were running and the ones you upgraded and mark this question as resolved? This would

help other community members seeing similar issue.

Thanks

-Prashanth

Highlighted

We have similar issues with a storage server:

switch3# show lldp dcbx interface e114/1/2

Local DCBXP Control information:

Operation version: 00  Max version: 00  Seq no: 1  Ack no: 0

Does this mean that the server isnt sending back an ACK ? It err-disabled the server interface with the same error message... There are actually 2 servers (failovers) having the same drivers etc, but the err-disable happened to only one server !

Highlighted

You can disable lldp and it will work fine.

Sent from Cisco Technical Support iPad App

Highlighted

Im getting the same issue as the OP.

Nexus 5548s with HPB22 FEX running 5.1(3)N2(1) into HP G7 blade servers with Emulex OCe10100 CNA adapters.

Every 50 mins getting:

VDC-1 %$ %ETHPORT-2-IF_DOWN_ERROR_DISABLED: Interface Ethernet100/1/1 is down (Error disabled. Reason:DCX-No ACK in 100 PDUs)

switch# sh lldp interface e100/1/1

Interface Information:

  Enable (tx/rx/dcbx): Y/Y/Y    Port Mac address: 70:ca:9b:f4:b3:42

Peer's LLDP TLVs:

Type Length Value

---- ------ -----

001  007    04e83935 2b5125

002  007    03e83935 2b5125

003  002    0078

006  045    456d756c 6578204f 6e65436f 6e6e6563 74203130 4762204d 756c7469

            2066756e 6374696f 6e204164 61707465 72

007  004    00800080

127  055    001b2102 020a0000 00000001 00000000 04110000 c0000000 10003232

            00000000 00000206 06000000 00100808 0a0000c0 000cbc01 1b2110

switch# sh lldp dcbx interface e100/1/1

Local DCBXP Control information:

Operation version: 00  Max version: 00  Seq no: 1  Ack no: 0 

That "Ack no: 0" indicates some kind of problem on the host right?

Content for Community-Ad