Solved: SUP720-3B - EOBC drops

UHansen1976 · ‎05-08-2011

Hi there,

Recently we say a dramatic increase in packet drops on the EOBC interface, followed by a err-disable on multible 10G interfaces. Theese interfaces were not located on the same linecard, but rather across several cards. There were no apparent errors detected on theese interfaces, e.g. udld. This has happend twice within a period of about 6 months and several hw-replacements have been performed. The first time, the chassis and a 6704 linecard was replaced, as theese were initially suspected of being the root cause. The next failure caused us to replace the supervisor.

The switch, a 6509-E/Sup720-3B, represent one half of a distributionslayer composed of two of theese switches. The neighboring switch is an exact match of this one, both with regards to hw and ios release. But we've never seen that particular problem on that one. Where the failing switch can produced more than 5'000 drops on the eobc in less than 3 months, the other one has been running for little over a year and so far, we've only recorded some 60 drops.

We've also checked the cables and transreceivers on both our 10G linecard, one being a 6704, the other a 6708. No errors of any kind have been registered on any of the interfaces, but as a preventive strike, we've replaced the 10G-transreceivers on the interfaces, that have been err-disabled on both occations. We suspect that somehow udld results are not passed on to the Sup, as we've seen UDLD-errormsg in the syslog on both occasions just prior to the failure, but subsequent troubleshooting of udld reveals nothing that would indicate a udld-error.

We've reported both incidents to TAC and since last time, we haven't seen any problems, but I still see a considerable amount of drops on the EOBC interface and fear, that this incident will repeat itself. I'm working on some interim countermeasures to work around the problem, should it repeat ifself.

But aside from replacing the before mentioned hardware-parts, can anyone think of anything else that could cause theese kind of drops?

Any suggestions will be greatly appreciated

Thanx

/Ulrich

SunilKhanna · ‎05-11-2011

That's strange, try thisURL

http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/best/practices/recommendations.html

Regards,

Sunil

Regards, Sunil Khanna

View solution in original post

SunilKhanna · ‎05-09-2011

Can you provide some output verifying the drops or any error message you might have encountered.

Regards, Sunil Khanna

UHansen1976 · ‎05-10-2011

Hi Sunil,

Below is the output from 'sh eobc'

Interface information:
        Interface EOBC0/0 (idb = 0x50E9C920)
        Hardware is Mistral EOBC (revision 5)
        Address is 0000.1500.0000 (bia 0000.1500.0000)
        Encap size         = 14         hardware status = 0x210840
        IDB type           = 18         IDB state        = 4
        Encap type         = 0x1        Span encap size = 0
        Error threshold    = 5000       Error count      = 0

Counters:
        rxring             = 0x8D3C140 rx ring entries       = 512
        rx_head            = 408        rx_tail               = 0
        inputs             = 470239172 rx_cumbytes           = 123374641679
        hw inputs          = 0          hw rx_cumbytes        = 0
        rx rate (bits/sec) = 255000     rx rate (packets/sec) = 97
        rx_buf_unavail     = 0          rx input drops        = 6058
        input broadcast    = 31         input resource        = 105596932
        input error        = 0          input giants          = 0
        input crc          = 6058       rx illegal length     = 0
        rxr eobc shadow    = 0x50FAA01C txr eobc shadow       = 0x47CD1B40

        txring             = 0x8D3E180 tx ring entries       = 0x200
        tx_head            = 482        tx_tail               = 482
        outputs            = 465663999 tx_cumbytes           = 31068706335
        hw outputs         = 0          hw tx_cumbytes        = 0
        tx rate (bits/sec) = 58000      tx rate (packets/sec) = 95
        tx_retry_error     = 0          tx_retry_count        = 101354
        tx_process_stopped = 0          tx total drops        = 0

Mistral Registers
        soft_reset_cfg     = 0x000000   dma_buffer_size_reg   = 0x000000
        int_mask_hi        = 0x00007E   int_mask_lo           = 0xE7001A58
        rxdscp_cnt         = 512        txdscp_cnt            = 0
        rxwork_dscp        = 0xDAC0     txwork_dscp           = 0xF098
        mistral_eobc_ds    = 0x47BC41B8 mistral_dma_register = 0x30000000
        mistral_glbl_reg   = 0x10020000

Misc. Global Registers:
        global_cfg         = 0x20       mis_init_sts          = 0xF
        dimm_parm_cfg_hi   = 0x00000566 dimm_parm_cfg_lo      = 0x42040F5A
        tm_init_size_cfg   = 0x8000

/Ulrich

SunilKhanna · ‎05-10-2011

Too hard to pin point the cause. It might be the supervisor, due to some non-fatal internal events, causing the software to run out of memory and hence crashing.

Here is good documentation on

Best Practice Recommendations for the Catalyst 6500 Series Switch

Regards, Sunil Khanna

UHansen1976 · ‎05-10-2011

Hi Sunil,

Thanks for the feedback.

However, I'm unable to request the url, getting a 403-error. I have a valid cco-account, but I'm still unable to read the document. Is there an alternative way to locate it?

/Ulrich

SunilKhanna · ‎05-11-2011

That's strange, try thisURL

http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/best/practices/recommendations.html

Regards,

Sunil

Regards, Sunil Khanna

UHansen1976 · ‎05-11-2011

Hi Sunil,

Much better, thanks a million.

/Ulrich