04-25-2024 01:12 AM
We're seeing a continuous string of the following messages on a 9407 running 17.9.5. Eventually the switch crashes. I can't find any info on what these messages mean exactly and whether we're looking at a software or hardware fault. The supervisor in question in new so I'm leaning towards hardware but would like to know what these messages mean.
Apr 24 12:23:28: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1930): IO Desc in cache: 0x740000000716020e 0x0000000000000500 0x0000000000000000 0x0000000000000000
Apr 24 12:23:58: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1921): Returning IODMA io desc mismatch, bus_id 4 start 0 cnt 16 ndx 12 jiffies 4305641022
Apr 24 12:23:58: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1925): updating IO Desc: 0x740000000406020e 0x0000000000000000 0x0000000000000000 0x0000000000000000
Apr 24 12:23:58: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1930): IO Desc in cache: 0x740000000416020e 0x0000000000000000 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1921): Returning IODMA io desc mismatch, bus_id 5 start 0 cnt 16 ndx 12 jiffies 4305641084
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1925): updating IO Desc: 0x740000000506020e 0x0000000000000000 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1930): IO Desc in cache: 0x740000000516020e 0x0000000000000000 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1921): Returning IODMA io desc mismatch, bus_id 4 start 16 cnt 16 ndx 12 jiffies 4305641146
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1925): updating IO Desc: 0x740000000406020e 0x0000000000000500 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1930): IO Desc in cache: 0x740000000416020e 0x0000000000000500 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1921): Returning IODMA io desc mismatch, bus_id 6 start 0 cnt 16 ndx 12 jiffies 4305641208
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1925): updating IO Desc: 0x740000000606020e 0x0000000000000000 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1930): IO Desc in cache: 0x740000000616020e 0x0000000000000000 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1921): Returning IODMA io desc mismatch, bus_id 5 start 16 cnt 16 ndx 12 jiffies 4305641270
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1925): updating IO Desc: 0x740000000506020e 0x0000000000000500 0x0000000000000000 0x0000000000000000
Apr 24 12:23:59: %IOSXE-3-PLATFORM: R0/0: kernel: ardbeg_iodma_cache_update (line 1930): IO Desc in cache: 0x740000000516020e 0x0000000000000500 0x0000000000000000 0x0000000000000000
04-25-2024 02:05 AM
Hello,
I am using 25 different global search engines, but none of them returned anything on what that log message means. A Cisco bug search does not return anything either...
Which supervisor do you have installed ?
04-25-2024 03:20 AM
04-25-2024 04:31 AM
Probably the only ones who would know (or be able to find out) would be TAC.
Between the sup being "new" and the nature of the error (kernel I/O DMA error), I too would suspect the sup.
What you might try, power off the chassis, reset the sup, restart the chassis, and monitor the console, during startup, for POST errors.
04-25-2024 04:44 AM
04-25-2024 08:15 AM
Well, like many other things, better documentation increases cost. Assuming this error is nothing you as an end user can do anything about, beyond replacing hardware, I would suggest it's adequate for such a purpose. If fact, internal TAC documentation might be no more informative, i.e. tell customer hardware needs to be replaced.
Of course, issues like this error, likely go beyond TAC, as vendors usually want to mitigate a reoccurring issue.
Such mitigations might provide for later hardware revisions for the "same" part number. But how often do you see any documentation why hardware has been revised (when there's no spec change)?
There are other good reasons for "poor" public documentation including trying to keep proprietary info secret.
Heck, don't know if you've been in a similar situation, but I've worked in large companies, buying hardware from Cisco, where Cisco required NDAs, to discuss their roadmaps, and work with them on what the roadmap might be. As I've signed such NDAs I cannot be more specific. However, such practices aren't limited to dealing with Cisco.
I mention the foregoing because your "wish" is reasonable but there are various reasonable reasons it's unlikely to be met.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide