Solved: Nexus 9300, ISR 4321: Severe latency across the ethernet cable.

MicJameson1 · ‎02-18-2023

Hello.
GIVEN:

Nexus-9300= vlan 172.16.1.5 on int e1/14
...is directly connected to...
ISR_4321= 172.16.1.2 on g0/0/0

Nexus-9300# traceroute 172.16.1.2
1 172.16.1.2 (172.16.1.2) 0.999 ms * 1.499 ms
Nexus-9300# traceroute 172.16.1.2
1 172.16.1.2 (172.16.1.2) 0.934 ms * 1.117 ms

...--- 172.16.1.2 ping statistics ---
30 packets transmitted, 30 packets received, 0.00% packet loss
round-trip min/avg/max = 0.679/0.882/1.575 ms

Nexus-9300# sh int e1/34
!! full output appended at bottom of post !!
Description: New SAN IBM 3700 MTU 1500 bytes, BW 1000000 Kbit , DLY 10 usec reliability 255/255, txload 1/255, rxload 1/255
Port mode is access full-duplex, 1000 Mb/s, media type is 1G

ISR_4321#sh int g0/0/0
Internet address is 172.16.1.2
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Full Duplex, 1000Mbps, link type is auto, media type is RJ45
output flow-control is off, input flow-control is off
Last input 00:00:00, output 00:00:44, output hang never
Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 15000 bits/sec, 7 packets/sec
5 minute output rate 2000 bits/sec, 2 packets/sec
671704620 packets input, 158567669050 bytes, 0 no buffer
Received 34735191 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 66233230 multicast, 0 pause input
595904388 packets output, 125339456258 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
24515668 unknown protocol drops

1. What is the most likely cause for such severe latency?

2. Why would there be 24515668 unknown protocol drops?

3. What are the wisest beginning troubleshoot commands to investigate this?

Thank you.

---

additional data..

NEXUS-9300#show interface

Ethernet1/34 is up
admin state is up, Dedicated Interface
Hardware: 1000/10000 Ethernet, address: f4cf.e237.8751 (bia f4cf.e237.8751)
Description: New SAN IBM 3700
MTU 1500 bytes, BW 1000000 Kbit , DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, medium is broadcast
Port mode is access
full-duplex, 1000 Mb/s, media type is 1G
Beacon is turned off
Auto-Negotiation is turned on FEC mode is Auto
Input flow-control is off, output flow-control is off
Auto-mdix is turned off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
EEE (efficient-ethernet) : n/a
admin fec state is auto, oper fec state is off
Last link flapped 38week(s) 4day(s)
Last clearing of "show interface" counters never
4 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 408 bits/sec, 0 packets/sec
30 seconds output rate 14264 bits/sec, 2 packets/sec
input rate 408 bps, 0 pps; output rate 14.26 Kbps, 2 pps
Load-Interval #2: 5 minute (300 seconds)
300 seconds input rate 392 bits/sec, 0 packets/sec
300 seconds output rate 13992 bits/sec, 1 packets/sec
input rate 392 bps, 0 pps; output rate 13.99 Kbps, 1 pps
RX
597067785 unicast packets 443196 multicast packets 50 broadcast packets
597511031 input packets 127920241022 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
586741236 unicast packets 73153042 multicast packets 38909450 broadcast packets
698803728 output packets 166170468737 bytes
0 jumbo packets
0 output error 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 1132 output discard
0 Tx pause

balaji.bandi · ‎02-19-2023

post the config on both the switch

show run interface x/x

I do not see major latency here (those are milli seconds )

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

View solution in original post

balaji.bandi · ‎02-19-2023

post the config on both the switch

show run interface x/x

I do not see major latency here (those are milli seconds )

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Joseph W. Doherty · ‎02-19-2023

#1 ping, I believe, was never really intended as a high precision latency measurement tool. It was intended as "can host be reached?" tool, with a timer thrown in too. This is a subtle but important difference!

It explains why hosts might not bother responding to every ping request, especially from the same requestor, and why a host may defer responding when it has anything else to do.

In fact, since so many want to use ping for end-to-end latency measurements, and Cisco network devices commonly delay ping replies, Cisco, in their SLA suite, has a ping variant that records device delay to provide a much more accurate end-to-end latency measurement.

The foregoing is why I can write your ping results may be fine even though not what you expected.

#2 Likely because the router interface is seeing L2 frames it doesn't expect, e.g. BPDUs. If your Nexus interface was a routed interface, likely you wouldn't see all these unknown protocols frames (if any at all).

#3 This can be a long, long answer, because there can be many possible causes, which you need to check out.

For example, in above answers, I provided, you could start with those and work to confirm them or deny them.

If you get positive results (above is true), is one confirmation enough? Two, three, four, more needed (consider false positives)?

If negative results (which can be wrong too) what do you do next? (BTW, I've heard the CCIE lab test is big on troubleshooting [often an indication of a true expert is how well they can troubleshoot].)

I.e. if the causes of what you've noticed are because of what I've described, investigation is more involved then using an additional command or two.

You might might, though, "research" my proposed causes and see if they make sense (in this/your case).

Lastly, your series of questions seems like you don't the experience and/or knowledge for the issues you're facing. Nothing wrong with that (!!!), but if true, us knowing a bit more about your level of knowledge and/or experience can help us in tailoring our answers.

Joseph W. Doherty · ‎02-19-2023

Oh, two additional points I forgot to mention.

#2 I noted I believed "issue" wouldn't exist if Nexus port was a routed port

An alternative is deactivation of unneeded L2 specific features, on connecting interface.

#3 Possibly first consideration, when you see "odd" stats, might be are they actually "normal" and/or not adversely impacting the network? (BTW, "good" looking stats sometimes conceal genuine problems.)

In "real-world", "odd" stats, but no one complaining, low priority investigation. "Good" stats, but someone complaining, investigate, priority depends on importance of the "someone(s)". ; )

MHM Cisco World · ‎02-19-2023

I will provide you some tool to check the latency in NSK.

MicJameson1 · ‎02-19-2023

Balaji is correct- there is no bad latency here.

I was working on different platform that was providing output in units of .001 seconds. I didn't realize that this device was using 1 millisecond as unit.

Thank you all for your inspired help. I do appreciate it, and always learn from it.

Joseph W. Doherty · ‎02-19-2023

"Balaji is correct- there is no bad latency here."

Laugh, ah, but are you sure?

"I was working on different platform that was providing output in units of .001 seconds. I didn't realize that this device was using 1 millisecond as unit."

Okay, that's fine, but let's look at those milli-second values, from (about) half a millisecond to (about) 1.5 milliseconds.

(Interestingly, the Nexus [?] is measuring ping times down to thousandths of a millisecond, i.e. microseconds.)

Assuming you're ping minimum size packets are 64 bytes, and the two devices are really physical close (?), and transmission rate is gig, and link is lightly loaded, what's (about) the minimum latency that might be expected?

So, a 64 byte packet comprises 512 bits, which divided by 1,000,000,000 bps should take 0.000000512 seconds, or 0.000512 milliseconds, or 0.512 microseconds. Latency is two way, so if we double that, we have about 1 microsecond latency!

If you want to account for electrical propagation delay, is (about) 5 ns per meter (for copper [using copper as I recall its slower than fiber {of course, cannot go as far, either}]) so, figure up to .5 us for 100 m, or 1 us for round-trip.

So, assuming my math is correct, minimum latency should be about, up to, 2 microseconds, a lot, lot less than your measured ping times.

Oh my gosh, your measured latency is horrible!!!

I really doubt that it is, for some of the reasons described in my earlier post, and including things like how accurately times are recorded for ping transmissions and their replies. Likely, ping program is grabbing the system time when packets are sent or received, but we don't know how well that's sync'ed with actual packet transmissions and receptions (or whether transmission times are more accurately recorded than reception times [if you think they should be the same, that would be nice, but don't count on it])?

BTW, the forgoing is why I didn't, also, just figure the latency is okay because they are about 1 ms. However, for the reasons in this and my prior posting, I don't believe there is a latency issue, either.

As an aside, as Ethernet went to 100 Mbps and gig, cut through switching fell to the side, as the time deltas became so small between using it or not. However, packet buffering latency, becomes more as packet size increases (think jumbo Ethernet), and some folk (automated stock trades [or possibly some USAF "apps"]), find saving a microsecond or two, important.

MicJameson1 · ‎02-19-2023

Thank you for your helpful reply.

My real-world situation regarding this network prevents me from exploring this. I am the only level 1-4 OSI tech here, and there are many pressing problems that I must attend to-- I could be here for months without additional work before I'd have time to investigate this.

Your effort has not gone unappreciated, and I have grown from reading it.

Thank you.

Joseph W. Doherty · ‎02-19-2023

"My real-world situation regarding this network prevents me from exploring this."

". . . and there are many pressing problems that I must attend to-- I could be here for months without additional work before I'd have time to investigate this."

Been there - done that. Sounds very real-world! ; )

"I am the only level 1-4 OSI tech"

Well, from your (intelligent) follow-ups, you do seem to also be learning, so I would expect you to advance.

"Your effort has not gone unappreciated"

Never thought otherwise.

If my querying about marking Balaji's posting as the "solution", I meant what I wrote, i.e. I want to insure all your questions were answered, and if not, also to insure you understand prematurely marking a question as solved, can preclude others from working on the questions/issues you describe.

Further, as you read these forums, you likely note many of the VIPs have certain areas they often address, mine, for example is QoS. If you notice this, don't be shy to reference, in a posting, A VIP you think might be of particularly help. One of the criteria for VIPs is the fact we all try to help whenever we can.

Joseph W. Doherty · ‎02-19-2023

Oh, oh no, another oh. ; )

As your reply discusses latency, and I believe most, if not all, of us agree, there's not likely any latency issue, and you marked Balaji's post as the solution, which addressed your first question, you no longer need answers for your second or third questions?

If not, do let us know as, beside myself, I'm sure @balaji.bandi or @MHM Cisco World, or others (NB: with a "solved" issue, others may be less likely to contribute) will try to resolve those too.

MHM Cisco World · ‎02-19-2023

as I know the NSK is nanoSec not MSec.

MHM Cisco World · ‎02-19-2023

I dont agree the solution'

If the latency is msec why I bug NSK?

Anyway

Only check latency check in NSK and compare result with your read network