cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1670
Views
6
Helpful
44
Replies

VLAN issues

dbronco
Level 1
Level 1

I recently replaced the core switch at one of my sites that consisted of two separate C9300-48Ts and three other off brand switches and installed a new stack with a C9300X-48HX, C9300-24S and a C9300-48T (running 17.12.5). This is a layer 2 network consisting of about 800 cameras, 75 workstations for viewing and about 20 phones. 

After the replacement, all the workstations and phones came back online and all but 8 cameras came back online. The cameras are on the network - they are pingable but the video management system cannot display them due to a poor connection. These cameras are all on the same floor, plugged into the same switch (another C9300X-48HX that was replaced 3 weeks before the core replacement) and all on VLAN 4. I understand ping is not a troubleshooting tool but I think it's worth mentioning that when pinging these particular cameras, from anywhere on the compound, I receive between 15-25 ms response times. There are approximately 225 other cameras across the compound on VLAN 4, however, and none of them have the same problem. I thought maybe it was the model of camera but brought up a different model, put it on VLAN 4 and still have the slow ping times. VLAN 2 (workstations) and VLAN 27 (phones) are also on this switch and aren't experiencing any issues. I tested this new model of camera on VLAN 6 and have the same slow ping times. Changing the IP again to VLAN 8 though - no problem at all. So, the Band-Aid is in place to get these cameras back online but I need to figure out what is going on with VLAN 6 and 8 between this switch and my core switch. 

The core switch is my gateway for all the interface VLANs, it's my root bridge for Spanning Tree and the entire site is running VTP transparent. I'm not getting any errors in the logs and I don't see anything weird in Wireshark but now I'm not sure where to start troubleshooting. So, I'm reaching out to the community to see if you all can help point me in the right direction. I can have the configs posted for each switch tomorrow along with any other output that might be helpful. 

44 Replies 44

Thank you for the continued support. I got pulled into something else today but will try and get this as soon as I can

Attached

What kind of logging are you doing?

Could you post a sanitized copy of switch config?

I send all my logs to a PRTG server but am not receiving anything outside of normal. 

I posted a copy of the switch config from 9.6 (where I first noticed the issue) and 9.1 (my core switch for the site) on 7/15 with a few other attachments. Please let me know what you're interested in and I'll work to get anything more detailed posted as well. 

I didn't see this reply earlier, apologies. 

I can get the output posted here but when I looked earlier in the week the CPU usage was less than 2%

That said, I was investigating a few things the other day and was running a continuous ping on 192.168.4.103 (a test camera I installed on switch 9.6) and did see dropped pings. Immediately after the drop, the ping times would be back to <1ms, 3ms, 3ms, 7ms, 15ms, and then steady again at 23ms. I got pulled into something else the past 2 days and haven't had a chance to look any more into that yet. 

While I'm not doubting CoPP could be a problem, especially on a large CCTV network, I'm still struggling to understand why the CoPP limit would now be a factor after replacing this hardware. Even if I unknowingly had this issue prior to the hardware replacement, why would it become more apparent with newer, more powerful hardware?  

Even if I unknowingly had this issue prior to the hardware replacement, why would it become more apparent with newer, more powerful hardware? 

Possibly because the newer equipment's CoPP features and/or defaults are different.  I.e. some traffic is now subject to CoPP that wasn't on the older platform and/or policy limits, for some traffic, have been reduced.

Actually, newer hardware being "more powerful" may have encouraged Cisco to "improve" CoPP.  E.g. newer is more expansive and/or more restrictive.

Also, sometimes, more powerful hardware isn't a uniform improvement for all features, in fact, occasionally some features' performance might have been reduced (intentionally or accidentally).

Then, there's the question exactly how did you transfer the prior device's config to the newer device.  I.e. are old and new exactly the same?


@dbronco wrote:

That said, I was investigating a few things the other day and was running a continuous ping on 192.168.4.103 (a test camera I installed on switch 9.6) and did see dropped pings. Immediately after the drop, the ping times would be back to <1ms, 3ms, 3ms, 7ms, 15ms, and then steady again at 23ms.


I've spent a few more minutes looking at your earlier posted 9.6 attachment, and (finally) noticed, you've provided more than just the configure.

Somethings that caught my eye were:

[THIS IS A CAMERA ON VLAN4]
9.6#show interface te1/0/36
TenGigabitEthernet1/0/36 is up, line protocol is up (connected)
Hardware is Ten Gigabit Ethernet, address is 68e5.9eca.ca24 (bia 68e5.9eca.ca24)
Description: VLAN4 Test
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
reliability 255/255, txload 249/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, media type is 100/1000/2.5G/5G/10GBaseTX
input flow-control is on, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:05, output 00:00:01, output hang never
Last clearing of "show interface" counters never
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 30,694,599
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 651000 bits/sec, 92 packets/sec
5 minute output rate 97,687,000 bits/sec, 11,098 packets/sec
1159046 packets input, 873673473 bytes, 0 no buffer
Received 1158647 broadcasts (1158622 multicasts)
0 watchdog, 1158622 multicast, 6424 pause input
111,758,402 packets output, 123061242152 bytes, 0 underruns
Output 111742912 broadcasts (111741039 multicasts)

BTW, not listed here, compare this interface to the camera interface you also provided on VLAN 8.

Why is the output, to the camera, saturated?

These stats don't show how deep the egress queue is, but considering the egress load, and the ratio of drops, that alone could explain you miserable ping performance.

If the cameras are generating multicast video, if IGMP snooping isn't operational on VLAN 4 for this switch, you may have a multicast flooding issue causing the excessive egress saturation.  You'll need to confirm, switch's IGMP snooping enabled (usually the default), and VLAN 4 has an IGMP querier, usually the gateway.

show platform hardware fed active qos queue stats interface <interface> <<- share this 

this will make us know if drop in CoPP or in Queue 

you use soft queue multiple 1200 <<- I think we need to increase this a little more if the drop in queue not CoPP

MHM

you use soft queue multiple 1200 <<- I think we need to increase this a little more if the drop in queue not CoPP

Even if drops are in egress queue, I suspect increasing queue depth wouldn't help.

Why not?

Because the 5 minute load average is so high.

There are two kinds of congestion, burst and sustained.  For the latter, which appears to be the case here, increasing queue depth is sort of like throwing gasoline on a fire.

For a case of ligitimate sustained congestion, either you need more bandwidth and/or you queue that traffic separately (to protect other traffic) and you improve its drop management with the goal of improving such traffic's goodput.

The big, big question, I believe, remains why is a video camera host being =>sent<= so much traffic?

My expectation would be a video camera should be receiving almost no traffic.  I would also expect a video camera to be sending an expected maximum amount of traffic based on various digital video attributes.  (NB: it looks like even 4k at 60 fps only needs 50 Mbps, so a 100 Mbps port should likely easily handle any single video camera.)

I'm rather curious about the traffic to the camera.

The big, big question, I believe, remains why is a video camera host being =>sent<= so much traffic?

This is where I'm stumped as well. In the screenshots I attached for the bandwidth monitoring in that same post, I noticed that from my core switch, I'm sending 500 Mbps to an edge switch with 3 cameras on it - no viewing station or any other reason for that much traffic to be flowing that direction. 

I'll be back on site today and will look into multicast further as well. The VMS we are using doesn't utilize multicast so IGMP has been turned off. Some of the older cameras do have an option to force multicast, however, so I'll look into that. 

In regards to how the new switch got it's config - I started from scratch with a base config that I've built for out network. The old switch was a "testing ground" for a previous engineer and had a lot of unnecessary (or so I thought) lines in there. I haven't wiped that yet though and will review the configs again, just to be sure something wasn't missed. 

The output is drop so the output from port connect to camera is high not input. 

Input rate is normal friend 

5 minute input rate 651000 bits/sec, 92 packets/sec

MHM

BTW, many newer Cisco platforms support embedded packet capture, don't recall if that applies to 9300 series.  If not, you might be able to SPAN traffic to a PC running Wireshark.

I have been using the integrated Wireshark capture through the WebGUI but haven't found anything to stand out. It doesn't appear to be flooded with packets. 

One thing I did notice today, and relates to another comment, is that my core switch (9.1) keeps sending ARP requests to 9.2 (which was removed 2 weeks ago when I did this replacement). Doing a quick Google search on "Why is my Cisco sending ARP requests to devices off the network" and one answer was that a Proxy ARP could be configured somewhere. I'm firing up the removed switches now to see if I can find anything in the old config

BTW, Cisco document on proxy-arp.

That document has been relatively recently revised, and note proxy-arp is enabled by default, but I thought later devices have it disabled by default (but I certainly can be mistaken on what's the default - especially as some setting are changed by Cisco, over the years).


@dbronco wrote:

I have been using the integrated Wireshark capture through the WebGUI but haven't found anything to stand out. It doesn't appear to be flooded with packets. 

Remember, it's not just what's being sent, but how much.  On your test camera port, what's the traffic composition?

Attached. It looks like lots of packets are queueing