Richard, this thread sounds

Jason Flory · ‎10-21-2013

Hello Everyone

I am having some very strange issues on my 3850 stack. I have certain IPs that can acces all networks locally but cannot access anything on remote networks. These IPs seem to have no relation to each other. IP 10.2.3.54 has issue while .49 does not. .20 has the issue while .21 does not.

Another thing that is very strange is that when this issue is happening they cannot ping remote gateways that are on the same network. For instance client can acccess all hosts and gateway of 192.168.100.1 but cannot access the wan gateway at .235

Most of this has been intermittent clients but now we have 2 servers that are affected on a different subnet but their issue is intermittent but displaying the exact same issues.

I ruled out this been a computer issue by changing known good machines to use the same suspect IPs and they all have the same issue. I feel like this is some sort of PBR gone crazy.

Has anyone heard of these issues with 3850's? It just seems like when the issue happens the packets cannot leave the switch.

Colby Beam · ‎11-10-2013

Hi Jason

What version of code are you running on the 3850. I would check to make sure you are running 3.2.3 or 3.3.0 to get around https://tools.cisco.com/bugsearch/bug/CSCug87540

dinesh.thathanath · ‎02-05-2014

Jason,

Looks like I am having the same kind of issues on version 03.02.02.SE.

Some users can work but others cannot within the same subnet. Did the upgrade fix the issues?. I noticed the following errors as well on SYSLOG.

Feb 5 20:58:07.023: %NGWC_COMMON-1-WDOG_CPUHOG: 1 fed: CPU usage time exceeded

369330000 msecs. -Traceback=1#f01c936775c649e7aee9def72bf33d1d pthread:2EA890

00+C450 :10000000+801478 :10000000+7657C0 :10000000+5A8F70 :10000000+9E4E74 :100

00000+9DAC98 :10000000+9DB4A0 :10000000+9DBA68 :10000000+9DC300 :10000000+9F2680

ngwcutils:2B5DE000+C024 ngwcutil

The setup worked perfectly fine for 3 months on 03.02.02.SE.

-Dinesh

Jason Flory · ‎02-05-2014

Hey everyone

Sorry i did not follow up with resolution. I ended calling Cisco Tac on this issue. Did not get any responses on the forums for a couple of weeks but i do see someone essentially said the same thing TAC said which is 3.02.0 code was very buggy and said that he had seen this type of strangeness but did say it was not documented as a bug...yet.

What happened to us was we deployed the 3850s and like you it was up and running with no issues except for some pbr problems. A few months later i had users that could not seem to get out of the switch. It seemed that they could get to anything that was on that switch but could bot get to neighboring switches on the same vlan or exit via router to external sites. It was very weird. Also it was just 1 server one day, then another server another day. The the user stack started havign the same issues and started with 1 user then spread.

Upgrade fixed all issues. There is a new version out since i did our upgrades so you may want to research. New ver is 3.03.01

dinesh.thathanath · ‎02-06-2014

Hi Jason,

First of all thank you for responding so fast. We are going ahead with the upgrade to 3.3.1 with hopes that this will sort out the issues. I am having a difficult time to convince myself that this is a switch code level issue.

The only explanation what I have is that the 3850s are not responding to the ARP queries properly resulting in such strange behaviour. I will come back and post my story as soon as the upgrade is done.

Thanks again.

-Dinesh

Richard Primm · ‎02-06-2014

Hello Jason and Dinesh,

The symptoms that you are seeing are due to bug CSCug87540 which is a major bug in 3.2.2 and below. I highly recommend getting up to the latest release which is 3.3.1 at the moment (3.3.2 should be out this month). The fix for 87540 is integrated into 3.2.3, but again, I would recommend moving to the latest stable release. The only workaround to your issue is a reload once its in the broken state. Also, just a reminder, 3.3.0 is a major release (feature release) which includes new features such as HSRP, 9 switch stacking, embedded packet capture to name a few.

If you have any questions or concerns, please feel free to post them here or message me. Thanks

CSCug87540:

3850: traffic L3 routed on 1 switch/member fails for newly added devices

https://tools.cisco.com/bugsearch/bug/CSCug87540/?reffering_site=dumpcr

dinesh.thathanath · ‎02-09-2014

Hi Rick,

Thanks for the response and clarification on the bug details.

I did the upgrade few days back and that fixed all the issues I was facing. I upgraded to version 3.3.1.

Couple if questions on the roadmap for 3850 though.

1. Is there a roadmap for ISSU for. 3850?. I have a completely redundant network where all my servers are dualhomed to my 3850 stack. Is there anyway I can avoid a full stack reload and minimize the downtime during an upgrade?

2. Is there a roadmap for flexible netflow on Vlan interfaces for both input and output traffic?

Thank you.

Thanks,
Dinesh

Sent from Cisco Technical Support iPhone App

Shaun Whitehorse · ‎06-06-2015

Richard, this thread sounds exactly like the issue I've been working on with a customer for weeks. However their 3850 stacks are running 03.03.04

dinesh.thathanath · ‎06-14-2015

I am running 03.03.01SE and so far after the upgrade and I never had any issues. My uptime on these devices are

uptime is 1 year, 5 weeks, 5 days, 20 hours

The issue however was one of the weirdest ones I have encountered in my career. Many devices within same VLAN works where other don't get a ARP response from the 3850s. I am not sure if you see the same symptoms on the wireshark on the newer codes. Its very unlikely to have the same bug resolved and re-appear in a latest code.

HTH

-Dinesh

TCAM · ‎06-10-2015

Hi -

We are running 3.7.1E and still experiencing the similar connectivity issue, reloaed multiple times but did not fix it. Planning to downgrade image to 3.3.5SE.

Thanks

Strange connectivity issues to remote resources on 3850