cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
567
Views
0
Helpful
2
Replies

Network Integrity - Accident in the field

JBrletic
Level 1
Level 1

Hello,

 

I am in the field on assignment and came across an unexpected behavior that caused high visibility issues, thankfully it was temporary.  We had some workstations and controllers become completely unreachable for about 5 minutes.  First given where I am cant provide too much info unfortunately. Here is our layout.  The core switches use HSRP on the 3 networks shown.  For some reason the previous administrator didnt put the appropriate VLANs used by the HSRP for the trunking Core Switch cross-connect so the hello packets have to travel down to each root and go around and back up.   So that being said there isnt spanning-tree configured for these 3 VLANs.  I was uploading Switch Configs for our "Core Switch B" L3 switch and when I Routinguploaded it, all hell broke loose.  Here are the order of events...

 

The event time was at 20:50:00 (hh:mm:ss) and lasted about 5 minutes.  Net1 and Net2 was impacted and Net3 was not.  

1. Core Switch A was wiped and reloaded. -  No issues found (20:40:00)

2. Repeated step 1 for Core Switch B (~20:44:00)

3. Core Switch B got stuck, had to hard reboot by pulling plug.

4. Tried step 2 again and this time it worked. 

5. As Core B (as soon as it finished its boot process) was uploading Core A started showing the following errors (20:49:00)  

 

%PM4-4-ERR_DISABLE: channel-misconfig (STP) error detected on Gi0/1 putting Gi0/1 in err-disable state (20:49:36)

%PM4-4-ERR_DISABLE: channel-misconfig (STP) error detected on Gi0/2 putting Gi0/2 in err-disable state (20:49:36)

%PM4-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po1 putting Gi0/1 in err-disable state (20:49:36)

%PM4-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po1 putting Gi0/2 in err-disable state (20:49:36)

%PM4-4-ERR_DISABLE: channel-misconfig (STP) error detected on Po1 putting Po1 in err-disable state (20:49:36)

 

%HSRP-5-STATECHANGE: vlan1001 grp101 active state Active -> Speak (20:50:14)

%HSRP-4-DIFFVIP1: vlan1001 grp 101 active router's virtual IP address 192.168.X.1 is different than the locally configured address 192.168.Y.1 (20:15:14)

 

6.  Core Switch B didnt finish uploading until (20:59:00)

 

In the screenshot below Core A is on left, Core B on the right.

 

HSRP.JPG

 

 

 

 

 

Network.JPG

Some notes:

- Root As are the root bridges for each specific unit (PVST+)

- Root Bs are the backup root bridges for each specific unit (PVST+)

- A forensic investigation showed that Core Switch A uploaded properly while Core Switch B did not.  Core Switch B for some reason didnt upload port channel as expected which resulted in mismatched settings.  Core A had the port channel configuration set up as a trunk with the allowed VLANs.  Core Switch B had the port channel but it was blank.

- Forensic investigation showed 150k input errors and 750k CRC errors.  99% of traffic was multicast. Similar results are shown per port on both Core Switch A & B.

- Spanning tree isnt configured for the 3 networks shown (see first 2 bullets)

- I found an error on the root switches prior to uploading cores but due to access restictions I couldnt get into the switches to load on that day.  The uplinks from Roots to Cores on all networks have bpdu guard enabled which isnt what I expected. I thought worst case scenario it would shut itself down.

- No errors or logs showed any kind of severe issues on any fanout switches in any unit.

 

My questions are as follows...

1. Given the behavior what was some potential cause for this behavior?

2. How does spanning-tree work?  If BPDU guard flags a notification to the root switches.  Does the entire switch revert back to listening and learning state?  Or just that port?  I am wondering if this somehow is the root cause.

3. What would cause so many CRC and input errors in this situation?  We just uploaded new configs so it went from 0-750k in a matter of hours?

4.  How can I prevent this in the future?

5.  If i want to configure those 3 vlans to cross the trunking port channel how do incorporate spanning-tree with out messing up the spanning-tree settings on each units root bridge?

 

Please be patient, I am relatively green with networking.  I am actually a Cyber Security Engineer but have done CCNA sort of network work on the side here and there over the last year on occasion.  I was thrust into the spotlight when the actual network administrator resigned.

 

Thanks,

Jon

 

2 Replies 2

Hello,

 

at the very least, the standby IP addresses for standby group 101/Vlan 1001 do not match. Change these to be the same...

Sorry, they both are 192.168.3.1.  I changed the IP in the event people from the site would happen to see this post.