cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
61924
Views
11
Helpful
34
Replies

MS350 Stacking Issues

brian holmes
Level 1
Level 1

I have 3 MS350 stacks, 6 switches in each stack, running on 8.10 software.

All stacks have been up and running without issue for the past month or so. No changes have been made on the Meraki switches or the Cisco Nexus 9k's that are upstream.

This morning within an hour all 3 stacks had switches that removed themselves from the stack and stopped forwarding traffic.

Stack 1 - 1 switch disappeared from the stack and stopped forwarding traffic - Did not come online until after a reboot. Uplink port was not on this switch.

Stack 2 - 1 switch disappeared from the stack and stopped forwarding traffic - Did not come online until after a reboot. Uplink port was not on this switch.

Stack 3 - 5 of the switches disappeared from the stack - 1 of the uplinks was on one of the switches that disappeared from the stack. After rebooting the switch with the uplink the 5 switches rejoined the stack. 1 switch in this stack stayed online, that switch has an uplink to the core.

It seems really strange that 3 stacks had the same issue within an hour, Meraki support suggested we hit a bug (8.10) and have requested I upgrade to 9.19.

Has anyone else been having stacking issues?

I typically put in Cisco 4500's or 3850's and these are my first 3 MS350 stack closets. This obviously isn't giving me a very warm and fuzzy feeling to have a failure like this after a month or so.

Curious what other users experiences have been when using MS350 stacks ???

1 Accepted Solution

Accepted Solutions

@Mr_IT_Guy- Upgraded my MS350 stacks to 10.14 a few weeks back and so far they have been solid. The original issue was nmap scan's would cause some switches in a MS350 stack to become unreachable, they would also no longer forward traffic. The only way to bring the switch back online was to reboot it.

Here is the reported fix from the Meraki engineering team.

"The fix was to implement a rate-limiter in hardware to avoid the CPU from being hampered by the scans."

View solution in original post

34 Replies 34

BHC_RESORTS
Level 6
Level 6

@brian holmes wrote:

I have 3 MS350 stacks, 6 switches in each stack, running on 8.10 software.

All stacks have been up and running without issue for the past month or so. No changes have been made on the Meraki switches or the Cisco Nexus 9k's that are upstream.

This morning within an hour all 3 stacks had switches that removed themselves from the stack and stopped forwarding traffic.

Stack 1 - 1 switch disappeared from the stack and stopped forwarding traffic - Did not come online until after a reboot. Uplink port was not on this switch.

Stack 2 - 1 switch disappeared from the stack and stopped forwarding traffic - Did not come online until after a reboot. Uplink port was not on this switch.

Stack 3 - 5 of the switches disappeared from the stack - 1 of the uplinks was on one of the switches that disappeared from the stack. After rebooting the switch with the uplink the 5 switches rejoined the stack. 1 switch in this stack stayed online, that switch has an uplink to the core.

It seems really strange that 3 stacks had the same issue within an hour, Meraki support suggested we hit a bug (8.10) and have requested I upgrade to 9.19.

Has anyone else been having stacking issues?

I typically put in Cisco 4500's or 3850's and these are my first 3 MS350 stack closets. This obviously isn't giving me a very warm and fuzzy feeling to have a failure like this after a month or so.

Curious what other users experiences have been when using MS350 stacks ???


The MS line is the one product we don't run in our environments, so my advice is more theoretical than practical I'm afraid.

Did the logs show anything interesting regarding the stacking? Also to be clear, we are talking about physical stacking here, not virtual stacking, correct?

Normally I would blame the upstream Nexus 9ks, as those have more bugs than a Brazilian jungle, but since some of the stacks don't uplink to it, i'm not so sure.

BHC Resorts IT Department

@BHC_RESORTSYeah talking about physical stacking.

Nothing in the logs, just all of a sudden certain switches were no longer seen as in the stacks.

We probably have 30+ stacks hanging off the 9k's the MS350's were the only ones that had any issues.

It did get me wondering if Meraki pushes out patches to switches outside of just standard firmware upgrades??????

Strange. Well, the beta firmware is 9.26, and while it isn't always a great idea to run beta in a production environment...the stable track is pretty far behind at this point, so I'd weigh your options. I skimmed through the release notes, and I didn't see this specific issue mentioned, but there are a TON of stability and bug fixes, so you never know. We run two MS2208p, and are on 9.26 with no issues yet. Small sample size, I know.

BHC Resorts IT Department

we have 2 425's stacked... I know not the same switches but if there's any questions about how ours are setup I'd be happy to check our setup...

we haven't run into any issues once we got it setup correctly.

Make sure Meraki support has the "secret new firmware on them." We implemented some of these a few weeks ago as well and had to call support to get the firmware. FYI your switches will show "up to date" without it.

dayoder
Visitor

I have 2 MS-350 stacks. One stack with 2 switches and the other with 3 switches.

When we first tried deploying these a year ago we had issues. Firmware wasn't out for stacking them back then. We had multiple engineers pulling their hair out of some of the bizarre issues that were going on.

I haven't had any issues over the last 8 months with them though. We are still running MS 8.10

@dayoder@cfraysier @BHC_RESORTS

Ok so today again I had the some of the switches fall out of the stacks, but I think we have been able to track down the issue.

Our security team was running ping sweeps across our network. When the sweep got to the MS stack closets for some reason it took down the OSPF neighbor relationship from the MS 350's to the 9k's.

When the OSPF neighbors went down and then came up the MS switches hit a bug. Based on what Meraki support is saying there is a bug when the uplink goes down other switches in the stack fall over the stack.

Tonight I am upgrading to 9.19 and Meraki support believes the stacking bugs are fixed in this release.

Hope the update goes smoothly for you and resolves the issue.

let's hope newer is better...


When the OSPF neighbors went down and then came up the MS switches hit a bug. Based on what Meraki support is saying there is a bug when the uplink goes down other switches in the stack fall over the stack.


@brian holmes Did support say that this was only on the MS350's or does it affect all stacked switches??

Found this helpful? Give me some Kudos! (click on the little up-arrow below)

I believe it's all switches on 8.10.

But it sounded like it only effects switches in the stack that don't have there own uplink. So in my case, I have 3 stacks of 6 switches, each stack has 2 switches with uplinks. The other 4 switches in the stacks don't have there own uplink, it only seems to affect the switches without uplinks.

They also mentioned that in general there are a lot of stacking bugs on 8.10.

One of the bugs is actually upgrading the switches! So instead of just scheduling the upgrade they have a very specific process they want me to follow for this upgrade.

1. Power down all but the primary uplink switch

2. Run the upgrade on the one powered on switch

3. boot up the remaining switches, I'm not sure if they meant all at once or one at a time.

But they mentioned they have had lots of issues when upgrading off 8.10 with stacked switches.

Please please please!!! Let me know how this goes. May need to discuss this with my team in the morning.
Found this helpful? Give me some Kudos! (click on the little up-arrow below)

Upgrade is completed, ended up going to beta 9.26 instead of candidate release 9.19.

The tech I talked to tonight said that if you are below 9.2 and going to 9.19 then have had stacked switches hang, but that issue was fixed in 9.25, 9.26 has since been released to he advised to go to that to avoid any of the issues of the switches locking up.

When I started the upgrade all switches started flashing white on the front while they were downloading the image. After about 35 minutes of that they rebooted, the reboot was quick I would say the stacks were back up and passing traffic in about 3 minutes.

I hadn't seen this before, but I was able to watch the upgrade process until reboot on the http status page. I got a little nervous when for about 20-25 minutes all of the switches were hung at 66%. But eventually they jumped to 100% and rebooted.

I guess the real test will be over the next few days.

Thanks for the update and all the detail. I'd have been nervous too during that long upgrade process.

This is something I'm going to be doing in the near future. We have random dropped uplinks from time to time. Let us if it fixes your issues.