the messaging layer was unable to deliver the stimulus (rejected, peer is not ready)

KYAW THURA · ‎04-19-2016

Any advice on how to resolve this issue? There was a bug ID, but no workaround or resolution provided.

Single APIC cluster broken after upgrade 1.1(2h)

CSCuw34092

dpita · ‎04-20-2016

I know that bug. What happened is when upgrading a single APIC cluster to that version, a new check was added for version. if you run "acidiag avread" you will see the cluster version and the APIC version are different.

Regrettably, the solution is to wipe the fabric, hopefully you have a recent config export. Please keep in mind running a single APIC in production is not recommended or supported.

To wipe the fabric use the following commands:

-APIC: eraseconfig setup

-SW: setup-clean-config.sh

-run through fabric discovery

Hope that helps!

lpember · ‎04-20-2016

Kyaw,

If you would like to confirm whether you are hitting the bug and see if there is any way to recover the configuration, I recommend that you open a TAC case as they may be able to help. Running one APIC cluster in production is not supported so it's always recommended that you schedule a regular backup if you are only using one APIC.

kgolding · ‎10-17-2016

if this happens to a production cluster of three, what would be the backout process? configuration backups exist, would the best process be to blow the invalid controller away and re-join the cluster? this would leave the question about the other nodes? they would be mid-way through the upgrade process.

KYAW THURA · ‎04-20-2016

Thanks dpita for your advice.

It's not a production environment. So I'm ok to wipe the config and built the fabric again from clean state.

But what I'm a bit unclear is that, the erase config setup & setup-clean-config.sh wipe the setup and user configurations only. The issue encountered was when the APIC and Switches are upgraded with version 1.1(2h). If the error is related to software itself, will it really resolve the issue?

Thanks again!

dpita · ‎04-21-2016

Thats a good question!

To clarify, "eraseconfig setup" on the APICs will wipe the entire user configuration as well as the fabric configuration and when the APIC reboots, it will drop you off at the setup script again. This means you need to enter the fabric name, TEP range, infra VLAN, etc all over again. This is essentially building a new fabric from scratch. Same with the switches, once they are wiped of user config and the APICs are clean, the switches need to be rediscovered as if it was a brand new fabric out of the box.

The reason that the clean config woks is that the APIC is starting out at 1.1(2h) and the cluster is starting out at 1.1(2h) so there is no mismatch of versions. Your bug only occurs when upgrading from older code to 1.1(2h) which leaves a mismatch between the cluster version and the APIC versions.

Hope that helps! What other questions do you have?

sentenced · ‎04-22-2018

Hi Daniel!

Big fan of yours!

I was trying something with our TestBed fabric (we still are at 1.3 version)(loooong story), and for some reason we did not have 3rd APIC connected to the fabric. I decommissioned the 1st APIC from the 2nd APIC, and after that any policy creation from 2nd APIC showed this same error. I re-commissioned the 1st APIC, and things started working normal.

Attaching image.

Just wanted to share.