I administer a network of approximately fifty SG300 series switches, and recently upgraded the firmware on all my switches from 184.108.40.206 to 220.127.116.11. I had a similar problem -- but not exactly the same problem -- as the original poster described. Because I have a large network of diverse SG300 series models, and because I can reliably repeat the problem I describe below, I believe there is a flaw with firmware version 18.104.22.168, and I will file a bug report with Cisco.
Prior to the upgrade, all of my switches were running firmware version 22.214.171.124. I used both MD5 and SHA1 to verify that the 126.96.36.199 image I downloaded from Cisco's site arrived in-tact. The process of transferring the updated firmware image (188.8.131.52) to all fifty of my switches reported no errors. After uploading the new image, I marked the firmware slot containing version 184.108.40.206 to be active on next boot, and then rebooted the switches. All switches successfully booted to version 220.127.116.11; I know this because I immediately SSHed to each switch after the upgrade completed and ran "show version".
However, within one week of the firmware upgrade from version 18.104.22.168 to version 22.214.171.124, some of the switches stopped responding to all IP services -- SSH, ping, logging, SNMP, etc. were completely unresponsive, but in most cases the switches would still forward user traffic according to the running configuration. Even the physical serial console on these "crashed" switches were completely unresponsive. Power cycling the affected switches was the only fix, but several of the switches again became unresponsive a few days later. After twelve of the SG300 series switches on my network became affected by this issue, I decided to move all of them back over to the version we were previously running, which was version 126.96.36.199. This is where it got interesting...
On three of the affected switches, I monitored the boot process using the serial console. After a power cycle, version 188.8.131.52 would boot without reporting or showing any errors or anomalies at all. I would then mark the firmware slot containing version 184.108.40.206 to be active on next boot, and then reboot the switch using the "reload" command. Upon boot of the old firmware version, I would always get the following error message (even on switches that had not crashed):
Size 115 of passive image exceeds the maximum supported size 114.
The passive image is deleted...............
In other words, the "passive image" (the image slot containing version 220.127.116.11) was corrupt because it was one byte/block/whatever too large. The old version of the firmware would detect this, and then delete the slot containing firmware version 18.104.22.168. Running "show bootvar" after this took place showed the slot that had previously contained 22.214.171.124 to now contain "N/A".
Because the original poster essentially witnessed the same behavior via the web interface -- they were running 126.96.36.199, then copied 188.8.131.52 to the passive firmware slot on the switch, then rebooted the switch (without setting the slot containing the new version to be active on reboot), then noticed the passive slot to be empty ("N/A") after reboot -- I am assuming that we are in a similar boat here. My solution to this issue was to treat version 184.108.40.206 as flawed, and revert back to running version 220.127.116.11 on all of my switches. Since I have done this, none of my switches have become unresponsive.
Note that this issue affected twelve of my fifty switches within a 10-day window. It affected several models (SG300-10, SG300-10MPP, SG300-10SFP, SG300-28MP) and several hardware revision numbers (V02, V03, and V05, but V02 and V05 seemed particularly susceptible). I plan to submit a bug report to Cisco, including a step-by-step procedure on everything I did that led up to my conclusion that 18.104.22.168 is flawed.
... View more