Solved: Cisco SG300-52MP connectivity problem

Nikolay Pertsev · ‎11-13-2013

Hello,

I have Cisco SG300-52MP switch. Recently device has rebooted itself sending following message to remote log server after bootup.

Nov 10 09:12:51 switch %Box-F-INVALID-PARAM-SETTING: Function BOXG_poe_i2c_read_mem_byte: invalid param recv_byte_PTR value = 0 ***** FATAL ERROR ***** Reporting Task: HCPT. Software Version: 1.2.9.44 (date 30-Sep-2012 time 01:33:07) 0x16a7a4 0x1671e4 0x6596d0 0x433c04 0x4390d0 0x4392b0 0x8f8c84 0x907320 0x8e92cc 0x8e96e4 0x8e98b4 0x8ec8b8 0x8ed9c4 0x8e2cb8 0x8f0990 0x8c9d64 0x8ca8c4 0x8b6674 0x8b71e8 0x121d9c

I decided to upgrade firmware as possible preventive action to avoid this from happening in the future. I did so and now switch is running with 1.3.0.62 version firmware.

Now I got another sort of problem with this switch (right after upgrade). Periodically switch's management interface becomes irresponsive. I did ping test and following is output of the test while problem happens:

64 bytes from switch (192.168.1.1): icmp_seq=3567 ttl=63 time=1.09 ms

64 bytes from switch (192.168.1.1): icmp_seq=3568 ttl=63 time=1.12 ms

64 bytes from switch (192.168.1.1): icmp_seq=3569 ttl=63 time=1.06 ms

64 bytes from switch (192.168.1.1): icmp_seq=3570 ttl=63 time=1.11 ms

64 bytes from switch (192.168.1.1): icmp_seq=3571 ttl=63 time=29538 ms

64 bytes from switch (192.168.1.1): icmp_seq=3572 ttl=63 time=28539 ms

64 bytes from switch (192.168.1.1): icmp_seq=3573 ttl=63 time=27539 ms

64 bytes from switch (192.168.1.1): icmp_seq=3574 ttl=63 time=26540 ms

64 bytes from switch (192.168.1.1): icmp_seq=3575 ttl=63 time=25540 ms

64 bytes from switch (192.168.1.1): icmp_seq=3576 ttl=63 time=24541 ms

64 bytes from switch (192.168.1.1): icmp_seq=3577 ttl=63 time=23542 ms

64 bytes from switch (192.168.1.1): icmp_seq=3578 ttl=63 time=22542 ms

64 bytes from switch (192.168.1.1): icmp_seq=3579 ttl=63 time=21542 ms

64 bytes from switch (192.168.1.1): icmp_seq=3580 ttl=63 time=20543 ms

64 bytes from switch (192.168.1.1): icmp_seq=3581 ttl=63 time=19543 ms

64 bytes from switch (192.168.1.1): icmp_seq=3582 ttl=63 time=18544 ms

64 bytes from switch (192.168.1.1): icmp_seq=3583 ttl=63 time=17545 ms

64 bytes from switch (192.168.1.1): icmp_seq=3584 ttl=63 time=16545 ms

64 bytes from switch (192.168.1.1): icmp_seq=3585 ttl=63 time=15545 ms

64 bytes from switch (192.168.1.1): icmp_seq=3586 ttl=63 time=14546 ms

64 bytes from switch (192.168.1.1): icmp_seq=3587 ttl=63 time=13547 ms

64 bytes from switch (192.168.1.1): icmp_seq=3588 ttl=63 time=12547 ms

64 bytes from switch (192.168.1.1): icmp_seq=3589 ttl=63 time=11548 ms

64 bytes from switch (192.168.1.1): icmp_seq=3590 ttl=63 time=10548 ms

64 bytes from switch (192.168.1.1): icmp_seq=3591 ttl=63 time=9549 ms

64 bytes from switch (192.168.1.1): icmp_seq=3592 ttl=63 time=8549 ms

64 bytes from switch (192.168.1.1): icmp_seq=3593 ttl=63 time=7550 ms

64 bytes from switch (192.168.1.1): icmp_seq=3594 ttl=63 time=6550 ms

64 bytes from switch (192.168.1.1): icmp_seq=3595 ttl=63 time=5551 ms

64 bytes from switch (192.168.1.1): icmp_seq=3596 ttl=63 time=4551 ms

64 bytes from switch (192.168.1.1): icmp_seq=3597 ttl=63 time=3552 ms

64 bytes from switch (192.168.1.1): icmp_seq=3598 ttl=63 time=2552 ms

64 bytes from switch (192.168.1.1): icmp_seq=3601 ttl=63 time=35.4 ms

64 bytes from switch (192.168.1.1): icmp_seq=3602 ttl=63 time=1.11 ms

64 bytes from switch (192.168.1.1): icmp_seq=3603 ttl=63 time=1.04 ms

64 bytes from switch (192.168.1.1): icmp_seq=3604 ttl=63 time=1.11 ms

64 bytes from switch (192.168.1.1): icmp_seq=3605 ttl=63 time=1.07 ms

64 bytes from switch (192.168.1.1): icmp_seq=3606 ttl=63 time=1.10 ms

Meanwhile all the connected equipment (30 active ports) is functioning well without network delays, so problem appears only for management interface.

Could anyone advice me on why this is happening and what can be done to resolve it?

Tom Watts · ‎11-20-2013

Fascinating, that means this may be related-

https://supportforums.cisco.com/thread/2221597

Edit-

These is a known bug for IPV6 where the management of the switch gets hung in an IPV6 environment. So I suspect your problems expands more than IPV6.

CSCuh50141

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

View solution in original post

Tom Watts · ‎11-22-2013

Hi Reuben, you are correct it may not be related as the IPV6 because it completely froze the switch. There were some other anomalies resolved on the 1.3.5 software in regards to NTP issues according to the develpment team. Currently it is being investigated as a separate issue.

Suffice to say, anything I hear I'll relay here.

Nikolay, when upgrading to the 1.3.5 software, if you upload it without the new boot code it should give an error "unknown software version" or something of this nature. You can load the boot code through TFTP. It's essentially the same process as a firmware load.

Check here for some information

https://supportforums.cisco.com/thread/2252523

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

View solution in original post

Tom Watts · ‎11-13-2013

Hi Nikolay, it is not characteristic of this switch to have a latency issue to the management IP. It is possible the switch software has a blip and may just need a factory default. It could be some other problem may be latency through the switches default gateway or some sort of odd negotiation issue.

I'm not sure it's really possible to tell you what's wrong but if you want to invariable rule out of the switch, you can disconnect it from the network and run a constant ping to the mangement IP and if it has the same latency, my money is on the software having a blip and it probably needs a default. If it clears up, I'd guess you have a weird routing problem or a physical wiring issue somewhere.

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Nikolay Pertsev · ‎11-14-2013

Hi Tom and thank you for your response.

Tom Watts wrote:

If it clears up, I'd guess you have a weird routing problem or a physical wiring issue somewhere.

Regarding above quote: I doubt this is the case, because this switch is connected to the rest of the network with 2 aggregated links going to 2 different switches (with spanning tree enabled) - 2 cables each - which gives 4 cables of uplink. In addition I run two ping tests simultaneously - one for management interface, second one for computer which is directly connected to this switch. When problem occured for management interface, ping delay for connected computer was stably around 0.15 ms. I think this is enough to be sure there is no physical or routing problem (all network is served by one Cisco Catalyst swith in L3 acting as router). If this would be a physical or routing problem then I would experience same problem somewhere else, I guess.

So, I would like to follow your advice regarding resetting switch to the default config. I will do so once I will be able to schedule network downtime and I will let you know results.

Additional question to you - do you know anything about that first error which I gave in my first post?

%Box-F-INVALID-PARAM-SETTING: Function BOXG_poe_i2c_read_mem_byte: invalid param recv_byte_PTR value = 0 ***** FATAL ERROR *****

Best,

Nikolay

Tom Watts · ‎11-14-2013

Hi Nikolay, I do not know the relevance of the error. I think it has to do with the memory pointer when it received an invalid byte value.

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Nikolay Pertsev · ‎11-19-2013

Hi, Tom.

I did reset switch to factory default configuration. Problem still persists. I did more monitoring of the switch and I discovered that latency problem occurs every 17 minutes. Sounds really strange - what is happening every 17 minutes? I will try to play with switch configuration to see if I can find root of the problem.

Regarding ping test which I provided in the first post. When problem occurs ping test results 'freeze' - no change on the screen - so packets are going to the switch but not coming back. After 30 seconds delay all the packets which were sent come back at same moment (packets with delay 29538 ms - 35.4 ms). It looks like switch accumulates them for 30 seconds and suddenly sends them back as one batch. May be someone can think with this to help me resolve this.

Tom Watts · ‎11-19-2013

So, if you remove the switch from the network and ping it for 17 minutes it will have the same problem?

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Nikolay Pertsev · ‎11-19-2013

I found what was causing the problem.

With use of Wireshark I found that every 17 minutes switch syncs time with NTP server. I set address of NTP server as IP address, not hostname. So, every time NTP sync happens switch will request PTR record for the NTP server's IP address. Switch will do two requests to DNS server:

first looks like this: 2.1.168.192.in-addr-arpa.domain

second looks like this: 2.1.168.192.in-addr-arpa

On first request switch will get "not found" answer. Since this was ONLY ONE wrong thing there and it was happening absolutely in same time with latency problem and everything else was perfect I thought it might be connected to my problem and I decided to set NTP server's address as hostname. The DNS name resolves to IP correctly and switch is not getting any 'not found' answers from DNS server. Now switch is running for 1 hour without problems, which makes me to believe the problem is gone.

I think switch should not behave like this. It must be able to handle 'not found' answer from DNS without freezing management interface. But anyway... problem is solved and I am happy with this.

Tom Watts · ‎11-20-2013

Fascinating, that means this may be related-

https://supportforums.cisco.com/thread/2221597

Edit-

These is a known bug for IPV6 where the management of the switch gets hung in an IPV6 environment. So I suspect your problems expands more than IPV6.

CSCuh50141

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Reuben Farrelly · ‎11-21-2013

Tom - I was the customer who logged CSCuh50141 with TAC. The problem was specifically relating to IPv6 NTP causing the management to lock up - no other IPv6 traffic caused any problems. This bug was fixed in 1.3.5.58.

Nikolay - can you try this release and see if it still occurs? Chances are if you log a case they'll suggest you upgrade first anyway.

Nikolay Pertsev · ‎11-22-2013

Reuben,

Thank you for participating. I uploaded 1.3.5.58 firmware update to the switch. But I will have to wait until end of the day before I will be able to restart the switch. I will check and I will let you know about results.

By the way, I also saw there is update for Boot code. Could you explain me (or give me link which explains that) is it necessary to upgrade switch Boot code and what I will gain from it?

Thank you,

Nikolay.

Tom Watts · ‎11-22-2013

Hi Reuben, you are correct it may not be related as the IPV6 because it completely froze the switch. There were some other anomalies resolved on the 1.3.5 software in regards to NTP issues according to the develpment team. Currently it is being investigated as a separate issue.

Suffice to say, anything I hear I'll relay here.

Nikolay, when upgrading to the 1.3.5 software, if you upload it without the new boot code it should give an error "unknown software version" or something of this nature. You can load the boot code through TFTP. It's essentially the same process as a firmware load.

Check here for some information

https://supportforums.cisco.com/thread/2252523

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Nikolay Pertsev · ‎11-22-2013

Hi, Tom.

Thank you for explanation about boot code. Even though I had an idea it is optional, I did upload the boot code upgrade too via TFTP. You saved me a lot of trouble for future updates by explaining importance of boot code upgrade.

Thank you.

Nikolay Pertsev · ‎11-23-2013

Hi Reuben,

I did upgrade the switch software to the version 1.3.5 last night. Switch worked perfectly over the night and continues to work well now.

I didn't do extensive testing with Wireshark and so forth, but I played with NTP settings: first, I set address of NTP server as IP - it was good, and after that I set it as DNS name - again it was good. I think problem is fully solved in this update.

Nikolay

Tom Watts · ‎11-23-2013

Hi Nikolay, this bug was reproduced by the development and engineering teams and is confirmed fixed on the 1.3.5. If you observe any anomalies please share and we can check it out further.

-Tom
Please mark answered for helpful posts

-Tom Please mark answered for helpful posts http://blogs.cisco.com/smallbusiness/

Nikolay Pertsev · ‎11-23-2013

I appreciate your help, Tom.

Good to hear it was fixed!

If I will spot another bug - I will report it.

Nikolay