This document briefly summarizes the 828 days problem often observed in CSS and CSM.
There is a need to provide detailed information on how the 828 days problem occurs, and ways to avoid it.
CSS and CSM are based on 32 bit VxWorks.
Further, the clock of VxWorks is counted at 60 Hz (increases by one every 1/60 of a second). You can get the value with tickGet() API, provided in the following URL. For example, when you get the value of tickGet() five seconds after booting, you can get the value of 0x12c (60Hz * 5sec = 300). CSS and CSM refer to this value for various purposes.
As tickGet() is 32 bit timer, and its maximum value is 2^32 = 4294967296, when this value is wrapped, the counter is reset to 0.
In other words, tickGet() value will be reset to 0 after 828 days and 12 hours have elapsed, according to the following formula.
Therefore, various problems will occur after 828 days have elapsed.
2^32 / (60Hz*60sec*60min*24hr) = 828.5days
Let us explain this problem in a bit more detail. We will use keepalive feature for the example. Keepalive is sent by CSS regularly.
By default, CSS sends icmp packets to a service every five seconds for an availability check.
Within CSS, the next transmission time will be calculated with current time and keepalive interval (five seconds, 0x12c; next_keepalive = tickGet() + 0x12c).
For example, if keepalive is sent 3600 seconds after booting, the next icmp packets will be sent 3605 seconds after booting.
If the value retrieved by tickGet() is larger than 3605 seconds (tickGet() > next_keepalive), keepalive packets will be sent.
If the tickGet() value is 0xfffffff0, the next_keepalive value is set to 0x1000011c, but the maximum value of tickGet() is 2^32 = 0xffffffff. Therefore, if this maximum value is exceeded, it is reset to 0 and the next keepalive value is set to 0x1000011c.
In this case, the condition of tickGet() > next_keepalive will never come, and thus CSS stops sending keepalive packets.
Changing the base OS from 32 bit to 64 bit also requires significant changes in CSS/CSM, which runs on the OS. Therefore, we have decided not to upgrade the base OS.
As a result, many bugs that may have taken effect after 828 days have been corrected.
For both CSS and CSM, we fixed many bugs. The root problem, however, remains. Therefore we suggest you reboot CSS/CSM before 828 days have elapsed..
Note: End of SW Maintenance Releases Date: September 20, 2012
Also, some of the reported failures were analyzed in order to determine that correction was impossible.
To avoid these problems, it is recommended that CSS/CSM be rebooted every two years.
When the CSS has an uptime of 828 days, it cannot send packets to the management port for 18 minutes. This issue affects the management port only. The circuit and VIP addresses works fine. We recommend that you reboot the CSS before its uptime is 828 days.
Hear Liz Centoni explore how the impact of explosive application growth and new development toolsets are creating a new reality that demands IT leaders approach cloud as an operating model that brings everything and everyone together.
Catch the keynote on...
Anyone can help me to resolve this issue below are the logs on my l3(CAT9K_IOSXE) Logs:-May 26 23:17:29: %SW_MATM-4-MACFLAP_NOTIF: Host 0015.5d97.f616 in vlan 10 is fl apping between port Te1/0/19 and port Te1/0/24May 26 23:17:40: %SW_MATM-4-MACFLAP_...
hi guys, I have to choose server farm switch but i dont have enough experince for that. My environment : 10 servers in VCenter cluster, brocade SAN switch for storage Tech spec:- 10G- standart L3 switch- non stoping Upgrade- may be MPLS for...
I need to reboot a HX nodes for a new config to take effect. the thing is each memory of the nodes is almost full when you check each host in the vcenter (let's say 90% of the memory of each host is used) The question is if I reboot the nodes 1 by 1;...
We have a N3K-C3524P-XL with 24 port enable by default; And we need additional 24 port license recently.How do we apply the license? using command install license xxx is ok? and do we need reboot the box, or do we need downtime?