I have cisco cat4006 switch with 4232L3 layer3 routing module installed, I have 1 gigabit FX module installed on G1 interface in the routing module.
the system gets hanging frequintly (every 1 or 2 days)at different times, some time its hanging at night when there is no heavy trafic, some time during the working hours !! when I am restaring the system its working fine and again gets hanged after some time.
also I feel that the whole network gets slower than before. I am not sure this is related to HW or SW and don't know how to even trace the problem
any help !!
Could you give us a bit more information on the role of this switch to help. Where does this switch seats in your network topology, is it a core switch where everything connects e.g server, users etc., or do you have ather switches connected to the 4006.. does this suddendly developed? if so any changes add ons in this switch that may be causing bridging loops.
Also, when you said system hangs can you telnet to it or connect to the switch through the local console when it hangs?
Thanks for reply, this is my core switch and there is no switches connected to it. it has 4 Gb interfaces in the routing module. 2 internal G3 and G4, and 2 external interfaces G1, G2 (in the routing blade)
G3 and G4 are connected internaly to the switch chasais.
I am using G1 for Server Networks, G3 for Staff network, G4 for Studnets network. G2 is empty.
what I noticed is when the switch hang its stoping the communication between those interfaces but each network segment can comuunicate inbetween.
also when it gets hanged I am able to access it using the consol port there no issue in that.
This reminds me of a hardware bug in the 4006 Sup II card that is not well documented and cannot be fixed. I'm going to paraphrase because I cannot remember all the technical terminology, but if you look up my postings about the 4006 from about 2 years ago, you may find more detail.
It is to do with the internal architecture of the Sup II. If you look at the card, you will see 3 giant heatsinks, each hiding a chip. These are three 12-way Gigabit switches that control traffic between the backplane uplinks (6 channels per card). The bug was due to the CAM tables getting confused when the links between these 3 switches get congested.
The symptom is that groups of 100 Mbit ports (or single Gbit ports), "fall off" the chassis.
You can get some change in behavior by redistributing the connections, so that pairs of links that have high traffic to each other are backplaned to the same chip on the Sup II. Generally this means putting them on the left third of the chassis, the middle third, or the right third.
Don't forget that there may be light user traffic during the night, but that is generally when backups take place.
But in the long term, the only really reliable way to overcome the problem was to upgrade to a 4506. (Which in your case has the advantage that the router is built into the Supervisor card.)
I would like to add that this core is working fine for 5 years .. I don't know this is the normal life time for it or there is a problem happened.
also something very strange happened today. the switch get hanged then after restart as useual I found there is no startup config present on the NVRAM !!!
and router config has been restored to the default settings :( I was able to rebuild it but don't know what is the next problem I will face.
I would like to replace it but before that I have to be sure there is a HW problem which can't be solved so I can procced with the replacment process.
I'm afraid if it is the problem I was talking about, you will have a hard time convincing anyone that the problem exists at all. I did, and it was only after escalating it through the TAC that I made any progress. there are related bugs in the bug database, but none that describes the problem exactly.
The incident with he NVRAM is disurbing too. If it was IOS I would say it is like someone had done a write erase on it. I have never used the L3 card, so I don't know how it works - is there a seperate IOS on the card itself? A couple of incidents like that could be used to justify the upgrade you need, especially if there was any significant downtime. ;-)