11-02-2015 06:47 AM - edited 03-08-2019 02:32 AM
Hi,
A client is currently running 2 core 6500's with SUP2's (I know they are end of support but aren't in a position to upgrade them at the moment). They are sporidically experiencing users dropping off the network. To resolve this they have to clear down the ARP cache on the 6500's and disabled and re enable the users NIC card, there is no pattern to which access switch the affected users are connecting to, it appears to be randomly across the campus. In the logs of the core1 switch which has most active VLAN's we see the following message:
Aug 25 11:42:29.554: SP: earl6_adj_free::attemp to free a non-allocated adj; index= 17520
Aug 25 11:42:29.554: SP: -Traceback= 4017734C 40171574 4016E404 4016BDA4 4016B998
There is little to no information about this error on CCO, I can only find that it will cause unexpected behaviour on the switch. We have swapped out the active SUP where we are seeing the error message but it returned a week later, also failed over to the standby SUP but issue appeared again after a week or so. We are planning on swapping out the standby SUP as well but not sure this will fix the issue. Running version 12.2.18-SXF17b. Any assistance would be appreciated.
Thanks in advance,
P
11-02-2015 01:59 PM
A few questions might help thought processes:
1. Is this a purely L2 network, and the 6500's a collapsed core?
2. How many downstream switches are there? Are they all L2 trunked, or routed?
3.. Approximately how many servers/workstations/printers on each (or the only...?) VLAN?
4. How is the memory use on the chassis (both of them), at the time things are experiencing issues?
5. What happens when they "drop off"? - can they ping the local gateway; do they lose their ip address, is there a loss of link on workstations; or could it be something like a DNS server isn't responding and it "looks" like the network is down.
11-03-2015 04:04 AM
Hi,
Thanks for your reply and questions, I have gained some clarity on the issue from the customer.
1. This is a collapsed core.
2. There are about 30 access switches, mixture of 2960's, 3750's and 6500's (there is no pattern to which switches / users are affected). They are all layer 2 trunked connections with their DFGW's being SVI's on the core 6500's. The core 6500's are running HSRP between them for each SVI.
3. There are approx 90 vlans's that use a mixture of /23, /24 and /25 although they are not near full utilisation.
4. Memory looks fine, when it happens it 'usually' coincides with the error mentioned in my original post.
5. When the issue occurs they are still able to ping their local gateway but not get anywhere else until the ARP cache is cleared.
I have got a snap shot of the ARP and MAC tables from the core so I can compare them after the issue occurs again. It is very sporadic, it can happen mulitple times a day or not at all and just affects one user at a time. It also happens across different VLAN's so there is not pattern which is making it hard to narrow down the issue.
11-03-2015 04:47 AM
Just as an update they are also seeing the following error on core1:
Nov 3 10:56:17.620: %DATACORRUPTION-1-DATAINCONSISTENCY: copy error
-Traceback= 40129E74 4015DAC0 4027A7D4 402721C8 4027D654 4020FC98 4026EA58 4026ECF0 4027E414 40210E7C 4022314C 401608E8 40155F64 4020EA88 40156098 401608E8
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide