01-24-2016 11:42 PM - edited 03-08-2019 03:31 AM
Hey!
We are having some issues in our lab environment. I'm out if ideas and want to check if someone else has been experiencing the same issues or if there is a known solution.
In this lab environment we have 4x6807XL-chassis and every chassi is equipped with 2x supervisor 2T-modules running 15.2.1-SY1 software and 2x C6800-32P linecards. I have also tried with software 15.2.1-SY1A without any difference. Oversubscription is turned off on all port-groups on the linecards.
Two of the 6807XL-chassis is configured as corechassis running as Provider(P) equipment and the other two are configured as a VSS-cluster running as Provider Edge(PE). Instead of describing the connections between core- and distribution the picture below should describe them.
The issues we are having is when the physical interfaces between core- and distribution is placed in a port-channel.
it's like the controlplane protocols can't communicate over the logical interface and finally UDLD puts the physical ports in
error-disable in the VSS-cluster(UDLD-timeout). If we debug UDLD we can see that both the core- and distribution chassis seems to send and receive UDLD-packets.
When we disable UDLD the physical port stays up but it is not possible to ping the point-to-point network addresses and the core chassis dont see
the VSS-cluster via CDP. The VSS-cluster can see the core-chassis via CDP.
The VSS-cluster adds a CEF adjacency record for port-channel 12 and 62 with the correct information but the core-chassis have no records.
If we convert all the physical interfaces to pure L3 point-to-point interfaces instead of placing them in L3 port-channels it works.
Running the VSS-cluster chassis as standalone makes no difference and we have tried swapping hardware, chassis, fiber and SFP-modules without any change.
The only solution we have found is by removing the physical interfaces from the port-channel.
We are using the following configuration on the port-channels and the physical interfaces.
Core1
interface Port-channel12
mtu 9216
ip address 10.124.254.77 255.255.255.252
no ip redirects
no ip proxy-arp
ip router isis 10
logging event link-status
mpls label protocol ldp
mpls ip
isis network point-to-point
!
interface TenGigabitEthernet1/5
mtu 9216
no ip address
logging event link-status
logging event bundle-status
channel-group 12 mode active
!
interface TenGigabitEthernet2/5
mtu 9216
no ip address
logging event link-status
logging event bundle-status
channel-group 12 mode active
VSS1
interface Port-channel12
no switchport
mtu 9216
ip address 10.124.254.78 255.255.255.252
no ip redirects
no ip proxy-arp
ip router isis 10
logging event link-status
mpls label protocol ldp
mpls ip
isis network point-to-point
!
interface TenGigabitEthernet1/1/1
no switchport
mtu 9216
no ip address
logging event link-status
logging event bundle-status
channel-group 12 mode active
!
interface TenGigabitEthernet2/1/1
no switchport
mtu 9216
no ip address
logging event link-status
logging event bundle-status
channel-group 12 mode active
Port-channel 62 is configured the same way as port-channel 12.
Usually when i run into problems like this there is always a post or two describing why it happens and how to solve it. I know we are running new hardware and software but i hope someone has a similar setup and design.
We have this setup implemented in production but we haven't started using it yet and that implementation doesen't have the same symptoms like the setup in the lab. I can provide more running-configuration or output from show commands if needed.
Solved! Go to Solution.
06-19-2016 08:26 PM
Hello !
I want update to this post in case someone else gets the same problem.
After almost 3 months of troubleshooting with a Cisco TAC engineer Cisco released a new software that seems to fix this problem, 15.2.1-SY2.
We have not been able to find the root cause yet but Cisco will keep working on this and will get back to us if they find anything. The support engineer have had some issues finding hardware so he had a different setup than us and couldn´t reproduce the problem. He is now waiting on the correct amount of supervisors, linecards and chassis to see if he can reproduce it.
I should also mention that during this time we got 4 new fully loaded 6807XL-chassis and we got the same problem.
Update 20160620
The latest information i got from the TAC engineer that was involved troubleshooting this case.
Regards
Micke
01-25-2016 12:37 AM
The bit stuck in my mind is when you say you disable UDLD you still have issues. It sounds to me like UDLD is kicking in - because there is a real forwarding issue.
What happens if you only plug in one of the port channel? Then try doing the other member on its own. Does it by chance only happen with one of the two members?
01-25-2016 01:22 AM
Hey Philip!
yes i'm very fortunate to have a big lab to play around with, and now it is small compared to what it was before we moved 6x 6807XL into pre-production.
We have tried running with only one core-chassi online and only port-channel 12 or 62 active. We have also tried running the VSS-cluster as standalone with one VSS-cluster chassi and one core chassi and other setups. The VSS-cluster configuration is still in running-config but we have converted them both to standalone.
We have tried running only 1 active port in the port-channels were VSS1 is connected to Core1 and VSS2 is connected to Core2 and we get the same problem.
I dont think this is a UDLD-problem as you mention i think UDLD error-disable is a symptom of another problem.
A Cisco TAC-case has also been created.
01-25-2016 01:27 AM
15.2(1) is very new. Any chance you can use the gold star release 15.1.2-SY6 ? I tend to lean heavily towards using gold star releases, especially in service provider networks.
01-25-2016 01:40 AM
The problem with that release is that it is not supported for the linecards used. The C6800-32P need at least 15.2(1)SY according to:
http://www.cisco.com/c/en/us/products/collateral/switches/catalyst-6800-series-switches/datasheet-c78-733662.html
01-25-2016 01:45 AM
No i cannot do that since the linecards C6800-32P doesen't work with 15.1.X. It only works with 15.2.X(i dont think that has changed). When we first got C6800-32P i tried with 15.1.X but the linecards didn't boot at all with that code so i swapped to 15.2.1-SY.
01-25-2016 12:38 AM
ps. I wish I had your lab. Then again, I wouldn't have enough power to plug it all in. But it would look impressive.
02-12-2016 06:22 AM
My Setup:
2 * 6807-XL (VSS) (only with one Supervisor 2T per Chassi - Version 15.2.1-SY1A) connecting to different Types of Distribution Switches with "multichassis etherchannels".
Issue:
After a Switchover -> a lot of portchannel members dont come up again !
A shutdown / no shutdown doesnt work!
I have to REMOVE the Portchannel Configuration from the interfaces and RE-APPLY the Portchannel Configuration !
Its not exactly the same Problem as you. But I think there is something really wrong with Portchannels in the Versions 15.2.1-SY(XX) !
06-19-2016 08:26 PM
Hello !
I want update to this post in case someone else gets the same problem.
After almost 3 months of troubleshooting with a Cisco TAC engineer Cisco released a new software that seems to fix this problem, 15.2.1-SY2.
We have not been able to find the root cause yet but Cisco will keep working on this and will get back to us if they find anything. The support engineer have had some issues finding hardware so he had a different setup than us and couldn´t reproduce the problem. He is now waiting on the correct amount of supervisors, linecards and chassis to see if he can reproduce it.
I should also mention that during this time we got 4 new fully loaded 6807XL-chassis and we got the same problem.
Update 20160620
The latest information i got from the TAC engineer that was involved troubleshooting this case.
Regards
Micke
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide