We ran into an issue today where we had to move a phone and upon plugging it back in, the phone did not get a DHCP address. It obviously wouldn't work at all. So we tested cycling a different phone and the same issue happened. I then took a spare test phone and I had the same issue. I went to another access stack and did NOT have the issue with the phone. I then went to our 3rd floor access stack, and again everything worked as expected.
So I narrowed it down to our larger 3750 switch stack that serves our 1st and 2nd floor. This is an access layer stack (one out of 3) that has 20gbps portchannel up to Extreme Networks x690 core switches running VRRP and aggregating everything, including servers responsible for DHCP. Because I could not replicate the issue on the other two switch stacks, I went into our problem switch stack and noticed that the phone is never added to the CDP table like it is with the working switches. I double checked and CDP is enabled, as it has been for the past 4+ years. What is interesting is that if I hardcode the phone (Polycom VVX500) to tag VLAN 172 as the voice network, it reboots and works fine. All existing phones work fine, and seem to continue to do so, but our fingers are crossed that next DHCP renewal cycle doesn't break anything.
When I do sh cdp traffic on this problem stack the counters are never incremented. However on the other stacks if I wait a few minutes an issue sh cdp traffic they increment (output and input) without issue.
If I issue sh cdp neighbor, it shows existing switch neighbor relationships as well as existing polycom phones, but it never shows the new phone no matter what port I try.
I believe the issue is narrowed down to a stuck CDP process. I opened a case with TAC but until they get back with me I am curious if anyone has any suggestions. Can CDP be restarted without causing any downtime?
sh cdp traffic
CDP counters :
Total packets output: 522993304, Input: 227169440
Hdr syntax: 0, Chksum error: 62, Encaps failed: 0
No memory: 0, Invalid packet: 8,
CDP version 1 advertisements output: 1447873, Input: 4281097
CDP version 2 advertisements output: 521545431, Input: 222888343
One of my test ports:
description test port D282
switchport access vlan 7
switchport mode access
switchport voice vlan 172
ip access-group acl1 in
uptime is 4 years, 6 weeks, 6 days, 5 minutes
System returned to ROM by power-on
System restarted at 14:08:44 EST Sat Nov 9 2013
System image file is "flash:/c3750-ipservicesk9-mz.122-55.SE8.bin"
Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 1 52 WS-C3750G-48TS 12.2(55)SE8 C3750-IPSERVICESK9-M
2 52 WS-C3750V2-48TS 12.2(55)SE8 C3750-IPSERVICESK9-M
3 52 WS-C3750V2-48TS 12.2(55)SE8 C3750-IPSERVICESK9-M
5 52 WS-C3750V2-48TS 12.2(55)SE8 C3750-IPSERVICESK9-M
6 52 WS-C3750V2-48TS 12.2(55)SE8 C3750-IPSERVICESK9-M
7 54 WS-C3750X-48P 12.2(55)SE8 C3750E-UNIVERSALK9-M
8 54 WS-C3750X-48P 12.2(55)SE8 C3750E-UNIVERSALK9-M
9 30 WS-C3750X-24P 12.2(55)SE8 C3750E-UNIVERSALK9-M
These are Polycom VVX500 phones. They may not be Cisco, but the command switchport voice vlan 172 works flawlessly across 5 other locations, and even in the main location on two separate access stacks. It used to work for the past 4+ years on our large access stack. We've just noticed recently it only stopped working on the large stack.
Whats the difference? Nothing really. They all run multiple vlans, port-security mac address sticky, some do layer 3, while other sites are handed off to a router (2901 or 2911). The only difference is this one stack no longer increments the counters when you run sh cdp traffic. Most stacks are on the 12.2.55, but I have two stacks on the 15.x train. The other difference is this stack has a few years on the others. Its been up for 4 years, 6 weeks, 6 days, 17 hours and 25 minutes. See if you go back in time that long ago we come to a period where we tried upgrading to the 15.x train. It failed miserably. It barely kept an SSH session alive. A routed port to another branch out a provider WAN connection was very unstable. In fact many ports were very unstable, slow, dropping traffic. Switches would lock up. We had to break (a than 9 switch stack) down one by one and downgrade it back to 12.2.55 SE8. We removed switch 4 then so it became an 8 switch stack. Once we got everything back up and stacked again, we ran it ever since. I believe we issued a command to renumber the switches so switch 5,6,7,8,9 would become switches 4,5,6,7,8 (remember we removed switch 4), but that was years ago and we'd have to see if that command made it over the years. Were kind of afraid to touch the thing since then so on December 16th we moved all core routing to new redundant Extreme Networks cores, which run VRRP and do multi chassis link aggregation on all uplinks. All we did was move the static routes to it and pull the vlan IPs off the large 3750 stack, and change the IP for vlan 1. The large 3750 stack is only routing to one building down the street since it does IP SLA, but the Extreme gear has a similar feature called ping protection which is broken until the April 2018 release. Anyway I digress.
We also tested different ports, patch panels and cables. I can recreate the problem at my desk, at a conference room and at my coworkers desk, as well as an accounting associates desk. So the fix is ether home run the phone into a switchport access vlan 172, or better yet you can still bridge the phone to the PC if in the phone you go through the menus and set the 802.1q VLAN tag to 172 (which is our voice vlan).
On no other switch stack is this required. All other switch stacks if you do sh cdp neighbor you not only see the uplinks (switch names and ports) to the Extreme gear, other Cisco AP's, but also Polycom phones. On the large switch stack sh cdp neighbor shows all existing CDP relationships, but new phones do not show up in the table.
Yes I had a port go bad once where a phone had power but no dhcp, but this is a different issue and according to our fluke network meter, the ports are good.
Unfortunately I do not have any spare AP's laying around to test that theory. Phones I do have, but not APs.
I did check the Polycom Advanced Ethernet settings VLAN menu and LLDP and CDP Compatibility is enabled. We have CDP enabled on all of our Cisco switches, but we do not have LLDP enabled. Not sure if its worth enabling LLDP or not.
If I specify VLAN 172 on the polycom phone, then this port configuration works fine
Example PC in vlan 7 but phone in vlan 172
switchport access vlan 7
switchport voice vlan 172
I was re-configuring a phone at one of our other branch office sites where it terminates into a Cisco 3560, and as a part of the reconfiguration I had to reboot the phone. There were NO issues with the phone showing up as a CDP neighbor and putting itself into the voice vlan at that other site. This shows me the problem is our switch stack.
I'm hesitant but curious to try the command to disable CDP then the command to enable it. I ran it on a spare switch while in a voice call and the call did not drop, so that is good.
I can say on a test switch, a Polycom VVX500 phone was put into the proper voice vlan with cdp disabled and lldp enabled globally.
I may try to enable lldp on the big stack, if CDP is frozen or not running, LLDP may buy us sometime until we can get a big maintenance window to do a reload. I'd hate to see that 4+ years of uptime go away, but I will be sure to take a screenshot before doing a reload.
I put the phone with a power inserter on a 100mb hub, using another port as uplink and another port to a laptop running a wireshark capture.
I can clearly see the CDP (and now LLDP) packets to and from the phone on one of our working access switches.
When I change the uplink to our big switch stack, I see absolutely NO CDP packets.
So I think I will enable lldp on the main stack.
I disabled cdp and enabled it, and I still do not see any cdp packets across the wire. I know wireshark is setup properly because when I plug it into another switch stack that has no issues, I see 4 to 5 CDP packets at the initial plugin.
Anyway with lldp the phones are detected properly and theres no communication issues. I did move the one person's phone back to their original configuration where the PC is connected through the phone. Oddly the switchport port-security mac-address sticky never showed the phone's mac address (the port is set to a maximum of 2). So I added it in by hand. I'll have to do further testing but now it has me wondering if mac based port security is working properly or not.
The stack is getting weird with its 4+ years uptime.