Introduction
The address resolution protocol (arp) is a protocol used by the Internet Protocol (IP) [RFC826], specifically IPv4, to map IP network addresses to the hardware addresses used by a data link protocol. The protocol operates below the network layer as a part of the interface between the OSI network and OSI link layer. It is used when IPv4 is used over Ethernet.
Problem
Some of the IP phones are not registering to the CUCM
CUCM Cluster: 2 nodes(Sub is the primary), Version: 8.6.2. Publisher is acting as DHCP server
Topology :
IP phones(VLAN: 200) --- Switch --- UCS (CUCM Subscriber(VLAN: 200), 3rd party server: 200)
|
|
Router
++ There are a lot of phones registered(around 1100), a few of the phones (10-15) are in an unregistered state
++ the phones are in VLAN: 200, CUCM is in VLAN: 200, the VLAN 200 is allowed in the trunk configuration between the UCS and the switch
++ When we reboot the working phones, they are going to an non-working state(unregistered).
Solution
++ We have connected the PC instead of an IP phone, the PC is getting the IP address. From PC the ping to CUCM Pub, Sub is failing, but the ping to 3rd party server on the same UCS(same vlan) is successful
++ We have run continuous ICMP (ping) from the PC to CUCM servers and collected the simultaneous capture from PC, port on the switch (7/12) connecting to the UCS, both the CUCM servers
++ From the PC pcap, we see that there is no ICMP packets send, and continuous ARP broadcast sent to find out the MAC address of the CUCM servers
++ From the pcap taken on 7/12 port on switch (connecting to the UCS), we see the ARP broadcast sent to the UCS
++ From the CUCM Sub/Pub pcap we can see the ARP broadcast reaching the CUCM, but the call manager is not responding to that
On checking the "utils network arp list", we noticed that the ARP table is full (around 1019 entries), if we try to ping any new address from the CUCM which is not part of its ARP cache, we get the following error :
admin:utils network ping 10.200.4.250
Error running command: connect: No buffer space available
++ Checked the SRND of the CUCM and found the following note :
"Note The recommendation to limit the number of devices in a single Unified Communications VLAN to approximately 512 is not solely due to the need to control the amount of VLAN broadcast traffic. For Linux-based Unified CM server platforms, the ARP cache has a hard limit of 1024 devices. Installing Unified CM in a VLAN with an IP subnet containing more than 1024 devices can cause the Unified CM server ARP cache to fill up quickly, which can seriously affect communications between the Unified CM server and other Unified Communications endpoints. Even though the ARP cache size on Windows-based Unified CM server platforms expands dynamically, Cisco strongly recommends a limit of 512 devices in any VLAN regardless of the operating system used by the Unified CM server platform"
SRND Link : http://www.cisco.com/c/en/us/td/docs/voice_ip_comm/cucm/srnd/8x/uc8x/netstruc.html#wp1043629
++ Currently CU has around 1200 phone, hence the ARP is complete filled, the new phones are not registering, old phones when reset are not registering.
++ Defect: CSCsl22939
.
++ This registration issue is caused because of the ARP table size limitation.
++ The CU needs to change the network design to avoid this scenario.