This document explains some best practices and common mistakes observed in UCS setups and provides recommendations. This list is not exhaustive but is useful for any UCS deployment.
UCS Best Practices
Observations and Recommendations
Observation1: Chassis uplink connections mismatched, chassis 2, 3 and 5 use discrete links while 1 and 4 use port channels
Explanation: Chassis uplinks were originally configured to use link-grouping, which creates port channels between the fabric interconnects and the chassis. Later this policy was changed to use discrete links. Three chassis have been reacknowledged since then resulting in 3 chassis with discrete links while the others still use port channels.
Recommendation: Standardize on using port channels for each chassis.
Observation2: MAC address pools are not defined separately for the A side and B side fabrics
Explanation: MAC address pools should be created with separate address ranges for the A side fabric and the B side fabric per Cisco best practices. Separate address ranges can simplify troubleshooting measures by making identification of vNICs in the environment easier via a unique bit which specifies the A or B side vNIC
Recommendation: Create separate MAC address pools and reconfigure the existing vNICs to use them
Observation3: Empty default pools exist creating faults
Explanation: Empty default pools are included in UCS Manager in the out-of-the-box configuration. Since these pools cannot be renamed customers often create custom pools for WWNs, MAC addresses, server pools along with iSCSI IP addresses and IQNs. A fault is generated and left until the default pools either have entries or they are removed.
Recommendation: Remove the empty default server, MAC, WWN, iSCSI IP and iqn pools
Observation4: External authentication is not used
Explanation: While TACACS+ servers are defined in the system it appears that external authentication is not in use after examining the audit logs. External authentication is recommended for role based access and to capture the name of the administrator making changes in the audit logs sent via Syslog.
Recommendation: Configure external authentication using an existing LDAP infrastructure, TACACS or RADIUS. Verify logon usernames are logged in the audit logs that are sent via syslog.
Observation5: Blade 3/5 DIMMs 29 and 30 missing or invalid
Explanation: A blade server has missing or invalid memory DIMMs resulting in the physical blade being marked as inoperable. The DIMMs may not be seated properly or another hardware issue may be present. The issue needs to be repaired so the blade and its associated service profile can operate normally.
Recommendation: Next steps to be taken with TAC, DIMM replacement
Observation6: Chassis links are distributed across ASICs on the fabric interconnects
Explanation: Chassis cabling has been distributed across the fabric interconnect ports so that each link of the pair is connected to a port that has a distinct back-end processor chip within the fabric interconnect. This configuration does not follow best practices for connectivity using chassis port channels
Recommendation: Move the links from the separated ports to ports within the same ASIC
Observation7: Callhome, SNMP, Syslog and SEL policy monitoring are not properly configured
Explanation: Monitoring tools built into UCS are not being utilized properly or are not configured. Callhome, which sends proactive email alerts is not turned on or configured. SNMP traps are not being sent to a monitoring station. SEL logs from the blades are not captured and stored centrally. These items are critical components necessary for monitoring the systems for faults and troubleshooting errors that occur
Recommendation: Configure Callhome for proactive email alerts of faults and errors. Configure SNMP traps and compile SNMP MIBs for UCS to properly parse SNMP traps for proactive notification of faults and errors. Direct system audit log output to a remote collector via Syslog. Configure the SEL policy to capture blade SEL logs for troubleshooting and automatic clearing of the logs when full
Observation8: BIOS policies are using all platform-default settings
Explanation: A BIOS policy is defined for the existing service profile templates but it does not change any settings from the platform-default of the blade. Cisco can provide specific recommendations for BIOS settings based on the OS installed. These settings should be set in appropriate BIOS policies.
Recommendation: Configure BIOS policies to enable/disable features according to Cisco best practices