cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Announcements

Community Helping Community

Network Management @ CiscoLive! London

3915
Views
0
Helpful
4
Comments
Hall of Fame Cisco Employee

CiscoLive2011.png

CiscoLive! Europe just came to a close at the ExCel London.  The show network was an entire Cisco-on-Cisco affair.  Cisco IT built a pure Cisco data, voice, and video network including full-on IPv6, IP video surveillance complete with analytics, and green technology based on Energywise.  Managing all of this was an integrated Cisco NMS solution.  In this blog post, I will tell my tale of the Cisco-on-Cisco network management story at CiscoLive! London.

Setup

I arrived at the ExCel London on Sunday, January 30 around noon.  I met up with three of my NMS colleagues, Tejas Shah (LAN Management Solution Technical Marketing Engineer), Stuart Parham (Consulting Systems Engineer), and Cengiz Savas (Systems Engineer).  While the network (including the network management) had been pre-staged, we needed to bring the NMS servers online.  We proceded into the bowels of the convention center to the IDF that held our make-shift data center.  There was a rack with multiple UCS servers.  These servers housed important services such as the Cisco Network Registrar DNS and DHCP server, Cisco Secure ACS server, and the Energywise Orchestrator.

One of the UCS C200 servers housed our main network management applications.  This server was running VMWare ESX 4.1 with virtual machines for CiscoWorks LAN Management Solution 4.0 and Cisco Unified Communication Management Suite 8.5 (pre-release).  The server had two eight-core CPUs with 16 GB of RAM.  Each VM ran Windows 2003 Server and was assigned one vCPU and 4 GB of RAM.  We had some spare VMs just in case.  Since we had two NMS servers running on the same physical server we wanted to configure ESX with one management interface for kernel activity, and a port channel with two gigabit ethernet ports for VM network activity.  We also needed to move the VMs into the out-of-band management VLAN.  This was a secure VLAN only allowing local traffic.

Just behind the server rack, on the far wall of the data center, was the full network topology diagram.

topology.png

The network design was a multi-tier core, distribution, and access in a non-blocking configuration.  Each device had an IP address in the secure management VLAN.  All told, there would be 140 switches , two Cisco Unified Communication Managers, and about 60 phones to be managed throughout the convention center.  No problem for LMS and CUCMS.

Once the UCS server was properly configured on the management VLAN, we proceeded to the pre-show Network Operations Center to complete the setup.  When we arrived at the NOC, we saw that the team had already setup their own NMS solution with PRTG and Kiwi Syslog .  We had our work cut out for us to convince the network staff that we had something better.

We got to work configuring LMS and CUCMS.  The first task was to discover the network.  We obtained the SNMP read-only community string for the switches, then configured LMS Discovery using the CDP module.  The first pass found quite a few devices, but took a very long time to complete.  We found the reason for this was that some devices still had "public" configured as the community string.  We adjusted the Discovery credentials so that we could pick up these errant devices, and then we re-ran Discovery.  This time, it completed more quickly.

Now that we had the network discovered, we needed to start fixing some of the problems we had noticed.  Besides the community string mismatches, the LMS server was not configured as a syslog host.  Unfortunately, we did not have any read-write credentials.  We still needed to convince the network staff that we could provide value.  Fortunately, there is a lot you can do with read-only SNMP credentials.  I went to the LMS Fault Monitor, and noticed we had found some events on the network already.  Some of them looked rather serious.

photo(4).JPG

We were seeing that some of the Xenpaks on the core VSS were experiencing a high temperature alert.  At the same time, one of the network engineers sitting next to us was trying to correlate devices to serial numbers (something LMS could do with ease).  Tejas exported a custom report correlating device names to serial numbers and gave it to the engineer.  He then took the temperature alert information to the network team, and they were very interested in LMS.  Tejas then started to explain what else we could do if we had full read-write access.  He talked about configuration archival, compliance management, and the ability to quickly deploy configuration changes to all devices.  They agreed to provide the credentials and look into the problems that LMS had already started to report.

No sooner that we had read-write access to the devices, than the network team wanted our help.  Besides the problematic SNMP credentials, some of the devices did not have the proper TACACS+ secret key configured.  To find those devices, we configured a baseline compliance template to make sure all devices had the following global configuration:

+ tacacs-server key [KEY]

We ran a compliance check and found four devices to be non-compliant.  A few clicks later, and we had the compliant configuration pushed down to those devices.

Meanwhile, June Zheng (CUCMS product manager) worked on setting up CUCMS.  She discovered the two CUCMs, the MGCP gateway, and phones that were currently online.  CUCMS performed perfectly.  It found all of the devices without any issues, and started monitoring each node for problems.

Back to LMS, we wanted to make sure LMS could see all of the network events, so we deployed a job to add the LMS server's IP as a syslog server to each device.  For this, we walked one of the network team engineers through configuring Netconfig's syslog template to add the new logging host.  It was a good opportunity to show off the ease and power of LMS.

It was around this time the network team started to notice some problems with multicast.  Given all of the video at CiscoLive!, they were generating a lot of multicast and broadcast traffic.  This was triggering storm events on the switches.  They needed to deploy modified multicast and broadcast storm control policies to all access ports and all distribution uplinks.  This sounded like another perfect job for baseline compliance.  We confirmed that all access ports had the command switchport mode access configured.  Because of this, we defined the following baseline prerequisite for our multicast template:

Name: CheckAccessPorts

IsPrereq: Yes

Sub-mode: interface [#((Fast)|(Gigabit))Ethernet.*#]

Body:

+ switchport mode access

For ports that match that prerequisite, we would apply the following template:

Name: ChangeStormControl

IsPrereq: No

Parent: CheckAccessPorts

Prereq: CheckAccessPorts

Body:

+ storm-control multicast level pps 2000

+ storm-control broadcast level pps 2000

For the distribution uplinks, we had to remove the existing statically configured policy.  We created the following template to do that:

Name: RemoveStormPolicy

IsPrereq: No

Sub-mode: interface [#((Fast)|(Gigabit)|(Tengigabit))Ethernet.*#]

Body:

- storm-control multicast level 1.00

We ran the compliance report, then applied the required commands to the non-compliant devices.  At this point, it was very close to midnight, and I worried that I would not be able to catch the train back to the hotel.  I packed up to leave, but Tejas, who was staying at a hotel within walking distance, remained to continue to refine the network configuration.

Remember those devices with the community string of "public"?  Tejas used LMS to deploy compliant configurations to those devices so that they received the correct community strings.  Even though the management network was restricted, we wanted to be extra careful when it came to management access, so Tejas also deployed an ACL for the community strings.  When it was all said and done, the configuration (for the access switches at least) looked like the following.

version 12.2
no service pad
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname SWITCH
!
boot-start-marker
boot-end-marker
!
enable secret 5 <removed>
!
!
!
aaa new-model
!
!
aaa authentication login default group tacacs+ local
!
!
!
aaa session-id common
system mtu routing 1500
vtp domain networkers
vtp mode transparent
ip domain-name events-cisco.com
ip name-server 172.16.14.5
!
!
ip dhcp snooping vlan 2-4,6-13,15,19,21-25,30-38,42-43
no ip dhcp snooping information option
ip dhcp snooping
!
!
energywise domain clive01 security shared-secret 0 <removed>
energywise role switch
energywise management security shared-secret 0 <removed>
energywise allow query save
!
energywise endpoint security shared-secret 0 <removed>
!
crypto pki trustpoint TP-self-signed-3891214336
enrollment selfsigned
subject-name cn=IOS-Self-Signed-Certificate-3891214336
revocation-check none
rsakeypair TP-self-signed-3891214336
!
!
crypto pki certificate chain TP-self-signed-3891214336
certificate self-signed 01
!
!
!
errdisable recovery cause bpduguard
errdisable recovery interval 30
!
spanning-tree mode rapid-pvst
spanning-tree extend system-id
!
vlan internal allocation policy ascending
!
vlan 2
name level_3_speaker
!
vlan 3
name wired_internet
!
vlan 4
name wired_internet_2
!
vlan 6
name wireless_1st_floor
!
vlan 7
name wireless_2nd_floor
!
vlan 8
name wireless_3rd_floor
!
vlan 9
name wireless_spare
!
vlan 10
name partner_1_wos
!
vlan 11
name partner_2_wos
!
vlan 12
name partner_3_wos
!
vlan 13
name partner_4_wos
!
vlan 15
name voice
!
vlan 19
name ccc
!
vlan 21
name demos1
!
vlan 22
name demos2
!
vlan 23
name video
!
vlan 24
name demos4
!
vlan 25
name ciscolabs1
!
vlan 30
name printing
!
vlan 31
name ciscostaff
!
vlan 32
name signage
!
vlan 33
name breakouts
!
vlan 34
name WG-REG
!
vlan 35
name WG-SAC
!
vlan 36
name WG-KIOSK
!
vlan 37
name capwapp
!
vlan 38
name public
!
vlan 42-43
!
vlan 250
name OOB-management
!
!
!
!
interface range FastEthernet0/1 - 47
description *** Access Port ***
switchport access vlan 10
switchport mode access
switchport nonegotiate
switchport port-security maximum 5
switchport port-security
storm-control broadcast level pps 1k
storm-control multicast level pps 2k
no cdp enable
no cdp tlv server-location
no cdp tlv app
spanning-tree portfast
spanning-tree bpduguard enable
!
interface FastEthernet0/48
description *** To Distribution Switch ***
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
ip dhcp snooping trust
!
interface GigabitEthernet0/1
!
interface GigabitEthernet0/2
!
interface GigabitEthernet0/3
!
interface GigabitEthernet0/4
!
interface Vlan1
no ip address
shutdown
!
interface Vlan250
description *** OOB Management Interface ***
ip address 172.100.100.X 255.255.255.0
!
ip default-gateway 172.100.100.1
ip classless
ip http server
ip http secure-server
!
ip tacacs source-interface Vlan250
!
ip access-list standard RESTRICT_SNMP
permit 172.100.100.0 0.0.0.255
deny   any
!
ip sla enable reaction-alerts
logging 172.100.100.16
logging 172.100.100.23
!
snmp-server community <removed> RW RESTRICT_SNMP
snmp-server community <removed> RO RESTRICT_SNMP
tacacs-server host 172.100.100.17
tacacs-server directed-request
tacacs-server key <removed>
!
banner motd ^C
##############################################################
##                                                          ##
##               Cisco CPOC Networkers Team                 ##
##                                                          ##
##           UNAUTHORIZED ACCESS IS PROHIBITED              ##
##                                                          ##
##      All sessions to this device are being monitored.    ##
##      If unauthorized access is detected, your address    ##
##      will be logged and the authorities will be          ##
##      notified to take appropriate actions.               ##
##                                                          ##
##############################################################
^C
!
line con 0
password <removed>
logging synchronous
line vty 0 4
password <removed>
logging synchronous
line vty 5 15
password <removed>
logging synchronous
!
ntp clock-period 36029214
ntp server 172.100.100.1
end

Showtime

The hub for network operations was an octagon-shaped "fishbowl" NOC in the middle of the World of Solutions.  On one side was the entrance to the NOC.  One side was setup with a glass window and cute signs like "Don't feed the engineers."  On the remaining six sides were monitors highlighting the network management applications being used to manage the production show network.  Let's take a brief stroll around the NOC.

Here I am next to the entry way.

Screen shot 2011-02-13 at 6.20.41 PM.png

One of the monitors was showing a slide show of the Cisco-on-Cisco story highlighting the technologies being used in the NOC at in the network.

Screen shot 2011-02-13 at 6.19.43 PM.png

The next monitor was showing CUCMS managing our CUCM cluster.  This cluster consisted of one publisher, one subscriber, and one MGCP gateway.  The cluster serviced all of the public phones in the venue.  We were offering customers free five minutes of calling to anywhere in the world.

Screen shot 2011-02-13 at 6.28.59 PM.png

The IPv6 team was using Munin to measure IPv6 end hosts and traffic.  CiscoLive! London set records in terms of the number of IPv6 end hosts.    Because operating systems like Mac OS X and Windows 7  have IPv6 enable by default, users were getting IP addresses, and accessing the Internet never knowing the difference.

Screen shot 2011-02-13 at 6.29.22 PM.png

Not to be outdone, we setup a poller in LMS to monitor traffic on the IPv6 router.

image002.jpg

Our next stop is Energywise.  In addition to using LMS to manage Energywise, we had Energywise Orchestrator running monitoring the power consumption of the network.  While we were not powering off devices or ports, we were able to measure the overall power consumption of the show.

Screen shot 2011-02-13 at 6.29.46 PM.png

That brings us to LMS.  More on the uses of LMS later.

Screen shot 2011-02-13 at 6.23.05 PM.png

Finally, you cannot do a trade show like this without wireless.  The wireless team was using a controller-based wireless network and managing it with the Wireless Controller Service (WCS).  Using WCS, they were able to spot areas of weak coverage and isolate interference issues.

Screen shot 2011-02-13 at 6.30.29 PM.png

Throughout the show, we regularly relied on LMS to report faults, and measure capacity.  At one point, the IP video surveillance team reported that they were getting choppy or hanging video from one set of access switches hanging off of a particular distribution switch.  Using Topology Services in LMS, we traced the layer 2 path from the switch to which the camera connected to one of the hosts reporting the problem.  We then used Netshow to look at errors on each of the ports.  This showed that one of the uplink ports was reporting thousands of output errors.  Replacing the Xenpak on this port corrected the problem.  After that, we setup a performance poller to watch the distribution uplinks for errors.

For capacity, we monitored the number of end hosts in User Tracking.  Since we were not using dynamic User Tracking, we had to rely on acquisitions.  The peak number of end hosts recorded for the show was around 5800.

Conclusion

Cisco network management stood up well at CiscoLive!.  We were able to use it to find faults, maintain consistent configurations, and track capacity.  By adopting a Cisco-on-Cisco methodology for shows like CiscoLive! we show off the power of our products as well as see areas where we can continue to improve.  I am looking forward to taking what I've learned to the my next CiscoLive! in Las Vegas.  I hope to see you there.

4 Comments
Enthusiast

Respected Joe Sir,

Thank you Joe Sir for sharing this event with us. It was indeed wonderful experience to see the power of LMS in the live environment and how we can make best of it for Network Operations.

Its always a great pleasure to learn more and more from you and everytime it looks like there is still so many things to learn from you.

Regards

Gaganjeet

Beginner

Joe,

Thanks for sharing and showing how your LMS setup was deployed. Wish I was there.

Hall of Fame Guru

Thanks for the behind the scenes report, Joe. Very nice.

I hope you got in a show at the West End or some sights whilst in London.

Hall of Fame Cisco Employee

Thanks.  I did get to see some things after the show ended.  London is a great town.

CreatePlease to create content
Content for Community-Ad
FusionCharts will render here