Improve your network with AI, ML and MR

pkathail@cisco.com · ‎02-04-2020

AI-Graphic-Intent based networking.png

My ‘check engine’ light was on and my car was idling a little rough, so I took it to the auto parts store. The service rep walked out to my car, plugged in a handheld engine scanner, and after a minute, told me that my coil on my number 3 cylinder was faulty and needed to be replaced. I bought a new one, unhooked the wire, removed one bolt, dropped in the new coil, and everything was back to normal.

It was a remarkable experience and so different than troubleshooting used to be. How did that scanner tool know what was wrong? It used volumes of data from my car, an auto repair knowledge base, as well as parts sold data to accurately tell me what was wrong.

Could something like this happen in your enterprise network? What if your network became a sensor itself and used the collected information, combined with your network expertise and an external knowledge base with experiences from Cisco and industry peers to help you troubleshoot? It would save you not only time and money, but your network users could have a far better experience when they encounter an issue.

IBN Advances Beyond SDN

There’s a lot of interest to automate the network using software defined networking (SDN). In fact, in our 2019 Networking Trends Survey, 41% of respondents deployed controller-based policy automation in at least one network domain.

But as many try to implement SDN, they quickly realize that it falls short of expectations because it only focuses on the activation piece. A ‘set it and forget it’ approach of automating policies without monitoring them continuously just won’t suffice in large enterprises where the business environment (locations, users, applications, initiatives, etc.) dynamically changes.

Intent-based networking (IBN), on the other hand, looks at the network more holistically. It starts with business intent turned into policies, activates those policies, and then continuously monitors the network to ensure those policies are behaving as intended and delivering the services and SLA as required. When there is an issue, the assurance engine sends an alert to the network operator along with recommended remediation steps.

The Network as a Sensor

How does an intent-based network pull this off? It uses the power of data generated by the network to offer insights and analytics back to the network controller and the network operators for remediation and optimization. Cisco’s DNA Center turns the network into its own sensor and continuously collects telemetry data from the devices (routers, switches, wireless devices, endpoints, etc.) as well as applications running on the network.

The devices continuously send this data to the Cisco DNA Center data lake, where the data is stored, and it is checked across the models we have built to see if it the data fits those models.

Machine Learning Runs Anonymized Data in the Cloud

Cisco doesn’t take any data into the cloud without your knowledge. If you opt-in, Cisco DNA Center sends anonymized data to the cloud to update the model as your network properties change: for example, changes made to network topology or configuration. Machine learning runs continuously in the cloud, updating the model from anonymized data. Model changes are sent back to DNAC, and that is where the model is applied.

Our data lake also helps you compare your sites against each other or against other anonymized data from similar size installations and industries.

Machine Reasoning Engine (MRE) provides custom troubleshooting capabilities

Machine Learning detects the anomalies and does the initial root cause analysis to narrow down the problem and associated remediation steps. In cases where machine learning identifies multiple potential causes, it can instantiate MRE for more targeted root cause analysis and remediation steps identification.

MRE uses a predefined knowledge base. With 35 years of networking experience, we know how to diagnose difficult network problems and have built a knowledge base to find where the problem is and to recommend the necessary remediation steps.

Examples of ML, MR, and SR

Problem: A user experiences 20% drop in throughput for a cloud application when they connect through a particular Access Point (AP). What is going on?

Machine Learning (ML), a subset of AI, enables the machine to use algorithms to learn from data. As a lot of data gets generated, the machines try to make sense of the data. In this case, the system continuously monitors the application throughput across all APs in a given site, and using ML, it establishes over time a baseline of what constitutes normal throughput for a given application. In addition, the system continuously monitors other vital telemetry data from the AP, including packet retransmission counters. Again, the ML algorithms here help establish what constitutes normal behavior (i.e. a baseline) for a given AP. When the user experiences the 20% drop in throughput, the ML process identifies an anomaly in the throughput measurement of the associated AP. The ML process also correlates this anomaly with an unusual increase in the packet retransmission counts of that AP. With this data, the ML system can identify that the user is experiencing a problem (20% drop in throughput), and it can attribute this problem to the increased packet retransmissions. Thus far, ML has helped in identifying the problem and pinpointing a cause.

But what is the ultimate root cause of the increased retransmissions? This is where Machine Reasoning Engine (MRE) kicks in. Using a pre-defined knowledge-base, an MR engine can start executing a workflow that ultimately zooms down on the root cause of the problem. The MR engine will start collecting additional contextual information from the network in order to progress down a decision tree to identify the ultimate cause: it will check the AP configuration as well as operational data at that AP or across the network, as defined in the knowledge base, to identify cause for drop packets. Specifically, in this example, it will check the Control and Provisioning of Wireless Access Points (CAPWAP) retransmission rates, CAPWAP MTU and TCP MSS (Maximum Segment Size) at the AP. The engine finds that there is a mismatch between the CAPWAP MTU length and the TCP MSS length, so it directly concludes that the remediation action is to change the APs configuration to make the TCP MSS setting match the CAPWAP MTU length.

See What’s Next in Networking

With the growth of devices and applications in distributed environments, networks are only going to grow larger. Why not put the right kind of infrastructure in place that will help eliminate many of the human errors, reduce your expenses to operate the network, and give your users a better experience? Put your network to work on maintaining itself and shift your focus on creating value-added services to drive the business forward.

Want to learn more about IBN and how networks will change in the near future? Check out the 2020 Global Networking Trends Report.

Improve your network with AI, ML and MR

Loading an IOS on a switch via Xmodem

Glimpse of "EIGRP name mode configuration"

Understanding Wireless Client Authentication