Thursday, February 27, 2020

AI in networking: Looking beyond the hype

"The question of whether machines can think is about as relevant as the question of whether submarines can swim." – Edsger Dijkstra

Artificial intelligence was the term coined by the famous computer scientist John McCarthy, primarily as a replacement for the term “Cybernetics”, which was romanticized by culture of the day as the utopian end all to all problem solving. McCarthy tried to distinguish the field by having a solid foundation in mathematics and not the “what if” scenarios extrapolated from minor progress in certain aspects of the science. Unfortunately, the term “Artificial Intelligence” has suffered the same fate decades later.

Defining a true AI is hard, because there is no true definition of Intelligence itself. Is a pig more intelligent than a dog? Is a dog more intelligent than a 1-year old human? How do we quantify these differences? The pursuit of an artificial general intelligence is typically limited to academia and research labs of bigger multinational corporations.

The majority of the industry is using the term AI for a more specialized, domain specific, algorithmic enhancement to knowledge compression and expression. We’d refer specifically this concept as AI in this article.

The tech industry is rife with confusing and interchangeable use of the terms AI, ML, Advance analytics. The goal of an ethical company should not be to create a technology that caters to the particular hyped up term but create the simplest products that solves a real customer problem. (Occam’s Razor!)

With that disclaimer, we need to find some logical separation between the technologies out there and assign some term to it. Let’s look at some points of separation and overlap between these technologies:

  1. Feedback loop: Most of the experts agree that ANY intelligent system should have a feedback loop to learn from mistakes and learn new things. So is an Artificially intelligent system, just a ML system with a feedback loop? How long is the feedback loop?  This depends on the problem at hand.
    Let’s take the problem of predicting network bandwidth for capacity planning. We could start with a clean slate and have the AI agent learn the network behavior over the course of an year, or if we already have the data, train it in one shot and deploy the model directly. To avoid model drifts (where the network behavior changes over time), we can retrain every few months to avoid loss of accuracy. Depending on the amount of data we could throw at problem and how the problem is described, an end user might not see a difference between the predictions done via different methodologies.
  2. Feature Extraction: Certain problems don’t lend themselves well to feature extraction and have fuzzy rules. These include problems natural language processing, image labelling etc. Deep neural networks can just be fed raw bytes and trained to internally create cluster of neurons (that are ‘imagined’ as features). The higher layer neurons then combine these features to predict the necessary outcome.
    This approach is not necessary for problems that have fixed feature sets. Coming back to the network capacity planning problem, the features of a network are well quantified (jitter, latency, loss, bandwidth etc) and do not really need to be deduced by a neural net.


Again, from an end user perspective, the gain in accuracy obtained by using simple linear regression vs a state of art GAN (general adversarial networks) might be non-existent or too little to justify investment of resources. Does the end user care if he’s being helped by AI or ML or advanced data analytics? What if we could quantify the gains for the end user instead.
We could borrow the idea of product maturity from the Autonomous Car research and classify an analytics product in levels:

  • Level Zero - No analytics: Think of these networking devices as just providing basic snmp/rest api support and you have to build all the tooling and logic around it yourself.
  • Level One - Basic Analytics support: With these networking devices, you get analytics support. You can drill down on traffic patterns, analyze typical issues and take corrective actions. Most of the network analytics dashboard in existence fall into this category.
  • Level Two – Partial Automation: These networking devices allow setting up of thresholds for admin notification, automation of service requests via precanned scripts and service platforms. E.g. a rule saying “Increase the bandwidth by 10% if the bandwidth threshold alarm is active for 10 minutes”. This level also allows for simpler forms of machine learning and could allow for rules like “if daily moving average of traffic rate exceeds the monthly moving average, then notify the admin”.
  • Level Three – Conditional Automation: This level takes a massive leap over level two in the sense that the system itself determines the parameters and thresholds for the networking device or the whole network. In this level, a networking device will decide on its own threshold and decide the extent of its own corrective actions. The template of action would still need human action. Using the same example as above, the system could “Increase the bandwidth by X% if a bandwidth threshold of Y mbps is breached for Z minutes”. The values of X, Y & Z will be learnt automatically by the system. 
  • Level Four – High Automation: This is the next automatic evolutionary step for stabilized level three systems. This allows for complete automation in controlled use cases and would be built over the abstraction of level three. E.g. if the latency between offices in Mumbai and San Jose increased, it would invoke Level 3 automatons to figure out the cause of slowness in traffic. It would even try to remedy some of them with human defined actions/notifications.
  • Level Five – Self Healing Networks: This would be the holy grail of networking products. The automation and machine learning would be sufficiently be advanced enough to not require any human intervention for most of network maintenance tasks. The whole network would behave as one distributed self-healing entity that not only heals itself but also optimizes itself for the best performance.

Each level requires a lot of investment in infrastructure and research. Each level is a layer of onion that you can only get to after peeling the outer layer. Level One systems are industry standards. Lot of new analytics products are Level two (including Versa Analytics). And we are incredibly close to building a Level three analytics products.

Huge interest in machine learning, increase in the amount of generated data, decrease in the cost of compute & storage and the rise of cloud computing are fueling incredible leaps in domain specific usage of machine learning. We are on our way to produce something truly transformative.