Software-defined networks give you flexibility, but to make them really effective at scale we need to take humans out of the loop and use automation to respond more quickly – like taking an optical link down for maintenance and moving the traffic over to another line automatically as the latency rises. Plus, we need to do that before the speed drops enough to cause problems for the workloads relying on that connection.
That kind of automation will create something more like a "self-driving" network, Juniper platform systems CTO Kireeti Kompella told Data Center Knowledge; but just as with self-driving cars, the prospect is exciting but also raises some long term concerns.
This is about creating adaptive, self-customizing services built on the flexibility of SDNs and Network Function Virtualization which means that instead of being a monolithic device, network hardware exposes APIs and functions. But even though we have what Kompella calls “power sharing between equipment makers and the people who deploy networks, who want more of a say in how systems are being built,” the problem is that it can also end up like parents fighting, forgetting about the children caught in the middle.
“The kids are the users, the applications, the IoT devices sitting on the edge of the network trying to use the network, and we ignore them. We give them a portal and APIs. We say ‘fill out this profile so I'll know how to deal with you,' but that’s good enough. We need to think about how to build systems in a different way, and makers and deployers of equipment need to work together to give users a better quality of life.”
SDNs have transformed the way we build networks – and networking hardware – but people and devices keep changing how they use the network, and the network has to keep up. “We can’t keep running after users asking them repeatedly what they want. That doesn’t work for end users or for software developers, he points out, and using profiles doesn’t give you enough differentiation.
Closing the Loop
He lists five key foundations for self-driving networks. “Telemetry is big data for networks. You also need correlations: It’s not enough to have siloed data; you have to be able to correlate them.”
The network has to be able both to make decisions and take action on them, mostly without human intervention. “If you get 50,000 false positives and you want a human to process them, that’s not going to happen. With a self-driving system, it can process 49,999 and alert you about one that you need to take care of. Bringing it all together are ‘intent-based’, declarative networks. Can I not configure the BGP policy, please; can I just tell you what I'd like my peering to do?"
That creates an adaptive system that can customize your experience by looking at how you’ve used the network in the past, that manages and allocates new resources and that can predict behavior and prepare the network for it. Kompella believes this will need not just machine learning but also a second ‘closed-loop’ system that takes insights from the machine learning system and uses them to take action. “The closed loop is used to measure things, to see whether the objective is being met, make a change, then go back and keep doing that,” he explains.
In practice, he expects this closed loop will be composed of many smaller systems and will likely use both machine learning and explicit rules and boundary conditions. “If I were to give you an SLA monitoring closed loop, and an LSP (label switched paths) closed loop, and a peering closed loop, and a VNC closed loop; and they start interacting, now I'm looking at a big improvement where my broadband network gateway customers are happy because my underlying infrastructure is making it so that the SLAs are being kept.”
Self-driving networks will be complex to develop. Kompella notes that machine learning systems aren’t good at explaining themselves. “Giving up control is hard; once in a while I will want to ask ‘why did you come up with this and how do you justify that?’”
Gathering the data for the machine learning system is another issue because the way you run your network today can mean the data doesn’t reflect how you want to run a self-driving network. “There are broadly two classes of service providers: the ones who over-provision the network and those who sweat every bit out of the assets. When I'm training the system, is CapEx or OpEx more valuable or more painful? The ones who throw bandwidth at the problem are saying OpEx is horrible, and I’d rather have a network overbuilt by 100 percent if it cuts my OpEx by 50 percent.”
That makes sense when for every dollar you spend on network hardware. You’re spending three to four dollars operating it (a figure both Juniper and Cisco cite for operational costs); but if you’re automating the network then OpEx drops significantly anyway. “The mentality of throwing bandwidth at the problem is wasting CapEx, and you can’t say ‘it’s OK because I'm saving on OpEx’ because your OpEx is small to non-existent. I have to make sure that I don’t feed my 20 years of network architecture experience that put me on the path of throwing bandwidth at the problem into the machine learning system.”
There are also privacy questions if you’re mining user data to give better service, let alone simplifying the setup: “Today if I ship a machine learning system I have to ship a data scientist with it, so they can tune all the parameters,” Kompella notes.
Putting declarative network ideas into practice can be tricky. “I want my network to run well; how do you operationalize that? We’re still very early in this game. ‘Here's a JUNOS construct for configuring BGP is the wrong answer and make my customers happy is also the wrong answer: so, what's in between?” One strong possibility is operational research to formalize the intent part of intent-based networking.
OpEx isn’t the only thing self-driving networks may reduce, he points out. “Does the impact of this mean my NIC is now a tenth of what it used to be, or are the people being repurposed to do other things?"
Despite these difficulties, Kompella believes 5G and IoT will make self-driving networks inevitable because we’ll need them for managing performance and security. “You have some IoT devices that only send one byte every few hours like a thermostat; but then you’ve got a home security system uploading gigabytes of images. Plus, you’ve got things that are bursty and things that care about latency. Again, rather than asking the app developer who’s dealing with the other end of IoT device ‘tell me what you want in the network’, why don't you learn it?”
The same goes for detecting unusual and malicious behavior from devices, like the cameras that made up the Mirai botnet that performed denial OS service attacks on DNS providers. Behavior analysis would have shown a device that previously connected to DNS twice a day and occasionally sent large packets to a particular server that was suddenly sending lots of small packets to a DNS server 10 times a second. Trying to write rules to detect behavior changes doesn’t, but machine learning is ideal for that. “Look at the behavior for 15 days or a month and alert me when it deviates from that behavior.”
“IoT will bring a lot of challenges – the massive numbers of devices and the bandwidth - but the one that scares me the most is security at scale from dumb devices. These devices are dumb enough not to be secure but also not so dumb that if you hack them. They can do a lot of damage,” Kompella warns.