With the once-regular improvements driven by Moore's Law diminishing and processor power demands climbing, Microsoft started using FPGAs in Bing and Azure infrastructure to accelerate its own workloads, like search indexing and software-defined networking, some years ago. However, the long-term plan was always to make hardware acceleration available to customers in some way.
“When we did the Bing launch and got phenomenal results worldwide, we said, ‘OK, we really want to take this to Azure to benefit our customers,’” Doug Burger, technical fellow in the Azure hardware division, tells Data Center Knowledge.
Several image classification and recognition models using deep neural networks (ResNet 50, ResNet 152, VGG-16, SSD-VGG, and DenseNet-121) that have been built on the Azure Machine Learning service can now run with FPGA (field-programmable gate array) hardware acceleration in Azure on production services. In addition, the options to package those models in containers and run them on FPGA appliances like Azure Data Box Edge (Intel Arria 10 FPGAs) or on FPGA-enabled hardware from Hewlett Packard Enterprise, Dell EMC, and the like are now in preview.
Training those models in cloud services like Azure Machine Learning gives enterprises access to hardware at scale, and some workloads need enormous scale with higher bandwidth and lower latency than CPUs can provide, but without the higher costs of using GPU instances. Intel is hoping to capture mainstream inferencing workloads with its new 2nd Generation Xeon Scalable processors, but the chips can’t match the efficiency of running a large model on an FPGA tuned specifically for it.
“There's a trend towards models just getting bigger and bigger and more expensive, and we don't see any signs of that slowing down,” Burger says. “The gap between what you can do at the unconstrained limit and what you can do with traditional hardware is continuing to grow. The questions used to be, ‘How expensive is it for me to serve this data model?’ Now the question is going to be, ‘Can I serve this model at all?’”
FPGA at the Edge
But running those trained models in the cloud isn’t always practical. Image-recognition software that spots flaws on a factory production line or rejects sub-standard ingredients that can spoil the next batch of beer in a brewery has to work in near real-time. Waiting for the images to be uploaded to the cloud for processing would slow production down too much – and in many environments where you’d want to use these techniques connectivity will be slow, intermittent, or simply not available.
“We found there is a lot of demand for inference on the edge because that’s where the sensors are, so we said, ‘Let's just replicate that,’” Burger says. “If you have a model in the family of models that's supported – and that’s growing rapidly – you say, ‘I want to run this in a container,’ and it's going to run on a machine with FPGAs.”
Customers like Kroeger, Fermilab, and Olympus have been trying that out in a private preview. The response, he says, has been, “Wow, I have true real-time AI with a level of accuracy that I couldn't get before or that wasn't economic before.”
Part of the appeal is integration with Azure’s IoT services, which manage the process of deploying containerized machine learning models to IoT devices.
“If you’re already using Azure to manage a lot of IoT devices, you can treat the Data Box Edge server or other servers with our boards – or with boards with the right FPGAs – as an endpoint, and you can just pull down a container and run it the same way you do in the cloud,” Burger says. “You can do your development in cloud with all the DevOps tools and then say, ‘Now I'm ready to deploy,’ and just push a button and integrate it into your Azure IoT Edge management plane.”
FPGA support started with ResNet, which Burger calls the most popular model. Microsoft has added more, including support for transfer learning (where you take a trained model and retrain it for another data set to get high accuracy quickly). He hints that the tools for porting models to run on FPGAs are improving so much that this option may be available directly to customers at some point, but the emphasis will continue to be on simplicity and usability.
“Raw performance is technologically inspiring, but there are a lot of metrics that matter just as much if not more than performance,” he says.
That’s significant because FPGAs have been notoriously difficult to work with, Patrick Moorhead, president and principal analyst at Moor Insights & Strategy, tells Data Center Knowledge.
“Microsoft has done more with FPGAs than I have seen any large company do,” Moorhead says. “Not only are those FPGAs on Azure reprogrammed weekly, but they are also used for different applications, from network acceleration to ML [machine learning] inference. There used to be a belief that no-one reprograms FPGAs, but Microsoft has challenged that belief.”
Customers can reap the benefit of that expertise through the new service, he says, adding that “Microsoft has created an FPGA capability that makes it easier for organizations to more easily access FPGAs at scale.”
Small, Precise, Affordable
Being able to train models in the cloud and use them at the edge simplifies development, but it usually comes with a trade-off in accuracy, because getting cloud-trained models to fit on local hardware requires reducing numerical precision. In something of a breakthough, Burger says, Microsoft has found how to avoid that problem.
“We’re able to drop the numerical precision down but not lose the precision of the models,” he says. “We have a very deep understanding of how these neural networks operate, and we have mathematically sound techniques that allow us to do that down-sampling. We have data types that are more efficient than what anybody else in the industry is using, and we’re able to preserve the model accuracy, and that’s why we’re getting this phenomenal performance.”
The versatility of FPGAs helps, Burger says. “We can try a data type and say, ‘For this class of models, we should make the data type a little fatter or skinnier.’”
One of the benefits of Azure’s FPGA design is that a model can run across multiple FPGAs without increasing latency, because they’re connected directly together rather than having to communicate via a CPU. But FPGAs are so efficient that you don’t need to prepare to deploy large numbers of them.
“The throughput of our solution is so high that we get a great cost reduction compared to what you could do with any other technology,” Burger says. “If I have a thousand cameras and I want to process [the image feeds] in real-time, I need some numbers of chips, and it will be cheaper than doing that any other way.”
In fact, if those cameras run at 10 frames per second, generating 10,000 images per second, he calculates you’d only need about 10 FPGAs. Internally, Microsoft uses FPGA appliances with multiple chips, and that workload might need one or two of those boxes.
“That number is going to drop as we execute on our roadmap,” Burger says. “Processing 10,000 images per second – it’s very affordable.”