Srinath Perera, Ph.D., is Vice President of Research at WSO2.
For a growing numbers of companies, the compass points toward the Internet of Things (IoT) as a pathway for improving customer service, enhancing operations, and creating new business models. In fact, IDC predicts that by 2020, some 32 billion connected IoT devices will be in use. The challenge is extracting timely, meaningful IoT data to enable these digital transformations. Following are five critical demands enterprises need to consider in developing their IoT analytics strategies.
IoT Analytics Must be Distributed
Most enterprise IoT environments are inherently distributed. Like spider webs, they connect a myriad of sensors, gateways and collection points with data flying between them. Moreover, these webs constantly change as components are added and subtracted, and data flows are modified or repurposed.
Such environments place multiple demands on analytics. First, the software has to handle a variety of networking conditions, from weak 3G networks to ad-hoc peer-to-peer networks. It also needs to support a range of protocols, often either the Message Queuing Telemetry Transport (MQTT) or Common Open Source Publishing Platform (CoApp), and then either ZigBee or Bluetooth low energy (BLE).
The dynamic quality of IoT implementations means analytics solutions should have the flexibility to expand or contract to match the load. Deploying analytics in the cloud is one option. However, many IoT deployments have on-premises aspects, such as machines on the factory floor or kiosks in stores. Therefore, an IoT analytics solution may need to scale across a hybrid environment leveraging both the cloud and on-premises systems. Additionally, the software must have a distributed architecture with the ability to run multiple queries across multiple systems—and scale while doing it.
Some Analytics Should Occur at the Edge
IoT data gets really big, really fast. Consider the Distributed Event-Based Systems (DEBS) Grand Challenge 2014 use case: 40 houses with 2,000 sensors generated about 6 billion events in four months. Imagine the sea of data generated by 4 million such homes. That’s billions of events per second being pushed out for processing.
However, many businesses only need an average over time or insights into trends that exceed established parameters. The answer is to conduct some analytics on IoT devices or gateways at the edge and send aggregated results to the central system. This facilitates the detection of important trends or aberrations, such as temperature changes or failed access attempts, and significantly reduces network traffic to improve performance.
Such edge analysis requires very lightweight software, since IoT nodes and gateways are low-power devices, which limit the available strength for query processing. To address this challenge, several companies are working on edge analytics products and reference architectures. Still, because edge computing is heavily contextual, there is no one-size-fits-all solution.
IoT Analytics are Event-Driven
IoT data are essentially streams of events. Therefore, analysis to support real-time interactions, whether triggering a thermostat or a fraud alert, requires some form of complex event processing (CEP) and streaming analytics. The software should handle time-series data, time windows, moving averages, and temporal event patterns. Two popular open source technologies for real-time event processing are Apache Storm, which should be used in combination with a CEP engine, and Apache Spark. Another option is the cloud-based Google Cloud DataFlow. With each offering, there are tradeoffs, so an IoT implementation’s specific requirements will determine the technology approach.
IoT Data Comes With Uncertainty
The ordering of inbound IoT data is important. For example, a progression of events may indicate that an engine part is heading for failure. At the same time, lots of nodes are pushing data through low-bandwidth IoT networks, and sometimes nodes fail, creating issues about whether sensors keep data and send it later. Other challenges include collection latency, duplicate messages, and reliability.
IoT analysis needs to handle these concerns. For example, time windows and temporal sequence-based queries will require special algorithms to ensure the proper order of inbound data. Google Millwheel addresses some problems in this space by providing fault-tolerant data stream processing and is worth evaluating. However, at this time, many IT organizations will need to develop custom rules and queries to support their IoT analytics implementations.
Predictions Produce More Value
Most IoT implementations calculate descriptive analytics, such as mean, median, and standard deviation. However, the maximum impact will come from applying predictive analytics for applications, such as fraud detection, proactive maintenance, and health warnings, to name a few.
Increasingly, machine-learning algorithms compliment statistical models for handling prediction. These algorithms will automatically learn from examples, providing an attractive alternative to rules-only systems, which require professionals to watch rules and evaluate their performance.
Several frameworks for machine learning have emerged in recent years. These include Apache Spark MlLib, Dato GraphLab Create, and Skytree. Meanwhile other organizations continue to develop new algorithms. While more research is needed, a thorough understanding a company’s IoT scenario can help in determining the best alternative.
One final note: The market for IoT analytics technologies is still nascent. So adopting a flexible and open architecture for today’s analytics challenges will best position an enterprise to capitalize on emerging technologies in this arena tomorrow.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.