By Dave Jaffe, performance engineering, VMware
Now available from VMware, the IoT Analytics Benchmark is a simulation of data analytics being performed in real-time on an Internet of Things data stream (i.e. factory machines being monitored for impending failure conditions) using a machine learning model running on Spark Streaming. It can be used as a Big Data performance benchmark or as a workload to stress IoT edge gateways.
The IoT Analytics Benchmark kit includes code to generate training data, then uses that data to train a machine learning model which it then runs in real-time to score incoming sensor data. The code includes versions in both Python and Scala. Kafka is used as a message bus to pass the sensor events into Spark Streaming, while Hadoop Distributed Filesystem (HDFS), Amazon S3 and local storage are also supported.
The benchmark kit includes three components:
- iotgen generates synthetic training data files using a simple randomized model. Each row of sensor values is preceded by a label (either 1 or 0) indicating whether that set of values would trigger the failure condition. For example:
0.00000,0.50457,0.06221,0.07500,0.60180,0.43030,0.48300,0.40584,0.87647 …
1.00000,0.04493,0.27824,0.16447,0.46271,0.92100,0.94577,0.63252,0.95342 …
0.00000,0.51404,0.69344,0.31516,0.48384,0.02342,0.68141,0.50664,0.09665 …
- iottrain uses the pre-labeled training data to train a Spark Logistic Regression classifier machine learning model.
- iotstream applies that model to a stream of incoming sensor values (generated by a separate program) using Spark Streaming, indicating when the impending failure conditions need attention. As shown in the graphic below, iotstream batches the incoming sensor events (which could be 100,000 or more per second) into one-second batches, arranges them into a vector by sensor index and feeds that into the machine learning model for an immediate classification as to whether or not that set of sensor values indicates impending issues in the factory.
Sample output from iotstream is shown below:
We are already using this benchmark within VMware (see here for example) and look forward to sharing it with the larger Big Data and IoT communities. Download the IoT Analytics Benchmark from GitHub here: https://github.com/vmware/iot-analytics-benchmark. Please feel free to send us your feedback and contributions!
For more news about our open source projects, stay tuned to the Open Source Blog and follow us on Twitter (@vmwopensource).