IoT Analytics Benchmark adds neural network–based deep learning with Keras and BigDL

The IoT Analytics Benchmark released last year dealt with an important Internet of Things use case—monitoring factory sensor data for impending failure conditions. This year, we are tackling an equally important use case—image classification. Whether used in facial recognition, license plate readers, inspection systems, or autonomous vehicles, neural network–based deep learning is making image detection and classification a viable technology.

As in the classic machine learning used in the original IoT Analytics Benchmark code (which used the Spark Machine Learning Library), the new deep learning code first trains a model using pre-labeled images and then deploys that model to infer the classification of new images. For IoT this inference step is the most important. Thus, the new programs, designated as IoT Analytics Benchmark DL, use previously trained models (included in the kit) to demonstrate inferencing that can be performed at the edge (on small gateway systems) or in scaled-out Spark clusters.

The programs run Keras and Intel’s BigDL image classifiers with the CIFAR10 image set. For each type of classifier, there is a program that sends the images as a series of encoded strings and a second program that reads those strings, converts them back to images, and infers which of the 10 CIFAR10 classes that image belongs to. The Keras classifier is a Python-based single node program for running on an IoT edge gateway. The BigDL classifier is a Spark-based distributed program. The programs use Intel’s BigDL library and the CIFAR10 dataset. (Also see Learning Multiple Layers of Features from Tiny Images, by Alex Krizhevsky.)

The CIFAR10 image set consists of 50,000 pre-labeled training images and 10,000 pre-labeled test images. Each image is a 32 x 32 color image from one of ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship or truck. For example, here’s a ship, frog, and truck:

Here’s what the Python-based Keras program looks like running a complex ResNet model on a small, virtualized edge gateway system:

First, the inference program is started on the VM on the edge gateway using a pre-trained ResNet model included in the kit:

[root@iotdemo ~]# nc -lk 10000 | python3 infer_cifar.py --modelPath cifar10_ResNet20v1_model_91470.h5
Using TensorFlow backend.
Loaded trained model cifar10_ResNet20v1_model_91470.h5
Start send program
2019-01-31T04:09:37Z: 100 images classified
...
2019-01-31T04:11:06Z: 1000 images classified
Inferenced 1000 images in 99.3 seconds or 10.1 images/second, with 916 or 91.6% correctly classified

Then, when the inference program prints out “Start send program”, the send program is started from a driver system, in this case the author’s Mac:

[djaffe@djaffe-a01 ~/code/neuralnetworks/BigDL]$ python3 send_images_cifar.py -s -i 100 -t 1000 | \
  nc 192.168.2.3 10000
Using TensorFlow backend.
2019-01-31T04:09:12Z: Loading and normalizing the CIFAR10 data
2019-01-31T04:09:22Z: Sending 100 images per second for a total of 1000 images with pixel mean
subtracted
2019-01-31T04:09:31Z: 100 images sent
...
2019-01-31T04:11:00Z: 1000 images sent
2019-01-31T04:11:00Z: Image stream ended

We are planning to use the new workloads in several VMware projects. As always, please send us your feedback and contributions!