Home > Blogs > VMware VROOM! Blog > Tag Archives: internet of things

Tag Archives: internet of things

New White Paper: High-Performance Virtualized Spark Clusters on Kubernetes for Deep Learning

By Dave Jaffe, VMware Performance Engineering

A new white paper is available showing the advantages of running virtualized Spark Deep Learning workloads on Kubernetes.

Recent versions of Spark include support for Kubernetes. For Spark on Kubernetes, the Kubernetes scheduler provides the cluster manager capability provided by Yet Another Resource Negotiator (YARN) in typical Spark on Hadoop clusters. Upon receiving a spark-submit command to start an application, Kubernetes instantiates the requested number of Spark executor pods, each with one or more Spark executors.

The benefits of running Spark on Kubernetes are many: ease of deployment, resource sharing, simplifying the coordination between developer and cluster administrator, and enhanced security. A standalone Spark cluster on vSphere virtual machines running in the same configuration as a Kubernetes-managed Spark cluster on vSphere virtual machines were compared for performance using a heavy workload, and the difference imposed by Kubernetes was found to be insignificant.

Spark applications running in Standalone mode require that every Spark worker node be installed with the correct version of Spark, Python, Java, etc. This puts a burden on the IT administrator, who may be managing many Spark applications with different requirements, and it requires coordination between the administrator and the application developer. With Kubernetes, the developer only needs to create a container with the correct software, and the IT administrator just needs to manage the cluster using the fine-grained resource management tools to enable the different Spark workloads.

To compare Spark Standalone performance to Spark on Kubernetes performance, a Deep Learning workload, the Maximum Throughput Spark BigDL ResNet50 image classifier from VMware IoT Analytics Benchmark, was run on the same 16 worker nodes, first while configured as Spark worker nodes, then while configured as Kubernetes nodes. Then the number of nodes was reduced by four (by removing the four workers on host 4), and the same comparison was made using 12 nodes, then 8, then 4.

The relative results are shown below. The Spark Standalone and Spark on Kubernetes performance in terms of images per second classified was within ~1% of each other for all configurations. Performance scaled well for the Spark tests as the number of VMs increased from 4 (1 server) to 16 (4 servers).

All details are in the paper.

IoT Analytics Benchmark adds neural network–based deep learning with Keras and BigDL

The IoT Analytics Benchmark released last year dealt with an important Internet of Things use case—monitoring factory sensor data for impending failure conditions. This year, we are tackling an equally important use case—image classification. Whether used in facial recognition, license plate readers, inspection systems, or autonomous vehicles, neural network–based deep learning is making image detection and classification a viable technology.

As in the classic machine learning used in the original IoT Analytics Benchmark code (which used the Spark Machine Learning Library), the new deep learning code first trains a model using pre-labeled images and then deploys that model to infer the classification of new images. For IoT this inference step is the most important. Thus, the new programs, designated as IoT Analytics Benchmark DL, use previously trained models (included in the kit) to demonstrate inferencing that can be performed at the edge (on small gateway systems) or in scaled-out Spark clusters.

Continue reading

New white paper: Big Data performance on VMware Cloud on AWS: Spark machine learning and IoT analytics performance on-premises and in the cloud

By Dave Jaffe

A new white paper is available comparing Spark machine learning performance on an 8-server on-premises cluster vs. a similarly configured VMware Cloud on AWS cluster.

Here is what the VMware Cloud on AWS cluster looked like:

Screenshot of cluster configuration

VMware Cloud on AWS configuration for performance tests

Three standard analytic programs from the Spark machine learning library (MLlib), K-means clustering, Logistic Regression classification, and Random Forest decision trees, were driven using spark-perf. In addition, a new, VMware-developed benchmark, IoT Analytics Benchmark, which models real-time machine learning on Internet-of-Things data streams, was used in the comparison. The benchmark is available from GitHub.

Continue reading