Beijing skyline at the central business district.
Machine Learning

End to End Machine Learning with Natural Language processing for training and inference on vSphere (Part 2 of 2)

In part 1 of the blog we introduced end to end machine learning. The conceptual architecture with all software and hardware components for the solution was described. The steps in the solution deployment were shown.

Sentiment Analysis basics

Sentiment Analysis is a binary classification task. It predicts positive or negative sentiment using raw user text. The IMDB dataset is used for this benchmark. The methodology and code used was from the following GitHub site. https://github.com/mlperf/training/tree/master/sentiment_analysis The model used is a convolution neural network (CNN) based on “Effective use of word order for text categorization with convolutional neural networks

Modeling, validation and inference:

The IMDB Dataset which provides 50000 movie reviews for sentiment analysis was used for the training and evaluation phase. Below are some of the aspects of the dataset and it processing parameters.

·       Data pre-processing

  • The dataset isn’t preprocessed in any way.

·       Training and test data separation

  • The entire dataset is split into training and test sets. 25000 reviews are used for training and 25000 are used for validation. This split is pre-determined and cannot be modified.

·       Training data order

  • Training data is traversed in a randomized order.

·       Test data order

  • Test data is evaluated in a fixed order.

·       Quality target

  • Average accuracy of 90%

         Inference

  • Independent dataset was used to showcase Inference (evaluation)

Training

The training infrastructure leveraged the following:

  • Kubernetes cluster based on VMware Essentials PKS on CentOS run on-premises in the Santa Clara Datacenter
  • Training leveraged Bitfusion based access to remote GPU resources over the network

The initial runs compared CPU against GPU based model training. The training using GPU enabled workers were seen to be approximately 160X faster than the CPU based training for the CNN model used for sentiment analysis. This clearly indicated the need for GPU for these models. The benchmark code for sentiment analysis from mlperf.org, was leveraged to create a trained and evaluated model for sentiment analysis based on IMDB dataset. The model was saved on disk. This part of the solution was run on VMware Essentials PKS on CentOS run on-premises in the Santa Clara Datacenter. The docker file used in the solution is shown in Appendix A and the YAML file used to deploy the Kubernetes pods is shown in Appendix B.

The models were tuned over multiple iterations of tuning until the quality target of 90% accuracy was achieved.

Figure 6: Output from a training run

Inference:

The inference infrastructure leveraged the following:

  • Inference leveraged Kubernetes cluster based on Essentials PKS running on VMware Cloud on AWS in the Oregon region
  • Inference was run on a pod that had only CPU capability – there was no GPU access

Figure 7: VMC on AWS with Essentials PKS used for Inference

Results:

The resulting model from the training was transferred to Essentials PKS running on VMware Cloud on AWS at an Oregon Datacenter. The inference is run on a CPU only VMware Cloud AWS infrastructure on Essentials PKS worker nodes.

Sentiment Analysis application in production:

A Flask based web interface is used to frontend the sentiment analysis model that was created during training. The production application can be used analyze movie reviews individually to gauge the sentiment of the reviewer

Figure 8: Data Entry screen for sentiment analysis

  1. The website shown takes movie review as input
  2. It applies the model to do machine learning inference on the review

 

Figure 9: Screen showing results of the sentiment analysis from the inference engine

  1. It displays inference results – i.e., show if the review is positive or negative
  2. In the example a movie review from “Once Upon a Time in Hollywood (Movie 2019)” from a popular reviewer is pasted into the web page
  3. The analysis shows that the sentiment is overwhelmingly positive (95%)

 

Conclusion:

The solution clearly demonstrated end to end machine learning on the vSphere platform. VMware Essentials PKS was successfully deployed with NVIDIA GPU and Bitfusion to provide for a high-performance container based machine learning platform. vSphere solutions can be leveraged during different stages of the ML Pipeline with Natural Language Processing. Training, evaluation and inference processes in the ML workflow on vSphere were effectively demonstrated. A comprehensive paper with all the details can be downloaded here.

 

 

Appendix A: Docker File used in Solution

#

# This example Dockerfile illustrates a method to install

# additional packages on top of Ubuntu 16.0.4 container image.

#

# To use this Dockerfile, use the docker build command.

# See https://docs.docker.com/engine/reference/builder/

# for more information.

#

#Download sidgoyal78/paddle:benchmark12042018

#FROM sidgoyal78/paddle:benchmark12042018

FROM paddlepaddle/paddle:latest-gpu-cuda9.0-cudnn7

RUN apt-get update && apt-get install -y –no-install-recommends \

vim \

&& \

rm -rf /var/lib/apt/lists/

# Install flexdirect

#RUN cd /tmp && wget -O installfd    getfd.bitfusion.io && chmod +x installfd && ./installfd  -v fd-1.11.2  — -s -m binaries

# Use flexdirect v1.11.7 instead of v1.11.2

RUN cd /tmp && wget -O installfd    getfd.bitfusion.io && chmod +x installfd && ./installfd  -v fd-1.11.7  — -s -m binaries

#

# IMPORTANT:

# Build docker image from the dir: sc2k8cl3:/data/tools/dev

#

RUN mkdir -p /workspace

RUN mkdir -p /root/.cache/paddle/dataset/imdb

COPY ./sentiment_analysis/ /workspace/sentiment_analysis

COPY  ./aclImdb_v1.tar.gz /root/.cache/paddle/dataset/imdb

WORKDIR /workspace/sentiment_analysis/

 

Appendix B: YAML File used for the Kubernetes pods

apiVersion: v1

kind: Pod

metadata:

  name: mlperfsc9

spec:

  volumes:

    – name:  bitfusionio

      hostPath:

        path: /etc/bitfusionio

    # – name: nfs

      # nfs:

        # # FIXME: use the right name

        # #server: nfs-server.default.kube.local

        # server: “172.16.35.40”

        # path: “/GPU_DB”

        # readOnly: false

  containers:

  – name: mlperfsc9

    #image: ubuntu:16.04

    #image: sc2harbor1.vslab.local/library/mlperf:18.06-py3

    #image: sc2harbor1.vslab.local/library/mlperf:senti

    #

    # Ash comments: CUDA9-cudnn7 – required for this GPU

    #

    #image: sc2harbor1.vslab.local/library/mlperf:senti_cuda9

    image: pmohan77/mlperf:senti_cuda9

    “imagePullPolicy”: “Always”

    command: [“/bin/bash”, “-ec”, “while :; do echo ‘My pod name is ${MY_NODE_NAME} .’; sleep 300 ; done”]

    #

    env:

      # – name: MY_NODE_NAME

        # valueFrom:

          # fieldRef:

            # fieldPath: metadata.name

      – name: POD_UID

        valueFrom:

          fieldRef:

            apiVersion: v1

            fieldPath: metadata.uid           

      – name: IMAGENET_HOME

        “value”: “/tmp/understand_sentiment_conv.inference.model”

        #value: “/gpu_data/imagedata/tiny-imagenet-200/tiny-imagenet-200”

    volumeMounts:

      # name(s) must match the volume name(s) above

      – name: bitfusionio

        mountPath: /etc/bitfusionio

        readOnly: true

      # – name: nfs

        # mountPath: “/gpu_data”

        # mountPath: “/gpu_data/imagedata/logs”

        # mountPath: “/logs”

        # subPath: $(POD_UID)

  restartPolicy: Never