Machine Learning

Multi-Cloud Machine Learning with data from on-premises and training with Google Cloud Vertex platform (Part 1 of 2)

Future of AI/ML Compute is multi-Cloud:

As enterprises evolve their compute in the cloud era, they are connecting multiple cloud providers based on their unique requirements. Enterprises are looking to leverage the unique capabilities offered by the different cloud providers and build a multi-cloud datacenter. The data processing training and inference for machine learning are proliferating across multiple clouds.

 

Figure 1: Future of AI/ML is Multi-Cloud

VMware Cloud: The complete portfolio for the Multi-Cloud Journey

The VMware Cloud Platform provides a diverse set of capabilities spanning all major public and regional cloud providers. The platform supports existing applications leveraging virtual machines while also supporting containers and modern applications with Kubernetes. The VMware Cloud Platform provides a robust management platform that provides

  • End to End Visibility
  • Optimized Operations
  • Automation & Orchestration
  • Security & Governance
  • Kubernetes with Container Management

The VMware Cloud Platform brings uniformity across on-premises and multiple clouds as enterprises seek to leverage the unique applications, platforms and tools provided by cloud providers. The VMware Cloud can be the control plane of the multi-cloud infrastructure for enterprises.

 

Figure 2:  VMware Cloud Platform and its broad ecosystem and capabilities

 

Google Cloud Vertex Platform: (Source: https://cloud.google.com/vertex-ai)

Enterprise IT has new demands from its data scientists to provide an easy to use end to end AI/ML platform. There are specialized requirements on HW and SW for data processing, model training and inferencing along with ML operations. Google Cloud has taken the lessons it has learned from Internal Google AI/ML operations and brought forth the Google Cloud Vertex AI platform.

Key features

A unified UI for the entire ML workflow

Vertex AI brings together the Google Cloud services for building ML under one, unified UI and API. In Vertex AI, you can now easily train and compare models using AutoML or custom code training and all your models are stored in one central model repository. These models can now be deployed to the same endpoints on Vertex AI.

Pre-trained APIs for vision, video, natural language, and more

Easily infuse vision, video, translation, and natural language ML into existing applications or build entirely new intelligent applications across a broad range of use cases (including Translation and Speech to Text). AutoML enables developers to train high-quality models specific to their business needs with minimal ML expertise or effort. With a centrally managed registry for all datasets across data types (vision, natural language, and tabular).

End-to-end integration for data and AI

Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery using standard SQL queries on existing business intelligence tools and spreadsheets, or you can export datasets from BigQuery directly into Vertex AI Workbench and run your models from there. Use Vertex Data Labeling to generate highly accurate labels for your data collection.

Support for all open source frameworks

Vertex AI integrates with widely used open source frameworks such as TensorFlow, PyTorch, and scikit-learn, along with supporting all ML frameworks via custom containers for training and prediction.

 

Google Cloud VMware Engine: (Source: https://cloud.google.com/vmware-engine)

Google Cloud VMware Engine is an easy to use platform that helps enterprises lift and shift your VMware-based applications to Google Cloud without changes to your apps, tools, or processes. The service provides all the hardware and VMware licenses needed to run in a dedicated VMware SDDC in Google Cloud.

Key features

Fast networking and high availability

VMware Engine is built on Google Cloud’s highly performant, scalable infrastructure with fully redundant and dedicated 100 Gbps networking, providing 99.99% availability to meet the needs of your most demanding enterprise workloads.

An integrated Google Cloud experience

Benefit from full access to innovative Google Cloud services. Native VPC networking gives you private layer-3 access between VMware environments and other Google Cloud services, allowing you to use standard access mechanisms such as Cloud VPN or Interconnect. Additionally, billing, identity, and access control are integrated to unify the experience with other Google Cloud services

Robust VMware ecosystem solutions

Continue to leverage IT management tools and third-party services consistent with the on-premises environment. We’re partnering closely with leading storage, backup, and disaster recovery providers such as NetApp, Actifio, Veeam, Zerto, Cohesity, and Dell Technologies to ensure support for third-party solutions, ease the migration journey, and enable business continuity. Learn more.

Utilize familiar VMware tools or Google Cloud operations suite

We make it easy for you to move to the cloud because you can continue to use the same VMware tools, processes, and policies you are familiar with. Manage your on-premises VMware workloads and those in the cloud with the same suite of tools to simplify your migration. Additionally, you can use Google Cloud operations suite (formerly Stackdriver) to monitor, troubleshoot, and improve application performance on your Google Cloud environment.

 

Proof of Concept:

Two AI based applications for the finance and healthcare verticals are show cased in the solution

  • Healthcare AI app with CT-Scan based Covid-19 image detection. Image data transferred through GCVE to Google Cloud Store for Vertex based training
  • Finance application use data stored in a

SQL Server based enterprise data repository running in GCVE. Nightly incremental data extraction to Google Cloud Storage and training

Use Case 1: Healthcare – COVID 19 detection

Publicly available previously labeled COVID 19 CT-Scans are used as the source of truth for the AI/ML training of this healthcare application. The data is split into training, test and validation datasets and used for creating an accurate model for this application. Details about this dataset are shown in Table 2. The dataset is available in the form of CT-Scan images in an enterprise NFS datastore. The data is uploaded to a cloud datastore and used for processing with the Google Cloud Vertex platform.

Use Case 2: Financial Stock Market

Sample datasets available from public sources for stock market data is leveraged for the stock market use case. The dataset available is split into training, test and validation datasets following data science methodology.  Details of the dataset are shown in Table 1. The dataset is extracted from a Microsoft SQL database instance representing the enterprise data repository.

The flow of the proof of concept for this solution is depicted in the figure below.

 

Figure 3: Steps leveraged in the proof of concept for the solution

In part 2 of the blog series, we will look at the deployment of the solution for the two use cases and the results.