vSAN

Running SQL Server 2019 Big Data Clusters on Tanzu Kubernetes cluster in vSphere with Tanzu

Since the release of vSphere 7 Update 1c (Dec 17th, 2020), vSphere with Tanzu now supports configurable node storage for Tanzu Kubernetes clusters to mount additional storage volume to the worker nodes. This feature allows Tanzu Kubernetes clusters to run containerized applications that require a large amount of image size during deployment, for example, SQL Server 2019 big data clusters (BDC).

Update 2021/08/17: With SQL BDC CU12 release, BDC on Tanzu has been listed as a partner solution that is validated jointly by VMware, Microsoft and Dell EMC. For more details, please visit:

What is SQL Server 2019 Big Data Clusters?

SQL Server Big Data Clusters is Microsoft’s newest data platform that allows you to deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark, allowing you to easily combine and analyze your high-value relational data with high-volume big data.

Why SQL Server Big Data Clusters on vSphere with Tanzu?

vSphere with Tanzu is the best solution that puts together the virtualization and Kubernetes into the same platform. Customers can still rely on traditional virtualized SQL Server applications to manage their mission-critical OLTP workload, while they are able to migrate certain SQL Server OLAP jobs onto the same vSphere with Tanzu platform, and what’s more, extend the data platform to process multiple external data sources through SQL Server BDC, for example, use SQL BDC to process big data on HDFS for machine learning jobs.

How to Deploy SQL Server Big Data Clusters on vSphere with Tanzu?

vSphere with Tanzu makes it easy to deploy and manage SQL Server BDC. The supported virtual machine classes for vSphere with Tanzu can be seen here – Virtual Machine Class Types for Tanzu Kubernetes Clusters. You may find that only the CPU and Memory can be changed with different virtual machine classes, but the storage size is fixed for a single 16 GB.

Prior to vSphere 7 U1c release, this amount of disk size is too small to store all the container images required for SQL Server BDC, as a result, you’ll see a new pod stuck in an endless loop of evicting and creating due to insufficient disk size shown as follows.

To configure additional storage volume on each of the TKG cluster worker nodes to allow the successful deployment of SQL Server BDC, you have to make sure your ESXi servers are running vSphere 7 u1c release (VMware ESXi, 7.0.1, 17325551) or above.

You may follow the vSphere with Tanzu Quick Start Guide to set up the supervisor cluster and create a TKG cluster that will be used for SQL Server BDC deployment. Here’s a sample yaml file to create a TKG cluster for SQL Server BDC, which created an additional 250GB disk on each of the TKG nodes.

sqlbdc-tkg.yaml

You may create the TKG cluster with the sample yaml file with the following command:

After that, you should see a TKG cluster with 1 master node and 3 worker nodes created with an additional 250GB disk space.

Now you are ready to deploy SQL Server big data cluster within the TKG cluster. It is recommended to deploy SQL Server BDC with the Azure Data CLI (azdata) tool, and configuration scripts (bdc.json & control.json) which can be generated by Azure Data Studio notebook. Run the following command to create the SQL Server BDC cluster using azdata cli with the two config files.

Once the cluster is ready you should be able to see the screen like this:

Check the SQL Server BDC pods status with the following command:

Now you have successfully deployed SQL Server 2019 big data clusters on vSphere with Tanzu. For more details about reference architecture for SQL Server on VMware platform, please visit https://core.vmware.com/business-critical-application-reference-architectures.