Introducing VMware Tanzu Greenplum 7.5: Enhanced Performance for Data-Intensive Workloads

The new version (v7.5) of VMware Tanzu Greenplum has been released and features higher performance and less resource usage for processing complex data workloads, including analytical queries, machine learning, real-time data ingestion, and geospatial queries. The GPORCA optimizer has been expanded to cover a wider range of user queries and Tanzu Greenplum’s query execution has been optimized to accelerate queries on append optimized tables. Improved ANALYZE and VACUUM maintenance commands reduces contention on the customer’s data system allowing for more productive usage of the platform. The Greenplum Streaming Server (GPSS) now supports a new scale out architecture which can scale up to large data ingestion rates from Apache Kafka and VMware Tanzu RabbitMQ. The gpMLBot streamlines automated machine learning with hyperparameter tuning, reducing model training times. Geospatial extensions, including pgPointCloud, 3DCityDB, pgRouting, H3 Index, and TIGER geocoding have been included, opening new geospatial analytical use cases.

Faster Query Execution

The GPORCA optimizer in the new version creates query plans that use less memory and optimize join orders, supporting advanced SQL features like multi-column subqueries, ROW expressions, and DISTINCT-qualified window aggregates for faster execution of complex queries. The query execution engine uses TupleTableSlot enhancements, including partial and late deserialization, to reduce CPU overhead in tuple processing. This results in operations on Append-Optimized (AO) tables running up to twice as fast as in previous versions.

Compressed Append-Optimized (AO) tables now support pure index scans, eliminating the need for costly bitmap scans. This enables significant performance improvements, including faster ORDER BY with LIMIT queries, faster pg_vector queries, and more efficient joins and aggregates, making Tanzu Greenplum well-suited for analytical workloads.

Streamlined Maintenance and Operations

The ANALYZE process in v7.5 is more efficient, skipping unmodified partitions and merging statistics to reduce resource usage and maintenance time. This ensures fresher statistics for better query planning without impacting system performance. Operations like COPY, VACUUM, and CREATE INDEX on wide tables benefit from reduced CPU usage due to new deserialization optimizations. The gprecoverseg tool now supports targeted segment recovery, speeding up repairs and rebalancing. The new gpctl cluster management tool, built on a gRPC-based design, simplifies and accelerates cluster deployment, enabling faster setup of new environments.

Scalable Data Ingestion with Greenplum Streaming Server

The Greenplum Streaming Server (GPSS) enhances data ingestion with a multi-node, Kubernetes-based architecture. Operating as a distributed system that can scale up to meet the load of data being ingested. GPSS efficiently processes high-volume data from sources like Kafka and RabbitMQ, supporting real-time analytics with low latency.

Efficient AutoML with gpMLBot

The gpMLBot in v7.5 simplifies Automated Machine Learning (AutoML) by supporting hyperparameter tuning, which helps identify the best model and parameters for specific datasets. Integrated with Tanzu Greenplum’s high-performance database engine and in-database analytics libraries, gpMLBot reduces model training and selection times, enabling faster iteration for data scientists working with large datasets or complex feature engineering tasks.

Advanced Geospatial Data Processing

Tanzu Greenplum 7.5 supports large-scale geospatial analysis through extensions like pgPointCloud, 3DCityDB, pgRouting, H3 Index, and TIGER geocoding. The pgPointCloud extension enables efficient storage and querying of LiDAR point clouds, while 3DCityDB manages CityGML-based 3D city models for urban planning and visualization. pgRouting provides optimized routing for large networks, and H3 Index improves search performance and data distribution for collocated joins. TIGER geocoding converts addresses into geographic coordinates, streamlining location-based queries.

Summary

Tanzu Greenplum now enhances performance for data-intensive workloads through optimized query execution, efficient maintenance, scalable streaming, streamlined AutoML, and robust geospatial capabilities. These improvements reduce processing times and resource demands, making it a reliable choice for analytical and operational data environments.

Resources

About VMware Tanzu Greenplum

VMware Tanzu Greenplum, one of the key components of the VMware Tanzu Data portfolio, is a robust data warehousing and analytics platform built on open-source Postgres, designed to aggregate and analyze vast amounts of data at scale. Tanzu Greenplum is perfect for regulated, mission-critical organizations with massively scaled data consisting of multiple sources and data types who need to accelerate business by improving how they aggregate, analyze, and act on insights from critical data assets.

Faster Query Execution

Streamlined Maintenance and Operations

Scalable Data Ingestion with Greenplum Streaming Server

Efficient AutoML with gpMLBot

Advanced Geospatial Data Processing

Summary

Resources

About VMware Tanzu Greenplum

Related Articles

Automate Data Science Workflows

Greenplum On Kubernetes

Scaling Tanzu Greenplum Just Got Smarter and Faster

Introducing VMware Tanzu Greenplum 7.5: Enhanced Performance for Data-Intensive Workloads

Introducing VMware Tanzu Data Lake: Unlock the Power of Hybrid Hadoop