Frequently tuned Greenplum parameters:
Please find below the list of most commonly used Greenplum parameters. Tuning these parameters can assist with the efficient memory management, performance tuning, resource and connection management of your Greenplum database. Please test these parameter changes on Dev or QA environments before implementing them on Production. Some of these parameters can be changed at the session level and some of them do require a system restart.
Note: Some of these parameters or their default values will be specific to Greenplum 7. For Greenplum version 6, please review the parameters and their values using this document.
Parameter Name | Requires Restart | Default | Value Range | Description |
---|---|---|---|---|
gp_vmem_protect_limit | Yes | 8192 | When resource queue-based or resource-group based resource management is active, the parameter sets the amount of memory (in number of MBs) that all postgres processes of an active segment instance can consume. If a query causes this limit to be exceeded, memory will not be allocated and the query will fail. | |
gp_max_plan_size | No | 0 | Specifies the total maximum uncompressed size of a query execution plan multiplied by the number of Motion operators (slices) in the plan. If the size of the query plan exceeds the value, the query is cancelled and an error is returned. A value of 0 means that the size of the plan is not monitored. You can specify a value in kB, MB, or GB. The default unit is kB. For example, a value of 200 is 200kB. A value of 1GB is the same as 1024MB or 1048576kB. | |
gp_vmem_idle_resource_timeout | No | 18s | If a database session is idle for longer than the time specified, the session will free system resources (such as shared memory), but remain connected to the database. This allows more concurrent connections to the database at one time. | |
gp_max_slices | No | 0 | 0 – INT_MAX | Specifies the maximum number of slices (portions of a query plan that are run on segment instances) that can be generated by a query. If the query generates more than the specified number of slices, Greenplum Database returns an error and does not run the query. The default value is 0, no maximum value. Running a query that generates a large number of slices might affect Greenplum Database performance. For example, a query that contains UNION or UNION ALL operators over several complex views can generate a large number of slices. You can run EXPLAIN ANALYZE on the query to view slice statistics for the query. |
gp_vmem_idle_resource_timeout | No | 18s | Any valid time expression | If a database session is idle for longer than the time specified, the session will free system resources (such as shared memory), but remain connected to the database. This allows more concurrent connections to the database at one time. |
max_connections | Yes | 250 on coordinator, 750 on segments | 10 – 262143 | The maximum number of concurrent connections to the database server. In a Greenplum Database system, user client connections go through the Greenplum coordinator instance only. Segment instances should allow 3-10 times the amount as the coordinator. When you increase this parameter, max_prepared_transactions must be increased as well. Increasing this parameter may cause Greenplum Database to request more shared memory. |
superuser_reserved_connections | Yes | 10 | Integer less than max_connections | Determines the number of connection slots that are reserved for Greenplum Database superusers. |
runaway_detector_activation_percent | Yes | 90 | Percentage (Integer) | For queries that are managed by resource queues or resource groups, this parameter determines when Greenplum Database terminates running queries based on the amount of memory the queries are using. A value of 100 deactivates the automatic termination of queries based on the percentage of memory that is utilized. |
statement_timeout | No | 0 | number of milliseconds | Sets the maximum allowed duration of any statement. |
lock_timeout | No | 0ms | 0-int_max | Abort any statement that waits longer than the specified number of milliseconds while attempting to acquire a lock on a table, index, row, or other database object. The time limit applies separately to each lock acquisition attempt. The limit applies both to explicit locking requests (such as LOCK TABLE or SELECT FOR UPDATE) and to implicitly-acquired locks. If log_min_error_statement is set to ERROR or lower, Greenplum Database logs the statement that timed out. A value of zero (the default) turns off this lock wait monitoring. |
max_locks_per_transaction | Yes | 128 | Integer | The shared lock table is created with room to describe locks on max_locks_per_transaction * (max_connections + max_prepared_transactions) objects, so no more than this many distinct objects can be locked at any one time. This is not a hard limit on the number of locks taken by any one transaction, but rather a maximum average value. You might need to raise this value if you have clients that touch many different tables in a single transaction. |
runaway_detector_activation_percent | Yes | 90 | percentage (integer) | For queries that are managed by resource queues or resource groups, this parameter determines when Greenplum Database terminates running queries based on the amount of memory the queries are using. A value of 100 deactivates the automatic termination of queries based on the percentage of memory that is utilized. |
statement_mem | No | 125MB | number of kilobytes | Allocates segment host memory per query. The amount of memory allocated with this parameter cannot exceed max_statement_mem or the memory limit on the resource queue or resource group through which the query was submitted. If additional memory is required for a query, temporary spill files on disk are used. |
max_prepared_transactions | Yes | 250 on Master, 250 on Segments | Integer | Sets the maximum number of transactions that can be in the prepared state simultaneously. Greenplum uses prepared transactions internally to ensure data integrity across the segments. This value must be at least as large as the value of max_connections on the coordinator. Segment instances should be set to the same value as the coordinator. |
Workfile limiting GUCs:
Greenplum creates spillfiles / workfiles on disk when a query is unable to run in memory allocated for Greenplum. Workfiles used by a query can be viewed using the following:
select * from gp_toolkit.gp_workfile_usage_per_query;
Parameter Name | Requires Restart | Default | Value Range | Description |
---|---|---|---|---|
gp_workfile_limit_files_per_query | No | 100000 | Integer | Sets the maximum number of temporary spill files (also known as workfiles) allowed per query per segment. Spill files are created when running a query that requires more memory than it is allocated. The current query is terminated when the limit is exceeded. Set the value to 0 (zero) to allow an unlimited number of spill files. coordinator session reload |
gp_workfile_limit_per_segment | Yes | 0 | kilobytes | Sets the maximum total disk size that all running queries are allowed to use for creating temporary spill files at each segment. The default value is 0, which means a limit is not enforced. |
gp_workfile_limit_per_query | No | 0 | kilobytes | Sets the maximum disk size an individual query is allowed to use for creating temporary spill files at each segment. The default value is 0, which means a limit is not enforced. |
Configuring autostats:
Bad statistics is one of the most common causes of poor performing queries on the cluster. Statistics play an important role in the planning and execution of the query where row estimations, cardinality, join order estimations and predicate selection can get affected due to poor statistics. Ensure that “analyze” is run frequently on tables that are loaded or modified frequently. Regular analyze of catalog tables and user tables will ensure that the Optimizer will generate the best available plan for a query and the execution will take the optimal code path for returning results back to the coordinator. This can avoid costly resource consumption on the segments thus ensure a healthy resource utilization on the segment host.
Parameter Name | Requires Restart | Default | Value Range | Description |
---|---|---|---|---|
gp_autostats_mode | No | on_no_ stats | none, on_change, on_no_stats | Specifies the mode for triggering automatic statistics collection with ANALYZE. The on_no_stats option triggers statistics collection for CREATE TABLE AS SELECT, INSERT, or COPY operations on any table that has no existing statistics. The on_change option triggers statistics collection only when the number of rows affected exceeds the threshold defined by gp_autostats_on_change_threshold. Operations that can trigger automatic statistics collection with on_change are CREATE TABLE AS SELECT, UPDATE, DELETE, INSERT, COPY. |
gp_autostats_mode_in_functions | No | None | none, on_change, on_no_stats | Specifies the mode for triggering automatic statistics collection with ANALYZE for statements in procedural language functions. The none option deactivates statistics collection. The on_no_stats option triggers statistics collection for CREATE TABLE AS SELECT, INSERT, or COPY operations that are run in functions on any table that has no existing statistics. The on_change option triggers statistics collection only when the number of rows affected exceeds the threshold defined by gp_autostats_on_change_threshold. Operations in functions that can trigger automatic statistics collection with on_change are: CREATE TABLE AS SELECT, UPDATE, DELETE, INSERT, COPY. |
Resource Groups parameters:
You use resource groups to set and enforce CPU, memory, and concurrent transaction limits in Greenplum Database. Once you define a resource group, you assign the group to one or more Greenplum Database roles in order to control the resources used by them. When you assign a resource group to a role, the resource limits that you define for the group apply to all of the roles to which you assign the group. For example, the memory limit for a resource group identifies the maximum memory usage for all running transactions submitted by Greenplum Database users in all roles to which you assign the group.
Resource Group configurations and limits can be viewed using the following:
SELECT * FROM gp_toolkit.gp_resgroup_config;
More information on resource groups can be found here: Link
**Greenplum Database uses Linux-based control groups for CPU resource management, and Runaway Detector for statistics, tracking and management of memory.
Parameter Name | Requires Restart | Default | Value Range | Description |
---|---|---|---|---|
gp_resource_group_cpu_limit | Yes | 0.9 | 0.1-1.0 | Identifies the maximum percentage of system CPU resources to allocate to resource groups on each Greenplum Database segment node. |
gp_resource_group_cpu_priority | Yes | 10 | 1-50 | Sets the CPU priority for Greenplum processes relative to non-Greenplum processes when resource groups are enabled. For example, setting this parameter to 10 sets the ratio of allotted CPU resources for Greenplum processes to non-Greenplum processes to 10:1. |
gp_resource_group_queuing_timeout | No | 0 ms | 0 – INT_MAX millisecs | Cancel a transaction queued in a resource group that waits longer than the specified number of milliseconds. The time limit applies separately to each transaction. The default value is zero; transactions are queued indefinitely and never time out. |
gp_resgroup_memory_policy | No | eager_free | auto, eager_free | Used by a resource group to manage memory allocation to query operators. When set to auto, Greenplum Database uses resource group memory limits to distribute memory across query operators, allocating a fixed size of memory to non-memory-intensive operators and the rest to memory-intensive operators. When you specify eager_free, Greenplum Database distributes memory among operators more optimally by re-allocating memory released by operators that have completed their processing to operators in a later query stage. |
gp_resgroup_memory_query_fixed_mem | No | 0 | 0 < integer < INT_MAX | Specifies a fixed amount of memory, in MB, reserved for all queries in a resource group for the scope of a session. When this parameter is set to 0, the default, the MEMORY_LIMIT resource group attribute determines this memory limit instead. While MEMORY LIMIT applies to queries across sessions, gp_resgroup_memory_query_fixed_mem overrides that limit at a session level. Thus, you can use this configuration parameter to adjust query memory budget for a particular session, on an ad hoc basis. ** This parameter is specific to Greenplum 7. |
memory_spill_ratio | No | 20 | 0-100 | Sets the memory usage threshold percentage for memory-intensive operators in a transaction. When a transaction reaches this threshold, it spills to disk. The default memory_spill_ratio percentage is the value defined for the resource group assigned to the currently active role. You can set memory_spill_ratio at the session level to selectively set this limit on a per-query basis. For example, if you have a specific query that spills to disk and requires more memory, you may choose to set a larger memory_spill_ratio to increase the initial memory allocation. You can specify an integer percentage value from 0 to 100 inclusive. If you specify a value of 0, Greenplum Database uses the statement_mem server configuration parameter value to control the initial query operator memory amount. |