Manage fault-tolerant execution in Trino
By default, when a Trino node runs out of resources while executing a query or fails due to some other reason, the entire query fails and has to be re-submitted manually. Typically, the longer a query execution lasts, the more likely it is to fail.
The Trino service provides a fault-tolerant execution (FTE) mechanism that mitigates query failures by retrying failed queries and their subtasks. If FTE is enabled and a query fails, Trino persists intermediate data in a file system, allowing other Trino workers to perform a retry using this data.
NOTE
|
Enable FTE
By default, the FTE mechanism is disabled for the Trino service. You can enable FTE using ADCM (Clusters → <clusterName> → Services → Trino → Primary Configuration → Fault-tolerant execution). Particularly, the retry-policy parameter should be specified. The parameter accepts the following values:
-
NONE
— disables FTE (default). -
QUERY
— if a Trino worker fails, Trino retries the entire query. This policy is recommended when the majority of Trino requests are small queries. -
TASK
— if a Trino worker fails, Trino retries individual query tasks. This policy is best suited for retrying large batch queries, however, it can result in higher latency for big number of short-running queries.NOTESelecting theTASK
retry policy automatically adjusts some relevant Trino configurations to follow best practices for a fault-tolerant Trino cluster. For theTASK
retry policy, it is strongly recommended to set thetask.low-memory-killer.policy=total-reservation-on-blocked-nodes
parameter; otherwise, queries may need to be killed manually if Trino worker components run out of memory.
Exchange manager and exchange data
In Trino, Exchange manager is a component responsible for transferring exchange data between Trino workers while executing a distributed query. When FTE is enabled, this component becomes responsible for spooling exchange data to a file system like HDFS, Ozone, etc. When activating FTE, you can select an Exchange manager implementation to save intermediate query data in a specific file system.
FTE settings
The Trino service provides several settings groups in ADCM, which are used for tuning the FTE behavior. These are:
-
Fault-tolerant execution. The settings define the retry behavior, including retry policies, intervals, encryption, etc.
-
Exchange manager settings. The settings are used to control the intermediate spooling data generated by an Exchange manager (storage type and location, permissions, buffers, etc.).
Spool data to HDFS, Ozone, and local FS
By default, the Trino service provides two Exchange manager implementations that can spool data to HDFS, Ozone, and to a local file system.
When you enable FTE, the Trino service automatically detects whether HDFS or Ozone is installed in the ADH cluster, and updates all the required settings to work with the current storage type.
To store exchange data to HDFS, set the exchange-manager.name=hdfs
parameter and specify the path for storing spooling data by using exchange.base-directories
with the hdfs:///
schema in the path.
To store exchange data in Ozone, set the exchange-manager.name=hdfs
parameter and specify the storage path using the ofs:///
schema.
NOTE
Specifying both HDFS and Ozone paths in exchange.base-directories is not allowed.
|
To store exchange data in the local file system, set the exchange-manager.name=local
parameter and specify the path for storing spooling data using exchange.base-directories
(use the file:///
scheme).
The Trino service can automatically generate directories for spooling data. Using FTE settings, you can specify the location, directory permissions, etc.