Prerequisites

Checklist

Make sure that the following requirements are met:

  • There is access to the ADQM cluster.

  • There is access to the Spark 3 cluster.

  • There is network connection between all ADQM shards and the Spark 3 driver.

  • There is network connection between all ADQM shards and each Spark 3 executor node.

Supported platforms and versions

  • ADQM starting with version 20.8.11.17.

  • Spark 3.3.x, Spark 3.4.x.

  • Scala 2.13.

  • ClickHouse Native JDBC 2.5.4.

Memory

In general, Spark 3 runs fine using any memory amount between 8 GB and hundreds of gigabytes per machine. We recommend to allocate 75% of the memory for Spark 3 at most — leave the rest for the operating system and buffer cache.

The memory amount that you need depends on your application. To determine the memory amount that your application uses for a certain dataset size, load a part of your dataset into the Spark 3 RDD, then use the Storage tab of the Spark 3 monitoring UI (http://<driver-node>:4040) to see the memory size for that part. Memory usage is affected by storage level and serialization format. See the tuning guide for tips on how to reduce the memory usage.

NOTE
Java VM does not always behave well if there is more than 200 GB of RAM. If you purchase machines with more RAM, you can run multiple worker Java VMs per node. In Spark 3 standalone mode, you can set the number of worker machines per node via the SPARK_WORKER_INSTANCES variable in the conf/spark-env.sh script. You can also set the number of cores per worker machine via the SPARK_WORKER_CORES variable.
Found a mistake? Seleсt text and press Ctrl+Enter to report it