Before installing and using ADQM Spark Connector, make sure that the following requirements are met:

  • There is access to the ADQM cluster.

  • There is access to the Spark cluster.

  • There is network connection between all ADQM shards and the Spark driver.

  • There is network connection between all ADQM shards and each Spark executor node.

Supported platforms and versions

  • ADQM starting with version

  • Spark 2.3, 2.4.

  • Scala 2.11.x.

  • ClickHouse Native JDBC 2.5.4.


In general, Spark runs fine using any memory amount between 8 GB and hundreds of gigabytes per machine. We recommend to allocate 75% of the memory for Spark at most — leave the rest for the operating system and buffer cache.

The memory amount that you need depends on your application. To determine the memory amount that your application uses for a certain dataset size, load a part of your dataset into the Spark RDD, then use the Storage tab of the Spark monitoring UI (http://<driver-node>:4040) to see the memory size for that part. Memory usage is affected by storage level and serialization format. See the tuning guide for tips on how to reduce the memory usage.

Java VM does not always behave well if there is more than 200 GB of RAM. If you purchase machines with more RAM, you can run multiple worker Java VMs per node. In Spark’s standalone mode, you can set the number of worker machines per node via the SPARK_WORKER_INSTANCES variable in the conf/ script. You can also set the number of cores per worker machine via the SPARK_WORKER_CORES variable.
Found a mistake? Seleсt text and press Ctrl+Enter to report it