Spark and Hive

By default, Spark is already configured to work with Hive. Hive settings for Spark are located in the home directory — /etc/spark/conf. If your Spark application is interacting with Hadoop or both with Hive, you need to put Hadoop configuration files in the Spark’s classpath.

Multiple running applications might require different Hadoop/Hive client side configurations. You can copy and modify hdfs-site.xml, core-site.xml, yarn-site.xml in Spark’s classpath for each application.

In a Spark cluster running on YARN, these configuration files are set cluster-wide and cannot safely be changed by the application.

The best choice is to use Spark Hadoop properties in the form of spark.hadoop.*, and Spark Hive properties in the form of spark.hive.*. For example, adding is the same as adding the abc.def=xyz Hadoop property; adding is equivalent to the Hive property. They can be considered as same as normal Spark properties which can be set in $SPARK_HOME/conf/spark-defaults.conf. Default home directory for Spark is /etc/spark/conf. Here you can keep all Spark configurations.

In some cases, you may want to avoid hard-coding certain configurations in a SparkConf. For instance, Spark allows you to simply modify or add configurations at runtime:

Passing parameters to spark-submit
./bin/spark-submit \
  --name "My app" \
  --master local[4] \
  --conf spark.eventLog.enabled=false \
  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
  --conf \

You can find all Spark parameters description in the Spark documentation.

Custom configuration

If you need to make some custom updates in Spark applications for Hive, there are two ways:

  • One way is by adding custom properties into the spark-defaults.conf file and adding this file to the Hive classpath.

  • The other way is to set configuration properties in the hive-site.xml Hive configuration file. All configuration files are stored in the Spark home directory /etc/spark/conf/hive-site.xml.

For more information about Hive parameters for Spark, please, refer to Hive on Spark parameters.

