Use JDBC to connect to Hive
This section describes ways to connect to HiveServer2 using the JDBC interface which is the recommended way for client interaction with HiveServer2. The section contains a sample Java program that opens a connection to HiveServer2 and also shows how to connect to Hive using DBeaver which is a database tool that relies on JDBC.
To connect to Hive that runs in a kerberized cluster, be sure to complete Kerberos prerequisites, like installing Kerberos client, getting a ticket, configuring krb5.conf, etc. For more information, see Kerberos overview.
JDBC connection string
Below are examples of JDBC connection strings depending on the security mechanism used in an ADH cluster. You can find the up-to-date JDBC connection string on the Hive Info page in ADCM (Clusters → Services → Hive → Info).
Connects to Hive in the high availability mode, the connection is insecure:
jdbc:hive2://<cluster_host_0>:2181,<cluster_host_1>:2181,<cluster_host_N>:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=arenadata/cluster/<cluster_id>/<namespace>
Connects to Hive in the HTTP mode, the connection is insecure:
jdbc:hive2://<cluster_host_0>:2181,<cluster_host_1>:2181,<cluster_host_N>:2181/;transportMode=http;httpPath=<hive_endpoint>
Where <hive_endpoint>
is the HTTP endpoint defined as hive.server2.thrift.http.path
in hive-site.xml.
Connects to Hive in the high availability mode, the connection is secured with SSL:
jdbc:hive2://<cluster_host_0>:2181,<cluster_host_1>:2181,<cluster_host_N>:2181;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=arenadata/cluster/<cluster_id>/<namespace>;ssl=true;sslTrustStore=/tmp/truststore.jks;trustStorePassword=bigdata
Connects to Hive in the high availability mode, the connection is secured with SSL+Kerberos:
jdbc:hive2://<cluster_host_0>:2181,<cluster_host_1>:2181,<cluster_host_N>:2181;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=arenadata/cluster/<cluster_id>/<namespace>;ssl=true;sslTrustStore=/tmp/truststore.jks;trustStorePassword=bigdata;principal=hive/_HOST@EXAMPLE.COM
Where EXAMPLE.COM
is your Kerberos realm, for example RU-CENTRAL1.INTERNAL
.
The sample JDBC connection strings above are intended to connect to Hive in the high availability (HA) mode.
However, you can still connect to a HiveServer2 instance directly in the non-HA mode using the Thrift server port (10000
by default).
In this case, the JDBC string looks like jdbc:hive2://<cluster_host>:10000/
.
JDBC data types mapping
The following table describes the mapping between Hive/Java data types.
Hive data type | Java data type | Comment |
---|---|---|
TINYINT |
byte |
Signed or unsigned 1-byte integer |
SMALLINT |
short |
Signed 2-byte integer |
INT |
int |
Signed 4-byte integer |
BIGINT |
long |
Signed 8-byte integer |
FLOAT |
double |
Single-precision number |
DOUBLE |
double |
Double-precision number |
DECIMAL |
java.math.BigDecimal |
Fixed-precision decimal value |
BOOLEAN |
boolean |
Single bit (0 or 1) |
STRING |
String |
Character string or variable-length character string |
TIMESTAMP |
java.sql.Timestamp |
Date and time values |
BINARY |
String |
Binary data |
ARRAY |
String (json-encoded) |
Values of the same data type |
MAP |
String (json-encoded) |
Key/value pairs |
STRUCT |
String (json-encoded) |
Structured data |
Examples
Java
The following snippet shows how to connect to HiveServer2 using standard JDBC tools.
This program uses the org.apache.hive.jdbc.HiveDriver
driver class from hive-jdbc-3.1.1-arenadata-standalone.jar.
You can pull this JAR from Maven repository.
This standalone JAR also contains all the dependencies required to use the driver on a standalone (non-ADH) host.
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveJdbcClient {
private static String driverName = "org.apache.hive.jdbc.HiveDriver"; (1)
public static void main(String[] args) throws SQLException {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
Connection con = DriverManager.getConnection("jdbc:hive2://ka-adh-1.ru-central1.internal:2181," +
"ka-adh-3.ru-central1.internal:2181,ka-adh-2.ru-central1.internal:2181/;serviceDiscoveryMode=zooKeeper;" +
"zooKeeperNamespace=arenadata/cluster/2/hiveserver2", "hive", ""); (2)
// Connection con = DriverManager.getConnection("jdbc:hive2://ka-adh-1.ru-central1.internal:10000/test_db", "hive", ""); (3)
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable";
stmt.execute("drop table if exists " + tableName);
stmt.execute("create table " + tableName + " (key int, value string)");
// show tables
String sql = "show tables '" + tableName + "'";
System.out.println("Running query: " + sql);
ResultSet res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1));
}
// other Hive queries
}
}
1 | The JDBC driver class name. |
2 | The JDBC connection string to connect to Hive in the HA mode. |
3 | The JDBC connection string to connect to Hive in non-HA mode, using Thrift server port. |
Connect using DBeaver
DBeaver is an open-source database tool that connects to Hive using JDBC.
To connect to HiveServer2 from DBeaver, all you need is to provide an appropriate JDBC connection string, and Hive will download the default driver org.apache.hive.jdbc.HiveDriver
automatically.