Use JDBC to connect to Impala

Eugenia Kuzina

Contents

Configure the JDBC Port
Enable Impala JDBC support on client systems
Connection strings for the Hive JDBC Driver
JDBC limitations

Impala supports the standard JDBC interface that allows access from commercial Business Intelligence tools and other software.

To set up a JDBC connection to Impala, follow the steps below:

Verify the port where the Impala daemons in your cluster are listening for incoming JDBC requests.
Install the JDBC driver on every host that runs the JDBC-enabled application.
Specify a connection string for the JDBC application to access one of the servers running the impalad daemon, with the appropriate security settings.

Configure the JDBC Port

The default ports on which Impala Daemons accepts JDBC connections are listed in the table below.

Protocol

Default Port

Parameter

HTTP

28000

hs2_http_port

Binary TCP

21050

hs2_port

To change these ports, go to the Components page of the Impala service in ADCM UI, select Impala Daemon, open its Primary Configuration tab, and set the required parameter( hs2_http_port or hs2_port).

Enable Impala JDBC support on client systems

The Hive JDBC driver is installed on ADH hosts with Impala. This is the hive-jdbc.noarch package.

If you are using JDBC-enabled applications on hosts outside the cluster, install the JDBC driver (the hive-jdbc package) on at least one cluster host. For this, use the Linux package manager. Then download the following JAR files to each client machine that should use JDBC with Impala:

commons-logging-X.X.X.jar
hadoop-common.jar
hive-common-X.XX.X.jar
hive-jdbc-X.XX.X.jar
hive-metastore-X.XX.X.jar
hive-service-X.XX.X.jar
httpclient-X.X.X.jar
httpcore-X.X.X.jar
libfb303-X.X.X.jar
libthrift-X.X.X.jar
log4j-X.X.XX.jar
slf4j-api-X.X.X.jar
slf4j-logXjXX-X.X.X.jar

To enable the JDBC support for Impala on the system where you run the JDBC application, perform the following steps:

Download the JAR files listed above to each client machine. For Maven users, see the sample github page for an example of the dependencies you could add to a pom file instead of downloading the individual JARs.
Store the JAR files. You can do it in a directory that already is referenced in your CLASSPATH setting. On Linux, you can use the location /opt/jars/. On Windows, you can use a subdirectory in the C:\Program Files folder.
Set CLASSPATH to include the JAR files. To successfully load the Impala JDBC driver, client programs must find the associated JAR files. For this, in most cases, you need to set the CLASSPATH variable. You can find some examples below. For more information about installing JDBC drivers, see the documentation for your JDBC client.

On Linux, if you extracted the JARs to /opt/jars/, you can execute the following command:
```
$ export CLASSPATH=/opt/jars/*.jar:$CLASSPATH
```
On Windows, use the System Properties item of Control Panel to modify the environment variables for your system. Change the environment variables to include the path to the extracted JAR files.

The class name of the Hive JDBC driver is org.apache.hive.jdbc.HiveDriver.

Connection strings for the Hive JDBC Driver

After you configured Impala to work with JDBC, you can establish a connection.

IMPORTANT

When using the HTTP protocol as the transport mode, make sure to execute the kinit command before connecting to the database.

Without Kerberos
SSL
SSL+Kerberos
LDAP

For a cluster that does not use Kerberos authentication, utilize the following connection string syntax:

jdbc:hive2://<hostname>:<port>/;auth=noSasl

Example (binary TCP protocol):

jdbc:hive2://host1.example.com:21050/;auth=noSasl

Example (HTTP protocol):

jdbc:hive2://host1.example.com:28000/;transportMode=http;auth=noSasl

To connect to an Impala instance with SSL enabled:

jdbc:hive2://<hostname>:<port>/;ssl=true;sslTrustStore=<trust_store_path>;trustStorePassword=<trust_store_password>;principal=<principal_name>

Example (binary TCP protocol):

jdbc:hive2://host1.example.com:21050/;ssl=true;sslTrustStore=/tmp/truststore.jks;trustStorePassword=bigdata;principal=hive/_HOST@EXAMPLE.COM

Example (HTTP protocol):

jdbc:hive2://host1.example.com:28000/;transportMode=http;ssl=true;sslTrustStore=/tmp/truststore.jks;trustStorePassword=bigdata;principal=hive/_HOST@EXAMPLE.COM

To connect to an Impala instance that requires Kerberos authentication with SSL enabled:

jdbc:hive2://<hostname>:<port>/;principal=<principal_name>/hostname@EXAMPLE.COM;ssl=true;sslTrustStore=<truststore_path>;trustStorePassword=<truststore_pass>

The principal must be the same user principal that you use when starting Impala.

Example (binary TCP protocol):

jdbc:hive2://host1.example.com:21050/;principal=impala/host1.example.com@EXAMPLE.COM;ssl=true;sslTrustStore=/tmp/truststore.jks;trustStorePassword=bigdata

Example (HTTP protocol):

jdbc:hive2://host1.example.com:28000/;transportMode=http;principal=impala/host1.example.com@EXAMPLE.COM;ssl=true;sslTrustStore=/tmp/truststore.jks;trustStorePassword=bigdata

To connect to an Impala instance that requires LDAP authentication:

jdbc:hive2://<hostname>:<port>/<db_name>;user=<ldap_userid>;password=<ldap_password>

Example (binary TCP protocol):

jdbc:hive2://host1.example.com:21050/test_db;user=user1;password=password1

Example (HTTP protocol):

jdbc:hive2://host1.example.com:28000/test_db;transportMode=http;user=user1;password=password1

JDBC limitations

Most Impala SQL features work equivalently through the impala-shell interpreter and the JDBC API, but there are some exceptions:

Queries involving the complex types (ARRAY, STRUCT, and MAP) require notation that might not be available on all levels of the JDBC driver. You can create a view that exposes a "flattened" version of the complex columns and point the application at the view. For more information, see Complex Types.

The Impala complex types are supported by the JDBC getColumns() method. Both MAP and ARRAY are reported as the JDBC SQL type ARRAY. This behavior is consistent with Hive. The STRUCT type is reported as the JDBC SQL type STRUCT. To be consistent with Hive, the TYPE_NAME field is populated with the primitive type name for scalar types. For complex types, the toSQL() method is executed. The resulting type names are not fully consistent. The following table shows how toSQL() converts types.

Initial type

Resulting type

DECIMAL(10,10)

DECIMAL

CHAR(10)

CHAR

VARCHAR(10)

VARCHAR

ARRAY<DECIMAL(10,10)>

ARRAY<CHAR(10)>

ARRAY<VARCHAR(10)>

Currently, Impala INSERT, UPDATE, or other DML statements executed through the JDBC interface against a Kudu table do not return JDBC errors for conditions such as duplicate primary key columns. Therefore, in applications that execute a large number of DML statements, it is better to use the Kudu Java API directly rather than JDBC.

Found a mistake? Seleсt text and press Ctrl+Enter to report it