Use JDBC to connect to Impala

Impala supports the standard JDBC interface that allows access from commercial Business Intelligence tools and other software.

To set up a JDBC connection to Impala, follow the steps below:

  1. Verify the port where the Impala daemons in your cluster are listening for incoming JDBC requests.

  2. Install the JDBC driver on every host that runs the JDBC-enabled application.

  3. Specify a connection string for the JDBC application to access one of the servers running the impalad daemon, with the appropriate security settings.

Configure the JDBC Port

The default ports on which Impala Daemons accepts JDBC connections are listed in the table below.

Protocol Default Port Parameter

HTTP

28000

hs2_http_port

Binary TCP

21050

hs2_port

To change these ports, go to the Components page of the Impala service in ADCM UI, select Impala Daemon, open its Primary Configuration tab, and set the required parameter( hs2_http_port or hs2_port).

Enable Impala JDBC support on client systems

The Hive JDBC driver is installed on ADH hosts with Impala. This is the hive-jdbc.noarch package.

If you are using JDBC-enabled applications on hosts outside the cluster, install the JDBC driver (the hive-jdbc package) on at least one cluster host. For this, use the Linux package manager. Then download the following JAR files to each client machine that should use JDBC with Impala:

  • commons-logging-X.X.X.jar

  • hadoop-common.jar

  • hive-common-X.XX.X.jar

  • hive-jdbc-X.XX.X.jar

  • hive-metastore-X.XX.X.jar

  • hive-service-X.XX.X.jar

  • httpclient-X.X.X.jar

  • httpcore-X.X.X.jar

  • libfb303-X.X.X.jar

  • libthrift-X.X.X.jar

  • log4j-X.X.XX.jar

  • slf4j-api-X.X.X.jar

  • slf4j-logXjXX-X.X.X.jar

To enable the JDBC support for Impala on the system where you run the JDBC application, perform the following steps:

  1. Download the JAR files listed above to each client machine. For Maven users, see the sample github page for an example of the dependencies you could add to a pom file instead of downloading the individual JARs.

  2. Store the JAR files. You can do it in a directory that already is referenced in your CLASSPATH setting. On Linux, you can use the location /opt/jars/. On Windows, you can use a subdirectory in the C:\Program Files folder.

  3. Set CLASSPATH to include the JAR files. To successfully load the Impala JDBC driver, client programs must find the associated JAR files. For this, in most cases, you need to set the CLASSPATH variable. You can find some examples below. For more information about installing JDBC drivers, see the documentation for your JDBC client.

    On Linux, if you extracted the JARs to /opt/jars/, you can execute the following command:

    $ export CLASSPATH=/opt/jars/*.jar:$CLASSPATH

    On Windows, use the System Properties item of Control Panel to modify the environment variables for your system. Change the environment variables to include the path to the extracted JAR files.

The class name of the Hive JDBC driver is org.apache.hive.jdbc.HiveDriver.

Connection strings for the Hive JDBC Driver

After you configured Impala to work with JDBC, you can establish a connection.

  • Without Kerberos

  • Kerberos

  • LDAP

  • Over HTTP

For a cluster that does not use Kerberos authentication, utilize the following connection string syntax:

jdbc:hive2://host:port/;auth=noSasl

Example:

jdbc:hive2://host1.example.com:21050/;auth=noSasl

The Binary TCP protocol is used.

To connect to an Impala instance that requires Kerberos authentication:

jdbc:hive2://host:port/;principal=principal_name

The principal must be the same user principal that you use when starting Impala.

Example:

jdbc:hive2://host1.example.com:21050/;principal=impala/host1.example.com@H2.EXAMPLE.COM

The Binary TCP protocol is used.

To connect to an Impala instance that requires LDAP authentication:

jdbc:hive2://host:port/db_name;user=ldap_userid;password=ldap_password

Example:

jdbc:hive2://host1.example.com:21050/test_db;user=user1;password=password1

The Binary TCP protocol is used.

To connect to an Impala instance over HTTP, specify the HTTP port (28000 by default), and transportMode=http in the connection string.

Example:

jdbc:hive2://myhost.example.com:28000/;transportMode=http

JDBC limitations

Most Impala SQL features work equivalently through the impala-shell interpreter and the JDBC API, but there are some exceptions:

  • Queries involving the complex types (ARRAY, STRUCT, and MAP) require notation that might not be available on all levels of the JDBC driver. You can create a view that exposes a "flattened" version of the complex columns and point the application at the view. For more information, see Complex Types.

  • The Impala complex types are supported by the JDBC getColumns() method. Both MAP and ARRAY are reported as the JDBC SQL type ARRAY. This behavior is consistent with Hive. The STRUCT type is reported as the JDBC SQL type STRUCT. To be consistent with Hive, the TYPE_NAME field is populated with the primitive type name for scalar types. For complex types, the toSQL() method is executed. The resulting type names are not fully consistent. The following table shows how toSQL() converts types.

    Initial type Resulting type

    DECIMAL(10,10)

    DECIMAL

    CHAR(10)

    CHAR

    VARCHAR(10)

    VARCHAR

    ARRAY<DECIMAL(10,10)>

    ARRAY<DECIMAL(10,10)>

    ARRAY<CHAR(10)>

    ARRAY<CHAR(10)>

    ARRAY<VARCHAR(10)>

    ARRAY<VARCHAR(10)>

  • Currently, Impala INSERT, UPDATE, or other DML statements executed through the JDBC interface against a Kudu table do not return JDBC errors for conditions such as duplicate primary key columns. Therefore, in applications that execute a large number of DML statements, it is better to use the Kudu Java API directly rather than JDBC.

Found a mistake? Seleсt text and press Ctrl+Enter to report it