Use JDBC to connect to Impala
Impala supports the standard JDBC interface that allows access from commercial Business Intelligence tools and other software.
To set up a JDBC connection to Impala, follow the steps below:
-
Verify the port where the Impala daemons in your cluster are listening for incoming JDBC requests.
-
Install the JDBC driver on every host that runs the JDBC-enabled application.
-
Specify a connection string for the JDBC application to access one of the servers running the impalad daemon, with the appropriate security settings.
Configure the JDBC Port
The default ports on which Impala Daemons accepts JDBC connections are listed in the table below.
Protocol | Default Port | Parameter |
---|---|---|
HTTP |
28000 |
hs2_http_port |
Binary TCP |
21050 |
hs2_port |
To change these ports, go to the Components page of the Impala service in ADCM UI, select Impala Daemon, open its Primary Configuration tab, and set the required parameter( hs2_http_port or hs2_port).
Enable Impala JDBC support on client systems
The Hive JDBC driver is installed on ADH hosts with Impala. This is the hive-jdbc.noarch package.
If you are using JDBC-enabled applications on hosts outside the cluster, install the JDBC driver (the hive-jdbc package) on at least one cluster host. For this, use the Linux package manager. Then download the following JAR files to each client machine that should use JDBC with Impala:
-
commons-logging-X.X.X.jar
-
hadoop-common.jar
-
hive-common-X.XX.X.jar
-
hive-jdbc-X.XX.X.jar
-
hive-metastore-X.XX.X.jar
-
hive-service-X.XX.X.jar
-
httpclient-X.X.X.jar
-
httpcore-X.X.X.jar
-
libfb303-X.X.X.jar
-
libthrift-X.X.X.jar
-
log4j-X.X.XX.jar
-
slf4j-api-X.X.X.jar
-
slf4j-logXjXX-X.X.X.jar
To enable the JDBC support for Impala on the system where you run the JDBC application, perform the following steps:
-
Download the JAR files listed above to each client machine. For Maven users, see the sample github page for an example of the dependencies you could add to a pom file instead of downloading the individual JARs.
-
Store the JAR files. You can do it in a directory that already is referenced in your CLASSPATH setting. On Linux, you can use the location /opt/jars/. On Windows, you can use a subdirectory in the C:\Program Files folder.
-
Set CLASSPATH to include the JAR files. To successfully load the Impala JDBC driver, client programs must find the associated JAR files. For this, in most cases, you need to set the CLASSPATH variable. You can find some examples below. For more information about installing JDBC drivers, see the documentation for your JDBC client.
On Linux, if you extracted the JARs to /opt/jars/, you can execute the following command:
$ export CLASSPATH=/opt/jars/*.jar:$CLASSPATH
On Windows, use the System Properties item of Control Panel to modify the environment variables for your system. Change the environment variables to include the path to the extracted JAR files.
The class name of the Hive JDBC driver is org.apache.hive.jdbc.HiveDriver
.
Connection strings for the Hive JDBC Driver
After you configured Impala to work with JDBC, you can establish a connection.
For a cluster that does not use Kerberos authentication, utilize the following connection string syntax:
jdbc:hive2://host:port/;auth=noSasl
Example:
jdbc:hive2://host1.example.com:21050/;auth=noSasl
The Binary TCP protocol is used.
To connect to an Impala instance that requires Kerberos authentication:
jdbc:hive2://host:port/;principal=principal_name
The principal must be the same user principal that you use when starting Impala.
Example:
jdbc:hive2://host1.example.com:21050/;principal=impala/host1.example.com@H2.EXAMPLE.COM
The Binary TCP protocol is used.
To connect to an Impala instance that requires LDAP authentication:
jdbc:hive2://host:port/db_name;user=ldap_userid;password=ldap_password
Example:
jdbc:hive2://host1.example.com:21050/test_db;user=user1;password=password1
The Binary TCP protocol is used.
To connect to an Impala instance over HTTP, specify the HTTP port (28000 by default), and transportMode=http
in the connection string.
Example:
jdbc:hive2://myhost.example.com:28000/;transportMode=http
JDBC limitations
Most Impala SQL features work equivalently through the impala-shell interpreter and the JDBC API, but there are some exceptions:
-
Queries involving the complex types (
ARRAY
,STRUCT
, andMAP
) require notation that might not be available on all levels of the JDBC driver. You can create a view that exposes a "flattened" version of the complex columns and point the application at the view. For more information, see Complex Types. -
The Impala complex types are supported by the JDBC
getColumns()
method. BothMAP
andARRAY
are reported as the JDBC SQL typeARRAY
. This behavior is consistent with Hive. TheSTRUCT
type is reported as the JDBC SQL typeSTRUCT
. To be consistent with Hive, theTYPE_NAME
field is populated with the primitive type name for scalar types. For complex types, thetoSQL()
method is executed. The resulting type names are not fully consistent. The following table shows howtoSQL()
converts types.Initial type Resulting type DECIMAL(10,10)
DECIMAL
CHAR(10)
CHAR
VARCHAR(10)
VARCHAR
ARRAY<DECIMAL(10,10)>
ARRAY<DECIMAL(10,10)>
ARRAY<CHAR(10)>
ARRAY<CHAR(10)>
ARRAY<VARCHAR(10)>
ARRAY<VARCHAR(10)>
-
Currently, Impala
INSERT
,UPDATE
, or other DML statements executed through the JDBC interface against a Kudu table do not return JDBC errors for conditions such as duplicate primary key columns. Therefore, in applications that execute a large number of DML statements, it is better to use the Kudu Java API directly rather than JDBC.