Use impala-shell to connect to Impala
You can connect to Impala through the shell tool — impala-shell
. Impala shell is available on hosts where you install the Impala Client component. The tool can connect remotely to any instance of the Impala Daemon.
Impala shell allows you to create and manage databases and tables, insert data, and issue queries. You can also submit SQL statements in an interactive session. It is possible to specify command-line options to process a single statement or a script file that contains a large number of statements.
In addition, you can store a set of default settings for impala-shell in a configuration file.
Run impala-shell
To run the impala-shell, use the command below:
$ impala-shell
To execute queries, you need to connect to an impalad daemon. The impala-shell
command connects to an impalad running on the same machine. For information on how to connect to an impalad daemon on another host, see Connect to impalad on a specific host.
NOTE
If the following error occurs: ImportError: no module named pkg_resources , install the python-setuptools package. To do this, use the command: yum install python-setuptools .
|
Configuration options
You can use configuration options to tune impala-shell for your tasks. The table below lists options that can be passed as a command line parameter or defined in a configuration file.
Command-line option | Configuration file setting | Description |
---|---|---|
-B or --delimited |
write_delimited=true |
Prints query results in plain format as a delimited text file. It is useful for creating data files to be used with other Hadoop components. It also helps to avoid the performance overhead of the pretty-printing output, when running benchmark tests that use queries returning large result sets. To specify the delimiter character, use the |
-b or --kerberos_host_fqdn |
kerberos_host_fqdn=load-balancer-hostname |
Overrides the hostname of the Impala daemon’s Kerberos service principal. impala-shell checks whether the server’s principal matches this hostname. It can be used when impalad is accessed via a load-balancer, but impala-shell should connect to a specific impalad directly |
--print_header |
print_header=true |
Determines whether the header row should be printed |
-o filename or --output_file filename |
output_file=filename |
Specifies an output file name. It is typically used to store the results of a single query executed from the command line with the |
--output_delimiter=character |
output_delimiter=character |
Specifies the character to use as a delimiter between fields when query results are printed in plain format (the |
-E or --vertical |
vertical=true |
Prints the output of a query (rows) vertically. This option has no effects if |
-p or --show_profiles |
show_profiles=true |
Displays the query execution plan (the same output as the |
-h or --help |
— |
Displays help information |
— |
history_max=1000 |
Sets the maximum number of queries executed in impala-shell to be stored to the history file |
-i hostname or --impalad=hostname[:port_num] |
impalad=hostname[:port_num] |
Connects to the impalad daemon installed on the specified host. If the |
-q query or --query=query |
query=query |
Allows you to issue a single query from the command line, without starting the interactive interpreter. You can use this option to run impala-shell from a shell script or when the command is invoked from a Python, Perl, or other scripts. The query should contain a single statement, for example, |
-f query_file or --query_file=query_file |
query_file=path_to_query_file |
Allows you to process a file containing multiple SQL statements. For example, a set of DDL statements to create a group of tables and views. Multiple statements must be delimited by the semicolon ( |
--query_option="option=value" or -Q "option=value" |
The header line — |
Sets default query options for impala-shell commands. To set multiple query options at once, use multiple instances of this command-line option. The query option names are not case-sensitive |
-k or --kerberos |
use_kerberos=true |
Determines whether the Kerberos authentication should be used when the shell connects to impalad. If Kerberos is not enabled on the instance of impalad to which impala-shell is connecting, an error is displayed. See Enabling Kerberos Authentication for Impala |
-s kerberos_service_name or --kerberos_service_name=name |
kerberos_service_name=name |
Specifies the impala-shell name to authenticate to a particular impalad service principal. If |
-V or --verbose |
verbose=true |
Enables the verbose output |
--quiet |
verbose=false |
Disables the verbose output |
-v or --version |
version=true |
Displays the version information of impala-shell |
-c |
ignore_query_failure=true |
Determines whether to ignore the query failure |
-d default_db or --database=default_db |
default_db=default_db |
Specifies the initial database. The same as running the USE statement after the connection. If not specified, the |
-ssl |
ssl=true |
Enables TLS/SSL for impala-shell |
--ca_cert=path_to_certificate |
ca_cert=path_to_certificate |
Specifies the local path to the third-party CA certificate, or to a copy of the server certificate for self-signed server certificates. If |
-l |
use_ldap=true |
Enables LDAP authentication |
-u |
user=user_name |
Specifies the username, when LDAP authentication is enabled. Utilize the short username, not the full LDAP distinguished name. After using this option, the shell prompts for a password |
--ldap_password_cmd=command |
ldap_password_cmd |
Specifies a command to retrieve the LDAP password, when the LDAP authentication is enabled. If the command includes space-separated arguments, enclose the command and its arguments in quotation marks. For example, |
--config_file=path_to_config_file |
— |
Specifies the path of the file containing impala-shell configuration settings. The default is |
--live_progress |
live_progress=true |
Displays a progress bar showing the approximate percentage of completion for each query. The information is updated interactively as the query is processed. See LIVE_PROGRESS Query Option |
--disable_live_progress |
live_progress=false |
Allows you to disable |
--live_summary |
— |
Shows a detailed report, similar to the |
-var=variable_name=value |
— |
Defines a substitution variable that can be used within the impala-shell session. The variable can be substituted into statements processed by the |
--auth_creds_ok_in_clear |
auth_creds_ok_in_clear |
Allows you to use LDAP authentication for an insecure connection to the shell. Note that authentication credentials are sent unencrypted, and may be vulnerable to attack |
--protocol=protocol |
— |
The protocol to use for the connection to Impala. Valid protocol values are:
|
--hs2_fp_format=HS2_FP_FORMAT |
hs2_fp_format=HS2_FP_FORMAT |
Sets the printing format specification for floating point values when using HS2 protocol. The default behavior makes the values handled by Python’s |
The impala-shell configuration file
As mentioned above, you can store a set of impala-shell default settings in a configuration file.
The default path to the global impala-shell configuration file is /etc/impalarc. The default path to the user-level configuration file is ~/.impalarc. Note that the global-level file name does not include a dot (.
). To change the default path to the global configuration file, set the $IMPALA_SHELL_GLOBAL_CONFIG_FILE
environment variable.
Since ADH Impala configuration files are located in the /etc/impala/conf directory, you can create an impala-shell configuration file in this directory and set the $IMPALA_SHELL_GLOBAL_CONFIG_FILE
variable:
$ export IMPALA_SHELL_GLOBAL_CONFIG_FILE=/etc/impala/conf
It is also possible to start impala-shell with the --config_file
option set to the custom path of the configuration file.
The options specified in the user configuration file have higher priority than options from the global configuration file. In turn, any options you specify on the impala-shell command line override any corresponding options within the configuration file.
An impala-shell configuration file (global or user) must contain the header — [impala]
, followed by the impala-shell options.
The names of the options in the configuration file are similar to the long-form command-line arguments to the impala-shell command. They are listed in the table above.
You can specify key/value pair options using keyval
, similar to the --var
command-line option. For example, keyval=variable1=value1
.
The configuration file can also contain query options in the [impala.query_options]
section. Options specified in the [impala]
section override the options from the [impala.query_options]
section.
The example below shows a configuration file where the verbose mode is enabled, so each SQL query output is followed by timing information. impala-shell operates with the test
database by default. The query output is formatted as delimited text and stored in a file. The configuration file also enables the show_profiles
option. This option prints detailed performance information after each query.
[impala] verbose=true default_db=test write_delimited=true output_delimiter=, output_file=/home/test_result.csv show_profiles=true keyval=var1=value1,keyval=var2=value2
The following example shows a configuration file that connects to a specific Impala node, runs a single query within the test
database, then exits. Any query options predefined under the [impala.query_options]
section are applied during the session.
[impala] impalad=ees-adh1.ru-central1.internal default_db=test # Run the query and exit query=select count(*) from customers [impala.query_options] mem_limit=32g
You can use this type of configuration file with the command-line option --config_file
, to easily select between many predefined queries that could be run against different databases, hosts, or even different clusters. To run a sequence of statements instead of a single query, create a file with statements and specify the configuration option query_file=path_to_query_file
of the query text.
Connect to impalad on a specific host
To connect to an impalad daemon on another host, you need to specify the connection parameters in one of the ways described below.
Use command-line options
You can use command-line options when you run the impala-shell command:
$ impala-shell -i ees-1adh3.ru-central1.internal:28000 --protocol='hs2-http'
This example connects to the impalad on the host ees-1adh3.ru-central1.internal
, on the port 28000
, via HiveServer2 over HTTP
Utilize a configuration file
It is possible to specify connection settings in a file. Create a file with connection settings as described in The impala-shell configuration file section. For example:
$ cat > ~/custom_conf1
[impala]
impalad=ees-1adh2.ru-central1.internal
query=select version()
When using the cat
command, press CTRL+D
to exit.
Run impala-shell:
$ impala-shell --config_file=~/custom_conf1
The result:
Starting Impala Shell with no authentication using Python 2.7.5 Warning: live_progress only applies to interactive shell sessions, and is being skipped for now. Opened TCP connection to ees-1adh2.ru-central1.internal:21050 Connected to ees-1adh2.ru-central1.internal:21050 Server version: impalad version 4.2.0-RELEASE RELEASE (build 978afcfae9aa626de182b2872c4469646f42e0f6) Query: select version() Query submitted at: 2023-07-25 14:36:57 (Coordinator: http://ees-1adh2.ru-central1.internal:25000) Query progress can be monitored at: http://ees-1adh2.ru-central1.internal:25000/query_plan?query_id=084b59535cac1baf:5f5e61f500000000 +----------------------------------------------------------------------------------------+ | version() | +----------------------------------------------------------------------------------------+ | impalad version 4.2.0-RELEASE RELEASE (build 978afcfae9aa626de182b2872c4469646f42e0f6) | | Built on Mon Jun 5 16:33:03 UTC 2023 | +----------------------------------------------------------------------------------------+ Fetched 1 row(s) in 0.44s
Use the CONNECT command
Execute the CONNECT
command during an impala-shell session:
$ impala-shell
connect ees-1adh3.ru-central1.internal;
The result:
Opened TCP connection to ees-1adh3.ru-central1.internal:21050 Connected to ees-1adh3.ru-central1.internal:21050 Server version: impalad version 4.2.0-RELEASE RELEASE (build 978afcfae9aa626de182b2872c4469646f42e0f6) [ees-1adh3.ru-central1.internal:21050] default>
Connect to a specific database
You can use all the same connection options as in previous examples, and add the -d
option with the database name:
$ impala-shell -i localhost -d my_db
localhost
refers to the same host where impala-shell is running.
Execute SQL statements in non-interactive mode
Invoke impala-shell with the -q
option to run a single statement, or the -f
option to run a sequence of statements from a file. The impala-shell command returns a result immediately, without going into the interactive mode:
$ impala-shell -i localhost -d my_db -q 'show tables'
$ impala-shell -i localhost -d my_db -f recreate_tables.sql
Run commands and SQL statements
The impala-shell interpreter accepts the SQL statements described in the Impala SQL Statements article and shell commands that you can use for tuning performance and diagnosing problems.
In non-interactive mode, query output is printed to stdout or to the file specified by the -o
option. Errors are printed to stderr. So, you can process the query output as a part of a Unix pipeline.
In interactive mode, impala-shell uses the Readline library to recall and edit previous commands.
You can use the following keyboard shortcuts and syntax rules for running commands and SQL statements in impala-shell:
-
To see the full list of available commands, press
TAB
twice. -
To display previous commands, click the up-arrow and down-arrow keys.
-
Use the standard set of keyboard shortcuts in GNU Readline library for editing and cursor movement, such as
Ctrl+A
for the beginning of line andCtrl+E
for the end of line. -
Commands and SQL statements can span multiple lines and must be terminated by a semicolon (
;
). -
Use
--
to comment out a single line comment and/* <content> */
to comment multi lines. A comment is considered as a part of the statement it precedes. When you type a comment, you get a continuation prompt until you enter a semicolon. For example:default> -- Show tables starting with t > SHOW TABLES LIKE 't*';
If a comment contains the
${variable_name}
and it is not for a variable substitution, the$
character must be escaped, for example —\${hello}
.
The example below uses impala-shell commands and the test
database and the table1
table:
create database test;
use test;
create table table1 (field1 int, field2 boolean, field3 string);
insert into table1 values ((1,true,"row1"), (2,true,"row2"), (3,false,"row3"));
select * from table1;
The result:
+--------+--------+--------+ | field1 | field2 | field3 | +--------+--------+--------+ | 1 | true | row1 | | 2 | true | row2 | | 3 | false | row3 | +--------+--------+--------+
Substitute variables
You can define variables to be used within SQL statements processed by impala-shell. To do this on the command line, specify the --var=variable_name=value
option. Within an interactive session or a script file processed by the -f
option, use the SET VAR:variable_name=value
command. To pass a variable in an SQL statement, use the notation: ${VAR:variable_name}
.
The examples below contain multiple commands that define variables and use them in SQL statements executed through the -q
option. The -q
argument strings containing variable references (${var:value}
) are single-quoted.
Example 1:
$ impala-shell --var=tname=table1 --var=colname=col1 --var=coltype=string -q 'CREATE TABLE ${var:tname} (${var:colname} ${var:coltype}) STORED AS PARQUET'
The result:
Query: CREATE TABLE table1 (col1 string) STORED AS PARQUET +-------------------------+ | summary | +-------------------------+ | Table has been created. | +-------------------------+
Example 2:
$ impala-shell --quiet --var=tname=table1
use test;
select count(*) from ${var:tname};
The result:
+----------+ | count(*) | +----------+ | 3 | +----------+
Example 3:
$ impala-shell
set var:tname=table1;
select count(*) from ${var:tname};
The result:
+----------+ | count(*) | +----------+ | 3 | +----------+