Use CLI for Impala installation on Kubernetes

Prerequisites

To deploy Impala on Kubernetes via CLI, you need:

  • A Kubernetes cluster (1.32 or later) with access configured through kubectl.

  • The CLI tool that is unpacked from your offline pack.

  • The following images that are unpacked and pushed to your repository:

    • hub.arenadata.io/adc-enterprise/impala-operator:<version>

    • hub.arenadata.io/adh-enterprise/impala-docker:<version>

    These artifacts can be found in the offline packages, which can be requested from the Arenadata support team.

  • An up-and-running ADH cluster (4.2.0 or later) with the following services:

    • Core configuration

    • ADPG

    • Zookeeper

    • HDFS

    • YARN

    • Hive

    Impala runs outside the ADH cluster — in Kubernetes pods, and communicates with ADH over the network.

For security configurations, refer to the tabs below.

  • Ranger

  • Kerberos + SSL

If you plan to integrate Impala with Ranger, you need an ADPS cluster (2.0.0 or later) installed and running. Also, you need to create a service for Impala in Ranger.

This guide describes how to create a service via Ranger REST API. Alternatively, you can create a service in the Ranger web UI.

  1. Define a service in a JSON file:

    {
      "isEnabled": true,
      "type": "hive",
      "name": "impala_k8s", (1)
      "displayName": "impala_k8s",
      "description": "Service for Kubernetes Impala",
      "configs": {
        "username": "impala", (2)
        "password": "bigdata",  (3)
        "ranger.plugin.audit.filters": "[ {'accessResult': 'DENIED', 'isAudited': true}, {'actions':['METADATA OPERATION'], 'isAudited': false}, {'users':['hive','hue'],'actions':['SHOW_ROLES'],'isAudited':false} ]",
        "jdbc.driverClassName": "org.apache.hive.jdbc.HiveDriver",
        "jdbc.url": "jdbc:impala://10.92.41.149:21050" (4)
      }
    }
    1 A name of the Impala service in Ranger. Should be unique.
    2 A username for the service.
    3 A password for the service.
    4 A JDBC string for connecting to Impala that is exposed by load balancer.
  2. Push the defined service to Ranger:

    $ curl -u admin:<admin_pwd> -H "Content-Type: application/json" -X POST -d @ranger-impala-k8s.json http://<ranger-admin>:6080/service/public/v2/api/service
  1. Make sure that SSL is enabled for the ADH cluster and for ADPS if you plan to use both SSL and Ranger.

  2. To access Impala web UI and allow JDBC connections, generate certificates for Ingress and load balancer:

    $ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout impala-cloud.ru-central1.internal.key -out impala-cloud.ru-central1.internal.crt -subj "/CN=impala-cloud.ru-central1.internal"
    $ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout impala-jdbc.ru-central1.internal.key -out impala-jdbc.ru-central1.internal.crt -subj "/CN=impala-jdbc.ru-central1.internal"
  3. Unpack and push the Kerberos operator image to your repository.

  4. Initialize Kerberos operator:

    $ ./adc operators init --kerberos -o operator-kerberos-init.yaml

    This operation creates the operator-kerberos-init.yaml file with a template configuration.

  5. Edit the configuration file to your needs:

    kerberos:
    image: hub.arenadata.io/adc-enterprise/kerberos-operator:<tag> (1)
    
      # Number of replicas
      # replicas: 1
    
      resources:
        limits:
          cpu: 500m
          memory: 256Mi
    
      # Operator service account.
      serviceAccount: (2)
        create: true
        name: kerberos-operator
    
      # Namespace to run the operator.
      # Operator's ServiceAccount, deployment and RBAC components will be installed in it.
      namespace: (3)
        create: true
        name: kerberos-operator
    
      # Create namespaces to run the payload.
      createPayloadNamespaces: true
    
      # List of namespaces to run the payload in.
      payloadNamespaces: (4)
        - impala
    
      ## Image pull secret for a private registry.
      ## Either set 'name' to reference an existing Secret,
      ## or set 'credentials' and the CLI will create a dockerconfigjson Secret.
      #imagePullSecret:
      #  name: my-pull-secret
      #  credentials:
      #    registry: registry.example.com
      #    username: user
      #    password: pass
      kdc: (5)
        realm: RU-CENTRAL1.INTERNAL
        labelSelector:
          env: prod
        realms:
          RU-CENTRAL1.INTERNAL: |-
            kdc = tsn-freeipa.ru-central1.internal
            admin_server = tsn-freeipa.ru-central1.internal
        domainRealm:
          ru-central1.internal: RU-CENTRAL1.INTERNAL
        libdefaults:
          debug: "false"
          default_realm: RU-CENTRAL1.INTERNAL
          dns_lookup_kdc: "false"
          dns_lookup_realm: 'false'
          udp_preference_limit: '1'
    
      ldapSecret: (6)
        addr: ldaps://tsn-freeipa.ru-central1.internal:636
        adminPW: AdhCloud!
        adminDN: uid=admin,cn=users,cn=accounts,dc=ru-central1,dc=internal
        baseDN: cn=services,cn=accounts,dc=ru-central1,dc=internal
        ca: (7)
        provider: freeipa
    1 URL to the Kerberos operator image in your repository.
    2 Service account settings.
    3 Namespace settings.
    4 Payload namespace settings. The listed namespaces will be available to the Kerberos operator instance.
    5 KDC settings.
    6 LDAP settings. If you don’t use SSL, change the protocol to ldap and port to 389.
    7 CA certificate if LDAP is secured with SSL.
  6. Apply the configuration and deploy Kerberos operator:

    $ ./adc operators apply kerberos-operator -f operator-kerberos-init.yaml

Step 1. Install the Impala operator

  1. Initiate the Impala operator:

    $ ./adc operators init --impala -o operator-impala-init.yaml

    This operation creates the operator-impala-init.yaml file with a template configuration.

  2. Edit the configuration file to your needs:

    operator-impala-init.yaml
    impala:
      image: hub.arenadata.io/adc-enterprise/impala-operator:<tag> (1)
    
      # Number of replicas
      # replicas: 1
    
      resources:
        limits:
          cpu: 500m
          memory: 256Mi
    
      # Operator service account.
      serviceAccount: (2)
        create: true
        name: "impala"
    
      # Namespace to run the operator.
      # Operator's ServiceAccount, deployment and RBAC components will be installed in it.
      namespace: (3)
        create: true
        name: impala-operator
    
      # Create namespaces to run the payload.
      createPayloadNamespaces: true
    
      # List of namespaces to run the payload in.
      payloadNamespaces: (4)
        - impala
    
      ## Image pull secret for a private registry.
      ## Either set 'name' to reference an existing Secret,
      ## or set 'credentials' and the CLI will create a dockerconfigjson Secret.
      #imagePullSecret:
      #  name: my-pull-secret
      #  credentials:
      #    registry: registry.example.com
      #    username: user
      #    password: pass
    1 URL to the Impala operator image in your repository.
    2 Service account settings.
    3 Namespace settings.
    4 Payload namespace settings. The listed namespaces will be available to the Impala operator instance.
  3. Apply the configuration and deploy the Impala operator:

    $ ./adc operators apply impala-operator -f operator-impala-init.yaml

    The expected output contains the confirmation of success:

    time="20260518125157UTC" level="info" msg="operator impala-operator applied to namespace impala-operator"
  4. Verify the Impala operator:

    $ kubectl get pods -n impala-operator

    The expected output should be similar to:

    NAME                               READY   STATUS    RESTARTS      AGE
    impala-operator-impala-operator-6bf8788587-7s22r   1/1     Running   0          150m

Step 2. Install Impala cluster

  • No security

  • Ranger

  • Kerberos + SSL

  1. Prepare the hadoop_conf.yaml Hadoop configuration file:

    sites:
      core:
        fs.defaultFS: hdfs://adh
        hadoop.security.authentication: simple
      hdfs:
        dfs.client.failover.proxy.provider.adh: org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
        dfs.ha.namenodes.adh: nn_tsn-k8s-1,nn_tsn-k8s-3
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:8020
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:8020
        dfs.nameservices: adh
      ozone:
        ozone.om.address.adh.om_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-2: tsn-k8s-2.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:9862
        ozone.om.nodes.adh: om_tsn-k8s-1,om_tsn-k8s-1,om_tsn-k8s-1
        ozone.om.service.ids: adhom
      hive:
        hive.metastore.sasl.enabled: false
        hive.metastore.uris: thrift://tsn-k8s-1.ru-central1.internal:9083
        metastore.use.SSL: false
  2. Initialize the Impala cluster:

    $ ./adc cluster init --product impala --hadoop-file hadoop_conf.yaml --output cluster-impala-init.yaml

    This operation creates the cluster-impala-init.yaml file with a template configuration.

  3. Edit the configuration file to your needs:

    product: impala
    namespace: impala (1)
    image: (2)
      registry: hub.arenadata.io
      repository: adh-enterprise/impala-docker
      tag: 4.5.0_arenadata1-adh-4.2.0-x86_64
      pullPolicy: Always
    
      ## Image pull secret for a private registry.
      ## Either set 'name' to reference an existing Secret,
      ## or set 'credentials' and the CLI will create a dockerconfigjson Secret.
      #imagePullSecret:
      #  name: my-pull-secret
      #  credentials:
      #    registry: registry.example.com
      #    username: user
      #    password: pass
    impala:
      catalog:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      coordinator:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      executor:
        replicas: 2
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      statestore:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
    hadoop: (3)
      core:
        fs.defaultFS: hdfs://adh
        hadoop.security.authentication: simple
      hdfs:
        dfs.client.failover.proxy.provider.adh: org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
        dfs.ha.namenodes.adh: nn_tsn-k8s-1,nn_tsn-k8s-3
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:8020
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:8020
        dfs.nameservices: adh
      hive:
        hive.metastore.sasl.enabled: "false"
        hive.metastore.uris: thrift://tsn-k8s-1.ru-central1.internal:9083
        metastore.use.SSL: "false"
      ozone:
        ozone.om.address.adh.om_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-2: tsn-k8s-2.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:9862
        ozone.om.nodes.adh: om_tsn-k8s-1,om_tsn-k8s-2,om_tsn-k8s-3
        ozone.om.service.ids: adhom
    1 Namespace that the Impala cluster will use.
    2 Settings for pulling the Impala cluster image.
    3 Hadoop settings that were taken from the previously created hadoop_conf.yaml.
  1. Prepare the hadoop_conf.yaml Hadoop configuration file:

    sites:
      core:
        fs.defaultFS: hdfs://adh
        hadoop.security.authentication: simple
      hdfs:
        dfs.client.failover.proxy.provider.adh: org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
        dfs.ha.namenodes.adh: nn_tsn-k8s-1,nn_tsn-k8s-3
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:8020
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:8020
        dfs.nameservices: adh
      ozone:
        ozone.om.address.adh.om_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-2: tsn-k8s-2.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:9862
        ozone.om.nodes.adh: om_tsn-k8s-1,om_tsn-k8s-1,om_tsn-k8s-1
        ozone.om.service.ids: adhom
      hive:
        hive.metastore.sasl.enabled: false
        hive.metastore.uris: thrift://tsn-k8s-1.ru-central1.internal:9083
        metastore.use.SSL: false
  2. Initialize the Impala cluster:

    $ ./adc cluster init --product impala --hadoop-file hadoop_conf.yaml --output cluster-impala-init.yaml

    This operation creates the cluster-impala-init.yaml file with a template configuration.

  3. Edit the configuration file to your needs:

    product: impala
    namespace: impala (1)
    image: (2)
      registry: hub.arenadata.io
      repository: adh-enterprise/impala-docker
      tag: 4.5.0_arenadata1-adh-4.2.0-x86_64
      pullPolicy: Always
    
      ## Image pull secret for a private registry.
      ## Either set 'name' to reference an existing Secret,
      ## or set 'credentials' and the CLI will create a dockerconfigjson Secret.
      #imagePullSecret:
      #  name: my-pull-secret
      #  credentials:
      #    registry: registry.example.com
      #    username: user
      #    password: pass
    impala:
      catalog:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      coordinator:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      executor:
        replicas: 2
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      statestore:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
    hadoop: (3)
      core:
        fs.defaultFS: hdfs://adh
        hadoop.security.authentication: simple
      hdfs:
        dfs.client.failover.proxy.provider.adh: org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
        dfs.ha.namenodes.adh: nn_tsn-k8s-1,nn_tsn-k8s-3
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:8020
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:8020
        dfs.nameservices: adh
      hive:
        hive.metastore.sasl.enabled: "false"
        hive.metastore.uris: thrift://tsn-k8s-1.ru-central1.internal:9083
        metastore.use.SSL: "false"
      ozone:
        ozone.om.address.adh.om_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-2: tsn-k8s-2.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:9862
        ozone.om.nodes.adh: om_tsn-k8s-1,om_tsn-k8s-2,om_tsn-k8s-3
        ozone.om.service.ids: adhom
    
    ## Ranger plugin configuration.
    ## Uncomment and fill the lines below. cluster apply derives the rest.
    ranger: (4)
      security:
        ranger.plugin.impala.policy.rest.url: "<ranger-admin>:6080"
        ranger.plugin.impala.service.name: "impala_k8s"
        ranger.plugin.impala.use.rangerGroups: "True"
        ranger.plugin.impala.use.only.rangerGroups: "True"
    
      # fill xasecure.audit.destination.solr.zookeepers below with Zookeepers endpoints to resolve solr service, e.g. adps-adc.ru-central1.internal:2181/Arenadata.Hadoop-2.solr.server
      audit:
        xasecure.audit.destination.solr.zookeepers: "tsn-adps2-1.ru-central1.internal:2181/Arenadata.Hadoop-3.solr.server"
    
      # Local Ranger files consumed by the CLI during 'cluster apply'.
      # Paths are relative to the config file. The CLI reads these files
      # and writes them into the generated configs Secret.
      files: (5)
        jceksStorePath: /tmp/impala_minimal/ranger-impala.jceks
    1 Namespace that the Impala cluster will use.
    2 Settings for pulling the Impala cluster image.
    3 Hadoop settings that were taken from the previously created hadoop_conf.yaml.
    4 Ranger configuration.
    5 Additional JCEKS file if Ranger is used along with SSL.
  1. Prepare the hadoop_conf.yaml Hadoop configuration file:

    sites:
      core:
        fs.defaultFS: hdfs://adh
        hadoop.security.authentication: kerberos
      hdfs:
        dfs.client.failover.proxy.provider.adh: org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
        dfs.ha.namenodes.adh: nn_tsn-k8s-1,nn_tsn-k8s-3
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:8020
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:8020
        dfs.nameservices: adh
        dfs.namenode.kerberos.principal: nn/_HOST@RU.CENTRAL1.INTERNAL
        dfs.journalnode.kerberos.principal: jn/_HOST@RU.CENTRAL1.INTERNAL
        dfs.datanode.kerberos.principal: dn/_HOST@RU.CENTRAL1.INTERNAL
      ozone:
        ozone.om.address.adh.om_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-2: tsn-k8s-2.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:9862
        ozone.om.nodes.adh: om_tsn-k8s-1,om_tsn-k8s-1,om_tsn-k8s-1
        ozone.om.kerberos.principal: om/_HOST@RU-CENTRAL1.INTERNAL
        ozone.om.service.ids: adhom
      hive:
        hive.metastore.sasl.enabled: true
        hive.metastore.uris: thrift://tsn-k8s-1.ru-central1.internal:9083
        hive.metastore.kerberos.principal: hive/_HOST@RU-CENTRAL1.INTERNAL
        metastore.use.SSL: true
  2. Initialize the Impala cluster:

    $ ./adc cluster init --product impala --hadoop-file hadoop_conf.yaml --output cluster-impala-init.yaml

    This operation creates the cluster-impala-init.yaml file with a template configuration.

  3. Edit the configuration file to your needs:

    product: impala
    namespace: impala (1)
    image: (2)
      registry: hub.arenadata.io
      repository: adh-enterprise/impala-docker
      tag: 4.5.0_arenadata1-adh-4.2.0-x86_64
      pullPolicy: Always
    
      ## Image pull secret for a private registry.
      ## Either set 'name' to reference an existing Secret,
      ## or set 'credentials' and the CLI will create a dockerconfigjson Secret.
      #imagePullSecret:
      #  name: my-pull-secret
      #  credentials:
      #    registry: registry.example.com
      #    username: user
      #    password: pass
    impala:
      catalog:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      coordinator:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      executor:
        replicas: 2
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
      statestore:
        replicas: 1
    
        ## Component arguments. Key-value pairs passed to the component configuration.
        #args:
        #  redirect_stdout_stderr: "false"
    hadoop: (3)
      core:
        fs.defaultFS: hdfs://adh
        hadoop.security.authentication: kerberos
      hdfs:
        dfs.client.failover.proxy.provider.adh: org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
        dfs.ha.namenodes.adh: nn_tsn-k8s-1,nn_tsn-k8s-3
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:8020
        dfs.namenode.rpc-address.adh.nn_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:8020
        dfs.nameservices: adh
        dfs.namenode.kerberos.principal: nn/_HOST@RU-CENTRAL1.INTERNAL
        dfs.journalnode.kerberos.principal: jn/_HOST@RU-CENTRAL1.INTERNAL
        dfs.datanode.kerberos.principal: dn/_HOST@RU-CENTRAL1.INTERNAL
      ozone:
        ozone.om.address.adh.om_tsn-k8s-1: tsn-k8s-1.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-2: tsn-k8s-2.ru-central1.internal:9862
        ozone.om.address.adh.om_tsn-k8s-3: tsn-k8s-3.ru-central1.internal:9862
        ozone.om.nodes.adh: om_tsn-k8s-1,om_tsn-k8s-1,om_tsn-k8s-1
        ozone.om.kerberos.principal: om/_HOST@RU-CENTRAL1.INTERNAL
        ozone.om.service.ids: adhom
      hive:
        hive.metastore.sasl.enabled: "true"
        hive.metastore.uris: thrift://tsn-k8s-1.ru-central1.internal:9083
        metastore.truststore.password: bigdata
        metastore.truststore.path: /etc/ssl/truststore.jks
        hive.metastore.kerberos.principal: hive/_HOST@RU-CENTRAL1.INTERNAL
        metastore.use.SSL: "true"
    
    ## Kerberos configuration for authentication.
    ## Set 'keytab.create: true' to let the operator create the keytab Secret,
    ## or set 'keytab.secretName' to reference an existing keytab Secret.
    kerberos: (4)
      realm: RU-CENTRAL1.INTERNAL
      service: impala
      hostname: impala-jdbc.ru-central1.internal
      keytab:
        create: true
        secretName: kerberos-secret
        labelSelector:
          env: prod
        additionalPrincipals:
          - HTTP/impala-cloud.ru-central1.internal
          - impala/impala-cloud.ru-central1.internal
          - impala/impala-jdbc.ru-central1.internal
        rotation:
          interval: 24h
          checkInterval: 1h
    
    ## Java KeyStore/TrustStore certificate configuration.
    ## Either reference an existing Secret via secretName,
    ## or set files: to have the CLI create the Secret from local files.
    ssl: (5)
      secretName: ssl-secret
      trustStoreKey: truststore.jks
    
      # Local file paths consumed by the CLI during 'cluster apply'.
      # Paths are relative to the config file. The CLI reads these files
      # and creates or updates the Secret named by ssl.secretName.
      files:
        trustStorePath: /etc/ssl/truststore.jks
    
    # TLS certificate configuration.
    # Either reference an existing Secret via secretName,
    # or set files: to have the CLI create the Secret from local files.
    tls: (6)
      secretName: tls-secret
      certificateKey: tls.crt
      privateKey: tls.key
    
      # Local file paths consumed by the CLI during 'cluster apply'.
      # Paths are relative to the config file. The CLI reads these files
      # and creates or updates the Secret named by tls.secretName.
      files: (7)
        certificatePath: impala-jdbc.ru-central1.internal.crt
        privateKeyPath: impala-jdbc.ru-central1.internal.key
    1 Namespace that the Impala cluster will use.
    2 Settings for pulling the Impala cluster image.
    3 Hadoop settings that were taken from the previously created hadoop_conf.yaml.
    4 Kerberos settings.
    5 SSL settings.
    6 TLS settings.
    7 Additional files with a certificate and key for JDBC connection to Impala.
  1. Apply the configuration and deploy the Impala cluster:

    $ ./adc cluster apply impala --file cluster-impala-init.yaml

    The expected output contains a confirmation of success:

    time="20260518133858UTC" level="info" msg="cluster impala applied to namespace impala"
  2. Verify the Impala cluster pods:

    $ kubectl get pods -n impala

    The expected output is:

    NAME                   READY   STATUS    RESTARTS   AGE
    impala-catalog-0       1/1     Running   0          70m
    impala-coordinator-0   1/1     Running   0          70m
    impala-executor-0      1/1     Running   0          70m
    impala-statestore-0    1/1     Running   0          70m

Step 3. Allow JDBC connections to Impala

For external JDBC access to Impala, you need to expose the service using one of the supported publication methods, for example, through a load balancer or Ingress controller.

All configurations related to exposing a service, including DNS, annotations, Ingress settings, load balancing rules, and other platform-specific settings, should be specified according to your Kubernetes environment.

  1. Get the external IP address of your Ingress controller or load balancer. For example:

    impala-lb                    LoadBalancer   10.96.231.158   10.92.42.144   21050:32154/TCP,26000:30753/TCP,24000:32645/TCP   25h
  2. Add the following entry to your /etc/hosts file:

    <lb_ip> impala-jdbc.ru-central1.internal

    where <lb_ip> is the external IP exposed by your load balancer. In this example, it is 10.92.42.144.

  3. Connect to the Impala cluster over JDBC, for example, using DBeaver. For this, the JDBC connection string looks as follows:

    jdbc:impala://impala-jdbc.ru-central1.internal:21050/default

    For Kerberos and SSL, append the string with the following parameters:

    AuthMech=1;KrbServiceName=impala;KrbHostFQDN=impala-jdbc.ru-central1.internal;SSL=1;SSLTrustStore=<path>truststore.jks;SSLTrustStorePwd=<password>;httpPath=cliservice

    where:

    • KrbServiceName=impala — Kerberos service name.

    • KrbHostFQDN=impala-jdbc.ru-central1.internal — Kerberos host name specified in the Impala cluster values file (impala_cluster_values.yaml).

    • SSLTrustStore=<path>/truststore.jks — path to the truststore with certificates used by DBeaver.

    • SSLTrustStorePwd=<password> — password for accessing the truststore.

  4. Once connected, verify the Impala cluster operability:

    SHOW DATABASES;

    The expected output:

    name            |comment                                     |
    ----------------+--------------------------------------------+
    _impala_builtins|System database for Impala builtin functions|
    default         |Default Hive database                       |

Step 4. Provide access to Impala web UI

To access Impala web interface, you need to expose the service using one of the supported publication methods, for example, through a load balancer or Ingress controller. All configurations related to exposing a service, including DNS, annotations, Ingress settings, load balancing rules, and other platform-specific settings, should be specified according to your Kubernetes environment.

  1. Get the external IP address of your load balancer or Ingress controller. For example:

    NAME             CLASS   HOSTS                               ADDRESS       PORTS   AGE
    impala-ingress   nginx   impala-cloud.ru-central1.internal   10.92.41.95   80      8m45s
  2. Add the following entry to your /etc/hosts file:

    <ingress_ip> impala-cloud.ru-central1.internal

    where <ingress_ip> is the external IP exposed by Ingress. In this example, it is 10.92.41.95.

  3. Open Impala web UI in your browser, using the URL: http://impala-cloud.ru-central1.internal (change the protocol to https if you use Kerberos and SSL).

    Impala web UI
    Impala web UI
    Impala web UI
    Impala web UI

Delete instances

IMPORTANT

Delete the operator only after all the resources it manages have been deleted.

To delete the Impala cluster, run the command below:

$ ./adc cluster destroy impala --f cluster-impala-init.yaml

To delete the Impala operator, run the command below:

$ ./adc operators destroy impala-operator -f operator-impala-init.yaml

To delete the Kerberos operator, run the command below:

$ ./adc operators destroy kerberos-operator -f operator-kerberos-init.yaml
Found a mistake? Seleсt text and press Ctrl+Enter to report it