Knox architecture

Features

Knox is a gateway service that provides a single point of authentication and access for the Hadoop cluster services.

With Knox, you can achieve the following results:

  • Perimeter security for the Hadoop REST APIs:

    • Authentication and token provision (SSO).

    • Ability to integrate the authentication with enterprise and cloud management systems (LDAP).

    • SSL provision for services that do not have that capability.

    • Centralized control via the use of a single gateway that enables auditing and authorization (with Ranger).

  • Single URL exposure to aggregate REST APIs of a Hadoop cluster:

    • Control the number of endpoints.

    • Hide the internal Hadoop topology from potential attackers.

  • Simplified access due to the encapsulation of services with Kerberos or the use of a single SSL certificate.

Knox placement
Knox placement
Knox placement
Knox placement

Architecture

The gateway is a layer on top of a Jetty JEE server. It has two main extensibility mechanisms: a service and a provider. The service extensibility framework allows you to add support for new HTTP endpoints, while the provider allows adding new features that can be used across various services.

There are two different phases that Knox works in: deployment and runtime. During the deployment, topologies are converted into implementation details that are based on JEE WebArchive (WAR). After that, the runtime phase is responsible for processing the incoming requests using a set of filters that is configured in WAR. More detailed diagrams that go deep into implementation (e.g. differences between the service and provider deployment) can be found in the Knox documentation.

Deployment

The deployment phase exists to convert understandable topology descriptors into optimized executable runtime artifacts. This process can be thought of as compilation of a descriptor into a JEE WAR that gets deployed to an embedded JEE container.

This framework is rather generic, but it has one interesting component called a contributor — it’s an entity that generates WAR artifacts from topologies. According to its design pattern, each topology is parsed, and an appropriate contributor is selected for each construct within the topology file. After that, the contributor receives the construct and updates the WAR artifacts. The workflow during the deployment phase is presented below:

  1. A topology is loaded from the conf/topologies directory into an internal structure.

  2. A gateway server calls a deployment factory to create a WAR structure.

  3. A deployment factory creates a basic WAR structure.

  4. Each construct (provider and service) in a topology is visited and the appropriate contributor is invoked. That contributor modifies the WAR structure based on the information passed from a topology file.

  5. A complete WAR is returned to the gateway.

  6. A WAR is dynamically deployed with an internal container API.

Knox deployment phase
Knox deployment phase
Knox deployment phase
Knox deployment phase

Runtime

The runtime phase is simpler than the deployment phase as it follows well-known JEE models. Below is a high-level diagram that shows how the requests are processed during the runtime. The runtime workflow is described below:

  1. A client makes a request for some service. It is received by an embedded JEE container.

  2. A filter chain is searched for in a map of URLs and filter chains.

  3. A filter chain (which is a filter itself) is invoked.

  4. Each filter invokes the filters that follow it in the chain.

  5. The last filter in the chain is invoked. Typically this is a special dispatch filter that is responsible for dispatching the request to the ultimate endpoint. Dispatch filters are also responsible for reading the response.

  6. A response is received.

  7. A response is streamed through various response wrappers added by the applied filters. Such wrappers can edit the response based on their configuration.

  8. A response is pulled through the filter response wrapper by the container.

  9. A response is returned to the client.

Knox runtime phase
Knox runtime phase
Knox runtime phase
Knox runtime phase

Knox in ADPS

While Kerberos is used for authentication within the Hadoop system, Knox is there to extend the system APIs to new users without having to deal with Kerberos. You can find out the Knox’s placement in the ADPS structure in the ADPS overview article.

As Knox can be extended with an LDAP authentication and Ranger authorization, it is possible to create a workflow as demonstrated on the image below. The Ranger part there demonstrates the Knox Ranger plugin.

Knox workflow
Knox workflow
Knox workflow
Knox workflow
Found a mistake? Seleсt text and press Ctrl+Enter to report it