Kerberos

Kerberos overview

The strict authentication and verification of user identity is necessary for the secure access in Hadoop. All users should be unambiguously identified within the whole Hadoop cluster. The users who were successfully identified gain access to the resources (for example, files or directories) and can interact with the cluster (for example, perform MapReduce tasks).

If a user was granted access to the cluster under some name, then all cluster components also trust that name. This process is facilitated by utilizing the Kerberos database that contains cluster access credentials for all users. Hadoop cluster resources (such as hosts or services) must also undergo mutual authentication. This is necessary to prevent various malicious systems pretending as cluster components from accessing the cluster.

Kerberos concepts
Concept Description

Key Distribution Center, KDC

The reliable source for authentication in a system that supports Kerberos

Kerberos KDC Server

Machine or server that serves as KDC

Kerberos Client

Any machine of the cluster that authenticates via KDC

Kerberos Principal

Unique name of a user or service that authenticates via KDC. This name consists of the basic user name and the full host domain name the server works at

Key

Unique key that is assigned to every principal. This key is sent to the authentication system

Keytab

File that contains one or several principals along with their keys. This file is stored in the Kerberos database where it’s being accessed by server principal. The file is then stored in the protected directory at the server component node

Realm

Kerberos network that includes KDC and one or several clients

KDC Admin Account

Administrator account that is used by ADCM for creating principals and generating KDC keys

Ticket Granting Server (TGS)

Server that grants tickets and permissions

Authentication Server (AS)

Server that performs the initial authentication check and issues a Ticket Granting Ticket (TGT)

Ticket Granting Ticket

Ticket for granting a ticket, also referred to as TGT. TGT includes the following objects: second copy of the session key, user name, end of the ticket life period. The ticket is encrypted with the use of KDC own master key that is known only to KDC. This means that TGT can be decrypted only by KDC itself

Authentication process

Hadoop utilizes Kerberos for the strict authentication and to verify identities for users and services. The Kerberos server is called Key Distribution Center (KDC). It consists of the three following components:

  • Users and services MIT KDC database — includes principals known to the server and the corresponding Kerberos passwords.

  • Authentication Server (AS) — performs the initial identity checking task and issues the Ticket Granting Ticket (TGT).

  • Ticket Granting Server (TGS) — issues a Kerberos token based on the initial TGT.

A simplified Kerberos flow is presented below:

  1. The principal requests authentication from AS.

  2. The AS replies by sending a TGT that is encrypted with the use of the Kerberos principal password. This password is known only to the principal and AS.

  3. The principal decrypts the received TGT locally via the Kerberos password. From this moment on, every principal can use TGT for obtaining the tickets from TGS. Those tickets allow the principal to access various services.

Due to cluster resources (services or hosts) not being able to provide a password for the decryption of TGT every time, a keytab file is used. This file contains credentials for resource authentication. The set of hosts, users, and services controlled by the Kerberos server is called the realm.

Example

Below is a simple example of the authentication process via Kerberos. This example includes the following components:

  • EXAMPLE.COM — a Kerberos realm.

  • Alice — a user that is assigned the alice@EXAMPLE.COM user principal name (UPN).

  • myservice — a service located at the server1.example.com node and is assigned the myservice/server1.example.com@EXAMPLE.COM service principal name (SPN).

  • kdc.example.com — a key distribution center (KDC) for the EXAMPLE.COM realm.

In order to get access to the myservice service, Alice must provide a valid ticket for the myservice. This is implemented via the following steps (some details are omitted for brevity):

  1. Alice must obtain a ticket for granting a ticket (TGT). In order to do this, Alice initiates a request to the Authentication Server (AS) located at kdc.example.com and identifies herself as alice@EXAMPLE.COM.

  2. The AS responds to Alice and grants her a ticket for granting a ticket (TGT) encrypted with the use of the alice@EXAMPLE.COM principal password.

  3. After receiving the encrypted message Alice enters the alice@EXAMPLE.COM principal password to decrypt the message.

  4. After the message with TGT is successfully decrypted, Alice requests a service ticket to access the myservice/server1.example.com@EXAMPLE.COM from the ticket granting service (TGS) located at kdc.example.com. The request from Alice contains the TGT and the response contains the service ticket.

  5. Ticket granting service (TGS) checks the received TGT and grants Alice the service ticket encrypted with the use of the myservice/server1.example.com@EXAMPLE.COM principal key.

  6. Alice sends the service ticket to the myservice service. The myservice service is able to decrypt the service ticket with the use of the myservice/server1.example.com@EXAMPLE.COM principal key. The myservice service is also able to check the decrypted service ticket.

  7. Alice gets permissions to use myservice after successful authentication.

The above process is shown in the image below.

Kerberos 1 dark
Kerberos authentication flow
Kerberos 1 light
Kerberos authentication flow

KDC types

Cluster kerberization is possible via the following KDC types:

  • MIT Kerberos that consists of the principal database and Kerberos keys storage.

  • MS Active Directory that consists of the principal database and Windows Server keys storage.

  • FreeIPA which is a free open source identity management system for Linux/UNIX environments.

Using either type, the identification and authentication process is practically the same. The only difference is the KDC type that is being used.

Found a mistake? Seleсt text and press Ctrl+Enter to report it