Policy model in Ranger
Ranger is a convenient framework that manages accesses and permissions for Hadoop and other systems. The Ranger’s effectiveness and usability comes from its most prominent feature — policies.
Basic concepts
A Ranger policy consists of the following blocks:
-
Resources. In the context of Ranger, a resource is an entity that requires authorization to be accessed (e.g. a file, database, table, column, etc.). For example, Solr has only one resource — a collection. Ranger allows you to use wildcards, macros, and variables in the resource names, which makes it possible to use a smaller number of policies for a large number of resources.
-
Permissions. A permission represents an action that can be performed on a resource. For example, reading a file, querying a table, etc.
NOTEDo not confuse these permissions with the Ranger module permissions. -
Permission holders. There are three types of permission holders:
-
User. An individual identity that can be a part of a group. For example,
hdfs
is a user that was created for the HDFS service. -
Group. A set of users that share a common trait. For example, the
hadoop
group combines several users that were created for the Hadoop services. -
Role. An authorization level that can be assigned to a user. There are three internal system roles in Ranger Admin:
User
(basic permissions),Admin
(privileges for all the Ranger modules expect for KMS), andAuditor
(privileges for all the Ranger modules). You can create a custom role that would combine users, groups, and even other roles.
Users and groups are typically obtained by UserSync from LDAP/AD/OS. Within a policy, you can set up allow/deny permissions to a resource for several holders at once.
-
-
Access conditions. While authorization policies grant access to certain resources, they can also be set up deny access to users/groups/roles on a resource, exclude certain users from the denied/allowed accesses, or deny all accesses to a specific resource other than the ones allowed in the policy. For example, the policy below allows the
hbase
user to read, write, and execute in the /hbase/archive directory.Access conditions for an HDFS policyAccess conditions for an HDFS policy -
Policy validity schedule. You can make Ranger policies effective only for a specific duration. You might want to use this feature to issue a delayed access at a certain time or to issue a temporary access for some users/groups/roles.
-
Security zone. Ranger’s security zones allow you to separate resource policies into various zones for convenience. Such separation simplifies administration of policies and also lowers the amount of policies that need to be checked during authorization, since only policies under a particular zone that contains the requested resource are loaded and checked. Also, it allows administrators to set up different policies based on the zones they have admin rights for.
-
Delegated admin. Additionally to Ranger administrators, certain users/groups/roles can also manage the authorization policies for a certain resource. To achieve that, these permission holders need to be marked as delegated admins when creating a policy for the said resource. For example, the
hdfs
user is a delegated admin for all the HDFS service resources, as shown in the policy below, while thezeppelin
andimpala
just have the read, write, and execute rights.An HDFS policy with a delegated admin settingAn HDFS policy with a delegated admin setting
Data manipulation
Data masking
Ranger allows you to mask certain information for some resources. For example, you can mask a column in Hive and instead of a person’s age users would see their age range (e.g. 30-40
instead of 35
). This is done by issuing data masking policies, where you can add users who will see raw information and those who will see the masked version.
Row filter
While data masking would transform the information, row filter completely removes the information that the user doesn’t have access to. This can be used to hide sensitive information or to limit the information by some parameter, e.g. department.
NOTE
Data masking and row filter policies do not grant access to resources. If a user is listed in such a policy but doesn’t have access to a referenced resource, the policy will not be applied to them.
|
Access control
The access control mechanism strictly defines entities that can access certain data. In Ranger, such mechanism is implemented through access conditions that are set during policy creation.
The type of access control that Ranger helps you achieve is known as attribute-based access control (ABAC).
It is a paradigm, according to which a subject’s ability to perform a set of operations is determined by evaluating attributes associated with the subject (user, group), object (resource), requested operations (create, read, etc.), and sometimes environment attributes (time, location, device, etc.).
ABAC makes it possible to design authorization policies without prior knowledge of specific resources or users, which helps avoid the need for new policies as new resources or users are introduced.
For example, you can allow each user to work with resources they own by using the {OWNER}
macro in the Allow conditions section.
There are several subtypes of ABAC that are listed below.
Resource-based access control
The idea of the resource-based control centralizes a single resources and attempts to create a comprehensive access rule around it, allowing it to certain permission holders, while denying it to others. In order to implement the resource-based access control, you need to set the allow/deny conditions for resources (e.g. permission to select a column in some table in a Hive database). This type of access control can get out of hand rather quickly as more and more policies will need to be created as your system grows, but for smaller systems it can be rather clean. See an example of a resource-based policy above.
Tag-based access control
A tag is a special label assigned to a resource, so that special policies would be applied to it. This enables admins to separate some resources based on responsibility (private health information, identity information, credit card information, etc.). To safely implement the resource access, it’s worth designing proper data classification and creating tools that scan for sensitive information.
A great benefit of creating authorization policies for classifications instead of resources is that such policies will be applied as classifications are added, removed, or updated. Also, you can create just one tag-based policy to authorize access to a resource for several services like Hive, HBase, Kafka. This feature helps you to reduce the complexity of the authorization policies management.
To achieve the best scalability, it is advised to use the tag-based approach with role-based authorization where possible.