When Kudu was first introduced as a part of CDH in 2017, it didn’t support any kind of authorization so only air-gapped and non-secure use cases were satisfied. Coarse-grained authorization was added along with authentication in CDH 5.11 (Kudu 1.3.0) which made it possible to restrict access only to Apache Impala where Apache Sentry policies could be applied, enabling a lot more use cases. Direct integration of Sentry in CDH 6.3 finally made it possible for customers to access Kudu using the same privileges using any query method. Finally, in CDP Private Cloud Base 7.1.5 and 7.2.6, Kudu is fully integrated with Ranger. In this post, we’ll cover how this works and how to set it up.
How it works
Ranger consists of an Admin server that has a web UI and a REST API with which admins can create policies. The policies are stored in a database and are periodically fetched and cached by a Ranger plugin, which runs on the Kudu Masters.
The Ranger plugin is responsible for authorizing the requests against the cached policies. The Ranger plugin base is available only in Java, as most Hadoop ecosystem projects, including Ranger, are written in Java. Unlike Sentry’s thin client which we reimplemented in C++, the Ranger plugin itself handles the evaluation of the policies (which are much richer and more complex than Sentry policies) locally, so we decided not to reimplement it in C++. Each Kudu Master spawns a JVM child process that is effectively a wrapper around the Ranger plugin and communicates with it via named pipes.
Setting up Kudu with Ranger
Setting up Ranger authorization for Kudu in Cloudera Manager is very simple; if both Ranger and Kudu are installed in CDP, the Ranger service needs to be selected in Kudu’s configuration.
Cloudera Manager will automatically configure Ranger for Kudu and vice versa. The rest of the Ranger-specific options don’t need to be changed. Under the hood, several customizable configuration files (ranger-kudu-security.xml, ranger-kudu-policymgr-ssl.xml, and ranger-kudu-audit.xml) are created for Kudu.
After setting up the integration it’s time to create some policies, as now only trusted users are allowed to perform any action; everyone else is locked out. Resource-based access control (RBAC) policies can be set up for Kudu in Ranger, but Kudu currently doesn’t support tag-based policies, row-level filtering or column masking.
To create your first policy, log in to Ranger Admin (this can be accessed from Cloudera Manager by navigating to the Ranger service, then clicking on “Ranger Admin Web UI” in the service’s tab navigation panel), click on the “cm_kudu” service, then on the “Add New Policy” button in the top right corner. You’ll need to name the policy and set the resource it will apply to. Kudu doesn’t support databases, but with Ranger integration enabled, it will treat the part of the table name before the first period as the database name, or default to “default” if the table name doesn’t contain a period (this is configurable with the -ranger_default_database flag in “Master Advanced Configuration Snippet (Safety Valve) for gflagfile”).
There is no implicit hierarchy in the resources, which means that granting privileges on db=foo won’t imply privileges on foo.bar. To create a policy that applies to all tables and all columns in the foo database you need to create a policy for db=foo->tbl=*->col=*. To create tables starting with foo, granting “create” privileges on db=foo is sufficient and table-level privileges are not required. Metadata should still be granted on db=foo->tbl=* as it is required to check if the newly created table exists, which is the last step of table creation.
For a list of the required privileges to perform operations please refer to our documentation.
Once the policies are set up in Ranger, Kudu will apply them when authorizing actions using any clients. Impala, however, works a bit differently.
Accessing Kudu tables in Impala
Impala is not just a Kudu client, it’s an analytic database that supports multiple storage systems, including, but not limited to, Kudu. It also authorizes all actions using Ranger before performing them. Due to this, a second Kudu-specific authorization step is not necessary, and the “impala” user is whitelisted in Kudu, bypassing the Kudu authorization when Impala tries to perform an action.
It is important to note that Kudu stores all its metadata internally, while Impala depends on Hive Metastore. As such, Impala authorizes requests against the policies in the Hadoop SQL repository, including requests on Kudu-backed tables.
Let’s take a common use case as an example: several Apache Spark ETL jobs store data in Kudu. This data is then accessed by other Spark jobs and also by data analysts through ad-hoc Impala queries. In this case a Kudu policy can be set up to allow the ETL user to read and write data to all tables, and separate Hadoop SQL policies are defined for different groups of people to read specific databases or tables via Impala.
Ranger supports granting privileges to the table owners via a special OWNER user. You can, for example, grant the ALL privilege and delegate admin (this is required to change the owner of a table) to OWNER on db=*->table=*->column=*. This way your users will be able to perform any actions on the tables they created without having to explicitly assign privileges per table. They will, of course, need to be granted the CREATE privilege on db=* or on a specific database to actually be able to create their own tables. Tables are automatically owned by the users creating the table, though it’s possible to change the owner as a part of an alter table operation.
Security is a very important part of a data platform and we at Cloudera know that. We’re continuously working on how to make CDP more secure and simpler to manage while maintaining security. Fine-grained authorization in Kudu using Ranger is the latest step in this endeavor and there is more to come in the near future, so keep an eye out for future posts where we share more information on what’s coming next.
The post Fine-Grained Authorization with Apache Kudu and Apache Ranger appeared first on Cloudera Blog.