The Data Integration service of DataWorks supports third-party identity authentication mechanisms. Before you use an authentication mechanism to perform identity authentication, you must upload the required authentication files on the Authentication File Management page of the DataWorks console. Then, you must enable third-party authentication when you add a data source. This way, only trusted applications and services can access the data source. This topic describes how to upload and reference an authentication file.
Background information
Third-party identity authentication mechanisms are used to perform strict identity authentication on users and services. These mechanisms prevent untrusted applications or services from accessing data and improve the security of data access during data synchronization. On the Authentication File Management page of the DataWorks console, you can manage authentication files in a centralized manner. You can upload an authentication file and view the data sources that reference the authentication file on this page.
Limits
DataWorks supports only Kerberos authentication. Other authentication mechanisms will be available in the future. For information about how to configure Kerberos authentication, see the Appendix: Configure Kerberos authentication section in this topic.
Precautions
In most cases, a certificate has a validity period. If the certificate that you uploaded expires, the synchronization task that uses the certificate fails because the synchronization task cannot be authorized to access the related data source. You must pay attention to the validity period of the certificate that you uploaded and upload a valid certificate before the certificate expires.
Upload an authentication file
Before you use an identity authentication mechanism, you must upload the required authentication files on the Authentication File Management page of the DataWorks console.
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane of the Data Integration page, choose .
On the Authentication File Management page, click Upload Authentication File in the upper-right corner.
In the Upload Authentication File dialog box, click Upload File, select a file that you want to upload from your on-premises machine, configure File Description, and then click OK.
Reference an authentication file
If you want to use third-party identity authentication, you must enable special identity authentication, configure the related parameters, and then reference the uploaded authentication files when you add a data source. DataWorks supports only Kerberos authentication. For information about how to configure Kerberos authentication, see the Appendix: Configure Kerberos authentication section in this topic.
The following table describes the parameters you must configure after you set the Special Authentication Method parameter to Kerberos Authentication when you add an HDFS data source. For information about how to add a data source, see Add a data source.
Parameter | Description |
Special Authentication Method | Set this parameter to Kerberos Authentication. |
Keytab File | Select an uploaded keytab file from the Keytab File drop-down list. If you want to upload a new keytab file, click Add Authentication File. |
CONF File | Select a CONF file from the CONF File drop-down list. If you want to upload a new CONF file, click Add Authentication File. |
principal | The Kerberos principal, which consists of the principal name, instance name, and domain name. Configure this parameter in the format of Principal name/Instance name@Domain name. Example: ****/hadoopclient@**.***. |
Other operations
You can upload a new authentication file and view the data sources that reference an uploaded authentication file on the Authentication File Management page. You can also delete multiple authentication files at a time on this page.
Appendix: Configure Kerberos authentication
The Data Integration service of DataWorks supports only Kerberos authentication. After you configure Kerberos authentication, only trusted applications and services can pass identity authentication. This way, only the applications and services that pass the authentication can access data.
Kerberos is a computer network security protocol for authentication. It supports single sign-on (SSO). You need to provide your credentials only once to obtain a Ticket Granting Ticket (TGT). Then, you can use the TGT to access multiple services. Kerberos provides high security. When you use Kerberos, a shared key is created between a trusted client and the server. Clients communicate with servers by using keys. This way, untrusted services or applications cannot access data.
Limits
For CDH clusters, only CDH 6.x clusters support Kerberos authentication. CDH clusters of other versions or self-managed CDH clusters for which Kerberos authentication tests are not performed may fail the authentication.
You can enable Kerberos authentication only for HBase, HDFS, and Hive data sources. Other types of data sources will also support Kerberos authentication in the future.
Only the data sources that are connected to exclusive resource groups for Data Integration support Kerberos authentication.
How Kerberos authentication works
Kerberos is a third-party authentication protocol that is based on symmetric keys. Clients and servers use Key Distribution Center (KDC) to perform identity authentication. KDC is a server program of Kerberos and can distribute TGTs. For more information about Kerberos, see Introduction to Kerberos.
The preceding figure shows the four stages of the Kerberos authentication on DataWorks.
A client requests a TGT: When a client (principal) accesses a data source for which Kerberos authentication is enabled, the client requests a TGT from KDC.
KDC grants a TGT: After KDC receives a request from the client, KDC authenticates the identity of the client. If the client passes the authentication, KDC grants an encrypted TGT that has a specific validity period to the client.
The client requests to access a specific service: After the client obtains the TGT, the client requests to access specific service resources from the server based on the service name.
The server authenticates the identity of the client: After the server receives the request from the client, the server authenticates the identity of the client. If the client passes the authentication, the client can access the service resources.
A keytab authentication file and a krb5.conf file are required for Kerberos authentication. The krb5.conf file is used to store the configurations of the KDC server. The keytab file is used to store the identity authentication tickets of resource principals, including principals and encrypted principal keys. Before you perform Kerberos authentication, you must upload the keytab authentication file and keb5.conf file on the Authentication File Management page of the DataWorks console, and reference the uploaded files and configure a principal when you add a data source. For information about how to upload the required authentication files, see the Manage third-party authentication files section in this topic.
Data source types that support Kerberos authentication
The following table lists the data source types that support Kerberos authentication and the configuration guide of Kerberos authentication for these types of data sources.
Data source type | References |
HBase | |
HDFS | |
Hive |