HBase is a real-time database that provides high write performance in the Hadoop ecosystem. OSS-HDFS is a storage service released by Alibaba Cloud and is compatible with the HDFS API. JindoSDK allows HBase to use OSS-HDFS as the underlying storage and supports the storage of write-ahead logging (WAL) files. This way, computing and storage resources are separated. OSS-HDFS is more flexible than the local HDFS storage and reduces O&M costs.
Prerequisites
An Elastic Compute Service (ECS) instance is created. For more information, see Create an instance
A Hadoop environment is created. For more information about how to install Hadoop, see Step 2: Create a Hadoop runtime environment.
Apache HBase is deployed. For more information, see Apache HBase.
OSS-HDFS is enabled for a bucket and permissions are granted to access OSS-HDFS. For more information, see Enable OSS-HDFS and grant access permissions.
Procedure
Connect to the ECS instance. For more information, see Connect to an instance.
Configure JindoSDK.
Download the latest version of the JindoSDK JAR package. To download JindoSDK, visit GitHub.
Optional. If Kerberos-related and SASL-related dependencies are not included in your environment, install the following dependencies on all nodes on which JindoSDK is deployed.
Ubuntu or Debian
sudo apt-get install libkrb5-dev krb5-admin-server krb5-kdc krb5-user libsasl2-dev libsasl2-modules libsasl2-modules-gssapi-mit
Red Hat Enterprise Linux or CentOS
sudo yum install krb5-server krb5-workstation cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain
macOS
brew install krb5
Decompress the downloaded installation package.
The following sample code provides an example on how to decompress a package named
jindosdk-x.x.x-linux.tar.gz
. If you use another version of JindoSDK, replace the package name with the name of the corresponding JAR package.tar -zxvf jindosdk-x.x.x-linux.tar.gz -C /usr/lib
Notex.x.x indicates the version number of the JindoSDK JAR package.
Configure
JINDOSDK_HOME
.export JINDOSDK_HOME=/usr/lib/jindosdk-x.x.x-linux export PATH=$JINDOSDK_HOME/bin:$PATH
Configure
HADOOP_CLASSPATH
.export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${JINDOSDK_HOME}/lib/*
ImportantSpecify the installation directory of the package and configure environment variables on all required nodes.
Install the downloaded JindoSDK JAR package to the path specified by classpath.
cp jindosdk-x.x.x-linux/lib/jindo-core-x.x.x.jar <HADOOP_HOME>/share/hadoop/hdfs/lib/ cp jindosdk-x.x.x-linux/lib/jindo-sdk-x.x.x.jar <HADOOP_HOME>/share/hadoop/hdfs/lib/
Configure the implementation class of OSS-HDFS and specify the AccessKey pair that you want to use to access the bucket.
Configure the implementation class of OSS-HDFS in the core-site.xml file of HBase.
<configuration> <property> <name>fs.AbstractFileSystem.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOSS</value> </property> <property> <name>fs.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value> </property> </configuration>
In the core-site.xml file of HBase, specify the AccessKey ID and AccessKey secret that you want to use to access the bucket for which OSS-HDFS is enabled.
<configuration> <property> <name>fs.oss.accessKeyId</name> <value>LTAI********</value> </property> <property> <name>fs.oss.accessKeySecret</name> <value>KZo1********</value> </property> </configuration>
Configure the endpoint of OSS-HDFS.
You must specify the endpoint of OSS-HDFS if you want to use OSS-HDFS to access buckets in Object Storage Service (OSS). We recommend that you configure the path that is used to access OSS-HDFS in the
oss://<Bucket>.<Endpoint>/<Object>
format (example:oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampleobject.txt
). After you configure the access path, JindoSDK calls the corresponding OSS-HDFS operation based on the specified endpoint in the access path.You can also configure the endpoint of OSS-HDFS by using other methods. The endpoints that are configured by using different methods have different priorities. For more information, see Appendix 1: Other methods used to configure the endpoint of OSS-HDFS.
Specify a storage path for HBase.
To specify a path in OSS as the storage path for data and WAL files in HBase, you must change the value of the hbase.rootdir parameter in the hbase-site configuration file to the OSS path. The path is in the
oss://bucket.endpoint/hbase-root-dir
format.ImportantTo release clusters, you must disable tables and make sure that all update operations performed on WAL files are synchronized to the HFiles.