This topic describes how to use Apache Sqoop on an E-MapReduce (EMR) cluster to implement read and write access to data stored in OSS-HDFS.
Prerequisites
A cluster of EMR V3.42.0 or later, or EMR V5.8.0 or later is created. For more information, see Create a cluster.
OSS-HDFS is enabled for a bucket and permissions are granted to access OSS-HDFS. For more information about how to enable OSS-HDFS, see Enable OSS-HDFS and grant access permissions.
Procedure
- Connect to the EMR cluster.
- Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
- Click the EMR cluster that you created.
- Click the Nodes tab, and then click the plus icon () on the left side of the node group.
- Click the ID of the ECS instance. On the Instances page, click Connect next to the instance ID.
For more information about how to log on to a cluster in Windows or Linux by using an SSH key pair or SSH password, see Log on to a cluster. - Import data from OSS-HDFS to MySQL.
sudo sqoop import --connect <dburi>/<dbname> --username <username> --password <password> --table <tablename> --target-dir <oss-dir> --temporary-rootdir <oss-tmpdir> --check-column <col> --incremental <mode> --last-value <value> -as <format> -m <count>
- The following table describes the parameters in the preceding command.
Parameter Required Description dburi Yes The URI that is used to access the database. Example: jdbc:mysql://192.168.xxx.xxx:3306/
.dbname Yes The name of the database. username Yes The username that you want to use to log on to the database. password Yes The password of the user. tablename Yes The name of the MySQL table. oss-dir Yes The OSS-HDFS directory from which you want to read data or to which you want to write data. Example: oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/dir/
.oss-tmpdir No The directory to which data is temporarily written. You must specify this parameter if the mode parameter is set to append. If you set the mode parameter to append, Apache Sqoop imports data to a temporary directory, and then renames and stores the files in the destination directory. If the destination directory exists in OSS-HDFS, Apache Sqoop does not import data to the directory or overwrite data in the directory.
col No The check column that you want to use to determine the rows to be imported in incremental import scenarios. mode No The incremental import mode. Valid values: append and lastmodified. - append: Data is imported in increments based on incremental columns.
- lastmodified: Data is imported in increments based on the time column.
value No The maximum value of the check column in the previous incremental import. format No The format in which you want to store an object. Valid values: avrodatafile, sequencefile, textfile (default), and parquetfile. count No The number of MapReduce tasks. - Examples:
The following sample code provides an example on how to import data in a specified directory of the examplebucket to the src_kv table in MySQL:
sudo sqoop import --connect jdbc:mysql://master-1-1/sqoop_test --username root --password password1 --table src_kv -m 1 --target-dir oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/tmp/sqoop_kv --as-parquetfile
- The following table describes the parameters in the preceding command.