FAQ - E-MapReduce - Alibaba Cloud Documentation Center

This topic provides answers to some frequently asked questions about Alluxio.

What do I do if the error message "No FileSystem for scheme: alluxio" appears?
What do I do if Alluxio does not work as expected?
How do I accelerate access to data in OSS by caching data?
How can I modify Alluxio-related parameters?

What do I do if the error message "No FileSystem for scheme: alluxio" appears?

This issue does not occur if you select Alluxio from the optional services when you create an E-MapReduce (EMR) cluster. This issue occurs because you add the Alluxio service after you create an EMR cluster, but you do not restart the service.

After you add the Alluxio service, you must restart the Alluxio service to load Hadoop configurations. For more information about how to restart a service, see Restart a service.

What do I do if Alluxio does not work as expected?

Identify the node on which the Alluxio service is abnormal based on the error message and view the Alluxio service log for troubleshooting.

In most cases, the Alluxio service log is stored in the /mnt/disk1/log/alluxio/ directory.

How do I accelerate access to data in OSS by caching data?

By default, HDFS serves as the Under File Storage (UFS) of Alluxio in EMR. We recommend that you use an Object Storage Service (OSS) directory as a mount point.

You can run the following command to accelerate access to data in OSS by caching data:

alluxio fs mount --option fs.oss.accessKeyId=<OSS_ACCESS_KEY_ID> \
  --option fs.oss.accessKeySecret=<OSS_ACCESS_KEY_SECRET> \
  --option fs.oss.endpoint=<OSS_ENDPOINT> \
  /oss_dir <path>/

Parameter description:

<OSS_ACCESS_KEY_ID>: the AccessKey ID of your Alibaba Cloud account that is used to access OSS.
<OSS_ACCESS_KEY_SECRET>: the AccessKey secret of your Alibaba Cloud account that is used to access OSS.
<OSS_ENDPOINT>: the endpoint of OSS, which is in the format of oss-xxxx-internal.aliyuncs.com. You can view the endpoint of OSS in the OSS console. You must create an EMR cluster in the same region as OSS. We recommend that you use an internal endpoint of OSS, such as oss-cn-shanghai-internal.aliyuncs.com.
<path>: the file storage path in OSS, such as oss://<OSS_YOURBUCKETNAME>/<OSS_DIRECTORY>. <OSS_YOURBUCKETNAME> is the name of your OSS bucket.

How can I modify Alluxio-related parameters?

You can use one of the following methods to modify Alluxio-related parameters:

Global parameter configuration
Go to the Alluxio service page in the EMR console and modify parameters. For more information, see Manage parameters for services.
Dynamic parameter configuration
- Alluxio shell
  Log on to your cluster and add a command in the Dproperty=value format to add custom configurations. Sample command:
```
alluxio fs copyFromLocal hello.txt /dir/tmp -Dalluxio.user.file.writetype.default=CACHE_THROUGH
```
  Note hello.txt is your local file. /dir/tmp is the directory in Alluxio. For more information about the copyFromLocal command, see Common commands.
- Spark jobs
  You can add a command in the -Dproperty=value format to spark.executor.extraJavaOptions of a Spark executor and to spark.driver.extraJavaOptions of Spark drivers to pass JVM parameters to Spark jobs.
  For example, when you submit a Spark job, set the file write type to CACHE_THROUGH. Sample code snippet:
```
spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
```
- MapReduce jobs
  You can add a command in the -Dproperty=value format to hadoop jar or yarn jar to configure attributes for MapReduce jobs.
  For example, in a MapReduce job, set the file write type to CACHE_THROUGH. Sample code snippet:
```
hadoop jar <HADOOP_HOME>/share/hadoop/mapreduce/hadoop-mapreduce-examples-x.x.x.jar wordcount \
-Dalluxio.user.file.writetype.default=CACHE_THROUGH \
-libjars /<PATH_TO_ALLUXIO>/client/alluxio-x.x.x.-client.jar \
<path1> <path2>
```
  Note <path1> is the path of input files. <path2> is the path of output files. x.x.x is the version of a JAR package. <HADOOP_HOME>/share/hadoop/mapreduce/hadoop-mapreduce-examples-x.x.x.jar and <PATH_TO_ALLUXIO>/client/alluxio-x.x.x.-client.jar are both examples.