This topic provides answers to some frequently asked questions about Alluxio.
- What do I do if the error message "No FileSystem for scheme: alluxio" appears?
- What do I do if Alluxio does not work as expected?
- How do I accelerate access to data in OSS by caching data?
- How can I modify Alluxio-related parameters?
What do I do if the error message "No FileSystem for scheme: alluxio" appears?
This issue does not occur if you select Alluxio from the optional services when you create an E-MapReduce (EMR) cluster. This issue occurs because you add the Alluxio service after you create an EMR cluster, but you do not restart the service.
After you add the Alluxio service, you must restart the Alluxio service to load Hadoop configurations. For more information about how to restart a service, see Restart a service.
What do I do if Alluxio does not work as expected?
Identify the node on which the Alluxio service is abnormal based on the error message and view the Alluxio service log for troubleshooting.
In most cases, the Alluxio service log is stored in the /mnt/disk1/log/alluxio/ directory.
How do I accelerate access to data in OSS by caching data?
By default, HDFS serves as the Under File Storage (UFS) of Alluxio in EMR. We recommend that you use an Object Storage Service (OSS) directory as a mount point.
alluxio fs mount --option fs.oss.accessKeyId=<OSS_ACCESS_KEY_ID> \
--option fs.oss.accessKeySecret=<OSS_ACCESS_KEY_SECRET> \
--option fs.oss.endpoint=<OSS_ENDPOINT> \
/oss_dir <path>/
<OSS_ACCESS_KEY_ID>
: the AccessKey ID of your Alibaba Cloud account that is used to access OSS.<OSS_ACCESS_KEY_SECRET>
: the AccessKey secret of your Alibaba Cloud account that is used to access OSS.<OSS_ENDPOINT>
: the endpoint of OSS, which is in the format of oss-xxxx-internal.aliyuncs.com. You can view the endpoint of OSS in the OSS console. You must create an EMR cluster in the same region as OSS. We recommend that you use an internal endpoint of OSS, such as oss-cn-shanghai-internal.aliyuncs.com.<path>
: the file storage path in OSS, such asoss://<OSS_YOURBUCKETNAME>/<OSS_DIRECTORY>
.<OSS_YOURBUCKETNAME>
is the name of your OSS bucket.
How can I modify Alluxio-related parameters?
- Global parameter configuration
Go to the Alluxio service page in the EMR console and modify parameters. For more information, see Manage parameters for services.
- Dynamic parameter configuration
- Alluxio shell
Log on to your cluster and add a command in the
Dproperty=value
format to add custom configurations. Sample command:alluxio fs copyFromLocal hello.txt /dir/tmp -Dalluxio.user.file.writetype.default=CACHE_THROUGH
Notehello.txt
is your local file./dir/tmp
is the directory in Alluxio. For more information about thecopyFromLocal
command, see Common commands. - Spark jobs
You can add a command in the
-Dproperty=value
format to spark.executor.extraJavaOptions of a Spark executor and to spark.driver.extraJavaOptions of Spark drivers to pass JVM parameters to Spark jobs.For example, when you submit a Spark job, set the file write type to CACHE_THROUGH. Sample code snippet:spark-submit \ --conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \ --conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
- MapReduce jobs
You can add a command in the
-Dproperty=value
format tohadoop jar
oryarn jar
to configure attributes for MapReduce jobs.For example, in a MapReduce job, set the file write type to CACHE_THROUGH. Sample code snippet:hadoop jar <HADOOP_HOME>/share/hadoop/mapreduce/hadoop-mapreduce-examples-x.x.x.jar wordcount \ -Dalluxio.user.file.writetype.default=CACHE_THROUGH \ -libjars /<PATH_TO_ALLUXIO>/client/alluxio-x.x.x.-client.jar \ <path1> <path2>
Note<path1>
is the path of input files.<path2>
is the path of output files.x.x.x
is the version of a JAR package.<HADOOP_HOME>/share/hadoop/mapreduce/hadoop-mapreduce-examples-x.x.x.jar
and<PATH_TO_ALLUXIO>/client/alluxio-x.x.x.-client.jar
are both examples.
- Alluxio shell