This topic provides answers to some frequently asked questions about Flume.
- What do I do if the number of logs that are written to Hive is less than the number of logs that are generated?
- What do I do if a DeadLock error occurs when I terminate the Flume process?
- How do I handle the occasional exception that occurs on File Channel after I run the kill -9 command to forcibly terminate the Flume process?
What do I do if the number of logs that are written to Hive is less than the number of logs that are generated?
- Problem description: The number of logs that are written to Hive by using Flume is less than the number of logs that are generated.
- Solution: Add the hdfs.batchSize parameter in the EMR console. For more information, see Add parameters. HDFS Sink uses the hdfs.batchSize parameter to specify the number of events that are written to a file before the file is rolled to HDFS. If the hdfs.batchSize parameter is not specified, a file is rolled to HDFS each time 100 events are written to the file. As a result, data is not updated in time.
What do I do if a DeadLock error occurs when I terminate the Flume process?
- Problem description: When you invoke the exit method to terminate the Flume process, a DeadLock error occasionally occurs.
- Solution: Run the
kill -9
command to forcibly terminate the Flume process.
How do I handle the occasional exception that occurs on File Channel after I run the kill -9 command to forcibly terminate the Flume process?
- Problem 1
- Problem description: File Channel is used. After you run the
kill -9
command to forcibly terminate the Flume process, a directory lock fails to be obtained. As a result, you cannot restart Flume. The following error message appears:Due to java.io.IOException: Cannot lock data/checkpoints/xxx. The directory is already locked.
- Solution: Delete the in_use.lock file before you restart Flume. We recommend that you run the
kill -9
command only when necessary.
- Problem description: File Channel is used. After you run the
- Problem 2
- Problem description: File Channel is used. After you run the
kill -9
command to forcibly terminate the Flume process, data directories fail to be parsed. As a result, you cannot restart Flume. The following error message appears:org.apache.flume.channel.file.CorruptEventException: Could not parse event from data file.
- Solution: Delete checkpoints and data directories before you restart Flume. We recommend
that you run the
kill -9
command only when necessary.
- Problem description: File Channel is used. After you run the