This topic provides answers to some frequently asked questions about Impala.
What do I do if I cannot find newly added Hive tables in Impala?
Can I change the owner of data when I use Impala to write the data to Hive tables?
How do I adjust the amount of memory that can be used by Impalad?
How do I specify the maximum size of memory that can be consumed by a single query statement?
How do I improve the efficiency of a query statement that does not contain a JOIN clause?
What do I do if newly added Impalad nodes cannot work as expected after I enable Impala for Ranger?
Ten seconds after I use JDBC to query data in Impala 3.4, a message appears, which indicates that the session times out. What do I do?
Run a command on Impala to set the FETCH_ROWS_TIMEOUT_MS parameter to 0. The value 0 indicates that the session is permanently valid. Sample command:
jdbc:impala://impala-hive.ymt.io:21050/ymtcube;FETCH_ROWS_TIMEOUT_MS=0
What do I do if I cannot find newly added Hive tables in Impala?
After you perform operations on table metadata in a component other than Impala, run the INVALIDATE METADATA command on Impala to refresh the metadata of a table or all tables in the database.
Can I change the owner of data when I use Impala to write the data to Hive tables?
No, the owner of the data that is written to Hive tables by using Impala is Impala and cannot be changed.
How do I adjust the amount of memory that can be used by Impalad?
The mem_limit parameter specifies the amount of memory that can be consumed. To configure this parameter, perform the following operations: Log on to the EMR console. Go to the Impala service page. Click the Configure tab and search for the mem_limit parameter in the search box. The default value of the mem_limit parameter is 80%. You can also set this parameter to a value that indicates a specific memory size, such as 10G
.
How do I specify the maximum size of memory that can be consumed by a single query statement?
Execute the set MEM_LIMIT=Xg
statement to specify the maximum size of memory that can be consumed by a single query statement. The setting takes effect within a session.
How do I improve the efficiency of a query statement that does not contain a JOIN clause?
Run a command on Impala to set the mt_dop parameter to a larger value to improve the parallelism of instances in a fragment.
What do I do if newly added Impalad nodes cannot work as expected after I enable Impala for Ranger?
Problem description: In EMR V5.6.0 or a minor version earlier than EMR V5.6.0, after Impala is enabled for Ranger and nodes are added in the EMR console, the newly added Impalad nodes cannot work as expected.
Cause: When you enable Impala for Ranger, Ranger-related configuration files are copied to the configurations of each Impalad node to allow Impala to support Ranger. However, this operation is not triggered during node addition. As a result, newly added nodes cannot work as expected due to the lack of Ranger-related configurations.
Solution: You can use one of the following methods to troubleshoot the issue:
Method 1: Go to the Status tab of the Ranger service page in the EMR console. Find the RangerAdmin component in the Components section, move the pointer over the icon in the Actions column, and then select enableImpala to enable Impala for Ranger again.
Method 2: Log on to the emr-header-1 node of your cluster. Then, copy the
ranger-hive-audit.xml
,ranger-hive-security.xml
,ranger-policymgr-ssl.xml
, andranger-security.xml
files that are stored in the/etc/ecm/impala-conf
directory to the/etc/ecm/impala-conf
directory of each newly added node.