Issue Description: #

While working with Hive, we noticed that there are too many directories with name .hive-staging_hive_yyyy-MM-dd_HH-mm-ss_SSS_xxxx-x in the table location directories. These directories were added to the location with execution of each query.

At the same time, few users were facing problems like access permission issues on table directories even when each table had read access for all users and groups.

Error while compiling statement: FAILED: RuntimeException Cannot create staging directory ‘hdfs://namenode:8020/path/to/table/hive-staging_hive_yyyy-MM-dd_HH-mm-ss_SSS_xxxx-x’: Permission denied: user=uname, access=WRITE, inode=”hdfs://namenode:8020/path/to/table”:uname2:hive:drwxrwxr-x at…….

RCA: #

While investigating the issue, we found every hive query was trying to treat table location as the Hive staging directory. Ideally it should be somewhere in the /tmp directory in HDFS. Users facing access problems were the ones which did not had write access to the table location which again is valid as you do not want everyone to mess with the data.

After doing some search, we found out that other CDH users were also facing this issue and they had already come up with work around for the issue. This article expains the root cause analysis of the issue.

Solution: #

We had to change few properties for Hive which are located in hive-site.xml. We used Cloudera Manger web configuration section of Hive to do it. Safety valves were changed for all roles in Hive service. So non data platform team memebers were able to execute query without any issues and no .hive-staging directories were being created in table location. But still this issue persisted when the queries were executed from Hive Cli. So we had to change the safety valve for client configurations as well. After deploying the configurations cluster wide, the issue disappeared.

Configurations:

1
2
3
4
 <property>
    <name>hive.exec.stagingdir</name>
    <value>/tmp/hive-staging/.hive</value>
  </property>

OR

1
2
3
4
<property>
   <name>hive.exec.stagingdir</name>
   <value>${hive.exec.scratchdir}</value>
</property>

Use second property configuration if data security is of high importance!!
What Broke it:
We recently upgraded our cluster to CDH 5.3 and it has Hive encryption feature in place. Hive uses the encryption provided by HDFS. Because of limitations put forth by HDFS encryption, the table location HDFS directory is used as hive staging directory and hence it broke previous hive settings.

References: #