BIGTOP-3992:Missing distcp package for hdfs in hive#1167
BIGTOP-3992:Missing distcp package for hdfs in hive#1167lvkaihua wants to merge 5 commits intoapache:masterfrom
Conversation
|
When loading data, hive will determine the size of hive. exec. copyfile. maxnumfiles and hive. exec. copyfile. maxsize. If this exceeds the size, it will call the distcp package |
bigtop-packages/src/deb/hive/rules
Outdated
| ln -s /usr/lib/hbase/hbase-common.jar /usr/lib/hbase/hbase-client.jar /usr/lib/hbase/hbase-hadoop-compat.jar /usr/lib/hbase/hbase-hadoop2-compat.jar debian/tmp/usr/lib/hive/lib | ||
| ln -s /usr/lib/hbase/hbase-procedure.jar /usr/lib/hbase/hbase-protocol.jar /usr/lib/hbase/hbase-server.jar debian/tmp/usr/lib/hive/lib/ | ||
| ln -s /usr/lib/zookeeper/zookeeper.jar debian/tmp/usr/lib/hive/lib | ||
| ln -s /usr/lib/hadoop//tools/lib/hadoop-distcp*.jar debian/tmp/usr/lib/hive/lib/ |
There was a problem hiding this comment.
Symlink to non-existent target was created by the line. The glob expression (*) seemed not be expanded as expected.
root@e0702827e22c:/# ls -l /usr/lib/hive/lib | grep distcp
lrwxrwxrwx 1 root root 41 Sep 19 09:33 hadoop-distcp*.jar -> ../../hadoop/tools/lib/hadoop-distcp*.jar
There was a problem hiding this comment.
@lvkaihua The cause was not duplicate slash. If I changed the line to ln -s /usr/lib/hadoop/tools/lib/hadoop-distcp-3.3.6.jar debian/tmp/usr/lib/hive/lib/, the symlink was properly created. The glob like hadoop-distcp*.jar did not work.
There was a problem hiding this comment.
You can use environment variable to get the version of Hadoop at least.
ln -s /usr/lib/hadoop/tools/lib/hadoop-distcp-${HADOOP_VERSION}.jar debian/tmp/usr/lib/hive/lib/
If you want to make the symlink work regardless of Hadoop version, you need to make hadoop-distcp.jar as symlink to the hadoop-distcp-x.y.z.jar in Hadoop package first. I guess.
There was a problem hiding this comment.
I'm very sorry, the previous testing I conducted on Centos lacked detailed testing for the Ubuntu section. I thought it was all the same soft link operation, but I just fixed it and changed it to ln - s/usr/lib/hadoop/tools/lib/hadoop distcp *. jar debian/tmp/usr/lib/live/lib/hadoop distcp.jar. After testing, it will automatically match the corresponding

| %__ln_s %{usr_lib_zookeeper}/zookeeper.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ | ||
| %__ln_s %{usr_lib_hbase}/hbase-common.jar %{usr_lib_hbase}/hbase-client.jar %{usr_lib_hbase}/hbase-hadoop-compat.jar %{usr_lib_hbase}/hbase-hadoop2-compat.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ | ||
| %__ln_s %{usr_lib_hbase}/hbase-procedure.jar %{usr_lib_hbase}/hbase-protocol.jar %{usr_lib_hbase}/hbase-server.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ | ||
| %__ln_s %{usr_lib_hadoop}/tools/lib/hadoop-distcp-*.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ |
There was a problem hiding this comment.
I got the same issue with RPM on Rocky Linux 8.
[root@c573e05810ea /]# cat /etc/os-release | head -n 2
NAME="Rocky Linux"
VERSION="8.5 (Green Obsidian)"
[root@c573e05810ea /]# ls -l /usr/lib/hive/lib | grep distcp
lrwxrwxrwx. 1 root root 45 Sep 19 11:55 hadoop-distcp-*.jar -> /usr/lib/hadoop/tools/lib/hadoop-distcp-*.jar
There was a problem hiding this comment.
Oops. I meant RPM on Rocky Linux 8.
There was a problem hiding this comment.
Perhaps it's because you need to install Hadoop's rpm first?




Description of PR
Missing distcp packets for hdfs in hive can cause errors when loading data due to missing packets
How was this patch tested?
For code changes: