环境信息
使用的 hadoop 完全分布式集群 用户 hadoop
1 | 192.168.2.241 hadoop01 # hive db |
MySQL安装以及授权参考hadoop部署
HIVE 安装
官网 https://www.apache.org/dyn/closer.cgi/hive/
hadoop01, hadoop02 执行
1 | wget --no-check-certificate https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz |
HIVE 配置
/opt/bigdata/hive/current/conf/hive-site.xml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.2.243:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<!-- mysql8 配置改为 com.mysql.cj.jdbc.Driver -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop01:9083</value>
</property>
<!-- 指定 hiveserver2 连接的 host -->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop01</value>
</property>
<!-- 指定 hiveserver2 连接的端口号 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
</configuration>
需要使用到 mysql-connector-java
下载链接 https://dev.mysql.com/downloads/connector/j/
复制 jar包到 /opt/bigdata/hive/current/lib (hadoop01,hadoop02) HiveDB 和 Hiveclient 两个机器
使用
hadoop01 (HiveDB)1
2schematool -dbType mysql -initSchema # 初始化 Hive 元数据, 会在 mysql 中创建数据库和表等
nohup hive --service metastore 1> /opt/bigdata/hive/current/metastore.log 2>/opt/bigdata/hive/current/metastor_err.log & # 启动 Metastore 服务
hadoop02 (Hiveclient)
执行
1 | $ hive |
使用JDBC方式访问Hive
需要修改 /etc/hadoop/conf/core-site.yml
添加1
2
3
4
5
6
7
8<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
然后重启 dfs yarn 等服务, 即可连接
使用 beeline 访问,需要启动 hiveserver2
1 | nohup hive --service hiveserver2 1>/opt/bigdata/hive/current/hiveserver.log 2> /opt/bigdata/hive/current/hiveserver.err & |
使用 beeline 连接1
2
3
4
5
6
7
8
9
10
11$ bin/beeline -u jdbc:hive2://hadoop01:10000 -n hadoop
.......
Connecting to jdbc:hive2://hadoop01:10000
Connected to: Apache Hive (version 3.1.3)
Driver: Hive JDBC (version 3.1.3)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.3 by Apache Hive
0: jdbc:hive2://hadoop01:10000> show databases;
.............
HIVE 改用 tez 引擎
编译
因为 当前 tez 当前版本 不兼容 3.3.2 需要手动编译。
maven 官网 https://maven.apache.org/download.cgi
tez 官网 https://tez.apache.org/
github https://github.com/apache/tez
先配置环境 (需要联网下载,最好在 另外的服务器上编译, 编译完成后的包传输过来)1
2
3
4
5
6
7
8
9
10
11
12
13mkdir /opt/bigdata/maven
wget https://dlcdn.apache.org/maven/maven-3/3.8.5/binaries/apache-maven-3.8.5-bin.tar.gz --no-check-certificate
tar -zxvf apache-maven-3.8.5-bin.tar.gz -C /opt/bigdata/maven
cd /opt/bigdata/maven
ln -s apache-maven-3.8.5 current
cat >/etc/profile.d/maven_env.sh<<-eof
MAVEN_HOME=/opt/bigdata/maven/current
export PATH=\$PATH:\$MAVEN_HOME/bin
eof
source /etc/profile
mvn -v
编译
需要使用到 protobuf-2.5.01
2
3
4
5
6
7
8$ wget https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
$ tar zxvf protobuf-2.5.0.tar.gz
$ cd rotobuf-2.5.0
$ ./configure
$ make
$ make install
$ protoc --version
libprotoc 2.5.0
需要修改源码1
2
3
4wget https://github.com/apache/tez/archive/refs/tags/rel/release-0.10.1.tar.gz
tar zxvf release-0.10.1.tar.gz
cd tez-rel-release-0.10.1
pom.xml 修改两个位置1
2
3<hadoop.version>3.3.2</hadoop.version>
<!-- <module>tez-ui</module> -->
tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java
1 | 注释 |
tez-plugins/tez-aux-services/findbugs-exclude.xml1
2
3
4
5
6
7<FindBugsFilter>
<Match>
<Class name="org.apache.tez.auxservices.ShuffleHandler"/>
<Method name="recordJobShuffleInfo"/>
<Bug pattern="RV_RETURN_VALUE_IGNORED_NO_SIDE_EFFECT"/>
</Match>
</FindBugsFilter>
修改完后编译1
mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
然后将编译完成后的 文件 拷贝到几台 hadoop集群上
1 | tez-dist/target/tez-0.10.1-minimal.tar.gz |
安装配置(所有节点执行)
安装1
2
3
4
5
6mkdir -p /opt/bigdata/tez/tez-0.10.1
tar zxvf tez-0.10.1-minimal.tar.gz -C /opt/bigdata/tez/tez-0.10.1
cd /opt/bigdata/tez
ln -s tez-0.10.1 current
chown -R hadoop:hadoop /opt/bigdata/tez
修改 /opt/bigdata/hive/current/conf/hive-env.sh1
2
3
4
5
6
7
8
9
10export HIVE_HOME=/opt/bigdata/hive/current
export TEZ_HOME=/opt/bigdata/tez/current
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HADOOP_CLASSPATH=${TEZ_JARS}/*:${TEZ_JARS}/lib/*
修改 /opt/bigdata/hive/current/conf/hive-site.xml , 添加1
2
3
4<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
新增 /opt/bigdata/hive/current/conf/tez-site.xml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/tez/tez-0.10.1,${fs.defaultFS}/tez/tez-0.10.1/lib</value>
</property>
<property>
<name>tez.lib.uris.classpath</name>
<value>${fs.defaultFS}/tez/tez-0.10.1,${fs.defaultFS}/tez/tez-0.10.1/lib</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>tez.am.resource.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>tez.am.resource.cpu.vcores</name>
<value>2</value>
</property>
</configuration>
上传文件(一个节点执行就行)1
2
3
4
5mkdir ~/tez-0.10.1
tar zxvf tez-0.10.1.tar.gz -C ~/tez-0.10.1
cd ~
hadoop fs -mkdir /tez
hadoop fs -put tez-0.10.1 /tez
验证
生成测试文件1
2
3
4
5
6
7cat > /tmp/demo.txt <<-EOF
Linux Unix windows
hadoop Linux spark
hive hadoop Unix
MapReduce hadoop Linux hive
windows hadoop spark
EOF
重启 metastore, hiveserver21
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21$ beeline -u jdbc:hive2://hadoop01:10000 -n hadoop
0: jdbc:hive2://hadoop01:10000> create table ad1 (id string, ip string,pt string) partitioned by (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
0: jdbc:hive2://hadoop01:10000> LOAD DATA LOCAL INPATH '/tmp/demo.txt' OVERWRITE INTO TABLE ad1 PARTITION (dt='2022-04-22');
0: jdbc:hive2://hadoop01:10000> set hive.execution.engine=mr;
No rows affected (0.029 seconds)
0: jdbc:hive2://hadoop01:10000> select count(1) from ad1;
+------+
| _c0 |
+------+
| 10 |
+------+
1 row selected (15.729 seconds)
0: jdbc:hive2://hadoop01:10000> set hive.execution.engine=tez;
No rows affected (0.003 seconds)
0: jdbc:hive2://hadoop01:10000> select count(1) from ad1;
+------+
| _c0 |
+------+
| 10 |
+------+
1 row selected (3.494 seconds)
可以看到 mr 用了 15.7s, tez 用了 3.4s
遇到的问题
1 | 0: jdbc:hive2://hadoop01:10000> select count(1) from ad1; |
Hive on tez 执行作业时报错请求内存大于允许内存
yarn 的 nodemanager.resource.memory-mb 设置的过小,需要将调整改值,或者改 hive.tez.container.size 的值
当前先 临时修改 hive.tez.container.size 测试1
20: jdbc:hive2://hadoop01:10000> set hive.tez.container.size=1024;
No rows affected (0.025 seconds)
日志文件
/tmp/root/hive.log
1 | property.hive.log.level = INFO |