环境信息
使用的 hadoop 完全分布式集群
1 | 192.168.2.241 hadoop01 |
下线一个 datanode 节点
/etc/hadoop/conf/hdfs-site.xml 添加1
2
3
4
5
<property>
<name>dfs.hosts.exclude</name>
<value>/etc/hadoop/conf/hosts-exclude</value>
</property>
/etc/hadoop/conf/hosts-exclude 添加待下线的节点
1 | hadoop03 |
刷新hadoop 配置1
hdfs dfsadmin -refreshNodes
查看, 已下线1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61[hadoop@hadoop01 ~]$ hdfs dfsadmin -report
Configured Capacity: 72955723776 (67.95 GB)
Present Capacity: 33702436598 (31.39 GB)
DFS Remaining: 32456507392 (30.23 GB)
DFS Used: 1245929206 (1.16 GB)
DFS Used%: 3.70%
Replicated Blocks:
Under replicated blocks: 412
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 412
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.2.241:9866 (hadoop01)
Hostname: hadoop01
Decommission Status : Normal
Configured Capacity: 36477861888 (33.97 GB)
DFS Used: 657819584 (627.35 MB)
Non DFS Used: 22519461952 (20.97 GB)
DFS Remaining: 13300580352 (12.39 GB)
DFS Used%: 1.80%
DFS Remaining%: 36.46%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Wed May 25 05:03:23 EDT 2022
Last Block Report: Wed May 25 04:20:17 EDT 2022
Num of Blocks: 871
Name: 192.168.2.242:9866 (hadoop02)
Hostname: hadoop02
Decommission Status : Normal
Configured Capacity: 36477861888 (33.97 GB)
DFS Used: 588109622 (560.87 MB)
Non DFS Used: 16733825226 (15.58 GB)
DFS Remaining: 19155927040 (17.84 GB)
DFS Used%: 1.61%
DFS Remaining%: 52.51%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Wed May 25 05:03:23 EDT 2022
Last Block Report: Wed May 25 04:20:17 EDT 2022
Num of Blocks: 599
上线, 修改配置1
2
3
4<property>
<name>dfs.hosts</name>
<value>/etc/hadoop/conf/hosts</value>
</property>
写入 /etc/hadoop/conf/hosts1
2
3hadoop01
hadoop02
hadoop03
刷新hadoop 配置1
hdfs dfsadmin -refreshNodes
hadoop03 手动启动 datanode1
hdfs --daemon start datanode
某个 datanode 节点磁盘坏掉
在故障节点上查看 /etc/hadoop/conf/hdfs-site.xml 文件中对应的 dfs.datanode.data.dir 参数设置,去掉故障磁盘对应的目录挂载点;
在故障节点上查看 /etc/hadoop/conf/yarn-site.xml 文件中对应的 yarn.nodemanager.local-dirs 参数设置,去掉故障磁盘对应的目录挂载点;
重启该节点的 DataNode 服务和 NodeManager 服务即可。
Hadoop 进入安全模式
- Hadoop 的启动和验证都正常,那么只需等待一会儿,Hadoop 便将自动结束安全模式
或者手动执行
1 | hdfs dfsadmin -safemode leave |
krb 调试
KRB5_TRACE=/dev/stdout
正常返回1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36[root@test-152 keytabs]# KRB5_TRACE=/dev/stdout kinit -kt test.keytab test
[3737241] 1731657026.743873: Getting initial credentials for [email protected]
[3737241] 1731657026.743874: Looked up etypes in keytab: aes256-cts, aes128-cts
[3737241] 1731657026.743876: Sending unauthenticated request
[3737241] 1731657026.743877: Sending request (175 bytes) to example.COM
[3737241] 1731657026.743878: Resolving hostname test-152
[3737241] 1731657026.743879: Sending initial UDP request to dgram 172.20.1.152:88
[3737241] 1731657026.743880: Received answer (692 bytes) from dgram 172.20.1.152:88
[3737241] 1731657026.743881: Sending DNS URI query for _kerberos.example.COM.
[3737241] 1731657026.743882: No URI records found
[3737241] 1731657026.743883: Sending DNS SRV query for _kerberos-master._udp.example.COM.
[3737241] 1731657026.743884: Sending DNS SRV query for _kerberos-master._tcp.example.COM.
[3737241] 1731657026.743885: No SRV records found
[3737241] 1731657026.743886: Response was not from master KDC
[3737241] 1731657026.743887: Processing preauth types: PA-ETYPE-INFO2 (19)
[3737241] 1731657026.743888: Selected etype info: etype aes256-cts, salt "example.COMtest", params ""
[3737241] 1731657026.743889: Produced preauth for next request: (empty)
[3737241] 1731657026.743890: Getting AS key, salt "example.COMtest", params ""
[3737241] 1731657026.743891: Retrieving [email protected] from FILE:test.keytab (vno 0, enctype aes256-cts) with result: 0/Success
[3737241] 1731657026.743892: AS key obtained from gak_fct: aes256-cts/C03C
[3737241] 1731657026.743893: Decrypted AS reply; session key is: aes256-cts/0610
[3737241] 1731657026.743894: FAST negotiation: available
[3737241] 1731657026.743895: Initializing FILE:/tmp/krb5cc_0 with default princ [email protected]
[3737241] 1731657026.743896: Storing [email protected] -> krbtgt/[email protected] in FILE:/tmp/krb5cc_0
[3737241] 1731657026.743897: Storing config in FILE:/tmp/krb5cc_0 for krbtgt/[email protected]: fast_avail: yes
[3737241] 1731657026.743898: Storing [email protected] -> krb5_ccache_conf_data/fast_avail/krbtgt\/example.COM\@example.COM@X-CACHECONF: in FILE:/tmp/krb5cc_0
[root@test-152 keytabs]# KRB5_TRACE=/dev/stdout klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: [email protected]
Valid starting Expires Service principal
11/15/2024 15:50:26 11/16/2024 15:50:26 krbtgt/[email protected]
[root@test-152 keytabs]# KRB5_TRACE=/dev/stdout kdestroy
[3737246] 1731657059.396333: Destroying ccache FILE:/tmp/krb5cc_0
[root@test-152 keytabs]#
yarn 日志查看
1 | yarn application -list # yarn app -list |