Hadoop 3.x(生产调优手册)----【HDFS--核心参数】
Hadoop 3.x(生产调优手册)----【HDFS--核心参数】
- 1. NameNode内存生产配置
- 2. NameNode心跳并发配置
- 3. 开启回收站配置
- 1. 回收站工作机制
- 2. 开启回收站功能参数说明
- 3. 启用回收站
- 4. 查看回收站
- 5. 注意:通过网页上直接删除的文件也不会走回收站
- 6. 通过程序删除的文件不会经过回收站,需要调用 moveToTrash() 才进入回收站
- 7. 只有在命令行利用hadoop fs -rm命令删除的文件才会走回收站
- 8. 恢复回收站数据
1. NameNode内存生产配置
- NameNode 内存计算
每个文件块大概占用 150byte,一台服务器 128G 内存为例,能存储多少文件块呢?
- Hadoop2.x 系列,配置 NameNode 内存
NameNode 内存默认 2000m,如果服务器内存 4G,NameNode 内存可以配置 3G。在 hadoop-env.sh 文件中配置如下。
HADOOP_NAMENODE_OPTS=-Xmx3072m
-
Hadooo3.x 系列,配置 NameNode 内存
(1).
hadoop-env.sh
中描述 Hadoop 的内存是动态分配的
# The maximum amount of heap to use (Java -Xmx). If no unit
# is provided, it will be converted to MB. Daemons will
# prefer any Xmx setting in their respective _OPT variable.
# There is no default; the JVM will autoscale based upon machine
# memory size.
# export HADOOP_HEAPSIZE_MAX=
# The minimum amount of heap to use (Java -Xms). If no unit
# is provided, it will be converted to MB. Daemons will
# prefer any Xms setting in their respective _OPT variable.
# There is no default; the JVM will autoscale based upon machine
# memory size.
# export HADOOP_HEAPSIZE_MIN=
HADOOP_NAMENODE_OPTS=-Xmx102400m
(2). 查看 NameNode 占用内存
[fickler@hadoop102 hadoop]$ jps
2913 Jps
2146 DataNode
2601 JobHistoryServer
2010 NameNode
2463 NodeManager
[fickler@hadoop102 hadoop]$ jmap -heap 2010
Attaching to process ID 2010, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.212-b10
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1031798784 (984.0MB)
NewSize = 21495808 (20.5MB)
MaxNewSize = 343932928 (328.0MB)
OldSize = 43515904 (41.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 181403648 (173.0MB)
used = 51214112 (48.841583251953125MB)
free = 130189536 (124.15841674804688MB)
28.23212904737175% used
From Space:
capacity = 18350080 (17.5MB)
used = 0 (0.0MB)
free = 18350080 (17.5MB)
0.0% used
To Space:
capacity = 18874368 (18.0MB)
used = 0 (0.0MB)
free = 18874368 (18.0MB)
0.0% used
PS Old Generation
capacity = 64487424 (61.5MB)
used = 27225592 (25.96434783935547MB)
free = 37261832 (35.53565216064453MB)
42.2184517713097% used
15572 interned Strings occupying 1472168 bytes.
(3). 查看 DataNode 占用内存
[fickler@hadoop102 hadoop]$ jmap -heap 2146
Attaching to process ID 2146, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.212-b10
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1031798784 (984.0MB)
NewSize = 21495808 (20.5MB)
MaxNewSize = 343932928 (328.0MB)
OldSize = 43515904 (41.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 116916224 (111.5MB)
used = 83610888 (79.73755645751953MB)
free = 33305336 (31.76244354248047MB)
71.51350354934488% used
From Space:
capacity = 7864320 (7.5MB)
used = 7854024 (7.490180969238281MB)
free = 10296 (0.00981903076171875MB)
99.86907958984375% used
To Space:
capacity = 9437184 (9.0MB)
used = 0 (0.0MB)
free = 9437184 (9.0MB)
0.0% used
PS Old Generation
capacity = 32505856 (31.0MB)
used = 8200896 (7.82098388671875MB)
free = 24304960 (23.17901611328125MB)
25.228980279737904% used
14824 interned Strings occupying 1308896 bytes
查看发现 hadoop102 上的 NameNode 和 DataNode 占用内存都是自动分配的,且相等。不是很合理。
具体修改:hadoop-env.sh
export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,RFAS -Xmx1024m"
export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS -Xmx1024m"
修改完后,记得分发配置,重启服务
2. NameNode心跳并发配置
3. 开启回收站配置
开启回收站功能,可以将删除的文件在不超时的情况下,恢复原数据,起到防止误删除、备份等作用。
1. 回收站工作机制
2. 开启回收站功能参数说明
- 默认值
fs.trash.interval = 0
,0 表示禁用回收站;其他表示设置文件的存活时间。 - 默认值
fs.trash.checkpoint.interval = 0
,检查回收站的间隔时间。如果该值为 0,则该值设置和fs.trash.interval
的参数值相等。 - 要求
fs.trash.checkpoint.interval <= fs.trash.interval
。
3. 启用回收站
修改 core-site.xml
,配置垃圾回收时间为 1 分钟。
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
4. 查看回收站
回收站目录在 HDFS 集群中的路径:/user/fickler/.Trash/…
5. 注意:通过网页上直接删除的文件也不会走回收站
6. 通过程序删除的文件不会经过回收站,需要调用 moveToTrash() 才进入回收站
Trash trash = New Trash(conf);
trash.moveToTrash(path);