CentOS7安装Hadoop 2.7.3

软硬件环境

  • 虚拟机软件:Oracle VM VirtualBox
  • 操作系统:CentOS7
  • JDK:1.7
  • Hadoop:2.7.3

网络架构

本文在Windows10主机上通过Oracle VM VirtualBox搭建三个虚拟机,一个Master,两个Slave,具体网络环境如下图所示:


a3c3916cd7d562c3/1edb1558fa114be8a7aead8c544dd4df.png

安装步骤

主机名及IP配置

  • 修改主机名

编辑vi /etc/hostname文件,分别在对应的虚拟机中输入:hd-master,hd-slave1,hd-slave2,出现以下结果,表示修改成功.

1
2
3
4
5
6
7
8
9
10
11
# 在hd-master中输入`hostname`命令
$ hostname
hd-master

# 在hd-slave1中输入`hostname`命令
$ hostname
hd-slave1

# 在hd-slave2中输入`hostname`命令
$ hostname
hd-slave2

  • 配置hosts文件

分别在三台虚拟机中输入命令vi /etc/hosts,并都输入以下内容:

1
2
3
192.168.0.121   hd-master
192.168.0.122 hd-slave1
192.168.0.123 hd-slave2

三台虚拟机能够互相通过ping hd-master,ping hd-slave1,ping hd-slave2连通代表配置成功。

安装JDK

安装前使用命令java -version检查系统是否安装默认JDK版本,若有安装,请先卸载系统自带JDK,具体步骤参考另外一篇博客CentOS 卸载OpenJDK,安装OracleJDK,然后在Oracle官网下载JDK 1.7版本,以安装jdk-7u80-linux-x64.tar.gz为例。

  • 解压JDK到/usr/lib/jvm目录

    1
    2
    3
    4
    $ mkdir -p /usr/lib/jvm
    $ cd /usr/lib/jvm/
    $ tar -zxvf /home/hadoop/soft/jdk-7u80-linux-x64.tar.gz
    $ mv jdk1.7.0_80 jdk
  • 配置环境变量

为安全起见,只给hadoop用户设置环境变量。进入hadoop用户目录,编辑.bashrc文件(该文件为隐藏文件),可通过如下命令查看:

1
$ ll -a

a3c3916cd7d562c3/14eba74ef5a140f9876200c76a982347.png

通过VI编辑.bashrc文件,输入命令vi .bashrc,在文件末尾输入如下内容:

1
2
3
4
export JAVA_HOME=/usr/lib/jvm/jdk
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tool.jar

编译.bashrc文件,输入命令source .bashrc使其生效。

  • 检查JDK是否安装成功
    1
    2
    3
    4
    5
    $ java -version

    java version "1.7.0_80"
    Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

配置SSH免密码登录

  • 启动SSH免密码登录设置

编辑/etc/ssh/sshd_config,去掉以下两行注释:

1
2
#RSAAuthentication yes
#PubkeyAuthentication yes

  • 生成公钥、私钥文件

输入命令ssh-keygen -t rsa,一直回车,完成后就会在/home/hadoop/.ssh文件夹,每台服务器都需要设置,如下图所示:

42fac93ac9d6def8/54b19d597ca041969e64a0ffeff63dd9.png

输入命令ll ~/.ssh,可看到如下文件:

42fac93ac9d6def8/410bafdd35da41578266e8ede5f266d8.png

  • 合并公钥到authorized_keys文件

进入hd-master服务器/home/hadoop/.ssh/目录,通过ssh命令合并

1
2
3
cat id_rsa.pub >> authorized_keys
ssh hadoop@192.168.0.122 cat ~/.ssh/id_rsa.pub >> authorized_keys
ssh hadoop@192.168.0.123 cat ~/.ssh/id_rsa.pub >> authorized_keys

此时hd-master,hd-slave1,hd-slave2服务器的公钥文件都合并到authorized_keys文件之中,可通过more authorized_keys查看,该过程操作截图如下:

42fac93ac9d6def8/bf2a2e91e2dd422b97db7f0c58572c51.png

  • 拷贝authorized_keys文件到hd-slave1,hd-slave2服务器

通过scp命令拷贝authorized_keyshd-slave1,hd-slave2服务器/home/hadoop/.ssh/目录。

1
2
scp authorized_keys hadoop@hd-slave1:~/.ssh
scp authorized_keys hadoop@hd-slave2:~/.ssh

该过程操作截图如下:

42fac93ac9d6def8/fda56ad3050648239e765aa043124d62.png

  • 修改authorized_keys权限

分别修改hd-master,hd-slave1,hd-slave2服务器authorized_keys文件权限

1
chmod 700 authorized_keys

  • 测试SSH免密码登录

进入hd-master,输入ssh username@hostname测试是否配置成功,如下图所示:
42fac93ac9d6def8/72b347b1192d4d82a110627c4306fa18.png

安装配置Hadoop

下载Hadoop

1
wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

安装Hadoop

假设将hadoop相关软件都安装到/usr/local目录下,现在将/usr/local目录赋权给hadoop用户,使用如下命令

1
chown hadoop:hadoop /usr/local

解压hadoop-2.7.3.tar.gz并拷贝到/usr/local目录.

1
2
tar -zxvf hadoop-2.7.3.tar.gz 
mv hadoop-2.7.3 /usr/local/hadoop

创建数据存放目录

进入/usr/local/hadoop目录,创建tmphdfs/datahdfs/name目录。

1
2
3
cd /usr/local/hadoop
mkdir tmp
mkdir -p hdfs/data hdfs/name

配置core-site.xml

进入/usr/local/hadoop/etc/hadoop编辑core-site.xml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
</property>

<property>
<name>fs.defaultFS</name>
<value>hdfs://hd-master:9000</value>
</property>

<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>

配置hdfs-site.xml

进入/usr/local/hadoop/etc/hadoop编辑hdfs-site.xml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/hdfs/data</value>
</property>

<property>
<name>dfs.replication</name>
<value>2</value>
</property>

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hd-master:9001</value>
</property>

主机配置

  • 修改/usr/local/hadoop/etc/hadoop/masters文件
1
hd-master
  • 修改/usr/local/hadoop/etc/hadoop/slaves文件
1
2
hd-slave1
hd-slave2

复制Hadoop到其他各个节点

1
2
scp -r /usr/local/hadoop hd-slave1:/usr/local
scp -r /usr/local/hadoop hd-slave2:/usr/local

到此,Hadoop安装及基本配置以及完成,下面就进行启动和验证基本功能。

启动及验证Hadoop

格式化HDFS文件系统

hd-master格式化HDFS文件系统,该动作只需执行一次即可

1
2
cd /usr/local/hadoop/bin
hdfs namenode -format

执行后控制台输出如下所示,看到Exiting with status 0表示格式化成功。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
16/08/28 17:30:19 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hd-master/192.168.0.121
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.3
STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop……
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff; compiled by 'root' on 2016-08-18T01:41Z
STARTUP_MSG: java = 1.7.0_80
************************************************************/
16/08/28 17:30:19 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
16/08/28 17:30:19 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-d6400139-18c9-4416-935c-33f8944b5efe
16/08/28 17:30:21 INFO namenode.FSNamesystem: No KeyProvider found.
16/08/28 17:30:21 INFO namenode.FSNamesystem: fsLock is fair:true
16/08/28 17:30:21 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
16/08/28 17:30:21 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
16/08/28 17:30:21 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
16/08/28 17:30:21 INFO blockmanagement.BlockManager: The block deletion will start around 2016 Aug 28 17:30:21
16/08/28 17:30:21 INFO util.GSet: Computing capacity for map BlocksMap
16/08/28 17:30:21 INFO util.GSet: VM type = 64-bit
16/08/28 17:30:21 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
16/08/28 17:30:21 INFO util.GSet: capacity = 2^21 = 2097152 entries
16/08/28 17:30:21 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
16/08/28 17:30:21 INFO blockmanagement.BlockManager: defaultReplication = 2
16/08/28 17:30:21 INFO blockmanagement.BlockManager: maxReplication = 512
16/08/28 17:30:21 INFO blockmanagement.BlockManager: minReplication = 1
16/08/28 17:30:21 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
16/08/28 17:30:21 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
16/08/28 17:30:21 INFO blockmanagement.BlockManager: encryptDataTransfer = false
16/08/28 17:30:21 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
16/08/28 17:30:21 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
16/08/28 17:30:21 INFO namenode.FSNamesystem: supergroup = supergroup
16/08/28 17:30:21 INFO namenode.FSNamesystem: isPermissionEnabled = true
16/08/28 17:30:21 INFO namenode.FSNamesystem: HA Enabled: false
16/08/28 17:30:21 INFO namenode.FSNamesystem: Append Enabled: true
16/08/28 17:30:21 INFO util.GSet: Computing capacity for map INodeMap
16/08/28 17:30:21 INFO util.GSet: VM type = 64-bit
16/08/28 17:30:21 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
16/08/28 17:30:21 INFO util.GSet: capacity = 2^20 = 1048576 entries
16/08/28 17:30:21 INFO namenode.FSDirectory: ACLs enabled? false
16/08/28 17:30:21 INFO namenode.FSDirectory: XAttrs enabled? true
16/08/28 17:30:21 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
16/08/28 17:30:21 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/08/28 17:30:21 INFO util.GSet: Computing capacity for map cachedBlocks
16/08/28 17:30:21 INFO util.GSet: VM type = 64-bit
16/08/28 17:30:21 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
16/08/28 17:30:21 INFO util.GSet: capacity = 2^18 = 262144 entries
16/08/28 17:30:21 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
16/08/28 17:30:21 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
16/08/28 17:30:21 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
16/08/28 17:30:21 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
16/08/28 17:30:21 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
16/08/28 17:30:21 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
16/08/28 17:30:21 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
16/08/28 17:30:21 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
16/08/28 17:30:21 INFO util.GSet: Computing capacity for map NameNodeRetryCache
16/08/28 17:30:21 INFO util.GSet: VM type = 64-bit
16/08/28 17:30:21 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
16/08/28 17:30:21 INFO util.GSet: capacity = 2^15 = 32768 entries
16/08/28 17:30:22 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1205337113-192.168.0.121-1472376621912
16/08/28 17:30:22 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
16/08/28 17:30:22 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
16/08/28 17:30:22 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 353 bytes saved in 0 seconds.
16/08/28 17:30:22 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/08/28 17:30:22 INFO util.ExitUtil: Exiting with status 0
16/08/28 17:30:22 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hd-master/192.168.0.121
************************************************************/

启动Hadoop

进入/usr/local/hadoop/sbin目录下执行以下命令,建议关闭防火墙。

1
2
3
cd /usr/local/hadoop/sbin
./start-dfs.sh
./start-yarn.sh

执行后控制台输出如下信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
# start-dfs.sh
Starting namenodes on [hd-master]
hd-master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-hd-master.out
hd-slave1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hd-slave1.out
hd-slave2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hd-slave2.out
Starting secondary namenodes [hd-master]
hd-master: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-hd-master.out

# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-hd-master.out
hd-slave1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hd-slave1.out
hd-slave2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hd-slave2.out

验证是否启动成功

  • 使用java的jps工具检查

分别在hd-master,hd-slave1,hd-slave2服务器上输入jps命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# hd-master
[hadoop@hd-master hadoop]$ jps
12744 Jps
12368 ResourceManager
11983 NameNode
12170 SecondaryNameNode

# hd-slave1
[hadoop@hd-slave1 hadoop]$ jps
11052 NodeManager
11254 Jps
10931 DataNode

# hd-slave2
[hadoop@hd-slave2 hadoop]$ jps
10663 DataNode
11035 Jps
10776 NodeManager
  • 使用./bin/hdfs dfsadmin -report查看状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
[hadoop@hd-master hadoop]$ ./bin/hdfs dfsadmin -report
Configured Capacity: 88624291840 (82.54 GB)
Present Capacity: 80397443072 (74.88 GB)
DFS Remaining: 80397434880 (74.88 GB)
DFS Used: 8192 (8 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 192.168.0.123:50010 (hd-slave2)
Hostname: hd-slave2
Decommission Status : Normal
Configured Capacity: 44312145920 (41.27 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 4113416192 (3.83 GB)
DFS Remaining: 40198725632 (37.44 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Aug 28 18:19:42 CST 2016


Name: 192.168.0.122:50010 (hd-slave1)
Hostname: hd-slave1
Decommission Status : Normal
Configured Capacity: 44312145920 (41.27 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 4113432576 (3.83 GB)
DFS Remaining: 40198709248 (37.44 GB)
DFS Used%: 0.00%
DFS Remaining%: 90.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Aug 28 18:19:44 CST 2016

访问WEB管理页面

  • 访问Namenode页面

浏览器输入:http://192.168.0.121:50070/

42fac93ac9d6def8/d4985a0b2cab4a26b460c7e5060a723a.png

  • 访问Cluster页面

浏览器输入:http://192.168.0.121:8088/

42fac93ac9d6def8/40221de82b004507b8e76f9ae37af31b.png

到此,Hadoop环境搭建完成。