1、基础环境配置
主机名 | IP地址 | 角色 | Hadoop用户 |
---|---|---|---|
centos05 | 192.168.48.105 | NameNode、ResourceManager、SecondaryNameNode、 DataNode、NodeManager | hadoop |
1.1、关闭防火墙和SELinux
1.1.1、关闭防火墙
略
1.1.2、关闭SELinux
略
注:以上操作需要使用root用户
1.2、hosts配置
1 | [root@centos05 ~]# vim/etc/hosts2 | ##hadoop host####3 | 192.168.48.105 centos05
1 | [root@centos05 ~]# vim /etc/sysconfig//network2 3 | HOSTNAME=centos05
注:以上操作需要使用root用户,通过ping 主机名可以返回对应的IP即可
1.3、创建主机账号及配置无密码访问
新建用户,建议用adduser命令 sudo adduser hadooppasswd hadoop输入密码后一直按回车即可,最后输入y确定。在创建hadoop用户的同时也创建了hadoop用户组,下面我们把hadoop用户加入到hadoop用户组输入sudo usermod -a -G hadoop hadoop 前面一个hadoop是组名,后面一个hadoop是用户名。完成后输入一下命令查询结果。cat /etc/group然后再把hadoop用户赋予root权限,让他可以使用sudo命令切换到可以root的用户输入sudo gedit /etc/sudoerssudo vi /etc/sudoers在图形界面可以用第一个命令,是ubuntu自带的一个文字编辑器,终端命令界面使用第二个命令。有关vi编辑器的使用自行百度。修改文件如下:# User privilege specificationroot ALL=(ALL) ALLhadoop ALL=(ALL) ALL保存退出,hadoop用户就拥有了root权限
生成私钥和公钥ssh-keygen -t rsa拷贝公钥到主机(需要输入密码)ssh-copy-id hadoop@hadoop注:以上操作需要在hadoop用户,通过hadoop用户ssh到本机主机不需要密码即可
1.4、Java环境配置
1.4.1、下载JDK
略
1.4.2、安装java
略
2、安装hadoop
2.1、下载安装CDH版本的hadoop
下载链接:http://archive-primary.cloudera.com/cdh5/cdh/5/
2.2、安装配置hadoop
hadoop的安装配置使用hadoop用户操作;
- 创建目录,用于存放hadoop数据;
[hadoop@centos05 ~]$ mkdir -p /home/hadoop/app/hadoop/hdfs/{name,data}
2.2.1、配置core-site.xml
[hadoop@centos05 ~]$vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/core-site.xmlfs.defaultFS hdfs://localhost:9090 hadoop.tmp.dir file:/opt/hadoop/tmp
2.2.2、配置hdfs-site.xml
[hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hdfs-site.xmldfs.replication 1 dfs.namenode.name.dir /opt/hadoop/hdfs/name dfs.datanode.data.dir /opt/hadoop/hdfs/data dfs.webhdfs.enabled true
2.2.3、配置mapred-site.xml
[hadoop@centos05 hadoop]$cd /opt/hadoop/hadoop-2.6.0/etc/hadoop[hadoop@centos05 hadoop]$cp mapred-site.xml.template mapred-site.xml[hadoop@centos05 hadoop]$vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/mapred-site.xmlmapreduce.framework.name yarn
2.2.4、配置yarn-site.xml
[hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/yarn-site.xmlyarn.nodemanager.aux-services mapreduce_shuffle
2.2.5、配置slaves
[hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/slavescentos05
2.2.6、配置hadoop-env
修改hadoop-env.sh文件的JAVA_HOME环境变量,操作如下:
[hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hadoop-env.shexport JAVA_HOME=/opt/java/jdk1.8.0_191
2.2.7、配置yarn-env
修改yarn-env.sh文件的JAVA_HOME环境变量,操作如下:
[hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hadoop-env.shexport JAVA_HOME=/opt/java/jdk1.8.0_191
2.2.8、配置mapred-env
修改mapred-env.sh文件的JAVA_HOME环境变量,操作如下:
[hadoop@centos05 hadoop]$ vim /opt/hadoop/hadoop-2.6.0/etc/hadoop/hadoop-env.shexport JAVA_HOME=/opt/java/jdk1.8.0_191
2.2.9、配置HADOOP_PREFIX
配置HADOOP主机用户环境变量:
[hadoop@centos05 ~]$ vim .bash_profile####HADOOP_PREFIX####export HADOOP_PREFIX=/opt/hadoop/hadoop-2.6.0export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
启用环境变量
[hadoop@centos05 ~]$ source .bash_profile
注:通过echo $HADOOP_PREFIX
命令返回hadoop的安装目录
3、启动hadoop伪分布式
3.1、启动hdfs和yarn
-
格式化hdfs
[hadoop@centos05 ~]$ hdfs namenode -format
-
启动dfs
-
启动yarn
[hadoop@centos05 ~]$ start-dfs.sh [hadoop@centos05 ~]$ start-yarn.sh
- 查看启动的进程
[hadoop@centos05 ~]$ jps 18265 DataNode18615 ResourceManager18463 SecondaryNameNode31343 Jps18728 NodeManager18152 NameNode
注:关闭dfs命令为:
stop-dfs.sh stop-yarn.sh
3.3、启动集群
hdfs和yarn的启动可以使用一条命令执行:
启动:start-all.sh关闭: stop-all.sh
-
启动后的所有进程:
[hadoop@centos05 ~]$ start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [centos05]centos05: starting namenode, logging to /opt/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-namenode-centos05.outcentos05: starting datanode, logging to /opt/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-datanode-centos05.outStarting secondary namenodes [0.0.0.0]0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-secondarynamenode-centos05.outstarting yarn daemonsstarting resourcemanager, logging to /opt/hadoop/hadoop-2.6.0/logs/yarn-hadoop-resourcemanager-centos05.outcentos05: starting nodemanager, logging to /opt/hadoop/hadoop-2.6.0/logs/yarn-hadoop-nodemanager-centos05.out[hadoop@centos05 ~]$
-
启动后的所有进程:
[hadoop@centos05 ~]$ jps32640 NodeManager529 Jps32057 NameNode32526 ResourceManager32356 SecondaryNameNode32172 DataNode
- YARN管理界面:
- HDFS管理界面:
4、hdfs的shell操作和Wordcount演示
4.1、简单的hdfs shell操作
-
创建目录
[hadoop@centos05 ~]$ hadoop fs -mkdir /input_test$ hadoop fs -mkdir /output_test
-
查看目录
[hadoop@centos05 ~]$ hadoop fs -ls /Found 3 itemsdrwxr-xr-x - hadoop supergroup 0 2018-11-27 23:04 /input_testdrwxr-xr-x - hadoop supergroup 0 2018-11-27 23:27 /output_testdrwx------ - hadoop supergroup 0 2018-11-27 23:08 /tmp
-
上传文件
[hadoop@centos05 /]$ hadoop fs -put /opt/hadoop/hadoop-2.6.0/share/doc/index.html /input_test
- 查看上传文件
-
[hadoop@centos05 /]$ hadoop fs -ls /input_test/index.html-rw-r--r-- 1 hadoop supergroup 19968 2018-11-28 10:08 /input_test/index.html
- 查看文本文件内容
[hadoop@centos05 /]$ hadoop fs -cat /input_test/index.html
4.2、Wordcount
将HDFS上/input_text/index.html
使用hadoop内置Wordcount的jar包统计文档的Wordcount
-
启动测试
[hadoop@centos05 /]$ hadoop jar /opt/hadoop/hadoop-2.6.0/share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0-cdh5.15.1.jar wordcount /input_test/index.html /output_test/runcount 18/11/28 10:18:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803218/11/28 10:18:54 INFO input.FileInputFormat: Total input paths to process : 118/11/28 10:18:54 INFO mapreduce.JobSubmitter: number of splits:118/11/28 10:18:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543369969234_000218/11/28 10:18:56 INFO impl.YarnClientImpl: Submitted application application_1543369969234_000218/11/28 10:18:56 INFO mapreduce.Job: The url to track the job: http://centos05:8088/proxy/application_1543369969234_0002/18/11/28 10:18:56 INFO mapreduce.Job: Running job: job_1543369969234_000218/11/28 10:19:16 INFO mapreduce.Job: Job job_1543369969234_0002 running in uber mode : false18/11/28 10:19:16 INFO mapreduce.Job: map 0% reduce 0%18/11/28 10:19:31 INFO mapreduce.Job: map 100% reduce 0%18/11/28 10:19:43 INFO mapreduce.Job: map 100% reduce 100%18/11/28 10:19:44 INFO mapreduce.Job: Job job_1543369969234_0002 completed successfully18/11/28 10:19:45 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=13728 FILE: Number of bytes written=313427 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=20075 HDFS: Number of bytes written=11719 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=12498 Total time spent by all reduces in occupied slots (ms)=9428 Total time spent by all map tasks (ms)=12498 Total time spent by all reduce tasks (ms)=9428 Total vcore-milliseconds taken by all map tasks=12498 Total vcore-milliseconds taken by all reduce tasks=9428 Total megabyte-milliseconds taken by all map tasks=12797952 Total megabyte-milliseconds taken by all reduce tasks=9654272 Map-Reduce Framework Map input records=383 Map output records=1087 Map output bytes=18860 Map output materialized bytes=13728 Input split bytes=107 Combine input records=1087 Combine output records=504 Reduce input groups=504 Reduce shuffle bytes=13728 Reduce input records=504 Reduce output records=504 Spilled Records=1008 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=174 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=5455101952 Total committed heap usage (bytes)=165810176 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=19968 File Output Format Counters Bytes Written=11719[hadoop@centos05 /]$
- 查看结果
[hadoop@centos05 /]$ hadoop fs -ls /output_test/runcount/Found 2 items-rw-r--r-- 1 hadoop supergroup 0 2018-11-28 10:19 /output_test/runcount/_SUCCESS-rw-r--r-- 1 hadoop supergroup 11719 2018-11-28 10:19 /output_test/runcount/part-r-00000[hadoop@centos05 /]$ hadoop fs -cat /output_test/runcount/part-r-000002018-08-09 2
5、遇到的问题
5.1、WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决:导致该问题的改版本是因为${HADOOP_PREFIX}/lib/native目录没有lib库,解决办法是到hadoop官网下载的包,把lib/native目录下的数据拷贝过去。
5.2、openssl: false Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录)!
解决:/usr/lib64/目录下做一个libcrypto.so软连
接cd /usr/lib64/ln -s /usr/lib64/libcrypto.so.1.0.1e libcrypto.so
- 使用命令
export HADOOP_ROOT_LOGGER=DEBUG,console
可以在终端上看到更详细的日志信息方便排查问题; - 以上两个问题可以使用命令检查是否为true:
hadoop checknative
注:${HADOOP_PREFIX}
表示hadoop的安装目录,或者说是${HADOOP_HOME}
6、参考资料