分布式安装(至少三台主机):
环境所需软件:
CentOS7
hadoop-2.7.3.tar.gz
jdk-8u102-linux-x64.tar.gz
安装前准备工作:
配置免密钥登陆
cd
ssh-keygen -t rsa
一直回车,直到结束
ssh-copy-id .ssh/id_rsa.pub bigdata1
ssh-copy-id .ssh/id_rsa.pub bigdata2
ssh-copy-id .ssh/id_rsa.pub bigdata3
同步时间
通过设置计划任务实现各主机间的时间同步
vim /etc/crontab
0 0 1 root ntpdate -s time.windows.com
或者部署一个时间服务器实现同步,这里就不详细讲解了
(*)hdfs-site.xml
<!--数据块的冗余度,默认是3-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!--是否开启HDFS的权限检查,默认:true-->
<!--
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
-->
core-site.xml
<!--NameNode的地址-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata1:9000</value>
</property>
<!--HDFS数据保存的目录,默认是Linux的tmp目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/training/hadoop-2.7.3/tmp</value>
</property>
mapred-site.xml
<!--MR程序运行的容器是Yarn-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<!--ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata1</value>
</property>
<!--NodeManager运行MR任务的方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
对NameNode进行格式化: hdfs namenode -format
日志:Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
scp -r /root/training/hadoop-2.7.3 bigdata2:/root/training/hadoop-2.7.3
scp -r /root/training/hadoop-2.7.3 bigdata3:/root/training/hadoop-2.7.3
启动:start-all.sh = start-dfs.sh + start-yarn.sh
验证
(*)命令行:hdfs dfsadmin -report
(*)网页:HDFS:http://192.168.157.12:50070/
Yarn:http://192.168.157.12:8088
(*)Demo:测试MapReduce程序
example: /root/training/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount /input/data.txt /output/wc1204
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。