这篇文章主要为大家展示了“如何在hadoop YARN上运行spark-shell”,内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下“如何在hadoop YARN上运行spark-shell”这篇文章吧。
1. spark模式架构图 ![](https://cache.yisu.com/upload/information/20210522/355/683134.png "在这里输入图片标题") 2. Scala下载安装 a. 官网: http://www.scala-alng.org/files/archive/ b. 选择好版本,复制链接,使用wget 命令下载 wget http://www.scala-alng.org/files/archive/scala-2.11.6.tgz c. 解压 tar xvf scala-2.11.6.tgz sudo mv scala-2.11.6 /usr/local/scala # 将scala移动到/usr/local目录 d. 设置环境变量 sudo gedit ~/.bashrc export SCALA_HOME=/usr/local/scala export PATH=$PATH:$SCALA_HOME/bin source ~/.bashrc # 使配置生效 e. 启动scala hduser[@master](https://my.oschina.net/u/48054):~$ scala 3. Spark安装 a. 官网: http://spark.apache.org/downloads.html b. 选择版本1.4 || Pre-built for Hadoop 2.6 and later || 复制链接使用wget 命令下载 c. wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz d. 解压并移动到 /usr/local/spark/ e. 编辑环境变量 f. sudo gedit ~/.bashrc export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin g. source ~/.bashrc # 使配置生效 4. 启动spark-shell交互页面 hduser[@master](https://my.oschina.net/u/48054):~$ spark-shell 5. 启动hadoop 6. 在本地运行spark-shell a. spark-shell --master local[4] b. 读取本地文件 val textFile=sc.textFile("file:/usr/local/spark/LREADME.md") textFile.count 7. 在Hadoop Yarn 运行spark-shell SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar # 设置sparkjar文件路径 HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop # 设置hadoop配置文件目录 MASTER=yarn-client # 设置运行模式是yarn-client /usr/local/spark/bin/spark-shell # 要运行的spark-shell的完整路径 8. 构建Spark Standalone Cluster执行环境 a. cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh # 复制模板文件 在进行设置 b. 设置spark-env.sh c. sudo gedit /usr/local/spark/conf/spark-env.sh export SPARK_MASTER_IP=master master的IP export SPARK_WORKER_CORES=1 每个worker使用的cpu核心 export SPARK_WORKER_MEMORY=600m 每个worker使用的内存 export SPARK_WORKER_INSTANCES=1 设置每个worker实例 # 一定要注意自己的内存 # hadoop+spark 在多个虚拟机上运行起来后8G内存是不够用的 非常耗内存 # 资源在经过虚拟机后会有比较大的损耗 d. 使用ssh链接data1,data2 并创建spark目录 sudo mkdir /usr/local/spark sudo chown hduser:hduser /usr/local/spark # 对data1 和data2执行上面的操作 e. 将master的spark复制到data1,data2上 sudo scp -r /usr/local/spark hduser@data1:/usr/local sudo scp -r /usr/local/spark hduser@data2:/usr/local f. 编辑slaves文件 sudo gedit /usr/local/spark/conf/slaves data1 data2 9. 在Spark Standalone运行spark-shell a. 启动Spark Standalone Cluster /usr/local/spark/sbin/start-all.sh b. 运行 spark-shell --master spark://master:7077 c. 查看Spark Standalone Web UI界面 http://master:8080/ d. 停止Spark Standalone Cluster /usr/local/spark/sbin/stop-all.sh 10. 命令参考 152 scala 153 jps 154 wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz 155 ping www.baidu.com 156 ssh data3 157 ssh data2 158 ssh data1 159 jps 160 start-all.sh 161 jps 162 spark-shell 163 spark-shell --master local[4] 164 SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell 165 ssh data2 166 ssh data1 167 cd /usr/local/hadoop/etc/hadoop/ 168 ll 169 sudo gedit masters 170 sudo gedit slaves 171 sudo gedit /etc/hosts 172 sudo gedit hdfs-site.xml 173 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs 174 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode 175 sudo chown -R hduser:hduser /usr/local/hadoop 176 hadoop namenode -format 177 start-all.sh 178 jps 179 spark-shell 180 SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell 181 ssh data1 182 ssh data2 183 ssh data1 184 start-all.sh 185 jps 186 cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh 187 sudo gedit /usr/local/spark/conf/spark-env.sh 188 sudo scp -r /usr/local/spark hduser@data1:/usr/local 189 sudo scp -r /usr/local/spark hduser@data2:/usr/local 190 sudo gedit /usr/local/spark/conf/slaves 191 /usr/local/spark/sbin/start-all.sh 192 spark-shell --master spark://master:7077 193 /usr/local/spark/sbin/stop-all.sh 194 jps 195 stop-all.sh 196 history
以上是“如何在hadoop YARN上运行spark-shell”这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注亿速云行业资讯频道!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。