本篇内容介绍了“Spark Eclipse开发环境的搭建方法”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
首先下载与集群 Hadoop 版本对应的 Spark 编译好的版本,解压缩到指定位置,注意用户权限
进入解压缩之后的 SPARK_HOME 目录
配置 /etc/profile 或者 ~/.bashrc 中配置 SPARK_HOME
cd $SPARK_HOME/conf cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export SCALA_HOME=/home/hadoop/cluster/scala-2.10.5 export JAVA_HOME=/home/hadoop/cluster/jdk1.7.0_79 export HADOOP_HOME=/home/hadoop/cluster/hadoop-2.6.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop #注意这个地方一定要指定为IP,否则下面的eclipse去连接的时候会报: #All masters are unresponsive! Giving up. 这个错误的。 SPARK_MASTER_IP=10.16.112.121 SPARK_LOCAL_DIRS=/home/hadoop/cluster/spark-1.4.0-bin-hadoop2.6 SPARK_DRIVER_MEMORY=1G
sbin/start-master.sh sbin/start-slave.sh
此时可以在浏览器中输入:http://yourip:8080 查看Spark集群的情况
此时默认的 Spark-Master 为: spark://10.16.112.121:7077
首先下载 Scala-Eclipse IDE 去 scala 官网下载即可
打开IDE, 新建 Maven 项目, pom.xml 填写如下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>spark.test</groupId> <artifactId>FirstTrySpark</artifactId> <version>0.0.1-SNAPSHOT</version> <properties> <!-- 填写对应版本 --> <hadoop.version>2.6.0</hadoop.version> <spark.version>1.4.0</spark.version> </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> <scope>provided</scope> <!-- 记得排除servlet依赖,否则会报冲突 --> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.6.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>${spark.version}</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/java</sourceDirectory> <plugins> <!-- bind the maven-assembly-plugin to the package phase this will create a jar file without the storm dependencies suitable for deployment to a cluster. --> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> </execution> </executions> <configuration> <scalaVersion>2.10</scalaVersion> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.5.5</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugins> <resources> <resource> <directory>src/main/resources</directory> </resource> </resources> </build> </project>
新建几个 Source Folder
src/main/java #编写 java 代码 src/main/scala #编写 scala 代码 src/main/resources #存放资源文件 src/test/java #编写测试 java 代码 src/test/scala #编写测试 scala 代码 src/test/resources #存放资源文件
此时环境全部搭建完毕!
测试代码如下:
import org.apache.spark.SparkConf import org.apache.spark.SparkConf import org.apache.spark.SparkContext /** * @author clebeg */ object FirstTry { def main(args: Array[String]): Unit = { val conf = new SparkConf conf.setMaster("spark://yourip:7077") conf.set("spark.app.name", "first-tryspark") val sc = new SparkContext(conf) val rawblocks = sc.textFile("hdfs://yourip:9000/user/hadoop/linkage") println(rawblocks.first) } }
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
分析问题:点开运行ID对应的运行日志发现下面的错误:
15/10/10 08:49:01 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 15/10/10 08:49:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/10 08:49:02 INFO spark.SecurityManager: Changing view acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: Changing modify acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop, Administrator); users with modify permissions: Set(hadoop, Administrator) 15/10/10 08:49:02 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/10/10 08:49:02 INFO Remoting: Starting remoting 15/10/10 08:49:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@10.16.112.121:58708] 15/10/10 08:49:02 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 58708. Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more 15/10/10 08:51:02 INFO util.Utils: Shutdown hook called
仔细一看原来是权限的问题:立马关闭 Hadoop, 在 etc/hadoop/core-site.xml 中添加:
<property> <name>hadoop.security.authorization</name> <value>false</value> </property>
设置任何人都可以读取,问题立马搞定。
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
到地址http://www.barik.net/archive/2015/01/19/172716/ 下载包含 winutils.exe 的 hadoop2.6 重新编译的版本。注意一定要下载对应自己的Hadoop版本。
减压缩到指定位置,设置 HADOOP_HOME 环境变量。注意一定要重新启动 eclipse。 搞定!
本文中提到的数据在哪里获取? http://bit.ly/1Aoywaq 操作代码如下:
mkdir linkage cd linkage/ curl -o donation.zip http://bit.ly/1Aoywaq unzip donation.zip unzip "block_*.zip" hdfs dfs -mkdir /user/hadoop/linkage hdfs dfs -put block_*.csv /user/hadoop/linkage
“Spark Eclipse开发环境的搭建方法”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注亿速云网站,小编将为大家输出更多高质量的实用文章!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。