这篇文章给大家介绍IDEA WordCount jar包上传spark是怎么调试及排错的,内容非常详细,感兴趣的小伙伴们可以参考借鉴,希望对大家能有所帮助。
Based on:
Mac os
Spark 2.4.3
(Spark running on a standalone mode reference blog :http://blog.itpub.net/69908925/viewspace-2644303/ )
scala 2.12.8
IDEA 2019
1 IDEA-File-Project Structure-Libarary-Scala SDK
select version 2.11.12
这处选择的版本需要跟spark scala运行版本一致,默认的是本机装的Scala版本2.12.8,spark上运行会报主类错误
2 新建project ,pom.xml添加依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.ny.service</groupId>
<artifactId>scala517</artifactId>
<version>1.0</version>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependencies>
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
<!-- 以下dependency都要修改成自己的scala,spark,hadoop版本-->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.12</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.3</version>
</dependency>
</dependencies>
<build>
<!--程序主目录,按照自己的路径修改,如果有测试文件还要加一个testDirectory-->
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<!--<transformers>-->
<!--<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">-->
<!--<mainClass></mainClass>-->
<!--</transformer>-->
<!--</transformers>-->
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<useUniqueVersions>false</useUniqueVersions>
<classpathPrefix>lib/</classpathPrefix>
<!--修改为自己的包名.类名,右键类->copy reference-->
<mainClass>com.ny.service.WordCount</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
scala library 选择spark中的Scala版本 2.11.12 也是目前支持的最近版本
org.apache.spark 也选择2.11
否则会出现主类错误:
19/05/16 10:52:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60010 (size: 22.9 KB, free: 366.3 MB)
19/05/16 10:52:03 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:18
Exception in thread "main" java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction2$mcIII$sp
at com.nyc.WordCount$.main(WordCount.scala:24)
at com.nyc.WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
如何查看spark 中Scala版本号
进入路径:
/usr/local/opt/spark-2.4.3/jars
3 word count测试脚本
package com.ny.service
import org.apache.spark.{SparkConf, SparkContext}
object WordCount{
def main(args: Array[String]): Unit = {
// 1 创建配置信息
val conf = new SparkConf().setAppName("wc")
// 2 创建spark context sc
val sc = new SparkContext(conf)
// 3 处理逻辑
//读取文件
val lines = sc.textFile(args(0))
//压平
val words = lines.flatMap(_.split(" "))
//map
val k2v = words.map((_,1))
val results = k2v.reduceByKey(_+_)
//保存数据
results.saveAsTextFile(args(1))
// 4 关闭连接
sc.stop()
}
}
4 打包
复制到spark家目录下,因为standalone模式所以没有启动Hadoop集群
nancylulululu:spark-2.4.3 nancy$ mv /Users/nancy/IdeaProjects/scala517/target/original-scala517-1.0.jar wc.jar
5 spark submit 执行
bin/spark-submit \
--class com.ny.service.WordCount \
--master spark://localhost:7077 \
./wc.jar \
file:///usr/local/opt/spark-2.4.3/test/1test \
file:///usr/local/opt/spark-2.4.3/test/out
如果是Hadoop file改为hdfs文件系统路径
查看执行结果文件:
nancylulululu:out nancy$ ls
_SUCCESSpart-00000part-00001
nancylulululu:out nancy$ cat part-00000
(scala,2)
(hive,1)
(mysql,1)
(hello,5)
(java,2)
关于IDEA WordCount jar包上传spark是怎么调试及排错的就分享到这里了,希望以上内容可以对大家有一定的帮助,可以学到更多知识。如果觉得文章不错,可以把它分享出去让更多的人看到。
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。
原文链接:http://blog.itpub.net/69908925/viewspace-2644643/