Install Apache Spark 2 on Ubuntu 16.04 and macOS: Complete Setup Guide
Two of the earlier posts are discussing installing Apache Spark-0.8.0 and Apache Spark-1.1.0 on Ubuntu-12.04 and …
By Prabeesh Keezhathra
- 2 minutes read - 408 wordsIn this post, we will discuss how to create an assembled JAR for a standalone Spark application using the sbt-assembly plugin. One of my previous posts, we discussed how to build a stand alone Spark Application using SBT eclipse plugin. Now, we will take it one step further and show you how to create a fat JAR for your Spark project using the sbt-assembly plugin.
The first step in creating an assembled JAR for your Spark application is to add the sbt-assembly plugin. To do this, you will need to add the following line to the project/plugin.sbt file:
1addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.1")
Next, you will need to specify sbt-assembly.git as a dependency in the project/project/build.scala file:
1import sbt._
2
3object Plugins extends Build {
4 lazy val root = Project("root", file(".")) dependsOn(
5 uri("git://github.com/sbt/sbt-assembly.git#0.9.1")
6 )
7}
In the build.sbt file, add the following contents:
1import AssemblyKeys._ // put this at the top of the file,leave the next line blank
2
3assemblySettings
You can use the full keys to configure the assembly plugin for more details refer
target assembly-jar-name test
assembly-option main-class full-classpath
dependency-classpath assembly-excluded-files assembly-excluded-jars
If multiple files share the same relative path, the default strategy is to verify that all candidates have the same contents and error out otherwise. This behavior can be configured for Spark projects using the assembly-merge-strategy as follows:
1mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
2 {
3 case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
4 case PathList("org", "apache", xs @ _*) => MergeStrategy.last
5 case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
6 case "about.html" => MergeStrategy.rename
7 case x => old(x)
8 }
9}
Once you have added the sbt-assembly plugin and configured the assembly settings and merge strategy, you can create the fat JAR for your Spark application. From the root folder of your project, run the following command:
sbt/sbt assembly
This will create the JAR file in the target/scala_2.10/ directory. The name of the JAR file will be in the format of <ProjectName>-assembly-<version>.jar.
You can find an example project on how to create an assembled JAR for a Spark application on GitHub.
Creating an assembled JAR for a standalone Spark application is a straightforward process when using the sbt-assembly plugin. By following the steps outlined in this guide, you can easily create a fat JAR for your Spark application.