Creating Uber JARs for Spark Projects with sbt-assembly
sbt-assembly packages a Spark project plus all its dependencies into one runnable “uber” JAR, which you can …
By Prabeesh Keezhathra
- 2 minutes read - 353 wordsThis post walks through building a Spark Streaming application in Scala that extracts popular hashtags from the Twitter firehose, packaged with sbt, and runnable from the Eclipse IDE via the sbteclipse plugin.
A standalone Scala application built against the Apache Spark API and packaged with sbt (Simple Build Tool).
For creating a stand alone app take the twitter popular tag example
This program calculates popular hashtags (popular topics) over sliding 10 and 60 second windows from a Twitter stream. The stream is instantiated with credentials and optionally filters supplied by the command line arguments.
But here modified the code for talking twitter authentication credentials through command line argument. So it needs to give the arguments as
master consumerKey consumerSecret accessToken accessTokenSecret filters.
1// Twitter Authentication credentials
2System.setProperty("twitter4j.oauth.consumerKey", args(1))
3System.setProperty("twitter4j.oauth.consumerSecret", args(2))
4System.setProperty("twitter4j.oauth.accessToken", args(3))
5System.setProperty("twitter4j.oauth.accessTokenSecret", args(4))
If you want to read Twitter authentication credentials from a file, see this TwitterUtils example.
The sbt configuration file. For more detail about sbt, see the sbt setup guide.
1name := "TwitterPopularTags"
2
3version := "0.1.0"
4
5scalaVersion := "2.10.3"
6
7libraryDependencies ++= Seq("org.apache.spark" %%
8"spark-core" % "0.9.0-incubating",
9"org.apache.spark" %% "spark-streaming" % "0.9.0-incubating",
10"org.apache.spark" %% "spark-streaming-twitter" % "0.9.0-incubating")
11
12resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
You can find the project at Github
Using sbt eclipse plugin, sbt project can run on Eclipse IDE. For more details find SBT Eclipse
1addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.1.0")
then run from the root folder of the project
sbt/sbt eclipse
This command creates a project compatible with Eclipse. Upon opening the eclipse IDE this project can now be imported and the executed with the spark.
To avoid generating Eclipse source entries for the java directories and put all libs in lib_managed (so you can distribute Eclipse project files), add this to build.sbt:
1retrieveManaged := true
2
3EclipseKeys.relativizeLibs := true
4
5(unmanagedSourceDirectories in Compile) <<= (scalaSource in Compile)(Seq(_))
6
7(unmanagedSourceDirectories in Test) <<= (scalaSource in Test)(Seq(_))
You can find the sbt eclipse project here
Once you have the application running, see the Uber JAR post for packaging it into a single deployable JAR with sbt-assembly.