Install Apache Spark 2 on Ubuntu 16.04 and Mac OS
Two of the earlier posts are discussing installing Apache Spark-0.8.0 and Apache Spark-1.1.0 on Ubuntu-12.04 and Ubuntu-14.04 respectively. In this post you can discover necessary actions to set up Apache Spark-2.0.2 on Ubuntu 16.04 and Mac OS X Sierra. For enhanced guidance refer above mentioned posts.
pyspark notebook with docker
Install Docker
Using the following command one can install docker. I have done the same using Ubuntu-14-04 instance. For richer options refer the docker official site
wget -qO- https://get.docker.com/ | sh
Now run the following command from any machine on which docker is installed.
docker run -d -t -p 8888:8888 prabeeshk/pyspark-notebook
After successfully running the pyspark-notebook docker container, access pyspark ipython notebook by
Self Contained PySpark Application
In my previous post, I wrote about installation of Spark and Scala interactive shell. Here in this post, we’ll see how to do the same in Python.
Similar to Scala interactive shell, there is an interactive shell available for Python. You can run it with the below command from spark root folder:
./bin/pyspark
Now you can enjoy Spark using Python interactive shell.
This shell might be sufficient for experimentations and developments. However, for production level, we should use a standalone application.
Install Apache Spark on Ubuntu-14.04
Update: For Apache Spark 2 refer latest post
One of the previous post mentioning about install Apache Spark-0.8.0 on Ubuntu-12.04. In this post explain about detailed steps to set up Apache Spark-1.1.0 on Ubuntu. For running Spark in Ubuntu machine should install Java. Using following commands easily install Java in Ubuntu machine.
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
To check the Java installation is successful
$ java -version
It shows installed java version
java version "1.7.0_72"_ Java(TM) SE Runtime Environment (build 1.7.0_72-b14)_ Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
In next step is install Scala, follow the following instructions to set up Scala.
Creating assembled JAR for Standalone Spark Application
In the previous post shared how to use sbt in Spark-streaming project. This post is about how to create a fat jar for spark streaming project using sbt plugin. sbt-assembly is an sbt plugin to create a fat JAR of sbt project with all of its dependencies.
Add sbt-assembly plugin in project/plugin.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.1")
Specify sbt-assembly.git as a dependency in project/project/build.scala
import sbt._
object Plugins extends Build {
lazy val root = Project("root", file(".")) dependsOn(
uri("git://github.com/sbt/sbt-assembly.git#0.9.1")
)
}
In build.sbt file add the following contents
A Standalone Spark Application in Scala
Sharing some ideas about how to create a Spark-streaming stand-alone application and how to run the Spark applications in scala-SDK (Eclipse IDE).
Building Spark Application using SBT
A Standalone application in Scala using Apache Spark API. The application is build using Simple Build Tool(SBT).
For creating a stand alone app take the twitter popular tag example
This program calculates popular hashtags (popular topics) over sliding 10 and 60 second windows from a Twitter stream. The stream is instantiated with credentials and optionally filters supplied by the command line arguments.
But here modified the code for talking twitter authentication credentials through command line argument. So it needs to give the arguments as
Installing Apache Spark on Ubuntu-12.04
Update: To install Apache Spark-1.0 follow this post
Apache Spark is an open source in memory cluster computing framework. Initially developed in UC Berkely AMPLab and now an Apache Incubator Project. Apache Spark is a cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter. It provides clean, language-integrated APIs in Scala, Java, and Python, with a rich array of parallel operators. You may read more about it here
You can download the Apache Spark distribution(0.8.0-incubating) from here. After that untar the downloaded file.
$ tar xvf spark-0.8.0-incubating.tgz
You need to have Scala installed, or the SCALA_HOME environment variable pointing to a Scala installation.
Building
SBT(Simple Build Tool) is used for building Spark, which is bundled with it. To compile the code
Running Mesos-0.13.0 on Ubuntu-12.04
You will need the following packages to run Mesos.
$ sudo apt-get install python2.7-dev g++ libcppunit-dev libunwind7-dev git libcurl4-nss-dev
You need to have Java installed, or the JAVA_HOME environment variable pointing to a Java installation.
You can download the Mesos distribution from here. After that untar the downloaded file
$ tar xvf mesos-0.13.0.tar.gz
Building and Installing
$ cd mesos-0.13.0
$ mkdir build
$ cd build
$ sudo ../configure --prefix=/home/user/mesos
$ sudo make
$ sudo make check
$ sudo make install
You can pass the –prefix option while configuring to tell where to install. For example
MQTT Scala Publisher and Subscriber using Eclipse Paho
MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed with extremely lightweight that support embedded and low power processing device. You may read more about it here. MQTT is broker based message queuing system. To work with Mqtt, Mqtt Message broker/server required. Mosquitto is an open source Mqtt Broker. In ubuntu mosquitto can be installed using the command
$ sudo apt-get install mosquitto
Eclipse Paho is one mqtt client work well with mosquitto. You may read more about it here.
MQTT Scala subscriber and publisher code based on eclipse paho library 0.4.0 is available in