Install Apache Spark 2 on Ubuntu 16.04 and Mac OS

Two of the earlier posts are discussing installing Apache Spark-0.8.0 and Apache Spark-1.1.0 on Ubuntu-12.04 and Ubuntu-16.04 respectively. In this post you can discover necessary actions to set up Apache Spark-2.0.2 on Ubuntu 16.04 and Mac OS X Sierra. For enhanced guidance refer above mentioned posts.

Java should be installed in the machine to run Apache Spark. The subsequent commands help quickly install Java in Ubuntu machine.

$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

To install Java in Mac OS X visit oracle website under Java SE Development Kit 7u79 find and download jdk-7u79-macosx-x64.dmg file after accepting the license agreement, double click the downloaded dmg file and follow the instructions.

To check the Java installation is successful run following command in the terminal

$ java -version

It exhibits installed java version

java version "1.7.0_72"_ Java(TM) SE Runtime Environment (build 1.7.0_72-b14)_ Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

install git. Spark build depends on git. on Ubuntu run

sudo apt-get install git

on Mac OS X

brew install git

Finally, downloaded and untar the apache spark 2 distribution to some location, for example /usr/local/share/spark.

$ mkdir /usr/local/share/spark
$ curl | tar xvz -C /usr/local/share/spark


Maven is used for building Spark, which is bundled with it. To build the apache spark run the following

$ cd /usr/local/share/spark/spark-2.0.2

$ ./build/mvn -DskipTests clean package

The building needs some time. After successfully packing you can test a sample program

$ ./bin/run-example SparkPi 10

Then you get the output as Pi is roughly 3.14634 along with the log.

To build the apache spark for the particular version of hadoop use below command

./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

