Install Apache Spark 3.5 on Linux (Ubuntu, CentOS)
To install Apache Spark 3.5 on Linux, install OpenJDK 17, download the Spark 3.5 binary from …
By Prabeesh Keezhathra
- 2 minutes read - 265 wordsEarlier posts covered Spark 1.1.0 on Ubuntu 14.04. This one walks through Spark 2.0.2 on Ubuntu 16.04 and Mac OS X Sierra. For the latest version, see Install Apache Spark 3.5 on Linux.
Java must be installed first. On Ubuntu:
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
On macOS, grab jdk-7u79-macosx-x64.dmg from the Oracle download page, accept the license, and double-click the dmg to install.
Verify:
$ java -version
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
The build depends on git. On Ubuntu:
sudo apt-get install git
On macOS:
brew install git
Download and untar the Spark 2 distribution, for example into /usr/local/share/spark:
$ mkdir /usr/local/share/spark
$ curl http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2.tgz | tar xvz -C /usr/local/share/spark
Maven is bundled with Spark, so you can build in-place:
$ cd /usr/local/share/spark/spark-2.0.2
$ ./build/mvn -DskipTests clean package
The build takes a while. Once it finishes, run a sample job to confirm:
$ ./bin/run-example SparkPi 10
You’ll see Pi is roughly 3.14634 in the output.
To build against a specific Hadoop version:
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package
See the official building docs for more options. For the older Spark 1 install walkthrough, see the Ubuntu 14.04 post.
After a successful build, make sure the shell knows where to find the binaries. Add these to ~/.bashrc:
1export SPARK_HOME=/usr/local/share/spark/spark-2.0.2
2export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Then reload with source ~/.bashrc and confirm with spark-shell --version.