Blogs
Apache Spark Performance Tuning Tutorial: Complete Guide with PySpark Examples
Performance tuning is an important aspect of working with Apache Spark, as it can help ensure that your data processing tasks are efficient and run smoothly. This comprehensive Apache Spark tutorial covers advanced performance optimization techniques that every data engineer should know.
In this PySpark tutorial, we will delve into the five critical areas for Apache Spark performance tuning: spill, skew, shuffle, storage, and serialization. Whether you’re new to Apache Spark or looking to …
Continue Reading
: Apache Spark Performance Tuning Tutorial: Complete Guide with PySpark ExamplesBlogs
Install Apache Spark 2 on Ubuntu 16.04 and macOS: Complete Setup Guide
Two of the earlier posts are discussing installing Apache Spark-0.8.0 and Apache Spark-1.1.0 on Ubuntu-12.04 and Ubuntu-14.04 respectively. In this post you can discover necessary actions to set up Apache Spark-2.0.2 on Ubuntu 16.04 and Mac OS X Sierra. For enhanced guidance refer above mentioned posts.
Continue Reading
: Install Apache Spark 2 on Ubuntu 16.04 and macOS: Complete Setup GuideBlogs
How to Run a PySpark Notebook with Docker
Apache Spark is a powerful big data processing engine that is well-suited for use in a distributed environment. One way to interact with Spark is through the use of an IPython Notebook, which allows you to run and debug your Spark code in an interactive manner. This tutorial will guide you through the process of setting up and running a PySpark Notebook using Docker.
Installing Docker Docker is a containerization platform that allows you to package and deploy your applications in a predictable …
Continue Reading
: How to Run a PySpark Notebook with DockerBlogs
Building Self-Contained PySpark Applications: Complete Development Guide
In my previous post, I wrote about installation of Spark and Scala interactive shell. Here in this post, we’ll see how to do the same in Python.
Similar to Scala interactive shell, there is an interactive shell available for Python. You can run it with the below command from spark root folder:
./bin/pyspark Now you can enjoy Spark using Python interactive shell.
This shell might be sufficient for experimentations and developments. However, for production level, we should use a standalone …
Continue Reading
: Building Self-Contained PySpark Applications: Complete Development GuideBlogs
Install Apache Spark on Ubuntu-14.04
Update: For Apache Spark 2 refer latest post One of the previous post mentioning about install Apache Spark-0.8.0 on Ubuntu-12.04. In this post explain about detailed steps to set up Apache Spark-1.1.0 on Ubuntu. For running Spark in Ubuntu machine should install Java. Using following commands easily install Java in Ubuntu machine.
$ sudo apt-add-repository ppa:webupd8team/java $ sudo apt-get update $ sudo apt-get install oracle-java7-installer To check the Java installation is successful
$ java …
Continue Reading
: Install Apache Spark on Ubuntu-14.04Blogs
Creating Uber JARs for Apache Spark Projects: Complete SBT Assembly Guide
In this post, we will discuss how to create an assembled JAR for a standalone Spark application using the sbt-assembly plugin. One of my previous posts, we discussed how to build a stand alone Spark Application using SBT eclipse plugin. Now, we will take it one step further and show you how to create a fat JAR for your Spark project using the sbt-assembly plugin.
Adding the sbt-assembly Plugin The first step in creating an assembled JAR for your Spark application is to add the sbt-assembly …
Continue Reading
: Creating Uber JARs for Apache Spark Projects: Complete SBT Assembly GuideBlogs
Creating a Standalone Spark Application in Scala: A Step-by-Step Guide with Twitter Streaming Example
This blog post will guide you through the process of building a Spark application in Scala that calculates popular hashtags from a Twitter stream. You will also learn how to use the sbt eclipse plugin to run the application in the Eclipse Integrated Development Environment (IDE). Whether you are new to big data processing or looking to improve your skills in data enginering and analytics, this tutorial has something to offer. Follow along with our step-by-step guide to develop your own stand …
Continue Reading
: Creating a Standalone Spark Application in Scala: A Step-by-Step Guide with Twitter Streaming ExampleBlogs
Complete Guide: Install Apache Spark on Linux (Ubuntu, CentOS) - 2024 Updated
Apache Spark has evolved dramatically since its early releases, becoming the de facto standard for large-scale data processing and analytics. This comprehensive guide covers installing the latest Apache Spark 3.5+ on modern Linux distributions with best practices for both development and production environments.
Update Notice: This guide covers modern Apache Spark 3.5+ installation. For historical reference, our previous guides covered Apache Spark 1.0 installation and Apache Spark 2.x setup. …
Continue Reading
: Complete Guide: Install Apache Spark on Linux (Ubuntu, CentOS) - 2024 UpdatedBlogs
Running Mesos-0.13.0 on Ubuntu-12.04
You will need the following packages to run Mesos.
$ sudo apt-get install python2.7-dev g++ libcppunit-dev libunwind7-dev git libcurl4-nss-dev You need to have Java installed, or the JAVA_HOME environment variable pointing to a Java installation.
You can download the Mesos distribution from official website. After that untar the downloaded file
$ tar xvf mesos-0.13.0.tar.gz Building and Installing $ cd mesos-0.13.0 $ mkdir build $ cd build $ sudo ../configure --prefix=/home/user/mesos $ sudo …
Continue Reading
: Running Mesos-0.13.0 on Ubuntu-12.04