# This is One of the Solutions > Technical blog by Prabeesh Keezhathra covering Apache Spark installation and performance tuning, PySpark design patterns, CUDA and GPU programming, embedded systems (AVR, MSP430, Arduino), and web development with Python and JavaScript. ## About - Author: Prabeesh Keezhathra - Site: https://blog.prabeeshk.com/ - Language: English - Topics: Apache Spark, PySpark, data engineering, performance tuning, design patterns, CUDA, GPU programming, embedded systems, AVR microcontrollers, MSP430, Arduino, web development ## Key Pages - [Install Apache Spark 3.5 on Linux (Ubuntu, CentOS)](https://blog.prabeeshk.com/blog/2024/11/26/install-apache-spark-3-on-linux/): To install Apache Spark 3.5 on Linux: install OpenJDK 17 and Python 3.8+, download the Spark binary from archive.apache.org/dist/spark, extract to /opt, set SPARK_HOME and PATH environment variables, then verify with spark-shell or pyspark. Optionally configure a standalone cluster with start-master.sh and start-worker.sh. - [Advanced PySpark Performance Optimization Techniques](https://blog.prabeeshk.com/bonus/advanced-performance-optimization-techniques-for-pyspark-data-pipelines/): The three key Spark 3.x performance features beyond basic tuning are Adaptive Query Execution (AQE, enabled by default since Spark 3.2), dynamic partition pruning (eliminates unnecessary partition reads in star-schema joins), and predicate pushdown (pushes filters to the data source). Enable them via spark.sql.adaptive.enabled, spark.sql.optimizer.dynamicPartitionPruning.enabled, and proper filter placement before joins. - [PySpark Design Patterns for Data Pipelines](https://blog.prabeeshk.com/bonus/implementing-design-patterns-in-pyspark-data-pipelines/): The five most useful design patterns for PySpark data pipelines are Factory (create readers/writers for different formats), Singleton (share SparkSession across modules), Builder (compose complex transformations step by step), Observer (monitor pipeline events), and Pipeline (chain transformation stages). Each keeps pipeline code modular and testable as complexity grows. - [Apache Spark Performance Tuning: Spill, Skew, Shuffle, Storage](https://blog.prabeeshk.com/blog/2023/01/06/performance-tuning-on-apache-spark/): The five main causes of slow Apache Spark jobs are spill (data doesn't fit in memory), skew (uneven partition sizes), shuffle (expensive cross-network data movement), storage (tiny files and inferred schemas), and serialization (Python UDF overhead). Fix them by enabling AQE, broadcasting small tables, salting skewed joins, using Parquet with explicit schemas, and replacing Python UDFs with SQL functions or Pandas UDFs. - [How to Run a PySpark Notebook with Docker](https://blog.prabeeshk.com/blog/2015/06/19/pyspark-notebook-with-docker/): To run PySpark in a Jupyter notebook with Docker, run 'docker run -d -t -p 8888:8888 jupyter/pyspark-notebook' and open http://localhost:8888 in your browser. For a current setup use the maintained jupyter/pyspark-notebook image rather than building from source. - [ATtiny2313 USBtinyISP Notes](https://blog.prabeeshk.com/bonus/attiny2313-usb-programming-guide/): Short follow-up notes on the USBtinyISP build: reading fuses with avrdude and a basic what-to-check list when the board won't enumerate. - [Advanced AM Modulation Analysis with Matplotlib](https://blog.prabeeshk.com/bonus/advanced-am-modulation-analysis-with-matplotlib/): Go beyond basic AM waveforms. Build a Matplotlib + NumPy analyzer that measures modulation index, inspects sidebands via FFT, and handles noise. - [PySpark Design Patterns Quick Reference](https://blog.prabeeshk.com/bonus/pyspark-design-patterns-quick-reference/): One-page cheat sheet with runnable snippets for the five core PySpark design patterns: factory, singleton, builder, observer, and pipeline. - [Advanced PySpark Design Patterns: Implementation Examples](https://blog.prabeeshk.com/bonus/advanced-pyspark-design-patterns-implementation/): Three more design patterns applied to PySpark pipelines, with runnable examples: strategy, decorator, and template method. - [Install Apache Spark 2 on Ubuntu 16.04 and macOS](https://blog.prabeeshk.com/blog/2016/12/07/install-apache-spark-2-on-ubuntu-16-dot-04-and-mac-os/): Install Apache Spark 2.0 on Ubuntu 16.04 and macOS. Covers Java setup, the Maven build, Hadoop integration, and environment configuration. - [Building Self-Contained PySpark Applications](https://blog.prabeeshk.com/blog/2015/04/07/self-contained-pyspark-application/): Go from the `pyspark` REPL to a self-contained Python application you can ship with `spark-submit`. Covers project layout, dependencies, and submission. - [Install Apache Spark on Ubuntu 14.04](https://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/): Install Apache Spark 1.1.0 on Ubuntu 14.04. Covers Java, Scala, and git prerequisites, the sbt assembly build, and the Spark shell. ## Optional - [Full content for LLMs](https://blog.prabeeshk.com/llms-full.txt): Complete blog content in Markdown for comprehensive ingestion. - [RSS Feed](https://blog.prabeeshk.com/feed.xml): RSS feed with latest posts. - [Sitemap](https://blog.prabeeshk.com/sitemap.xml): Full sitemap for all pages.