This is One of the Solutions

Pyspark Notebook With Docker

| Comments

Install Docker

Using the following command one can install docker. I have done the same using Ubuntu-14-04 instance. For richer options refer the docker official site

 wget -qO- | sh

Now run the following command from any machine on which docker is installed.

docker run -d -t -p 8888:8888 prabeeshk/pyspark-notebook

After successfully running the pyspark-notebook docker container, access pyspark ipython notebook by

Self Contained PySpark Application

| Comments

In my previous post, I wrote about installation of Spark and Scala interactive shell. Here in this post, we’ll see how to do the same in Python.

Similar to Scala interactive shell, there is an interactive shell available for Python. You can run it with the below command from spark root folder:


Now you can enjoy Spark using Python interactive shell.

This shell might be sufficient for experimentations and developments. However, for production level, we should use a standalone application.

Install Apache Spark on Ubuntu-14.04

| Comments

Update: For Apache Spark 2 refer latest post

One of the previous post mentioning about install Apache Spark-0.8.0 on Ubuntu-12.04. In this post explain about detailed steps to set up Apache Spark-1.1.0 on Ubuntu. For running Spark in Ubuntu machine should install Java. Using following commands easily install Java in Ubuntu machine.

$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

To check the Java installation is successful

$ java -version

It shows installed java version

java version "1.7.0_72"_ Java(TM) SE Runtime Environment (build 1.7.0_72-b14)_ Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

In next step is install Scala, follow the following instructions to set up Scala.

Creating Assembled JAR for Standalone Spark Application

| Comments

In the previous post shared how to use sbt in Spark-streaming project. This post is about how to create a fat jar for spark streaming project using sbt plugin. sbt-assembly is an sbt plugin to create a fat JAR of sbt project with all of its dependencies.

Add sbt-assembly plugin in project/plugin.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.9.1")

Specify sbt-assembly.git as a dependency in project/project/build.scala

import sbt._

object Plugins extends Build {
  lazy val root = Project("root", file(".")) dependsOn(

In build.sbt file add the following contents

A Standalone Spark Application in Scala

| Comments

Sharing some ideas about how to create a Spark-streaming stand-alone application and how to run the Spark applications in scala-SDK (Eclipse IDE).

Building Spark Application using SBT

A Standalone application in Scala using Apache Spark API. The application is build using Simple Build Tool(SBT).

For creating a stand alone app take the twitter popular tag example

This program calculates popular hashtags (popular topics) over sliding 10 and 60 second windows from a Twitter stream. The stream is instantiated with credentials and optionally filters supplied by the command line arguments.

But here modified the code for talking twitter authentication credentials through command line argument. So it needs to give the arguments as

Installing Apache Spark on Ubuntu-12.04

| Comments

Update: To install Apache Spark-1.0 follow this post

Apache Spark is an open source in memory cluster computing framework. Initially developed in UC Berkely AMPLab and now an Apache Incubator Project. Apache Spark is a cluster computing framework designed for low-latency iterative jobs and interactive use from an interpreter. It provides clean, language-integrated APIs in Scala, Java, and Python, with a rich array of parallel operators. You may read more about it here

You can download the Apache Spark distribution(0.8.0-incubating) from here. After that untar the downloaded file.

$ tar xvf spark-0.8.0-incubating.tgz

You need to have Scala installed, or the SCALA_HOME environment variable pointing to a Scala installation.


SBT(Simple Build Tool) is used for building Spark, which is bundled with it. To compile the code

Running Mesos-0.13.0 on Ubuntu-12.04

| Comments

You will need the following packages to run Mesos.

$ sudo apt-get install python2.7-dev g++ libcppunit-dev libunwind7-dev git libcurl4-nss-dev

You need to have Java installed, or the JAVA_HOME environment variable pointing to a Java installation.

You can download the Mesos distribution from here. After that untar the downloaded file

$ tar xvf mesos-0.13.0.tar.gz

Building and Installing

$ cd mesos-0.13.0
$ mkdir build
$ cd build
$ sudo  ../configure --prefix=/home/user/mesos
$ sudo make
$ sudo make check
$ sudo make install

You can pass the –prefix option while configuring to tell where to install. For example

MQTT Scala Publisher and Subscriber Using Eclipse Paho

| Comments

MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed with extremely lightweight that support embedded and low power processing device. You may read more about it here. MQTT is broker based message queuing system. To work with Mqtt, Mqtt Message broker/server required. Mosquitto is an open source Mqtt Broker. In ubuntu mosquitto can be installed using the command

$ sudo apt-get install mosquitto

Eclipse Paho is one mqtt client work well with mosquitto. You may read more about it here.

MQTT Scala subscriber and publisher code based on eclipse paho library 0.4.0 is available in

Paint App Using Flask With MongoDB

| Comments

Here the paint app is modified using with a new database system. The MongoDB is a famous NoSQL database system. The NoSQL database is a simple lightweight mechanism. It provides high scalability and availability. It provides horizontal scaling of data. This system redefined the database concept from the traditional relational database system. MongoDB is an open-source, document-oriented database designed for ease of development and scaling. The main features of MongoDB are flexibility, power, speed, and ease of use. The MongoDB can installed in local machine by following the instructions from official website

Some commands used in the MonoDB operations are given below:

db :- After starting the mongo shell your session will use the test database for context, by default. At any time issue the above operation at the mongo to report the current database. show dbs :- Display the list of databases from the mongo shell. use mydb :- Switch to a new database named mydb. help :- At any point you can access help for the mango shell using this operation. db.things.insert() :- Insert documents into the collection things.When you insert the first document, the mangod will create both the database and the things collection. show collections :- Displays the available collections in the database. db.things.find() :- Finds the documents in the collection. The documents to be found can be specified through arguments of the find function. The cursor of the MongoDB displays only the first 20 output documents. it command is used to display the rest of the documents.

The source code is available in

Paint App Using JavaScript and Canvas

| Comments

An application to draw simple drawings using lines, rectangles and circles in different colours.


The application is developed using JavaScript and HTML5. The canvas feature in HTML5 is used for providing a drawable region. The JavaScript is used to handle drawing functions in this region. The select button to select the different tools to draw.

Simple CUDA Program

| Comments

In my previous post I wrote about an introduction to parallel programming with CUDA. In this post explaining a simple example CUDA code to compute squares of 64 numbers. A typical GPU program consists of following steps.

1- CPU allocates storage on GPU
2- CPU copies input data from CPU to GPU
3- CPU launch kernels on GPU to process the data
4- CPU copies result back to CPU from GPU
nvcc -o square

Here is instead of running the regular C compiler we are running nvcc, the Nvidia C Compiler. The output is going to go an executable called square and our input file is “”. cu is the convention for how we name.Source code is available on github

We are going to walk through the CPU code first.

Introduction to Parallel Programing

| Comments

This post focuses on parallel computing on the GPU. Parallel computing is a way of solving large problems by breaking them into smaller pieces and run these smaller pieces at the same time.

Main reasons of technical trends in the parallel computing on the GPU

Modern processors are made from transistors. And each year, those transistors get smaller and smaller. The feature size is the minimum size of a transistor on a chip. As the feature size decreases, transistors get smaller, run faster, use less power, and put more of them on a chip. And the consequence is that ,more and more resources for computation every single year. One of the primary features of processors is clock speed . Over many years, the clock speeds continue to go up. However, over the last decade, that have essentially remained constant. Even though transistors are continuing to get smaller and faster and consume less energy per transistor, Running a billion transistors generates a high amount of heat . since we can’t keep all these processors cool, Power has emerged as a primary driving factor.

Traditional CPUs has very complicated control hardware. This allows flexibility in performance, but as control hardware gets more complicated, it becomes more expensive in terms of power and design complexity.

Developing a Simple Game With HTML5/canvas

| Comments

HTML5 is the new HTML standard. One of the most interesting new features in HTML5 is the canvas element canvas for 2D drawing. A canvas is a rectangular area on an HTML page. All drawing on the canvas must be done using JavaScript. This post goes through the basics of implementing a 2D canvas context, and using the basic canvas functions for developing a simple game. Creating a canvad context, adding the canvas element to your HTML document like so

<canvas id="Canvas" width="800" height="450"></canvas>

To draw inside the canvas need to use Javascript. First find the canvas element using getElementById, then initialize the context.

var canvas = documnet.getElementById("Canvas");
var context = canvas.getContext("2d")


To draw text on a canvas, the most import property and methods are:

Finding RC Constant Using ATmega8

| Comments

The time constant(sec) of an RC circuit is equal to the product of the resistance and the capacitance of the circuit.

It is the time required to charge the capacitor through the resistor to 63. 2% of full charge,or to discharge it to 36.8% of its initial voltage.

The voltage of the RC circuit is measured using adc of the ATmega8, input voltage for RC circuit is given from PB0. The timer is started at the time of the PB0 making 1 .

The adc of ATmega8(ADCH) is 8 bit long so corresponding to 5V get 255 in ADCH. The TCNT1 value is taken to a variable when the output voltage of the RC circuit become 63.2% of the input voltage.That is 3.16 v corresponding to these voltage ADCH show 161(appr).

Using an LCD can show the TCNT1 value. TCNT1 is 16 bit long.Here ATmega8 running in 8MHz clock,timer prescaled by 1024.

So if you get the real time multiply the TCNT1 value to (1024/8000000).

Some test examples:

Running Arduino Codes in Stand Alone Atmega8

| Comments

An Arduino board consists of an 8-bit Atmel AVR microcontroller with complementary components to facilitate programming and incorporation into other circuits. If you wish to study the arduino codes ,then one of the major problems is the availability and cost of the Arduino board. If you have an atmega8 microcontroller then you have to study the Arduino codes by simply changing some options in Arduino IDE.

First download the arduino IDE(I am using Arduino 1.0). Next you have to an avr programmer(I am using usbasp and usbtiny). Launch the arduino IDE as root.Then select your programmer from tools and also select your board in this case select ATmega8. Take care in fuse bytes because arduino codes are running in 8MHz.Y ou can enable internal 8MHz clock by

-U lfuse:w:0xa4:m -U hfuse:w:0xcc:m

Or you can enable the external crystal by setting the fuse byte as

LCD Interfacing Using Msp430

| Comments

There is a pot connect to the ADC of msp430 Ao(pin p1.0). The values of ADC10MEM displayed using LCD.

The Vcc for pot is taken from msp430 maximum voltage is 3.6v.

The msp430 10 bit ADC operates in the range 0 to 3.6V. If the input voltage is 0V,

the ADC generates a 10 bit value:

0 0 0 0 0 0 0 0 0 0

which is numerically equal to 0.

When the input voltage is 3.6V, the ADC

generates a 10 bit pattern:

1 1 1 1 1 1 1 1 1 1

which is numerically equal to 1023.

These values are stored in ADC10MEM.

Introduction to AVR Programming

| Comments

Atmel AVR 8-bit and 32-bit microcontrollers deliver a unique combination of performance, power efficiency, and design flexibility. Optimized to speed time to market, they are based on the industry’s most code-efficient architecture for C and assembly programming. No other microcontrollers deliver more computing performance with better power efficiency. Industry-leading development tools and design support let you get to market faster. Once there, the large AVR family lets you reuse your knowledge when improving your products and expanding to new markets—easily and cost-effectively.

package required in linux

binutils: Programs to manipulate binary and object files that may have been created for Atmel’s AVR architecture. This package is primarily for AVR developers and cross-compilers.

gcc-avr: The GNU C compiler, a fairly portable optimising compiler that supports multiple languages. This package includes C language support.

avr-libc: Standard library used for developing C programs for Atmel AVR microcontrollers. This package contains static libraries, as well as needed header files.

sample programme to blink a LED.

AM Generation Using Matplotlib Python

| Comments

we can plot AM waves using matplotlib

It is the one of the most strongest tool in linux to plot the waves

import matplotlib.pylab as plt
import numpy as num