Tuesday, 15 November 2016

Installation of Spark and Python step by step

       Helpguide for getting an glimpse on Installing Spark,Python in their Machine


The blog is for People who want to run spark on top of python with Linux (Ubuntu                         desktop) installed in Oracle VM Box in a Windows Machine

Introduction on Python and Spark

Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.
Spark is admired for many reasons by developers and analysts to quickly query, analyze and transform data at scale. In simple words, you can call Spark a competent alternative to Hadoop, with its characteristics, strengths and limitations. Spark runs in-memory to process data with speed and sophistication than the other complement approaches like Hadoop Map Reduce.

1.Installation of Oracle Virtual Machine Box Manager
In Case of Windows 1)First download the Oracle  VM Virtual Box  Manager 3)Open the VM VirtualBox Manager
  Begin the Installation  process as represented above you will find the screen like this after installing


Step 2: Downloading Linux (Ubuntu Desktop) and installing it in Oracle Virtual Box Manager 

In windows > Go to Google Chrome > Download the Ubuntu Software  following the link  > download Ubuntu Desktop by clicking the download button 
 once it is downloaded open the Oracle Virtual Box Manager by double clicking the Virtual Box Manager icon 1) click on Create a new virtual machine 2)  select Ubuntu (64 bit)>   3) click open  then click next  4)give memory of size 20GB for this machine to run smoothly 5) click next 6)which will install Ubuntu in Oracle Virtual Box Manager, it will take more than 15 minutes to install.

Once Ubuntu is installed it will look like this inside Virtual Box Manager


 3 :Downloading Anaconda 4.2.0 using Ubuntu

1)Inside Ubuntu open internet browser 2) download Anaconda 4.0.2 for windows 3)once download is finished 4) open the terminal 5) type following command after $ symbol 6) press enter

4.PYTHON SPARK

1)Download the latest Version of Spark
After Download is finished Install the Spark
As soon as you install Python Spark ,you must execute the following codes for running a Python Spark in your machine Subsequently you can work on this Platform for further Execution of Programs 

export SPARK_HOME=/home/rajat/spark16
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.3-src.zip:$PYTHONPATH
export SPARK_LOCAL_IP=LOCALHOST
PYSPARK_PYTHON=python3 PYSPARK_DRIVER_PYTHON=ipython3 PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark

4a..To start spark in the command line mode enter the "pyspark" command and you will find the Spark screen







4b.To start spark in ipython notebook
  
You can see the Jupiter Notebook below once the Python spark codes are Run.


4c. The  Spark running on our Ubuntu machine, check out the status at http;//localhost//:


5.Running Simple Programs:

from pyspark import  SparkContext
sc = SparkContext( 'local', 'pyspark')
text = sc.textFile("Twilight.txt")
print (text)
from operator import add
def tokenize(text):
return text.split()
words = text.flatMap(tokenize)
print (words)
PythonRDD[2] at RDD at PythonRDD.scala:48
wc = words.map(lambda x: (x,1))
print (wc.toDebugString())
 
Copy the Codes and Paste it in notebook Line by Line to get the output as done above


Thus spark and python starts working in our Machine.Lets start working for Real Life Datasets.

Hope the explanation reached well.






2 comments:

  1. 1)Inside Ubuntu open internet browser 2) download Anaconda 4.0.2 for windows 3)once download is finished 4) open the terminal 5) type following command after $ symbol 6) press enter ---

    puzzled by above statement ... why should you download Anaconda for Windows inside Ubuntu? Also what commands are to be typed at $ prompt ???? Not making sense

    ReplyDelete
  2. identical to what viswam has written in https://pysparkinstallation.blogspot.in/

    ReplyDelete