Quick Setup Zeppelin Notebook

In this article i describe a quick way to have zeepelin running so that you could quickly testing some Spark application.

NOTE: This procedure shouldn’t be used in production environments has you should setup the Notebook with auth and connected to your local infrastructure.

Requirements

  • One should have a docker environment setup. Check my previous article if you need some help with that
  • Docker-compose

Setup

  1. Create a folder named zeepelin
mkdir docker-zeepelin
  1. Create a data where you could put some data to analyse.
mkdir -p docker-zeepelin/data
  1. Create the following docker-compose.yml file in dir docker-zeepelin :
version: '2'
services:
zeppelin:
ports:
- "8080:8080"
volumes:
- ./data:/opt/data
image: "dylanmei/zeppelin"

  1. Launch docker-compose
sudo docker-compose up -d
  1. That’s it you should now be able to access http://localhost:8080

Test it

  1. Lets download a demo file to our data dir.
curl -s https://api.opendota.com/api/publicMatches -o ./data/OpenDotaPublic.json

Yeah! I kinda like Dota so this makes sense :D

  1. Create a new NoteBook in the web Interface and use the following code
%spark

val df = sqlContext.read.json("file:///opt/data/OpenDotaPublic.json")
df.show

Hit: Shift-Enter

  1. Let’s register this dataframe as temp table and create some visuals
%spark
df.registerTempTable("publicmatches")
  1. Create the following to generate visualizations
    %sql
    select radiant_win,match_id
    from publicmatches

Guess i need to start playing on Radiant side :D

Well and that’s it.

Cheers,
RR

References