iPython Jupyter Notebooks for Data Science and Big Data

by Zymr

Share It


In the last post we looked at Zeppelin Notebooks used for data science and working with big data.  Here we look at Jupyter.  iPython came first.  Zeppelin is a variation on that that includes new features and ideas.

iPython Jupyter is like a scratchpad for data science and big data programming.  It’s a REPL (read eval print loop) tool.

Jupyter lets you write Python, R, and many other programs and then document those as you write them by adding markdown, i.e. bold face and other styles.  And then you can connect Jupyter to Spark so that you can write Python code with Spark and do that from an easy-to-use interface instead of using the Linux command line or Spark shell.

Jupyter Web Interactive Shell

iPython (interactive Python) is, as the name suggests, a Python shell.  The latest upgrade to that is now called Jupyter.  The name comes from Julia, Python, and R.  (Julia is a new programming language for scientific computing.)

The difference between ordinary iPython and an iPython Notebook is that notebook supports code and markdown all in the same page.  That means you can write self-documenting code and share it as a web page.  So you could write a predictive model using big data and R and share that with your users.

We will illustrate the basic idea with a simple example.

[See Also:Aerospike vs. Redis]

Installing Jupyter

You can download and install Jupyter or first try it online here, where it is hosted for free by Rackspace.

The next step would probably be to install Jupyter on your Ubuntu laptop.  If you install it on a server and want to output graphics, like charts, then you will have to figure out how to forward the DISPLAY to your laptop.

Also to use Spark with Jupyter the first step you probably be to install Spark on that same laptop as well.  But that’s not suitable for long term use as this config does not let you share the environment with other users.  So look up instructions on how to do that.  Here we give you a look at the simplest install.

The install is made really easy with this Anaconda script.

When you start it, it will assume you are running on a local machine so it will try to launch a browser.  So to run it without a browser and specifically indicate what port you want to use this after adding anaconda2/bin to your path :

jupyter notebook --no-browser --port 8888

Now, if you set that up at your office and want to connect to it from your home or Starbucks, you will have to configure port forwarding unless you get your office to open a port on the firewall for you.

And then you will have to figure out how to get the DISPLAY to transmit to your laptop if you use matplotlib Python library for creating graphs.

Using Jupyter

There are Jupyter intro videos on the internet.  But most either focus on explaining the concepts or dive into advanced data science topics.  Here we explain the basics of how to use the interface.

The first step it to create a notebook.  You pick from any of the language interpreters that you have installed.  Notice that one of those is Bash so you could use Jupyter as a web-based bash shell.


Basically, when you use Jupyter you are working with cells.  Below you can see the simple Python command print (2 * 3).  You type that in and you then press (-shift-)(-enter-) to execute that.

In the graphic above you can also see an example of markdown.  Markdown is the same syntax used to create documentation on Github.  It creates bold face, numbered lists, etc.  This is how you roll out your Jupyter program to your users.  You mix the documentation and code all in the same page.

[See Also:How to Use BaaS to Build Mobile Apps With Apache Usergrid]

The screen is not exactly intuitive but think of it just like a spreadsheet into which you can add and delete cells.  Below is how you do that.

Here is where you tell it to position a cell.

You can also run cells from the menu instead of using the (-shift-)(-enter-).

The next place to go from here is to read about the inline commands called magic and then work with your installation to get it working with a Spark cluster.

Everything you need to know about outsourcing technology development

Access a special Introduction Package with everything you want to know about outsourcing your technology development. How should you evaluate a partner? What components of your solution that are suitable to be handed off to a partner? These answers and more below.

Introduction Package

Leave a comment

Your email address will not be published.

Let's continue the conversation!

    Please prove you are human by selecting the Flag.