Apache Zeppelin takes the idea of a notebook, or interactive shell for Spark, up an order of magnitude. It is quite a sophisticated tool for creating visualizations, reports, etc. from Spark or Hive and then sharing those as live reports for users and output reports and graphs as formatted web pages.
A notebook for analytics is a tool like iPython Jupiter that lets you walk through big data databases like Spark or Hive in an interactive way. Spark already has interactive shells for Python, Scala, and R. These output tables and graphs. But a notebook adds more features like visualization, meaning stunning colorful graphs.
But what Zeppelin does that is different from iPython Jupiter is it lets you run a notebook from a web page. It hides all the details of the Spark installation from the user so that they are isolated from anything messy like a command line.
What this means is you can walk through data, do transformations and map reduce operations, and then fit machine learning and other algorithms onto your data set. Then when you get a resulting dataset that you like, you can turn all of that into a web page. And because it supports web sockets you can share that web page with others by echoing the output of your browser, thus creative live reports.
Because Zeppelin supports markdown and Angular that means you can make add HTML-like elements to the page for bold face type, heading, bullet lists, etc. Markdown is the same syntax that you use in, for example, the Git README page for a project. Angule adds the ability for even fancier HTML output and input.
[See Also:Apache Storm StreamParse Python]
To see what I mean, take a look at some examples.
Here is a graphic from a YouTube presentation by Moon Soo. He developed Zeppelin as a proprietary product then handed it over to Apache.
What you see is he is using Spark SQL to do a simple SQL statement on Spark data. But instead of just outputting the results in text format, Zeppelin arranges those in a nicely formatted table with column headers and rows. Above the table are icons that you can click to create a histogram or bar or scatter chart.
You can see these graph features and the ability to replace hard coded values with input here:
As you can see above, if you put parameters into the SQL statement then a little box pops up where you can enter those. You can see above that in this example product_name and other values have been made parameters instead of hardcoded values.
And then the user in this example has clicked the pie chart, so that the results are shown in a pie chart. On the right is a histogram.
Above you can see the SQL Spark language. That could also be Python or Scala. You put all that code there as you are developing your web page. Then you click the arrow next to that to hide the code when you are ready to share the data with others, which you can do with websockets. Websockets for a web page lets you output one web page to another web browser real time. So it becomes a mirror image of what you are doing, thus letting you share that screen with someone else, or create a mirror image. For live data this could thus create a live report or live visualization.
[See Also: Introduction to Apache Storm]
Here is a graphic from WiPro that gives a generate idea of how Spark is laid out.
As you can see Zeppelin is installed as a daemon on a Linux server. When you install it, you tell it where Spark is installed and details about the cluster by setting the SPARK_HOME and MESOS environment variables.
The data scientists and programmers attach to Spark via Zeppelin. Then they do transformations and write machine learning algorithms and so forth from the web browser.
Everything you need to know about outsourcing technology development
Access a special Introduction Package with everything you want to know about outsourcing your technology development. How should you evaluate a partner? What components of your solution that are suitable to be handed off to a partner? These answers and more below.