Zeppelin on Vagrant VM ( Zeppelin 在 Vagrant 虚拟机上 ) · Zeppelin 0.7.2 中文文档

# Zeppelin on Vagrant VM ( Zeppelin 在 Vagrant 虚拟机上 ) 原文链接 : [http://zeppelin.apache.org/docs/0.7.2/install/virtual_machine.html](http://zeppelin.apache.org/docs/0.7.2/install/virtual_machine.html) 译文链接 : [http://www.apache.wiki/pages/viewpage.action?pageId=10030718](http://www.apache.wiki/pages/viewpage.action?pageId=10030718) 贡献者 : [小瑶](/display/~chenyao) [ApacheCN](/display/~apachecn) [Apache中文网](/display/~apachechina) ## 概观 **Apache Zeppelin distribution** ( 分发 ) 包括一个脚本目录 **scripts/vagrant/zeppelin-dev** 该脚本创建一个虚拟机，启动一个可重复的已知集合，用于开发 Zeppelin 所需的核心依赖关系。如果您不打算从源代码构建，它可以用于运行现有的 **Zeppelin** 构建。对于 **PySpark** 用户，此脚本包含几个有用的 [ ](http://zeppelin.apache.org/docs/0.7.1/install/virtual_machine.html#python-extras)[ ](http://zeppelin.apache.org/docs/0.7.1/install/virtual_machine.html#python-extras)[**Python** 库](http://zeppelin.apache.org/docs/0.7.1/install/virtual_machine.html#python-extras) 。对于 **SparkR** 用户，此脚本包含几个有用的 [**R** 库](http://zeppelin.apache.org/docs/0.7.1/install/virtual_machine.html#r-extras)。 ### 先决条件此脚本需要三个应用程序，分别是** [Ansible](http://docs.ansible.com/ansible/intro_installation.html#latest-releases-via-pip)** ，**[Vagrant](http://www.vagrantup.com/)** 和 **[Virtual Box ](https://www.virtualbox.org/)**。所有这些应用程序都可以免费使用，因为它们是开源项目，并且在大多数操作系统上都非常容易设置。 ## Create a Zeppelin Ready VM ( 创建一个 Zeppelin Ready VM ) 如果您正在运行 **Windows** 并且尚未安装 **python** ，请先安装 **Python 2.7.x** 。 1. 下载并安装 **Vagrant** ： [**Vagrant** 下载](http://www.vagrantup.com/downloads.html) 2. 安装 **Ansible** ：**[Ansible Python pip install](http://docs.ansible.com/ansible/intro_installation.html#latest-releases-via-pip)** ``` sudo easy_install pip sudo pip install ansible ansible --version ``` 然后，请检查它是否报告了可接受版本 **1.9.2 或者更高版本**。 3. 安装 **Virtual Box** ： [**Virtual Box** 下载](https://www.virtualbox.org/) 4. 在 **/scripts/vagrant/zeppelin-dev** 目录中输入 **vagrant up** 这样就可以了！您现在可以运行** vagrant ssh** ，这将使您进入客户机器终端提示符界面。如果您不想从头构建 **Zeppelin** ，请在 **guest** 虚拟机中运行时运行 **z-manager installer** 脚本： ``` curl -fsSL https://raw.githubusercontent.com/NFLabs/z-manager/master/zeppelin-installer.sh | bash ``` ## Building Zeppelin ( 构建 Zeppelin ) 你现在可以 ``` git clone git://git.apache.org/zeppelin.git ``` 进入主机上的目录，或直接在虚拟机中。将 **Zeppelin** 从主机中克隆到** /scripts/vagrant/zeppelin-dev** 目录中，将允许在主机和客户机之间共享该目录。再次克隆项目可能似乎是直观的，因为这个脚本可能源于项目库。考虑将 **Zeppelin** 项目中的** vagrant/zeppelin-dev** 脚本作为独立目录复制，然后再次克隆要构建的特定分支。同步的文件夹使 **Vagrant** 能够将主机上的文件夹同步到客机，允许您继续在主机上处理项目的文件，但是使用 **guest** 机器上的资源来编译或运行项目。[ (1)同步文件夹描述从 Vagrant Up](https://docs.vagrantup.com/v2/synced-folders/index.html) 默认情况下，**Vagrant** 将共享您的项目目录 ( **Vagrantfile** 的目录 ) 到** /vagrant** 。这意味着您应该能在您的 **cd /vagrant/zeppelin** 之后在客户机中构建。 ## 这个虚拟机有什么？在客户机中运行以下命令显示以下预期版本： **node --version should report v0.12.7 mvn --version 应该报告 Apache Maven 3.3.9 and Java version: 1.7.0_85** 虚拟机包括： * Ubuntu Server 14.04 LTS * Node.js 0.12.7 * npm 2.11.3 * ruby 1.9.3 + rake，make 和 bundler ( 仅在构建 jekyll 文档时才需要 ) * Maven 3.3.9 * Git * Unzip * libfontconfig 以避免 phatomJs 缺少依赖性问题 * openjdk-7-jdk * Python 插件：pip，matplotlib，scipy，numpy，pandas * R 和 R 运行 R 解释器和相关 R 教程笔记本所需的软件包，包括：Knitr，devtools，repr，rCharts，ggplot2，googleVis，mplot，htmltools，base64enc，data.table ## 如何构建和运行 Zeppelin 这假设您已经在 **zeppelin-dev** 目录中的主机（要与客户机共享）上克隆项目，或者在客机中运行时直接克隆到目录中。以下构建步骤还将包括通过 **PySpark** 和 **SparkR** 的 **Python** 和 **R** 支持： ``` cd /zeppelin mvn clean package -Pspark-1.6 -Ppyspark -Phadoop-2.4 -Psparkr -DskipTests ./bin/zeppelin-daemon.sh start ``` 在您的主机上浏览 [http://localhost:8080/](http://localhost:8080/) 如果您[关闭 Vagrantfile 中的端口转发](http://zeppelin.apache.org/docs/0.7.1/install/virtual_machine.html#tweaking-the-virtual-machine)，请浏览 [http://192.168.51.52:8080](http://192.168.51.52:8080/) ## 调整虚拟机如果您计划沿着其他 **Vagrant** 映像一边运行此虚拟机，则可能希望将虚拟机绑定到特定的 IP 地址，而不要使用本地主机的端口。注释出 **forward_port **行，并取消注释 **Vagrantfile** 中的 **private_network** 行。最适合本地网络的子网将会有所不同，因此可以调整**192.168.*.***。 ``` #config.vm.network "forwarded_port", guest: 8080, host: 8080 config.vm.network "private_network", ip: "192.168.51.52" ``` **vagrant** 停止，随后 **vagrant** 将重新启动访问客户机绑定到** IP** 地址 **192.168.51.52** 。通常运行通过** IP** 地址直接发现的其他虚拟机 ( 如 **Spark Masters** 和 **Slaves** 以及 **Cassandra Nodes** ， **Elasticsearch Nodes** 和其他 **Spark** 数据源 ) 时通常需要此方法。您可能希望在具有适用于本地网络的子网中的** IP** 地址的虚拟机中启动节点，例如： **192.168.51.53**,**192.168.51.54**,**192.168.51.53**等。 ## 附加功能 ### Python Extras 随着 **Zeppelin** 运行， **NumPy** ， **SciPy** ，**Pandas** ，和 **Matplotlib** 将可用。创建一个 **pyspark** 笔记本，并尝试运行下面的代码。 ``` %pyspark import numpy import scipy import pandas import matplotlib print "numpy " + numpy.__version__ print "scipy " + scipy.__version__ print "pandas " + pandas.__version__ print "matplotlib " + matplotlib.__version__ ``` 要测试使用 **Matplotlib** 绘制到渲染的 **%html SVG image** ，请尝试： ``` %pyspark import matplotlib matplotlib.use('Agg') # turn off interactive charting so this works for server side SVG rendering import matplotlib.pyplot as plt import numpy as np import StringIO # clear out any previous plots on this note plt.clf() def show(p): img = StringIO.StringIO() p.savefig(img, format='svg') img.seek(0) print "%html <div style='width:600px'>" + img.buf + "</div>" # Example data people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim') y_pos = np.arange(len(people)) performance = 3 + 10 * np.random.rand(len(people)) error = np.random.rand(len(people)) plt.barh(y_pos, performance, xerr=error, align='center', alpha=0.4) plt.yticks(y_pos, people) plt.xlabel('Performance') plt.title('How fast do you want to go today?') show(plt) ``` ### R Extras 随着 **zeppelin** 运行，一个 **R** 教程笔记本将可用。本教程笔记本中运行示例和图形所需的** R** 包由该虚拟机安装。安装的** R** 包包括：**Knitr**，**devtools**，**repr**，**rCharts**，**ggplot2**，**googleVis**，**mplot**，**htmltools**，**base64enc**，**data.table**