HIVE Installation on Windows 7

Authored byNamrata H.on Sep 08, 2016 in Topic Technology
Namrata H. Lead Software Engineer- QE, Zymr, Inc.
Keyword Cloud

Building and Configuring Open vSwitch on OpenWrt for Cloud Networking

This article will show you how to install Hadoop and Hive in Windows 7. Since information on installing Hadoop in Windows 7 without Cloudera is relatively rare, so I thought I’d write up on this.

Let’s check out the softwares that we need for Hadoop installation –

Supported Windows OSs: Hadoop supports Windows Server 2008 and Windows Server 2008 R2, Windows Vista and Windows 7. For installation purposes we are going to make use of Windows 7 Professional Edition, SP 1.

Microsoft Windows SDK: Download and install Microsoft Windows SDK v7.1 to get the tools, compilers, headers and libraries that are necessary to run Hadoop.

Cygwin: Download and install Unix command-line tool Cygwin to run the unix commands on Windows as per your 32-bit or 64-bit windows. Cygwin is a distribution of popular GNU and other Open Source tools running on Microsoft Windows.

Maven: Download and install Maven 3.1.1.
The installation of Apache Maven is a simple process of extracting the archive and adding the bin folder with the mvn command to the PATH.

Open a new command prompt and run “mvn -v” command to verify the installation as below –

1

Protocol Buffers 2.5.0: Download Google’s Protocol Buffers 2.5.0

Download Protocol Buffers 2.5.0 and extract to a folder in C drive. Version should strictly be 2.5.0 for installing Hive.

Setting environment variables:

Check environment variable value from command prompt, e.g.

echo %JAVA_HOME%
C:\Program Files\Java\jdk1.7.0_51

If nothing is shown on executing the above command, you need to set the JAVA_HOME path variable.

Go to My Computer > right-click > Properties > Advanced System settings > System Properties > Advanced tab > Environment Variables button. Click on ‘New…’ button under System Variables and add –

Variable Name: JAVA_HOME

Variable Value: C:\Program Files\Java\jdk1.7.0_51

Note: Edit the Path environment variable very carefully. Select the whole path and go to the end to append the new environment variables. Deleting any path variable may lead to non functioning of some programs.

Adding to PATH:

Add the unpacked distribution’s bin directory to your user PATH environment variable by opening up the

My Computer > right-click > Properties > Advanced System settings > System Properties > Advanced tab > Environment Variables button, then adding or selecting the PATH variable in the ‘System variables’ with the value C:\Program Files\apache-maven-3.3.9\bin

Edit Path variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\Program Files\apache-maven-3.3.9\bin) and installation path of Protocol Buffers (say C:\protoc-2.5.0-win32).

2

3

4

5

Download and install CMake: Download and install CMake (Windows Installer) from here.

Official Apache Hadoop releases do not include Windows binaries, so you have to download sources and build a Windows package yourself.

Download Hadoop sources tarball hadoop-2.6.4-src.tar.gz and extract to a folder having short path (say C:\hdp) to avoid runtime problem due to maximum path length limitation in Windows.

Note: Do not use the Hadoop binary, as it is bereft of Windowsutils.exe and some Hadoop.dll files. Native IO is mandatory on Windows and without it the Hadoop installation will not work on Windows. Instead, build from the source code using Maven. It will download all the required components.

For building Hadoop from Native IO Source code,

1. Extract hadoop-2.2.0.tar.gz to a folder (say C:\hdp)

2. Add Environment Variable HADOOP_HOME=“C:\hdp\hadoop-2.6.4-src” and edit Path variable to add bin directory of HADOOP_HOME, e.g. C:\hdp\hadoop-2.6.4-src\bin.

Before moving to the next step make sure you have the following variables set in your Environment variables window.

JAVA_HOME = C:\hdp\Java\jdk1.7.0_65

PATH = C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Program Files\CMake\bin;C:\protoc-2.5.0-win32;C:\Program Files\apache-maven-3.3.9\bin;C:\cygwin64\bin;C:\cygwin64\usr\sbin;%JAVA_HOME%\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\system32\WindowsPowerShell\v1.0;

Note: If the JAVA_HOME environment variable is set improperly, Hadoop will not run. Set environment variables properly for JDK, Maven, Cygwin and Protocol-buffer. If you still get a ‘JAVA_HOME not set properly’ error, then edit the”C:\hadoop\bin\hadoop-env.cmd” file, locate ‘set JAVA_HOME =’ and provide the JDK path (with no spaces).

Running Maven Package

Select Start –> All Programs –> Microsoft Windows SDK v7.1(as an administrator) and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (C:\hdp\hadoop-2.6.4-src).

Execute maven package with options -Pdist, native-win -DskipTests -Dtar to create Windows binary tar distribution by executing the below command.

mvn package -Pdist,native-win -DskipTests -Dtar

You will get a long list of commands running for some time while the build process is running. When build will be successful, you will get the screen like below –

6

If everything goes well in the previous step, then the native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdp\hadoop-dist\target\hadoop-2.6.4 directory.

Extract the newly created Hadoop Windows package to the directory of choice (eg. C:\hdp\hadoop-2.6.4)

Testing and Configuring Hadoop Installation

1. Configuring Hadoop for a Single Node (pseudo-distributed) Cluster.

2. As part of configuring HDFS, update the files:

1. Near the end of “C:\hdp\hadoop-2.6.4\etc\hadoop\hadoop-env.cmd” add following lines:

set HADOOP_PREFIX=C:\hdp\hadoop-2.6.4

set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop

set YARN_CONF_DIR=%HADOOP_CONF_DIR%

set PATH=%PATH%;%HADOOP_PREFIX%\bin

2. Modify “C:\hdp\hadoop-2.6.4\etc\hadoop\core-site.xml” with following:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://0.0.0.0:19000</value>

</property>

</configuration>

3. Modify “C:\hdp\hadoop-2.6.4\etc\hadoop\hdfs-site.xml” with:

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

4. Finally, make sure “C:\hdp\hadoop-2.6.4\etc\hadoop\slaves” has the following entry:

localhost

Create C:\tmp directory as the default configuration puts HDFS metadata and data files under \tmp on the current drive.

As part of configuring YARN, update files:

1. Add following entries to “C:\hdp\hadoop-2.6.4\etc\hadoop\mapred-site.xml”, replacing %USERNAME% with your Windows user name:

<configuration>

  <property>

    <name>mapreduce.job.user.name</name>

    <value>%USERNAME%</value>

  </property>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

  <property>

    <name>yarn.apps.stagingDir</name>

    <value>/user/%USERNAME%/staging</value>

  </property>

  <property>

    <name>mapreduce.jobtracker.address</name>

    <value>local</value>

  </property>

</configuration>

2. Modify “C:\hdp\hadoop-2.6.4\etc\hadoop\yarn-site.xml”, with:

  yarn.server.resourcemanager.address

 0.0.0.0:8020

   yarn.server.resourcemanager.application.expiry.interval

 60000

   yarn.server.nodemanager.address

 0.0.0.0:45454

   yarn.nodemanager.aux-services

 mapreduce_shuffle

   yarn.nodemanager.aux-services.mapreduce.shuffle.class

 org.apache.hadoop.mapred.ShuffleHandler

   yarn.server.nodemanager.remote-app-log-dir

 /app-logs

   yarn.nodemanager.log-dirs

 /dep/logs/userlogs

   yarn.server.mapreduce-appmanager.attempt-listener.bindAddress

 0.0.0.0

   yarn.server.mapreduce-appmanager.client-service.bindAddress

 0.0.0.0

   yarn.log-aggregation-enable

 true

   yarn.log-aggregation.retain-seconds

 -1

   yarn.application.classpath

 %HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*

Since Hadoop doesn’t recognize JAVA_HOME from “Environment Variables” (and has problems with spaces in pathnames),

a. Copy your JDK to some dir (eg. “C:\hdp\java\jdk1.8.0_40”)

b. Edit “C:\hdp\hadoop-2.6.4\etc\hadoop\hadoop-env.cmd” and update

set JAVA_HOME=C:\hdp\Java\jdk1.7.0_65

c. Initialize Environment Variables by running cmd in “Administrator Mode”, moving to path C:\hdp\hadoop-2.6.4\etc\hadoop and executing:

C:\hdp\hadoop-2.6.4\etc\hadoop\>hadoop-env.cmd

3. Format the FileSystem – From command prompt, go to path C:\hdp\hadoop-2.6.4\bin\ and run

C:\hdp\hadoop-2.6.4\bin>hdfs namenode -format

4.  Start HDFS Daemons – From command prompt, go to path C:\hdp\hadoop-2.6.4\sbin\ and run

C:\hdp\hadoop-2.6.4\sbin\>start-dfs.cmd

Two separate Command Prompt windows will open automatically to run Namenode and Datanode.

.
7

5. Start YARN Daemons –

 C:\hdp\hadoop-2.6.4\sbin\>start-yarn.cmd 

Two command prompts will open, named yarn nodemanager and yarn resourcemanager after executing the above command.

8

6. Run an example YARN job – Execute the below commands altogether in command prompt and check the results. It should show you Hadoop license file opened in notepad.

C:\hdp\hadoop-2.6.4\bin\yarn jar C:\hdp\hadoop-2.6.4\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.6.4.jar 
wordcount C:\hdp\hadoop-2.6.4\LICENSE.txt /out

7. Check the following pages in your browser:

Resource Manager:  http://localhost:8088
Web UI of the NameNode daemon:  http://localhost:50070
HDFS NameNode web interface:  http://localhost:8042

The pages will look like this –

91011

 

When the pages open as shown in the above screens, we can conclude that Hadoop is successfully installed.

CONFIGURING HIVE

Now as installation of hadoop is done, we will progress to install HIVE now.

Pre-requisites for HIVE installation are –

–       Java 1.7 (preferred)

–       Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward)

Step 1: Verifying JAVA Installation

Open command prompt as an administrator and write command as –

> java –version

It should show appropriate version of Java.

Step 2: Verifying Hadoop Installation

> hadoop version

It should show appropriate hadoop version.

Step 3: Downloading Hive

Download any binary for hive from any source available on Internet. You can use http://mirror.fibergrid.in/apache/hive/hive-2.1.0/ and download the binary apache-hive-2.1.0-bin.tar.gz

Step 4: Extracting and Configuring Hive

Paste the downloaded HIVE tar file in C drive, e.g, C:\hive\apache-hive-2.1.0-bin and extract the files there.

Setting Environment variable –

Go to Control Panel > Advanced System Properties > System Properties > Environment Variables and add a new environment variable named HIVE_HOME. Edit the path to

“C:\hive\apache-hive-2.1.0-bin”
Now in the PATH environment variable, append %HIVE_HOME%\bin and SAVE.

And in CLASSPATH variable, append “C:\hive\apache-hive-2.1.0-bin\lib” and “C:\hdp\hadoop-2.6.4\share\hadoop\common\lib” paths.

Step 5: Installing MySQL and configuring HIVE
1. Install and start MySQL if you have not already done so.

2. Configure the MySQL Service and Connector

Download mysql-connector-java-5.0.5.jar file and copy the jar file to %HIVE_HOME%/lib directory.

3. Create the Database and User

Go to MySQL command-line and execute the below commands –

First, Create a metastore_db database in MySQL database using root user –

$ mysql -u root -p
Enter password:
mysql> CREATE DATABASE metastore_db;

Create a User [hiveuser] in MySQL database using root user. Let’s take hiveuser as ‘userhive’ and hivepassword as ‘hivepwd’

mysql> CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hivepassword';
mysql> GRANT all on *.* to 'hiveuser'@localhost identified by 'hivepassword';

mysql> flush privileges;

4. Install Hive if you have not already done so

5. Configure the Metastore Service to communicate with the MySQL Database.

Go to %HIVE_HOME%/conf folder and copy “hive-default.xml.template” file. Now, rename the copied file as “hive-site.xml”. The template file has the formatting needed for hive-site.xml, so you can paste configuration variables from the template file into hive-site.xml and then change their values to the desired configuration.

Edit hive-site.xml file in %HIVE_HOME%/conf directory and add the following configurations:

<configuration>

 <property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>

<description>metadata is stored in a MySQL server</description>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>MySQL JDBC driver class</description>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>userhive</value>

<description>user name for connecting to mysql server </description>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>hivepwd</value>

<description>password for connecting to mysql server </description>

</property>

<property>

 <name>hive.metastore.uris</name>

 <value>thrift://<IP address of your host>:9083</value>

 <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>

  </property>

</configuration>

Step 6: Launch HIVE

On successful installation of Hive, you get to see the following response-

12

Hive console:

hive> create table hivetesting(id string);

MySql console:

There are 2 ways to access metastore_db

mysql -u root -p
 Enter password:
 mysql> use metastore_db;
 mysql> show tables ;
 From command prompt,
 mysql -u <hiveusername>
 -p<hivepassword> <Database name>
 E.g. >mysql -u userhive -phivepwd metastore_db
 Enter password:
 mysql> show tables;

On your MySQL database you will see the names of your Hive tables.

Step 7: Verifying HIVE installation

The following sample command is executed to display all the tables:

hive> show tables;
OK
Time taken: 2.798 seconds
hive>

3 comments

  1. Hi

    I followed but hive is not running, giving errors.

    I sat hive.metastore.uris to
    as I don’t have remote metastore to connect.
    it was giving error that can’t connect to remote

    after it , is asking for schematool usage
    I run from cygwin : ./schematool -dbType mysql -initSchema
    but it tells now that : upgrade.order.mysql can’t be found in
    scripts\metastore\upgrade\mysql
    but file exists there

    any idea?

    regards

  2. it is asking to schematool -dbType mysql -initSchema

    any help?

    Regards

  3. datanucleus.schema.autoCreateAll

    set to true

    and it will not ask for schematool

Leave a Reply

Contact Us

Request a Consultation

Smartsourcing: A guide to selecting an Agile Development Partner

Smartsourcing is a brief guide to the world of modern technology partnerships. It was developed through a collaborative effort of top Zymr executives as we uncovered a gap in the market between the perception of what outsourcing used to be, and how leading technology innovators are leveraging this globalized approach to value generation. Read this guide to learn...

  • Key factors to consider for your development strategy
  • Popular destinations with a track record of high quality services
  • Selection criteria to narrow your shortlisted vendors

Get access to Smartsourcing eBook

 30 days 3 Months 1 year Still exploring

Register below to download your free eBook

Register below to download your free White Paper

Register below to download your free Guide

Register below to download your full Case Study