HIVE Installation on Windows 7

This article will show you how to install Hadoop and Hive in Windows 7. Since information on installing Hadoop in Windows 7 without Cloudera is relatively rare, so I thought I’d write up on this.

Let’s check out the softwares that we need for Hadoop installation –

Supported Windows OSs: Hadoop supports Windows Server 2008 and Windows Server 2008 R2, Windows Vista and Windows 7. For installation purposes we are going to make use of Windows 7 Professional Edition, SP 1.

Microsoft Windows SDK: Download and install Microsoft Windows SDK v7.1 to get the tools, compilers, headers and libraries that are necessary to run Hadoop.

Cygwin: Download and install Unix command-line tool Cygwin to run the unix commands on Windows as per your 32-bit or 64-bit windows. Cygwin is a distribution of popular GNU and other Open Source tools running on Microsoft Windows.

Maven: Download and install Maven 3.1.1.
The installation of Apache Maven is a simple process of extracting the archive and adding the bin folder with the mvn command to the PATH.

Open a new command prompt and run “mvn -v” command to verify the installation as below –

Protocol Buffers 2.5.0: Download Google’s Protocol Buffers 2.5.0

Download Protocol Buffers 2.5.0 and extract to a folder in C drive. Version should strictly be 2.5.0 for installing Hive.

Setting environment variables:

Check environment variable value from command prompt, e.g.

echo %JAVA_HOME%C:\Program Files\Java\jdk1.7.0_51

If nothing is shown on executing the above command, you need to set the JAVA_HOME path variable.

Go to My Computer > right-click > Properties > Advanced System settings > System Properties > Advanced tab > Environment Variables button. Click on ‘New…’ button under System Variables and add –

Variable Name:

JAVA_HOMEVariable Value: C:\Program Files\Java\jdk1.7.0_51

Note: Edit the Path environment variable very carefully. Select the whole path and go to the end to append the new environment variables. Deleting any path variable may lead to non functioning of some programs.

Adding to PATH:

Add the unpacked distribution’s bin directory to your user PATH environment variable by opening up the

My Computer > right-click > Properties > Advanced System settings > System Properties > Advanced tab > Environment Variables button, then adding or selecting the PATH variable in the ‘System variables’ with the value C:\Program Files\apache-maven-3.3.9\bin

Edit Path variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\Program Files\apache-maven-3.3.9\bin) and installation path of Protocol Buffers (say C:\protoc-2.5.0-win32).

Download and install CMake:

Download and install CMake (Windows Installer) from here.

Official Apache Hadoop releases do not include Windows binaries, so you have to download sources and build a Windows package yourself.

Download Hadoop sources tarball hadoop-2.6.4-src.tar.gz and extract to a folder having short path (say C:\hdp) to avoid runtime problem due to maximum path length limitation in Windows.

Note: Do not use the Hadoop binary, as it is bereft of Windowsutils.exe and some Hadoop.dll files. Native IO is mandatory on Windows and without it the Hadoop installation will not work on Windows. Instead, build from the source code using Maven. It will download all the required components.

For building Hadoop from Native IO Source code,

1. Extract hadoop-2.2.0.tar.gz to a folder (say C:\hdp)

2. Add Environment Variable HADOOP_HOME=“C:\hdp\hadoop-2.6.4-src” and edit Path variable to add bin directory of HADOOP_HOME, e.g. C:\hdp\hadoop-2.6.4-src\bin.

Before moving to the next step make sure you have the following variables set in your Environment variables window.

JAVA_HOME = C:\hdp\Java\jdk1.7.0_65PATH = C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Program Files\CMake\bin;C:\protoc-2.5.0-win32;C:\Program Files\apache-maven-3.3.9\bin;C:\cygwin64\bin;C:\cygwin64\usr\sbin;%JAVA_HOME%\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\system32\WindowsPowerShell\v1.0;

Note: If the JAVA_HOME environment variable is set improperly, Hadoop will not run. Set environment variables properly for JDK, Maven, Cygwin and Protocol-buffer. If you still get a ‘JAVA_HOME not set properly’ error, then edit the”C:\hadoop\bin\hadoop-env.cmd” file, locate ‘set JAVA_HOME =’ and provide the JDK path (with no spaces).

Running Maven Package

Select Start –> All Programs –> Microsoft Windows SDK v7.1(as an administrator) and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (C:\hdp\hadoop-2.6.4-src).

Execute maven package with options -Pdist, native-win -DskipTests -Dtar to create Windows binary tar distribution by executing the below command.

mvn package -Pdist,native-win -DskipTests -Dtar

You will get a long list of commands running for some time while the build process is running. When build will be successful, you will get the screen like below –

If everything goes well in the previous step, then the native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdp\hadoop-dist\target\hadoop-2.6.4 directory.

Extract the newly created Hadoop Windows package to the directory of choice (eg. C:\hdp\hadoop-2.6.4)

Testing and Configuring Hadoop Installation

1. Configuring Hadoop for a Single Node (pseudo-distributed) Cluster.

2. As part of configuring HDFS, update the files:

1. Near the end of “C:\hdp\hadoop-2.6.4\etc\hadoop\hadoop-env.cmd” add following lines:

set HADOOP_PREFIX=C:\hdp\hadoop-2.6.4set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoopset YARN_CONF_DIR=%HADOOP_CONF_DIR%set PATH=%PATH%;%HADOOP_PREFIX%\bin

2. Modify “C:\hdp\hadoop-2.6.4\etc\hadoop\core-site.xml” with following:

<configuration><property><name>fs.default.name</name><value>hdfs://0.0.0.0:19000</value></property></configuration>

3. Modify “C:\hdp\hadoop-2.6.4\etc\hadoop\hdfs-site.xml” with:

<configuration><property><name>dfs.replication</name><value>1</value></property></configuration>

4. Finally, make sure “C:\hdp\hadoop-2.6.4\etc\hadoop\slaves” has the following entry:

localhost

Create C:\tmp directory as the default configuration puts HDFS metadata and data files under \tmp on the current drive.

As part of configuring YARN, update files:

1. Add following entries to “C:\hdp\hadoop-2.6.4\etc\hadoop\mapred-site.xml”, replacing %USERNAME% with your Windows user name:

<configuration><property><name>mapreduce.job.user.name</name><value>%USERNAME%</value></property><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>yarn.apps.stagingDir</name><value>/user/%USERNAME%/staging</value></property><property><name>mapreduce.jobtracker.address</name><value>local</value></property></configuration>

2. Modify “C:\hdp\hadoop-2.6.4\etc\hadoop\yarn-site.xml”, with:

yarn.server.resourcemanager.address 0.0.0.0:8020   yarn.server.resourcemanager.application.expiry.interval 60000   yarn.server.nodemanager.address 0.0.0.0:45454   yarn.nodemanager.aux-services mapreduce_shuffle   yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler   yarn.server.nodemanager.remote-app-log-dir /app-logs   yarn.nodemanager.log-dirs /dep/logs/userlogs   yarn.server.mapreduce-appmanager.attempt-listener.bindAddress 0.0.0.0   yarn.server.mapreduce-appmanager.client-service.bindAddress 0.0.0.0   yarn.log-aggregation-enable true   yarn.log-aggregation.retain-seconds -1   yarn.application.classpath %HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*

Since Hadoop doesn’t recognize JAVA_HOME from “Environment Variables” (and has problems with spaces in pathnames),

a. Copy your JDK to some dir (eg. “C:\hdp\java\jdk1.8.0_40”)

b. Edit “C:\hdp\hadoop-2.6.4\etc\hadoop\hadoop-env.cmd” and update

set JAVA_HOME=C:\hdp\Java\jdk1.7.0_65

c. Initialize Environment Variables by running cmd in “Administrator Mode”, moving to path C:\hdp\hadoop-2.6.4\etc\hadoop and executing:

C:\hdp\hadoop-2.6.4\etc\hadoop\>hadoop-env.cmd

3. Format the FileSystem – From command prompt, go to path C:\hdp\hadoop-2.6.4\bin\ and run

C:\hdp\hadoop-2.6.4\bin>hdfs namenode -format

4.  Start HDFS Daemons – From command prompt, go to path C:\hdp\hadoop-2.6.4\sbin\ and run

C:\hdp\hadoop-2.6.4\sbin\>start-dfs.cmd

Two separate Command Prompt windows will open automatically to run Namenode and Datanode.

5. Start YARN Daemons –

C:\hdp\hadoop-2.6.4\sbin\>start-yarn.cmd

Two command prompts will open, named yarn nodemanager and yarn resourcemanager after executing the above command.

6. Run an example YARN job – Execute the below commands altogether in command prompt and check the results. It should show you Hadoop license file opened in notepad.

C:\hdp\hadoop-2.6.4\bin\yarn jar C:\hdp\hadoop-2.6.4\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.6.4.jar wordcount C:\hdp\hadoop-2.6.4\LICENSE.txt /out

7. Check the following pages in your browser:

Resource Manager:  https://localhost:8088Web UI of the Name

Node daemon:  https://localhost:50070HDFS Name

Node web interface:  https://localhost:8042

The pages will look like this

When the pages open as shown in the above screens, we can conclude that Hadoop is successfully installed.

CONFIGURING HIVE

Now as installation of hadoop is done, we will progress to install HIVE now.

Pre-requisites for HIVE installation are –

–       Java 1.7 (preferred)

–       Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward)

Step 1: Verifying JAVA Installation

Open command prompt as an administrator and write command as –

> java –version

It should show appropriate version of Java.

Step 2: Verifying Hadoop Installation

> hadoop version

It should show appropriate hadoop version.

Step 3: Downloading Hive

Download any binary for hive from any source available on Internet. You can use https://redrockdigimark.com/apachemirror/hive/hive-2.1.1/ and download the binary apache-hive-2.1.1-bin.tar.gz

Step 4: Extracting and Configuring Hive

Paste the downloaded HIVE tar file in C drive, e.g, C:\hive\apache-hive-2.1.1-bin and extract the files there.

Setting Environment variable –

Go to Control Panel > Advanced System Properties > System Properties > Environment Variables and add a new environment variable named HIVE_HOME. Edit the path to

“C:\hive\apache-hive-2.1.1-bin”Now in the PATH environment variable, append %HIVE_HOME%\bin and SAVE.

And in CLASSPATH variable, append “C:\hive\apache-hive-2.1.1-bin\lib” and “C:\hdp\hadoop-2.6.4\share\hadoop\common\lib” paths.

Step 5: Installing MySQL and configuring HIVE

1. Install and start MySQL if you have not already done so.

2. Configure the MySQL Service and Connector

Download mysql-connector-java-5.0.5.jar file and copy the jar file to %HIVE_HOME%/lib directory.

3. Create the Database and User

Go to MySQL command-line and execute the below commands –

First, Create a metastore_db database in MySQL database using root user –

$ mysql -u root -pEnter password:mysql> CREATE DATABASE metastore_db;

Create a User [hiveuser] in MySQL database using root user. Let’s take hiveuser as ‘userhive’ and hivepassword as ‘hivepwd’

mysql> CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hivepassword';mysql> GRANT all on *.* to 'hiveuser'@localhost identified by 'hivepassword';mysql> flush privileges;

4. Install Hive if you have not already done so

5. Configure the Metastore Service to communicate with the MySQL Database.

Go to %HIVE_HOME%/conf folder and copy “hive-default.xml.template” file. Now, rename the copied file as “hive-site.xml”. The template file has the formatting needed for hive-site.xml, so you can paste configuration variables from the template file into hive-site.xml and then change their values to the desired configuration.

Edit hive-site.xml file in %HIVE_HOME%/conf directory and add the following configurations:

<configuration><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value><description>metadata is stored in a MySQL server</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>MySQL JDBC driver class</description></property><property><name>javax.jdo.option.ConnectionUserName</name><value>userhive</value><description>user name for connecting to mysql server </description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>hivepwd</value><description>password for connecting to mysql server </description></property><property><name>hive.metastore.uris</name><value>thrift://<IP address of your host>:9083</value><description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description></property></configuration>

Step 6: Launch HIVE

On successful installation of Hive, you get to see the following response-

Hive console:

hive> create table hivetesting(id string);

MySql console:

There are 2 ways to access metastore_db

mysql -u root -p Enter password: mysql> use metastore_db; mysql> show tables ; From command prompt, mysql -u <hiveusername> -p<hivepassword><Database name> E.g. >mysql -u userhive -phivepwd metastore_db Enter password: mysql> show tables;

On your MySQL database you will see the names of your Hive tables.

Step 7: Verifying HIVE installation

The following sample command is executed to display all the tables:

hive> show tables;OKTime taken: 2.798 secondshive>

Conclusion

how to install Hadoop and Hive in Windows 7

Let's Talk
Lets Talk

Our Latest Blogs

With Zymr you can