Hive JDBC – Working with Hive in Embedded Mode

Hive allows you to execute HQL statements using command line interface. But you can also write a program that works with Hive in embedded (local) mode using JDBC. It does not require you to run HiveServer2, and you can launch your program anywhere where you can run Hive CLI.

Let’s create a simple Java program that executes SHOW PARTITIONS command in Hive using JDBC interface:

import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;

public class HiveJdbc {
  public static void main (String[] args) throws Exception {   
    // Connect to embedded Hive  
    Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
    Connection conn = DriverManager.getConnection("jdbc:hive://", "", "");
    Statement stmt = conn.createStatement();
    // Execute the query and output rows
    ResultSet rs = stmt.executeQuery("show partitions DB.ORDERS");
    while (rs.next()) {
      System.out.println(rs.getString(1));
    }      
    // Get the number of columns
    ResultSetMetaData meta = rs.getMetaData();
    int cols = meta.getColumnCount();
    System.out.println("Number of columns:" + cols);    
    // Output column names
    for(int i = 1; i <= cols; i++) {
      System.out.println(meta.getColumnName(i));
    }
    rs.close();
    stmt.close();
    conn.close();
  }
}

This example program is typical for JDBC, but just note which JDBC driver and connection string you have to use for Hive in embedded mode:

Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
Connection conn = DriverManager.getConnection("jdbc:hive://", "", "");

You can compile the code using the following command:

javac HiveJdbc.java

Before you can run the program, you have to initialize your shell environment as follows (set your version number for jars located in /usr/lib/hive/lib/):

export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-jdbc.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-exec.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-metastore.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-service.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/libthrift-0.9.0.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/libfb303-0.9.0.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/jdo-api-3.0.1.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/antlr-runtime-3.4.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/datanucleus-core-3.2.10.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/datanucleus-rdbms-3.2.9.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/mysql-connector-java.jar"
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/conf"

Now you can the application as follows:

hadoop HiveJdbc

Here is the sample output:

...
local_dt=2014-12-02/country=KR
local_dt=2014-12-02/country=MY
local_dt=2014-12-02/country=NL
local_dt=2014-12-02/country=NO
local_dt=2014-12-02/country=NZ
local_dt=2014-12-02/country=PH
local_dt=2014-12-02/country=SE
local_dt=2014-12-02/country=SG
local_dt=2014-12-02/country=TH
local_dt=2014-12-02/country=TW
local_dt=2014-12-02/country=UK
local_dt=2014-12-02/country=VN
Number of columns:1
partition

Note that although the sample DB.ORDERS table contains two partition columns local_dt and country, SHOW PARTITIONS statement returns one column named partition in JDBC.