My presentation from Big Data Meetup #3 in Minsk:
Monthly Archives: December 2014
Hive JDBC – Working with Hive in Embedded Mode
Hive allows you to execute HQL statements using command line interface. But you can also write a program that works with Hive in embedded (local) mode using JDBC. It does not require you to run HiveServer2, and you can launch your program anywhere where you can run Hive CLI.
Let’s create a simple Java program that executes SHOW PARTITIONS command in Hive using JDBC interface:
import java.sql.DriverManager; import java.sql.Connection; import java.sql.Statement; import java.sql.ResultSet; import java.sql.ResultSetMetaData; import java.sql.SQLException; public class HiveJdbc { public static void main (String[] args) throws Exception { // Connect to embedded Hive Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver"); Connection conn = DriverManager.getConnection("jdbc:hive://", "", ""); Statement stmt = conn.createStatement(); // Execute the query and output rows ResultSet rs = stmt.executeQuery("show partitions DB.ORDERS"); while (rs.next()) { System.out.println(rs.getString(1)); } // Get the number of columns ResultSetMetaData meta = rs.getMetaData(); int cols = meta.getColumnCount(); System.out.println("Number of columns:" + cols); // Output column names for(int i = 1; i <= cols; i++) { System.out.println(meta.getColumnName(i)); } rs.close(); stmt.close(); conn.close(); } }
This example program is typical for JDBC, but just note which JDBC driver and connection string you have to use for Hive in embedded mode:
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver"); Connection conn = DriverManager.getConnection("jdbc:hive://", "", "");
You can compile the code using the following command:
javac HiveJdbc.java
Before you can run the program, you have to initialize your shell environment as follows (set your version number for jars located in /usr/lib/hive/lib/
):
export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-jdbc.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-exec.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-metastore.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/hive-service.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/libthrift-0.9.0.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/libfb303-0.9.0.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/jdo-api-3.0.1.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/antlr-runtime-3.4.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/datanucleus-core-3.2.10.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/datanucleus-rdbms-3.2.9.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/mysql-connector-java.jar" export "HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/conf"
Now you can the application as follows:
hadoop HiveJdbc
Here is the sample output:
... local_dt=2014-12-02/country=KR local_dt=2014-12-02/country=MY local_dt=2014-12-02/country=NL local_dt=2014-12-02/country=NO local_dt=2014-12-02/country=NZ local_dt=2014-12-02/country=PH local_dt=2014-12-02/country=SE local_dt=2014-12-02/country=SG local_dt=2014-12-02/country=TH local_dt=2014-12-02/country=TW local_dt=2014-12-02/country=UK local_dt=2014-12-02/country=VN Number of columns:1 partition
Note that although the sample DB.ORDERS table contains two partition columns local_dt
and country
, SHOW PARTITIONS statement returns one column named partition
in JDBC.