Friday, November 8, 2013

Steps to run Presto on linux m/c

Facebook announced on 6th November 2013 that it is committing its Presto low-latency, SQL-compliant query system for Hadoop to open source. Facebook has one of the largest data warehouses in the world, now surpassing 300 petabytes. That sheer scale has forced Facebook to invent its own tools for working with variably-structured data at high-scale. 


Steps to run Presto on linux m/c  :


Step 1 : First install Hadoop and Hive. Installation of hadoop can be follow from the blog "Setup Hadoop". This move brings yet another fast query option to Hadoop.
Setup Hive as :
Step a : Download the Release of Hive from : http://mirror.reverse.net/pub/apache/hive/ and extract Hive Release
Step b : export the path of hive folder as :
export HIVE_HOME=/home/hduser/hadoop/hive-0.11
Step c : Add $HIVE_HOME/bin to your PATH:
export PATH=$HIVE_HOME/bin:$PATH
Step d : Create directory on hdfs for hive as :
bin/hadoop fs -mkdir       /home/hduser/hadoop-workDir/warehouse
At my m/c HDFS place at "/home/hduser/hadoop-workDir/". You can use according to HDFS place at your Hadoop Installation.
Step e : To use the hive command line interface (cli) from the shell:
$HIVE_HOME/bin/hive
Step f : Simple example of create table on hive as :
hive> CREATE TABLE hiveExample (eId INT, eName STRING);
Step g : hive> SHOW TABLES;

Step 2 : Download and Deployment Presto as per instruction shown at :
http://prestodb.io/docs/current/installation/deployment.html
Start the presto server as :
bin/launchuer run

Step 3 : Download presto-cli-0.52-executable.jar(download from http://prestodb.io/docs/current/installation/cli.html), rename it to presto, then run it:
./presto --server localhost:8080 --catalog hive --schema default

Step 4 : From another prompt start Hive metastore as :
Whatver port specify in Step 2 for "etc/catalog/hive.properties". Same port specify in below metastore run.
bin/hive --service metastore -p 9083

Step 5 : At presto command prompt shown in step 3, run the command "show tables", it will list the table(hiveExample) we had created in hive.
All the operation of hive we can do on presto, it is very fast compare to directly use hive.