Friday, November 8, 2013

Steps to run Presto on linux m/c

Facebook announced on 6th November 2013 that it is committing its Presto low-latency, SQL-compliant query system for Hadoop to open source. Facebook has one of the largest data warehouses in the world, now surpassing 300 petabytes. That sheer scale has forced Facebook to invent its own tools for working with variably-structured data at high-scale. 


Steps to run Presto on linux m/c  :


Step 1 : First install Hadoop and Hive. Installation of hadoop can be follow from the blog "Setup Hadoop". This move brings yet another fast query option to Hadoop.
Setup Hive as :
Step a : Download the Release of Hive from : http://mirror.reverse.net/pub/apache/hive/ and extract Hive Release
Step b : export the path of hive folder as :
export HIVE_HOME=/home/hduser/hadoop/hive-0.11
Step c : Add $HIVE_HOME/bin to your PATH:
export PATH=$HIVE_HOME/bin:$PATH
Step d : Create directory on hdfs for hive as :
bin/hadoop fs -mkdir       /home/hduser/hadoop-workDir/warehouse
At my m/c HDFS place at "/home/hduser/hadoop-workDir/". You can use according to HDFS place at your Hadoop Installation.
Step e : To use the hive command line interface (cli) from the shell:
$HIVE_HOME/bin/hive
Step f : Simple example of create table on hive as :
hive> CREATE TABLE hiveExample (eId INT, eName STRING);
Step g : hive> SHOW TABLES;

Step 2 : Download and Deployment Presto as per instruction shown at :
http://prestodb.io/docs/current/installation/deployment.html
Start the presto server as :
bin/launchuer run

Step 3 : Download presto-cli-0.52-executable.jar(download from http://prestodb.io/docs/current/installation/cli.html), rename it to presto, then run it:
./presto --server localhost:8080 --catalog hive --schema default

Step 4 : From another prompt start Hive metastore as :
Whatver port specify in Step 2 for "etc/catalog/hive.properties". Same port specify in below metastore run.
bin/hive --service metastore -p 9083

Step 5 : At presto command prompt shown in step 3, run the command "show tables", it will list the table(hiveExample) we had created in hive.
All the operation of hive we can do on presto, it is very fast compare to directly use hive.  

3 comments:

  1. I followed your steps. able to describe table but when I make query ex: select * from books; getting exception.
    presto:default> select * from books;

    Query 20131121_025845_00004_qqe25, FAILED, 1 node
    Splits: 1 total, 0 done (0.00%)
    0:00 [0 rows, 0B] [0 rows/s, 0B/s]

    Query 20131121_025845_00004_qqe25 failed: java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "ubuntu/192.168.56.101"; destination host is: "localhost":54310;
    presto:default>

    ReplyDelete
  2. John Presto in process of stabilizing,there is so many compatibility issues. Run presto with Hadoop 2.0 CDH4 Cludera Distribution.

    ReplyDelete
  3. I've been trying to get presto working with a mysql-metastore-backed hive install. Details here: http://datachurn.blogspot.com/2013/11/steps-to-run-presto-on-linux-mc.html

    Any tips? Thanks!

    ReplyDelete