博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hive的基本操作
阅读量:5068 次
发布时间:2019-06-12

本文共 2476 字,大约阅读时间需要 8 分钟。

1、创建表

First, create a table with tab-delimited text file format:

(1) CREATE TABLE u_data (  userid INT,  movieid INT,  rating INT,  unixtime STRING)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t'STORED AS TEXTFILE; (2) //creates a table called invites with two columns and a partition column called ds. //The partition column is a virtual column. //It is not part of the data itself but is derived from the partition that a particular dataset is loaded into.
CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); (3)创建一个带有分区的表 CREATE TABLE page_view(viewTime INT, userid BIGINT,                 page_url STRING, referrer_url STRING,                 ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) CLUSTERED BY(userid) SORTED BY(viewTime)  INTO 32 BUCKETS ROW FORMAT DELIMITED         FIELDS TERMINATED BY '1' STORED AS SEQUENCEFILE; 说明:In the example above, the table is clustered by a hash function of userid into 32 buckets. Within each bucket the data is sorted in increasing order of viewTime. Such an organization allows the user to do efficient sampling on the clustered column - in this case userid. The sorting property allows internal operators to take advantage of the better-known data structure while evaluating queries with greater efficiency.

2、修改表和删除表

(1)修改表名

ALTER TABLE events RENAME TO 3koobecaf;

(2)给表增加字段

ALTER TABLE pokes ADD COLUMNS (new_col INT);

(3)给表增加字段和字段注释

ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');

(4)替换标的字段

ALTER TABLE invites REPLACE COLUMNS (foo INT, bar STRING, baz INT COMMENT 'baz replaces new_col2');//这里是替换表的前三列,其他的列会被删除

(5)删除表

DROP TABLE pokes; 删除标的分区:ALTER TABLE pv_users DROP PARTITION (ds='2008-08-08')

3、加载本地数据到hive的表中

(1) load u.data into the table that was just created: LOAD DATA LOCAL INPATH  '
/u.data' OVERWRITE INTO TABLE u_data; (2)Count the number of rows in table u_data: SELECT COUNT(*) FROM u_data

4、加载hdfs上的数据到hive的表中

LOAD DATA INPATH '/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

5、将hive中表的数据加载到本地和hdfs上

(1)将hive中的表的数据加载到本地

insert overwrite local directory '/home/liujiyu/hivetolocal' select a.* from inc a;

(2)将hive中的表的数据加载到hdfs上

insert overwrite directory '/user/liujiyu/hivetolocal' select a.* from inc a;

6、JOIN的操作

FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo;

转载于:https://www.cnblogs.com/ljy2013/p/4891436.html

你可能感兴趣的文章
CentOS7.5下安装Python3.7 --python3
查看>>
CSS Variables
查看>>
微信接口php
查看>>
hibernate_validator_08
查看>>
修改/etc/profile和/etc/environment导致图形界面无法登陆的问题
查看>>
PPPOE
查看>>
tcpdump常用命令
查看>>
(原创)Quartus硬件工程路径改变,nios工程该怎么办?
查看>>
关于 VS2013监视窗口的内存面板及寄存器面板
查看>>
android断点续传实现方案之三
查看>>
类的封装
查看>>
android 多渠道打包之tools:replace
查看>>
java.util.Properties类的介绍-配置文件的读写【-Z-】
查看>>
第十四周
查看>>
deeplearning.ai 神经网络和深度学习 week4 深层神经网络
查看>>
fstat、stat和lstat 区别
查看>>
iOS工作笔记(十四)
查看>>
spring自定义schema学习
查看>>
有关jquery弹出层
查看>>
MAC Android Studio 快捷键
查看>>