分区表 · JAVA

[TOC] # 分区表 ## 增加 ### 创建表 ~~~ hive> create table t_patition(ip string,duration int) > partitioned by(country string) > row format delimited > fields terminated by ','; OK Time taken: 0.086 seconds ~~~ ### 创建多级分区表 ~~~ hive> create table t_patition(ip string,duration int) > partitioned by(country string,city string) > row format delimited > fields terminated by ','; ~~~ 数据又分成了一半,在country文件夹下在分city文件 ### 插入单个分区我们来用文件来填充数据注意我们要指定分区的,因为他的分区表 ~~~ //把整个文件载入到这个分区 hive> load data local inpath '/root/ip.txt' into table t_patition partition(country="china"); ~~~ ### 插入多级分区 ~~~ load data local inpath '/root/ip.txt' into table t_patition partition(country="china", city="hanzou"); ~~~ ## 查询 ### 查看表的详情 ~~~ hive> desc t_patition; ~~~ ### 查询数据查询的时候,后面会显示分区 ~~~ hive> select * from t_patition; OK 192.168.1.100 1 china 192.168.1.200 2 china ~~~ 也可以只查指定分区的数据 ~~~ hive> select * from t_patition where country="china"; ~~~ ### 查询多级分区 ~~~ select * from t_patition where country="china" and city="hanzou"; ~~~ ### 多分区联合查询 ~~~ select * from t_patition where country="china" union select * from t_patition where country="japan"; union select * from t_patition where country="us"; ~~~ ### 查询表的分区 ~~~ show partitions dept_partition; ~~~ ## 删除 ### 删除一个分区 ~~~ alter table dept_patition drop partition(month="201705"); ~~~ ### 同时删除多个分区 `alter table dept_patition drop partition(month="201705"),partition(month="201706");` ## 改 ### 创建单个分区 ~~~ alter table dept_patition add partition(month="201705"); ~~~ ### 同时创建多个分区 ~~~ alter table dept_patition add partition(month="201705") partition(month="201706"); ~~~ # 分区数据关联把数据直接上传到分区目录下,让分区表和数据产生关联 ## 上传数据后修复我们先在hive中创建个文件夹 ~~~ dfs -mkdir -p /user/hive/warehouse/db1.db/t_patition/country=japan; ~~~ 把数据上传到这个文件夹 ~~~ dfs -put /root/study/ip.txt /user/hive/warehouse/db1.db/t_patition/country=japan; ~~~ 然后我们查询下 ~~~ select * from t_patition where country="japan"; ~~~ 发现查询不到,因为关联不上修复下,他就会把分区添加到元数据 ~~~ msck repair table t_patition; ~~~ 然后就能查询到了 ## 上传数据后添加分区我们先创建目录,然后上传文件 ~~~ hive (db1)> dfs -mkdir -p /user/hive/warehouse/db1.db/t_patition/country=x; hive (db1)> dfs -put /root/study/ip.txt /user/hive/warehouse/db1.db/t_patition/country=x; ~~~ 然后这时对表进行alter添加分区,这个情况下就不用修复了 ~~~ alter table t_patition add partition(country='x'); ~~~ ## 上传数据后load数据到分区创建文件夹 ~~~ dfs -mkdir -p /user/hive/warehouse/db1.db/t_patition/country=y; ~~~ 把数据load到分区中 ~~~ load data local inpath '/root/study/ip.txt' into table t_patition partition(country='y'); ~~~ 然后这时就可以查询到了