Hive

哔哔大数据

说明统计各个用户每个月访问的次数，显示出以下信息：用户名、月份、访问次数测试数据tom,2020-01,5tom,2020-01,15jack,2020-01,5tom,2020-01,8jack,2020-01,25tom,2020-01,5tom,2020-02,4tom,2020-02,6jack,2020-02,10jack,2020-02,5数据导入Hive#建表createtabletest(namestring,monthString,pvint)rowformatdelimitedfieldsterminatedby',';#从本地导入数据loaddatalocalinpath'/root/info.txt'intotabletest;#查询数据select*fromtest;统计按照month分组，pv相加，最后显示name,month,sum(pv)字段如下面写法出现一个错误，出现这个错是因为，在使用分组的时候，聚合的时只规定了pv的聚合方式，name的聚合方式没有确定。hive>selectname,month,sum(pv)>fromdefault.test>groupbymonth;FAILED:SemanticException[Error10025]:Line1:7ExpressionnotinGROUPBYkey'name'而根据我们需要的统计结果，是需要month,name一起分组。selectname,month,sum(pv)fromdefault.testgroupbymonth,name;

【Hive 项目】统计各个用户每个月访问的次数

2020-7-12 2660 0

未分类

空字段赋值函数说明为了给字段中为null的值赋予一个新值。nal(字段1,字段2|所赋值)有两种用法用法一：只传入一个字段#原数据hive>select*fromstudentinof;OK1bidataboy10002BoyNULL3AaNULL4Boy2004Tab1000Timetaken:1.13seconds,Fetched:5row(s)把为null的赋值成0hive>selectname,nvl(money,0)fromstudentinof;OKbidataboy1000Boy0Aa0Boy200Tab1000Timetaken:0.383seconds,Fetched:5row(s)用法二：传入两个字段作用：当字段1为null时，字段2补充，当字段2的值为null，赋予值补充。#原数据hive>select*fromstudentinof;OK1bidataboy10002BoyNULL3AaNULL4Boy2005Tab10006NULL2000Timetaken:0.333seconds,Fetched:6row(s)#传入两个字段：当money为null时，返回name的值。hive>selectname,nvl(money,name)fromstudentinof;OKbidataboy1000BoyBoyAaAaBoy200Tab1000NULL2000Timetaken:1.75seconds,Fetched:6row(s)case匹配这个case匹配跟MySQL里的是差不多的。case变量或者字段when待匹配值then值或表达式when待匹配值then值或表达式else未匹配的处理end使用统计不同部门的男女数部门|男|女数据hive>select*fromemp;OK小黄A女小蓝B女小粉A女小红B男小黑A男小紫B男Timetaken:1.111seconds,Fetched:6row(s)HQL查询语句selectdept_id,sum(casesexwhen"男"then1else0end)male_count,sum(casesexwhen"女"then1else0end)female_countfromempgroupbydept_id;结果TotalMapReduceCPUTimeSpent:11seconds90msecOKA12B21Timetaken:343.221seconds,Fetched:2row(s)行转列相关函数concat(col,col,...)：字符串的拼接，跟MySQL里面的一样。concat_ws(separator,str1,str2,...)：这个是特殊的拼接，第一个参数是分隔符，也可以传入数组，返回一个能指定分隔符的字符串。COLLECT_SET(col)：函数只接受基本数据类型，它的主要作用是将某字段的值进行去重汇总，产生array类型字段。使用原数据hive>select*fromemp;OK小黄A女小蓝B女小粉A女小红B男小黑A男小紫B男Timetaken:1.048seconds,Fetched:6row(s)HQL语句selectt.ds,concat_ws("|",collect_set(t.n))from(selectconcat_ws("|",dept_id,sex)ds,namenfromemp)tgroupbyt.ds;结果TotalMapReduceCPUTimeSpent:8seconds40msecOKA|女小黄|小粉A|男小黑B|女小蓝B|男小红|小紫Timetaken:54.446seconds,Fetched:4row(s)#可以把最外层的select里的concat_ws()函数去掉，结果变成如下。TotalMapReduceCPUTimeSpent:7seconds500msecOKA|女["小黄","小粉"]A|男["小黑"]B|女["小蓝"]B|男["小红","小紫"]Timetaken:71.575seconds,Fetched:4row(s)列转行效果实例#转换之前movie|category|------------------------------------《疑犯追踪》|悬疑,动作,科幻,剧情|#转换之后movie|category|------------------------------------《疑犯追踪》|悬疑|《疑犯追踪》|动作|《疑犯追踪》|科幻|《疑犯追踪》|剧情|函数说明：explode(col)：将hive一列中复杂的array或者map结构拆分成多行。lateralview：用法：lateralviewudtf(expression)tablealiasascolumnalias解释：用于和split,explode等UDTF一起使用，它能够将一列数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。使用原数据《Lietome》悬疑,警匪,动作,心理,剧情《战狼2》战争,动作,灾难创表createtablemovie_info(moviestring,categoryarray<string>)rowformatdelimitedfieldsterminatedby"\t"collectionitemsterminatedby",";查询语句selectmovie,category_namefrommovie_infolateralviewexplode(category)table_tmpascategory_name;结果不跑MROK《Lietome》悬疑《Lietome》警匪《Lietome》动作《Lietome》心理《Lietome》剧情《战狼2》战争《战狼2》动作《战狼2》灾难Timetaken:0.438seconds,Fetched:8row(s)

Hive 其他常用查询函数

2020-3-10 1432 0

未分类

引言分区针对的是数据的存储路径；分桶针对的是文件。产生原因分区提供一个隔离数据和优化查询的便利方式，不过，并非所有的数据集都可形成合理的分区，所以把单个文件划分成不同大小来进行优化。分桶分桶是将数据集分解成更容易管理的若干部分的另一个技术。创建分桶表在创建分桶表之前，需要允许分桶，默认为Flasesethive.enforce.bucketing=true;createtablestu_buck(idint,namestring)clusteredby(id)//按照那个字段分桶into4buckets//需要分几个桶rowformatdelimitedfieldsterminatedby'\t';加载数据到分桶表使用load方式加载分桶表是看不出任何效果的,需要跑个MR程序，所以使用insert方式把数据加到分桶表首先创建一个普通表createtablebuck(idint,namestring)rowformatdelimitedfieldsterminatedby'\t';把数据加载到这个普通表里loaddatalocalinpath'/root/data.txt'intotablebuck;通过insert方式加载数据到分桶表hive>insertintotablestu_buck>select*frombuck;QueryID=root_20200211223628_4d38ab59-ff4f-49a5-a749-d4b1a3c9b95aTotaljobs=1LaunchingJob1outof1...OKTimetaken:43.425seconds

Hive 分桶表创建，加载数据

2020-2-13 1090 0

未分类

groupby分组通常会和聚合函数一起使用，按照某个字段的内容进行分组，然后每个分组执行聚合操作原本数据idnamemoney1Bob12002Black21003BigDataBoy56004Bob23005Bob32006Black5600需求：按照姓名（name）进行分组，求平均工资（money）查询语句：selectname,avg(money)fromhive_dbgroupbyname;查询结果hive>selectname,avg(money)fromhive_db>groupbyname;QueryID=root_20200127203058_a0984c91-9ca1-4735-b6a9-ffd5aa2d17a7...TotalMapReduceCPUTimeSpent:18seconds160msecOKBigDataBoy5600.0Black3850.0Bob2233.3333333333335Timetaken:48.392seconds,Fetched:3row(s)having语句having与where语句的异同where针对表中的列发挥作用，查询数据；having针对查询结果中的列发挥作用，筛选数据。where后面不能写分组函数，而having后面可以使用分组函数。having只用于groupby分组统计语句。需求：不同名字（name）的平均工资（money）大于3000的查询语句：selectname,avg(money)avg_moneyfromhive_dbgroupbynamehavingavg_money>3000;查询结果：hive>selectname,avg(money)avg_moneyfromhive_db>groupbyname>havingavg_money>3000;QueryID=root_20200127204801_68c6a01d-21a3-4355-b3a8-69900f16b857...TotalMapReduceCPUTimeSpent:16seconds10msecOKBigDataBoy5600.0Black3850.0Timetaken:46.82seconds,Fetched:2row(s)

Hive group by 分组和 having 语句

2020-1-28 3940 0

未分类

Join语句Hive支持通常的SQLJOIN语句，但是只支持等值连接，不支持非等值连接。表1（hive_db）idnamemoney1Bob12002Black21003BigDataBoy56004Bob23005Bob32006Black5600表2（city）nameaddrBobBeijingBlackHuNanBigDataBoySiChuan查询案例需求：表1的姓名（name）与表2的addr合并idnameaddr查询语句selecth.id,h.name,c.addrfromhive_dbhjoin#连接的另一张表citycon#筛选条件不支持orh.name=c.name;查询结果hive>select>h.id,h.name,c.addr>from>hive_dbh>join>cityc>on>h.name=c.name;QueryID=root_20200127215351_3f0945cf-dfd3-4c00-ba9a-b177cecf018f...MapReduceJobsLaunched:Stage-Stage-3:Map:1CumulativeCPU:2.86secHDFSRead:6494HDFSWrite:91SUCCESSTotalMapReduceCPUTimeSpent:2seconds860msecOK1BobBeijing2BlackHuNan3BigDataBoySiChuan4BobBeijing5BobBeijing6BlackHuNanTimetaken:31.054seconds,Fetched:6row(s)内连接join只有进行连接的两个表中满足连接条件的数据才会被保留下来。selecth.id,h.name,c.addrfromhive_dbhjoincityconh.name=c.name;左外连接leftjionon或者where筛选条件的左边的表的查询字段的所有数据将会保留下来selecth.id,h.name,c.addrfromhive_dbhleftjoincityconh.name=c.name;右外连接rightjionon或者where筛选条件的右边的表的查询字段的所有数据将会保留下来selecth.id,h.name,c.addrfromhive_dbhrightjoincityconh.name=c.name;满外连接fulljionon或者where筛选条件的两边的表的查询字段的所有数据将会保留下来，任一表的指定字段没有符合条件的值用NULL代替selecth.id,h.name,c.addrfromhive_dbhfulljoincityconh.name=c.name;多表连接注意：连接n个表，至少需要n-1个连接条件。例如：连接三个表，至少需要两个连接条件。

Hive join 连接字段

2020-1-28 1183 0

未分类

测试数据和表数据1Bob12002Black21003BigDataBoy5600表createtablehive_db(idint,namestring,moneyint)rowformatdelimitedfieldsterminatedby'\t';基本查询格式select...from...语法格式规范HQL语言大小写不敏感HQL可以写一行，也可以写多行关键字不能被缩写也不能分行各子句一般要分行写使用缩进和换行提高语句的可读性全表和特定列查询全表查询select*from表名;特定列查询select字段名,字段名from表名;列别名关键字asselect字段名as列别名,字段名as列别名from表名算数运算符运算符描述A+BA加BA-BA减BA*BA乘BA/BA除BA%BA对B取余数hive>select1+6;OK7Timetaken:0.117seconds,Fetched:1row(s)hive>select2&6;OK2Timetaken:0.144seconds,Fetched:1row(s)hive>select~6;OK-7Timetaken:0.153seconds,Fetched:1row(s)常用函数求行数count(*)，count()还能进行其他的统计hive>selectcount(*)fromhive_db;(会跑一个MapReduce任务)QueryID=root_20200119201909_4eaa334c-5e26-46e4-8ff9-ef0e4a703394Totaljobs=1...Stage-Stage-1:Map:1Reduce:1CumulativeCPU:4.27secHDFSRead:6799HDFSWrite:2SUCCESSTotalMapReduceCPUTimeSpent:4seconds270msecOK4Timetaken:131.007seconds,Fetched:1row(s)最大值max()hive>selectmax(money)fromhive_db;QueryID=root_20200119202349_2d1c7587-554e-4089-8b99-b76cb08ddab1...Stage-Stage-1:Map:1Reduce:1CumulativeCPU:4.06secHDFSRead:6908HDFSWrite:5SUCCESSTotalMapReduceCPUTimeSpent:4seconds60msecOK5600Timetaken:125.705seconds,Fetched:1row(s)最小值min()hive>selectmin(money)fromhive_db;总和sum()hive>selectsum(money)fromhive_db;平均值avg()hive>selectavg(money)fromhive_db;比较运算（Between/In/IsNull）操作符支持的数据类型描述A=B基本数据类型如果A=B，返回True，反之返回FalseA<=>B基本数据类型A、B有一个为NULL，则返回NULL；A、B都为NULL，返回True；A=B，返回True，反之返回False；A<>B,A!=B基本数据类型A、B有一个为NULL，则返回NULL；如果A不等于B，则返回True，反之返回FalseA<B基本数据类型A、B有一个为NULL，则返回NULL；A小于B，返回True，反之返回FalseA<=B基本数据类型A、B有一个为NULL，则返回NULL；A小于等于B，返回True，反之返回FalseA>B基本数据类型A、B有一个为NULL，则返回NULL；A大于B，返回True，反之返回FalseA>=B基本数据类型A、B有一个为NULL，则返回NULL；A大于等于B，返回True，反之返回FalseA[NOT]BETWEENBANDC基本数据类型A、B、C任一为NULL，则结果为NULL；如果A的值在B、C之间，则返回True，反之返回False；加上NOT，则是相反的效果AISNULL所有数据类型如果A等于NULL，则返回TRUE，反之返回FALSEAISNOTNULL所有数据类型如果A不等于NULL，则返回TRUE，反之返回FALSEAIN(数值1,数值2)基本数据类型显示该字段等于IN()里数值的数据A[NOT]LIKEBSTRING类型B是SQL下简单的正则表达式，与A匹配，则返回True，反之返回False；使用NOT达到相反的效果ARLIKEB,AREGEXPBSTRING类型B是一个正则表达式，如果A与其匹配，则返回TRUE；反之返回FALSE。匹配使用的是JDK中的正则表达式接口实现的，因为正则也依据其中的规则。例如，正则表达式必须和整个字符串A相匹配，而不是只需与其字符串匹配。简单例子#工资在2000到6000之间的hive>select*fromhive_dbwheremoneybetween2000and6000;OK2Black21003BigDataBoy5600Timetaken:0.118seconds,Fetched:2row(s)#查询money为NULL的hive>select*fromhive_dbwheremoneyisnull;OKNULLNULLNULLTimetaken:0.227seconds,Fetched:1row(s)#查询money为2100和5600的数据hive>select*fromhive_dbwheremoneyin(2100,5600);OK2Black21003BigDataBoy5600Timetaken:0.21seconds,Fetched:2row(s)Like和RLikeLike使用的是SQL语法下的匹配%代表零个或多个字符(任意个字符)。_代表一个字符%o：匹配o开头的o%：匹配o结尾的%o%：匹配包含o的_2%：匹配开头第二位是2的__2_：匹配4位，但第3位是2的#匹配name字段带o的hive>select*fromhive_dbwherenamelike'%o%';OK1Bob12003BigDataBoy5600Timetaken:0.092seconds,Fetched:2row(s)RLike使用的Java下的正则表达式#匹配money中带有12的hive>select*fromhive_dbwheremoneyrlike'12';OK1Bob1200Timetaken:0.075seconds,Fetched:1row(s)逻辑运算符（And/Or/Not）操作符含义And并Or或Not否简单例子#查询money大于1000并name是Bob的hive>select*fromhive_dbwheremoney>1000andname='Bob';OK1Bob1200Timetaken:0.145seconds,Fetched:1row(s)limit语句limit子句用于限定返回的行数hive>select*fromhive_dblimit2;OK1Bob12002Black2100Timetaken:0.196seconds,Fetched:2row(s)where语句将不满足条件的过滤掉where子句紧随from子句#查询money大于2000的hive>select*fromhive_dbwheremoney>2000;OK2Black21003BigDataBoy5600Timetaken:0.172seconds,Fetched:2row(s)

Hive 的基本运算符及查询

2020-1-19 2294 0

未分类

数据导入Load方式向表中装载数据语法：loaddata[loacl]inpath'文件'[overwriter]intotable表名[partition(指定分区)]loaddata：加载数据[loacl]：本地加载，不写表示从HDFS上加载，注意从HDFS上加载的数据，原文件会被移动inpath‘文件’：文件路径[overwriter]：overwriter表示覆盖重写[partition(指定分区)]：如果是分区表需要指定分区则加上#从HDFS上导入追加loaddatainpath'/student.txt'intotablehive_db;#从HDFS上导入覆盖loaddatainpath'/student.txt'overwriteintotablehive_db;通过查询语句向表中插入数据（Insert）首先创建一张分区表createtabletest_p(idint,namestring)partitionedby(daystring)rowformatdelimitedfieldsterminatedby'\t';基本插入数据语法：insertintotable表名字段values(值,值)#它会提交一个MapReduce任务hive>insertintotabletest_ppartition(day="08")values(1,"tp");QueryID=root_20200108180532_9006dc63-78cd-4afc-9527-028387b366aaTotaljobs=3LaunchingJob1outof3Numberofreducetasksissetto0sincethere'snoreduceoperator.....Stage-Stage-1:Map:1CumulativeCPU:3.35secHDFSRead:3765HDFSWrite:79SUCCESSTotalMapReduceCPUTimeSpent:3seconds350msecOKTimetaken:84.034seconds基本模式插入（根据单张表查询结果）#语法insertinto|overwritertable指定表#结果插入的表select字段from表名;#查询的表#实例会跑一个MapReduce任务hive>insertintotabletest_ppartition(day="9")>select*fromhive_db;QueryID=root_20200109100135_356015e5-d696-43cd-a6af-68e29fa246e2Totaljobs=3LaunchingJob1outof3Numberofreducetasksissetto0sincethere'snoreduceoperator......TotalMapReduceCPUTimeSpent:2seconds350msecOKTimetaken:79.799seconds多插入模式（根据多张表查询结果）查询的结果字段需要与插入的字段对应，不能多也不能少#这个会提交一个MapReduce任务fromhive_dbinsertintotabletest_ppartition(day="15")select*insertintotabletest_ppartition(day="16")select*;创表时通过location指定加载数据的位置创建表createtablelo_db(idint,namestring)rowformatdelimitedfieldsterminatedby'\t'location'/hive';上传数据到/hive目录下[root@master~]#hadoopfs-putst.txt/hive查看表的数据hive>select*fromlo_db;OK4aa5bb6Timetaken:0.119seconds,Fetched:3row(s)使用import导入数据数据必须是export导出的(应为会有元数据)语法：importtable表名from路径;表可以不存在路径下使用export导出的所有数据hive>importtablestu1from'/export';Copyingdatafromhdfs://192.168.176.65:9000/export/dataCopyingfile:hdfs://192.168.176.65:9000/export/data/student.txtLoadingdatatotabletest.stu1OKTimetaken:0.62seconds数据导出使用insert导出数据语法insertoverwrite[local]directory'/root'-->导出的路径去掉loacl就是导出到HDFSrowformatdelimitedfieldsterminatedby'\t'-->导出的分隔符select*fromhive_db;-->需要导出的内容使用#执行HQL跑一个MR程序hive>insertoverwritelocaldirectory'/root/hive'>rowformatdelimitedfieldsterminatedby'\t'>select*fromhive_db;QueryID=root_20200115161857_48aa5b8a-1bd4-45b9-9642-9b7135bf9009Totaljobs=1...Copyingdatatolocaldirectory/root/hiveCopyingdatatolocaldirectory/root/hiveMapReduceJobsLaunched:Stage-Stage-1:Map:1CumulativeCPU:1.91secHDFSRead:3034HDFSWrite:21SUCCESSTotalMapReduceCPUTimeSpent:1seconds910msecOKTimetaken:77.28seconds#查看本地文件[root@masterhive]#pwd/root/hive[root@masterhive]#ll总用量4-rw-r--r--.1rootroot211月1516:20000000_0[root@masterhive]#cat000000_01Bob2Black3Jeck使用Hadoopshell命令下载数据hive>dfs-get/hive/st.txt/root/hive;使用Hiveshell命令导出格式bin/hive-e‘HQL语句或HQL文件’>导出的文件名[root@mastersrc]#hive-e'select*fromtest.hive_db'>/root/hive/a.txtLogginginitializedusingconfigurationinjar:file:/usr/local/src/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.propertiesOKTimetaken:12.103seconds,Fetched:3row(s)#查看导出wenjian[root@masterhive]#cata.txt1Bob2Black3Jeck[root@masterhive]#pwd/root/hiveexport导出数据（不是很常用）格式：exporttable表名toHDFS路径;hive>exporttablehive_dbto'/export';Copyingdatafromfile:/usr/local/src/hive/tmpdir/hive/8edfc4e3-3832-43c1-9d4b-d863ef85d0eb/hive_2020-01-15_16-41-17_608_2632318306276412864-1/-local-10000/_metadataCopyingfile:file:/usr/local/src/hive/tmpdir/hive/8edfc4e3-3832-43c1-9d4b-d863ef85d0eb/hive_2020-01-15_16-41-17_608_2632318306276412864-1/-local-10000/_metadataCopyingdatafromhdfs://192.168.176.65:9000/user/hive/warehouse/test.db/hive_dbCopyingfile:hdfs://192.168.176.65:9000/user/hive/warehouse/test.db/hive_db/student.txtOKTimetaken:0.601seconds导出的信息结构export\|__metadata元数据文件|_data\|_导出的数据清空表数据只能清空内部表(管理表)数据，不能清空外部表数据truncatetable表名hive>truncatetablestu1;OK

Hive DML 操作（数据的导入、导出、清空表数据）(不涉及表数据的查询)

2020-1-19 1343 0

未分类

创建数据库数据库的创建#直接创建数据库默认在HDFS的/user/hive/warehouse里面createdatabasehive_db;#指定数据存放的位置路径会在HDFS上#注意路径最后要加上库名，不然HDFS上无法查看到创建的库文件createdatabasehive_dblocation'/hive/hive_db.db'#ifnotexists判断是否存在，为了避免报错createdatabaseifnotexistshive;数据库的查询显示数据库showdatabases;模糊查询showdatabaseslike'hiv*';查询数据库的信息hive>descdatabasehive_db;OKhive_dbhdfs://192.168.176.65:9000/hive/hive.dbrootUSERTimetaken:0.019seconds,Fetched:1row(s)#可加一个参数extended,查看详细信息#hive>descdatabaseextendedhive_db;数据库的修改说是修改，其实只能增加数据库的额外属性，只要数据库创建好，使用desc查询出来的都不能改#dbproperties()里的额外属性可以自己定义hive>alterdatabasehive_dbsetdbproperties('Ctime'='2020.6.16');OKTimetaken:0.132seconds#创建的额外属性必须加extended才能查询出来hive>descdatabaseextendedhive_db;OKhive_dbhdfs://192.168.176.65:9000/hive/hive.dbrootUSER{Ctime=2020.6.16}Timetaken:0.016seconds,Fetched:1row(s)数据库删除删除空的数据库dropdatabase数据库名;删除不为空的数据库（联动删除）（强制删除）dropdatabase数据库名cascade;表的操作创建表完整的创表语法格式CREATE[EXTERNAL]TABLE[IFNOTEXISTS]table_name[(col_namedata_type[COMMENTcol_comment],...)][COMMENTtable_comment][PARTITIONEDBY(col_namedata_type[COMMENTcol_comment],...)][CLUSTEREDBY(col_name,col_name,...)[SORTEDBY(col_name[ASC|DESC],...)]INTOnum_bucketsBUCKETS][ROWFORMATrow_format][STOREDASfile_format][LOCATIONhdfs_path]精简入门格式#指定创表表名CREATE[EXTERNAL]TABLE[IFNOTEXISTS]table_name#创字段[(col_namedata_type[COMMENTcol_comment],...)]#分隔符[ROWFORMATrow_format]字段说明CREATETABLE：创建一个指定名字的表，如果名相同，这会抛出一个异常[EXTERNAL]：可以创建一个外部表，在建表的同时指定一个实际的路径（LOCATION），Hive在创建内部表时，会将数据移动到Hive指定的路径，若创建外部表，仅记录数据所在的路径，不对数据的位置做任何改变。在删除数据的时候，内部表的元数据和数据会一起删除，而外部表只删除元数据，不删除数据。[COMMENT]：为表的字段添加注释[ROWFORMAT]：指定分隔符。比如rowformatdelimitedfieldsterminatedby"\t";修改表修改表名#格式altertable旧表名renameto新表名;hive>altertablehive_dbrenametotest_db;OKTimetaken:0.237seconds表的列操作修改列名只能一列一列的修改altertable表名changecolumn旧列名新列名新列类型；hive>altertabletest_dbchangecolumniddb_idint;OKTimetaken:0.239seconds添加、替换列名可以几列同时操作#添加列名altertable表名addcolumns(列名类型,列名类型);hive>altertabletest_dbaddcolumns(addrstring,ageint);OKTimetaken:0.211seconds#替换列名替换列名是把表原来的所有列都替换了留下替换语句指定的列名altertable表名replacecolumns(列名类型,列名类型)hive>altertabletest_dbreplacecolumns(idint,yearint);OKTimetaken:0.267seconds##查看表的结构只留下了替换语句的字段hive>desctest_db;OKidintyearintmouthstring#分区字段#PartitionInformation#col_namedata_typecommentmouthstringTimetaken:0.048seconds,Fetched:8row(s)增加、删除表的分区#增加表的分区altertable表名addpartition(指定分区字段)hive>altertabletest_dbaddpartition(mouth="04");OKTimetaken:0.225seconds#删除表的分区altertable表名droppartition(指定分区字段)hive>altertabletest_dbdroppartition(mouth="04");Droppedthepartitionmouth=04OKTimetaken:0.312seconds删除表语法：droptable表名hive>droptabletest_db;OKTimetaken:0.439seconds

Hive DDL操作(数据库，表的创，删...操作，不涉及表的内容)

2020-1-8 1241 0

未分类

说明分区表本质是HDFS上的文件夹，所以直接通过HDFS创建好文件，再使用Hadoop命令把数据上传到该目录，直接使用select*from表名查看，是没有数据的，因为文件夹里的数据与Hive的元数据没有关联起来。#直接创建目录直接上传分区然后查询没有表hive>dfs-mkdir/user/hive/warehouse/test.db/hive_db/mouth=01;hive>dfs-put/root/student.txt/user/hive/warehouse/test.db/hive_db/mouth=01;hive>select*fromhive_db;OKTimetaken:0.099seconds关联的第一种方式（最常用的）使用loaddatalocalinpath'文件'intotable分区表名partition(指定分区)hive>loaddatalocalinpath'/root/student.txt'intotablehive_dbpartition(mouth=01);Loadingdatatotabletest.hive_dbpartition(mouth=1)Partitiontest.hive_db{mouth=1}stats:[numFiles=1,numRows=0,totalSize=21,rawDataSize=0]OKTimetaken:0.553secondshive>select*fromhive_db;OK1Bob12Black13Jeck1Timetaken:0.136seconds,Fetched:3row(s)关联的第二种方式使用修复命令，修复分区表msckrepairtable分区表名;hive>msckrepairtablehive_db;OKPartitionsnotinmetastore:hive_db:mouth=01Repair:Addedpartitiontometastorehive_db:mouth=01Timetaken:0.211seconds,Fetched:2row(s)关联的第三种方式使用添加分区命令，添加一下分区语法：altertable分区表名addpartition(指定分区);hive>altertablehive_dbaddpartition(mouth="03");OKTimetaken:0.238secondshive>select*fromhive_dbwheremouth=03;OK1Bob32Black33Jeck3Timetaken:0.107seconds,Fetched:3row(s)

Hive 分区表与数据关联的三种方式

2020-1-7 3270 0

未分类

分区表说明分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多.分区表的操作创建分区表语法partitionby(字段类型)#创分区表语句createtablehive_db(idint,namestring)partitionedby(mouthstring)rowformatdelimitedfieldsterminatedby'\t';#查看分区表格式，发现定义的分区也是一个字段hive>deschive_db;OKidintnamestringmouthstring#一个字段#PartitionInformation#col_namedata_typecommentmouthstringTimetaken:0.106seconds,Fetched:8row(s)添加分区一次添加一个分区hive>altertablehive_dbaddpartition(mouth="20200109");OKTimetaken:0.356seconds一次添加多个分区注意空格分隔hive>altertablehive_dbaddpartition(mouth="20200111")partition(mouth="20200110");OKTimetaken:0.356seconds删除分区一次删除一个分区hive>altertablehive_dbdroppartition(mouth="202001407");Droppedthepartitionmouth=202001407OKTimetaken:0.601seconds一次删除多个分区注意逗号分隔hive>altertablehive_dbdroppartition(mouth="20200110"),partition(mouth="20200111");Droppedthepartitionmouth=20200110Droppedthepartitionmouth=20200111OKTimetaken:0.371seconds加载数据到分区表#加载loaldatalocalinpath'/root/student.txt'intotablehive_dbpartitionh(mouth=20200107);#查看内容发现分区也显示出来了hive>select*fromhive_db;OK1Bob2020014072Black2020014073Jeck202001407Timetaken:0.103seconds,Fetched:3row(s)查看HDFS上的Hive分区表数据查看分区数据查询全部数据#会把多个分区数据一起查询出来hive>select*fromhive_db;OK1Bob2020014072Black2020014073Jeck2020014071Bob2020014082Black2020014083Jeck202001408Timetaken:0.116seconds,Fetched:6row(s)查询单个分区的数据hive>select*fromhive_dbwheremouth=202001407;OK1Bob2020014072Black2020014073Jeck202001407Timetaken:0.376seconds,Fetched:3row(s)二级分区表二级分区表就是在一级分区表的基础上，在加一个字段createtablestu(idint,namestring)partitionedby(mouthstring,daystring)rowformatdelimitedfieldsterminatedby'\t';导入数据到二级分区表语法loaddatalocalinpath'文件'intotablestupartition(mouth="01",day="07");partition()里面是对应的分区字段hive>loaddatalocalinpath'/root/student.txt'intotablestupartition(mouth="01",day="07");Loadingdatatotabletest.stupartition(mouth=01,day=07)Partitiontest.stu{mouth=01,day=07}stats:[numFiles=1,numRows=0,totalSize=21,rawDataSize=0]OKTimetaken:0.554seconds扩展三级分区、四级分区...都是一样的道理，但很没有必要

Hive 的分区表相关操作

2020-1-7 1112 0

未分类

区别X外部表内部表(管理表)创表关键字需要加external不需要加表被删除时数据不会被删除数据会被删除创建表的区别在创建外部表时需要加external关键字createexternaltablestud(idint,namestring)rowformatdelimitedfieldsterminatedby'\t';表被删除时表被删除时外部表使用showtables;查询不到，但数据还在。如果再按照原来的表格式创建表，内容会自动加载到表里。内部表被删除时，数据会跟着被删除。使用场景当有一份数据，可能使用它来分析的不止一个部门，还会有许多地方共享这份数据，所以不能在我分析完删除时，把元数据也删除了外部表与内部表的相互转换内部表改为外部表：altertable表名settblproperties('EXTERNAL'='TRUE');外部表改为内部表：altertable表名settblproperties('EXTERNAL'='FLASE');注意后面的tblproperties('EXTERNAL'='布尔值');是一定的，引号是单引号，布尔值要大写查看表的详细信息查询语句descformatted表名hive>descformattedstud;OK#col_namedata_typecommentidintnamestring#DetailedTableInformationDatabase:defaul#该表所在的数据库Owner:root#该表的用户CreateTime:MonJan0611:37:37CST2020LastAccessTime:UNKNOWNProtectMode:NoneRetention:0Location:hdfs://192.168.176.65:9000/user/hive/warehouse/studTableType:MANAGED_TABLE#说明表示管理表（内部表）TableParameters:COLUMN_STATS_ACCURATEtruenumFiles1totalSize21transient_lastDdlTime1578281909#StorageInformationSerDeLibrary:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeInputFormat:org.apache.hadoop.mapred.TextInputFormatOutputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatCompressed:NoNumBuckets:-1BucketColumns:[]SortColumns:[]StorageDescParams:field.delim\tserialization.format\tTimetaken:0.104seconds,Fetched:31row(s)

Hive 外部表和内部表的区别及介绍

2020-1-6 1406 0

未分类

组件介绍Hive开启了hiveserver2才能远程连接到Hivehiveserver2:可以理解为一个服务端beeline:可以理解为一个客户端#它们在Hive的bin/目录下[root@masterbin]#ll总用量32-rwxr-xr-x.1rootroot10314月12017beelinedrwxr-xr-x.3rootroot409612月2616:43ext-rwxr-xr-x.1rootroot78444月12017hive-rwxr-xr-x.1rootroot19001月82016hive-config.sh-rwxr-xr-x.1rootroot8851月82016hiveserver2-rwxr-xr-x.1rootroot8321月82016metatool-rwxr-xr-x.1rootroot8841月82016schematool小测试搭建的是Hadoop伪分布式进行测试开启两个窗口，一个运行hiveserver2，另一个运行beeline运行hiveserver2当做服务端下图这样挂住就算开启另一个窗口执行beeline当做客户端，当然客户端不一定是beeline，还能是其他的一些客户端，但连接的地址、用户名、密码是一个道理连接Hive，进行查询连接地址：!connectjdbc:hive2://master:10000拓展beeline的所有命令0:jdbc:hive2://master:10000>help!addlocaldriverjarAdddriverjarfileinthebeelineclientside.!addlocaldrivernameAdddrivernamethatneedstobesupportedinthebeelineclientside.!allExecutethespecifiedSQLagainstallthecurrentconnections!autocommitSetautocommitmodeonoroff!batchStartorexecuteabatchofstatements!briefSetverbosemodeoff!callExecuteacallablestatement!closeClosethecurrentconnectiontothedatabase!closeallCloseallcurrentopenconnections!columnsListallthecolumnsforthespecifiedtable!commitCommitthecurrenttransaction(ifautocommitisoff)!connectOpenanewconnectiontothedatabase.!dbinfoGivemetadatainformationaboutthedatabase!describeDescribeatable!dropallDropalltablesinthecurrentdatabase!exportedkeysListalltheexportedkeysforthespecifiedtable!goSelectthecurrentconnection!helpPrintasummaryofcommandusage!historyDisplaythecommandhistory!importedkeysListalltheimportedkeysforthespecifiedtable!indexesListalltheindexesforthespecifiedtable!isolationSetthetransactionisolationforthisconnection!listListthecurrentconnections!manualDisplaytheBeeLinemanual!metadataObtainmetadatainformation!nativesqlShowthenativeSQLforthespecifiedstatement!nullemptystringSettotruetogethistoricbehaviorofprintingnullasemptystring.Defaultisfalse.!outputformatSettheoutputformatfordisplayingresults(table,vertical,csv2,dsv,tsv2,xmlattrs,xmlelements,anddeprecatedformats(csv,tsv))!primarykeysListalltheprimarykeysforthespecifiedtable!proceduresListalltheprocedures!propertiesConnecttothedatabasespecifiedinthepropertiesfile(s)!quitExitstheprogram!reconnectReconnecttothedatabase!recordRecordalloutputtothespecifiedfile!rehashFetchtableandcolumnnamesforcommandcompletion!rollbackRollbackthecurrenttransaction(ifautocommitisoff)!runRunascriptfromthespecifiedfile!saveSavethecurrentvariabesandaliases!scanScanforinstalledJDBCdrivers!scriptStartsavingascripttoafile!setSetabeelinevariable!shExecuteashellcommand!sqlExecuteaSQLcommand!tablesListallthetablesinthedatabase!typeinfoDisplaythetypemapforthecurrentconnection!verboseSetverbosemodeonComments,bugreports,andpatchesgoto???0:jdbc:hive2://master:10000>

Hive 的 hiveserver2 和 beeline 使用

2020-1-6 1807 0

关于 Hive 的文章共有16条