安装前说明版本说明组件版本下载地址Hadoop2.6.0点我下载ZooKeeper3.4.5点我下载机器分配机器IP机器名分配192.168.176.61masternamenode192.168.176.62slave1namenode192.168.176.63slave2其他修改机器名hostnamectlset-hostnamemasterhostnamectlset-hostnameslave1hostnamectlset-hostnameslave2修改IP与机器名的映射#修改的文件:vi/etc/hosts#追加192.168.176.61master192.168.176.62slave1192.168.176.63slave2配置免密登陆有namenode的机器能免密登陆其他机器#每台机器执行:ssh-keygen-trsa#在准备有namenode进程的机器上执行ssh-copy-id-i/root/.ssh/id_rsa.pub机器名#再把准备有namenode进程机器上的密钥发送给其他机器#master机器scp/root/.ssh/authorized_keysroot@slave1:/root/.ssh/scp/root/.ssh/authorized_keysroot@slave2:/root/.ssh/#slave1机器scp/root/.ssh/authorized_keysroot@master:/root/.ssh/scp/root/.ssh/authorized_keysroot@slave2:/root/.ssh/关闭每台机器的防火墙#关闭防火墙systemctlstopfirewalld.service#禁止防火墙开机自启systemctldisablefirewalld.service#查看防火墙状态firewall-cmd--state配置ZooKeeper解压ZooKeeper并配置环境变量[root@masterzookeeper]#tar-zxvfzookeeper-3.4.5.tar.gz[root@masterzookeeper]#pwd/usr/local/src/zookeeper#追加到环境变量[root@masterzookeeper]#vi~/.bash_profile#zookeeperexportZK_HOME=/usr/local/src/zookeeper/zookeeper-3.4.5exportPATH=$PATH:$ZK_HOME/bin:[root@masterzookeeper]#source~/.bash_profile配置ZooKeeper集群进入到ZooKeeper的conf/目录下#拷贝zoo_sample.cfg并重命名为zoo.cfg[root@masterconf]#cpzoo_sample.cfgzoo.cfg修改zoo.cfg文件#第一处修改如没有也可以自己加上这个路径需要自己创建好#examplesakes.dataDir=/usr/local/src/zookeeper/DataZk#在最后添加,指定myid集群主机及端口,机器数必须为奇数server.1=192.168.176.61:2888:3888server.2=192.168.176.62:2888:3888server.3=192.168.176.63:2888:3888进入DataZk目录添加Zookeeper用于识别当前机器的ID[root@masterDataZk]#echo1>myid[root@masterDataZk]#catmyid1#myid文件中为1,即表示当前机器为在zoo.cfg中指定的server.1分发配置到其他机器在/usr/local/src目录下执行[root@mastersrc]#pwd/usr/local/src[root@mastersrc]scp-rzookeeper/root@slave1:/usr/local/src/[root@mastersrc]scp-rzookeeper/root@slave2:/usr/local/src/修改其他机器的myid文件#在slave1上root@slave1src]#echo2>/usr/local/src/zookeeper/DataZk/myid#在slave2上root@slave2src]#echo3>/usr/local/src/zookeeper/DataZk/myid启动Zookeeper集群在Zookeeper的bin/目录下启动#分别在master、slave1、slave2执行./zkServer.shstart[root@masterbin]#./zkServer.shstartJMXenabledbydefaultUsingconfig:/usr/local/src/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfgStartingzookeeper...STARTED查看状态只有其中一个是leader,其他的都是follower注意:leader需要在其中的一个有namenode进程的机器上。#master查看状态[root@masterbin]#./zkServer.shstatusJMXenabledbydefaultUsingconfig:/usr/local/src/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfgMode:leader#slave1查看状态[root@slave1bin]#./zkServer.shstatusJMXenabledbydefaultUsingconfig:/usr/local/src/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfgMode:follower#slave2查看状态[root@slave2bin]#zkServer.shstatusJMXenabledbydefaultUsingconfig:/usr/local/src/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfgMode:follower修改Hadoop配置文件在Hadoop的etc/hadoop目录下修改core-site.xml文件<!--hdfs地址,ha模式中是连接到nameservice--><property><name>fs.defaultFS</name><value>hdfs://ns</value></property><!--这里的路径默认是NameNode、DataNode、JournalNode等存放数据的公共目录,也可以单独指定--><property><name>hadoop.tmp.dir</name><value>/usr/local/src/hadoop/tmp</value></property><!--指定ZooKeeper集群的地址和端口。注意,数量一定是奇数,且不少于三个节点--><property><name>ha.zookeeper.quorum</name><value>master:2181,slave1:2181,slave2:2181</value></property>修改mapred-site.xml文件需要复制一份模板cpmapred-site.xml.templatemapred-site.xml<property><!--指定mapreduce运行在yarn上--><name>mapreduce.framework.name</name><value>yarn</value></property>修改hdfs-site.xml文件<!--执行hdfs的nameservice为ns,需要与core-site.xml中定义的fs.defaultFS一致--><property><name>dfs.nameservices</name><value>ns</value></property><!--ns下有两个namenode,分别是nn1、nn2--><property><name>dfs.ha.namenodes.ns</name><value>nn1,nn2</value></property><!--nn1的RPC通信地址--><property><name>dfs.namenode.rpc-address.ns.nn1</name><value>master:9000</value></property><!--nn1的http通信地址--><property><name>dfs.namenode.http-address.ns.nn1</name><value>master:50070</value></property><!--nn2的RPC通信地址--><property><name>dfs.namenode.rpc-address.ns.nn2</name><value>slave1:9000</value></property><!--nn2的http通信地址--><property><name>dfs.namenode.http-address.ns.nn2</name><value>slave1:50070</value></property><!--指定namenode的元数据在JournalNode上的存放位置,这样,namenode2可以从jn集群里获取最新的namenode的信息,达到热备的效果--><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://master:8485;slave1:8485;slave2:8485/ns</value></property><!--指定JournalNode存放数据的位置--><property><name>dfs.journalnode.edits.dir</name><value>/usr/local/src/hadoop/journal</value></property><!--开启namenode故障时自动切换--><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><!--配置切换的实现方式--><property><name>dfs.client.failover.proxy.provider.ns</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><!--配置隔离机制:通过秘钥隔离机制--><property><name>dfs.ha.fencing.methods</name><value>sshfence</value></property><!--配置隔离机制的ssh登录秘钥所在的位置--><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/root/.ssh/id_rsa</value></property><!--配置namenode数据存放的位置,可以不配置,如果不配置,默认用的是core-site.xml里配置的hadoop.tmp.dir的路径--><property><name>dfs.namenode.name.dir</name><value>file:///usr/local/src/hadoop/tmp/namenode</value></property><!--配置datanode数据存放的位置,可以不配置,如果不配置,默认用的是core-site.xml里配置的hadoop.tmp.dir的路径--><property><name>dfs.datanode.data.dir</name><value>file:///usr/local/src/hadoop/tmp/datanode</value></property><!--配置block副本数量--><property><name>dfs.replication</name><value>3</value></property><!--设置hdfs的操作权限,false表示任何用户都可以在hdfs上操作文件--><property><name>dfs.permissions</name><value>false</value></property>修改yarn-site.xml文件<!--开启YARNHA--><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><!--指定两个resourcemanager的名称--><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value></property><!--配置rm1,rm2的主机--><property><name>yarn.resourcemanager.hostname.rm1</name><value>master</value></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>slave1</value></property><!--开启yarn恢复机制--><property><name>yarn.resourcemanager.recovery.enabled</name><value>true</value></property><!--执行rm恢复机制实现类--><property><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value></property><!--配置zookeeper的地址--><property><name>yarn.resourcemanager.zk-address</name><value>master:2181,slave1:2181,slave2:2181</value><description>Formultiplezkservices,separatethemwithcomma</description></property><!--指定YARNHA的名称--><property><name>yarn.resourcemanager.cluster-id</name><value>yarn-ha</value></property><property><!--指定yarn的老大resoucemanager的地址--><name>yarn.resourcemanager.hostname</name><value>master</value></property><property><!--NodeManager获取数据的方式--><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>修改slaves文件指定datanode节点[root@masterhadoop]#catslavesmasterslave1slave2配置Hadoop和ZooKeeper环境为了方便执行命令,配置过就不用配置了[root@master/]#vi~/.bash_profile#追加到后面#hadoopexportHADOOP_HOME=/usr/local/src/hadoop/hadoop-2.6.0exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:#zookeeperexportZK_HOME=/usr/local/src/zookeeper/zookeeper-3.4.5exportPATH=$PATH:$ZK_HOME/bin:[root@master/]#source~/.bash_profile分发配置在/usr/local/src目录下执行#分发Hadoop文件scp-rhadoop/root@slave1:/usr/local/src/scp-rhadoop/root@slave2:/usr/local/src/#分发环境变量scp~/.bash_profileroot@slave1:~/scp~/.bash_profileroot@slave2:~/启动Zookeeper集群启动结果是leader需要在其中一个namenode上,如果不是,请杀死每台机器的QuorumPeerMain进程,重新启动#每台机器执行#启动命令zkServer.shstart#查询状态命令zkServer.shstatus格式化Zookeeper的leader节点hdfszkfc-formatZK启动Hadoop集群两个NameNode为了数据同步,会通过一组称作JournalNodes的独立进程进行相互通信。当active状态的NameNode的命名空间有任何修改时,会告知大部分的JournalNodes进程。standby状态的NameNode有能力读取JNs中的变更信息,并且一直监控editlog的变化,把变化应用于自己的命名空间。standby可以确保在集群出错时,命名空间状态已经完全同步了。启动journalNode集群,用于主备节点的信息同步在每台机器上都上输入hadoop-daemon.shstartjournalnode格式化Zookeeper的leader上的namenode进程在准备的namenode上还有leader的机器执行hdfsnamenode-format启动有leader的namenode进程作为活跃(active)在有namenode还有leader的机器执行hadoop-daemon.shstartnamenode设置另一个namenode作为备用(standby)在准备的另一个namenode上执行hdfsnamenode-bootstrapStandby启动备用(standby)的namenode进程备用(standby)上执行hadoop-daemon.shstartnamenode启动所有的datanode进程在活跃(active)机器上执行hadoop-daemons.shstartdatanode启动zkfc用于检测namenode的监控状态和选举在有namenode的机器上执行hadoop-daemon.shstartzkfc启动yarn资源管理在leader上执行start-yarn.sh下载主备切换依赖主备机器都要下载yuminstallpsmisc查看各个机器的进程活跃机器备用机器其他JpsJpsJpsDataNodeDataNodeDataNodeJournalNodeJournalNodeJournalNodeQuorumPeerMainQuorumPeerMainQuorumPeerMainNodeManagerNodeManagerNodeManagerResourceManagerResourceManagerDFSZKFailoverControllerDFSZKFailoverControllerNameNodeNameNodeHadoop高可用测试查看活跃机器查看备份机器现在杀死活跃机器的namenode进程,再查看备份机器的状态,发现主备切换了启动杀死namenode进程的机器,再查看两机器状态hadoop-daemon.shstartnamenode安装好后,启动HadoopHA集群首先启动Zookeeper的集群查看选举每台机器执行#启动命令zkServer.shstart#查询状态命令zkServer.shstatus#结果也是要选择一个leader在一个namenode上,不然杀死QuorumPeerMain进程,重新选举在一个namenode不是leader上启动全部进程[root@master~]#start-all.shThisscriptisDeprecated.Insteadusestart-dfs.shandstart-yarn.shStartingnamenodeson[masterslave1]slave1:startingnamenode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-slave1.outmaster:startingnamenode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-master.outslave1:startingdatanode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-slave1.outslave2:startingdatanode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-slave2.outmaster:startingdatanode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-master.outStartingjournalnodes[masterslave1slave2]slave1:startingjournalnode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-journalnode-slave1.outmaster:startingjournalnode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-journalnode-master.outslave2:startingjournalnode,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-journalnode-slave2.outStartingZKFailoverControllersonNNhosts[masterslave1]slave1:startingzkfc,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-zkfc-slave1.outmaster:startingzkfc,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/hadoop-root-zkfc-master.outstartingyarndaemonsstartingresourcemanager,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-resourcemanager-master.outslave1:startingnodemanager,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-slave1.outslave2:startingnodemanager,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-slave2.outmaster:startingnodemanager,loggingto/usr/local/src/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-master.out
单机安装Spark的单机安装是很简单的解压安装包[root@masterspark]#tar-zxvfspark-2.0.0-bin-hadoop2.6.tgz[root@masterspark]#ll总用量0drwxr-xr-x.125005001937月202016spark-2.0.0-bin-hadoop2.6[root@masterspark]#pwd/usr/local/src/spark配置环境变量[root@masterspark]#vi~/.bash_profile#加入环境变量#SparkexportSPARK_HOME=/usr/local/src/spark/spark-2.0.0-bin-hadoop2.6exportPATH=$PATH:$SPARK_HOME/bin:[root@masterspark]#source~/.bash_profile安装测试Spark的shell命令[root@masterspark]#spark-shellUsingSpark'sdefaultlog4jprofile:org/apache/spark/log4j-defaults.propertiesSettingdefaultloglevelto"WARN".Toadjustlogginglevelusesc.setLogLevel(newLevel).20/01/0215:29:53WARNNativeCodeLoader:Unabletoloadnative-hadooplibraryforyourplatform...usingbuiltin-javaclasseswhereapplicable20/01/0215:29:55WARNUtils:Service'SparkUI'couldnotbindonport4040.Attemptingport4041.20/01/0215:29:55WARNSparkContext:UseanexistingSparkContext,someconfigurationmaynottakeeffect.SparkcontextWebUIavailableathttp://192.168.176.61:4041Sparkcontextavailableas'sc'(master=local[*],appid=local-1577950195083).Sparksessionavailableas'spark'.Welcometo______/__/__________//___\\/_\/_`/__/'_//___/.__/\_,_/_//_/\_\version2.0.0/_/UsingScalaversion2.11.8(JavaHotSpot(TM)64-BitServerVM,Java1.8.0_221)Typeinexpressionstohavethemevaluated.Type:helpformoreinformation.scala>启动后的控制台界面和web界面,这样就算安装成功Spark的集群安装在单机模式的安装上,再进行相应的配置修改slaves进入Spakr的conf/目录下,复制一个slaves模板#复制slaves模板[root@masterconf]#cpslaves.templateslaves#打开slaves[root@masterconf]#vislaves#加入节点机器的映射名称,映射在配置Hadoop集群的时候已经配置masterslave1slave2修改spark-env.shexportJAVA_HOME=/usr/lib/jvm/jdk8u191-b12exportSCALA_HOME=/home/modules/spark-2.3.0/examples/src/main/scalaexportHADOOP_HOME=/home/modules/hadoop-2.8.3exportHADOOP_CONF_DIR=/home/modules/hadoop-2.8.3/etc/hadoopexportSPARK_HOME=/home/modules/spark-2.3.0exportSPARK_DIST_CLASSPATH=$(/home/modules/hadoop-2.8.3/bin/hadoopclasspath)exportLD_LIBRARY_PATH=/home/modules/hadoop-2.8.3/lib/nativeexportYARN_CONF_DIR=/home/modules/hadoop-2.8.3/etc/hadoopexportSPARK_MASTER_IP=node1分发配置到其他机器上分发Sparkscp-rspark/root@slave1:/usr/local/src/scp-rspark/root@slave2:/usr/local/src/分发环境变量scp~/.bash_profileroot@slave1:~/.bash_profilescp~/.bash_profileroot@slave1:~/.bash_profile启动Spark集群进入Spark的sbin/目录下执行启动命令[root@mastersbin]#start-all.shorg.apache.spark.deploy.master.Masterrunningasprocess14018.Stopitfirst.master:org.apache.spark.deploy.worker.Workerrunningasprocess14365.Stopitfirst.slave1:org.apache.spark.deploy.worker.Workerrunningasprocess1952.Stopitfirst.slave2:org.apache.spark.deploy.worker.Workerrunningasprocess2616.Stopitfirst.启动测试个机器出现以下就是搭建成功masterslave1slave2NodeManagerNodeManagerNodeManagerJpsJpsJpsDataNodeDataNodeDataNodeWorkerWorkerWorkerNameNodeMasterSecondaryNameNodeResourceManagerSparkSubmitSparkSubmitSparkSubmit进入Master:8080查看集群模式网页
案例说明使用Flume监控一个本地目录,如果目录有新的文件产生,则会自动传到HDFS上配置本案例任务需要的Flume文件在Flume的conf/目录下创建,名称为spooldir-hdfs.properties#相当定义Flume的三个组件变量名agent1.sources=source1agent1.sinks=sink1agent1.channels=channel1#配置source组件agent1.sources.source1.type=spooldir#监控的本地路径agent1.sources.source1.spoolDir=/home/logs/agent1.sources.source1.fileHeader=false#配置拦截器agent1.sources.source1.interceptors=i1agent1.sources.source1.interceptors.i1.type=hostagent1.sources.source1.interceptors.i1.hostHeader=hostname#配置sink组件agent1.sinks.sink1.type=hdfsagent1.sinks.sink1.hdfs.path=hdfs://192.168.176.65:9000/locadir/flume_log/%y-%m-%d/%H-%Magent1.sinks.sink1.hdfs.filePrefix=eventsagent1.sinks.sink1.hdfs.maxOpenFiles=5000agent1.sinks.sink1.hdfs.batchSize=100agent1.sinks.sink1.hdfs.fileType=DataStreamagent1.sinks.sink1.hdfs.writeFormat=Textagent1.sinks.sink1.hdfs.rollSize=102400agent1.sinks.sink1.hdfs.rollCount=1000000agent1.sinks.sink1.hdfs.rollInterval=60#agent1.sinks.sink1.hdfs.round=true#agent1.sinks.sink1.hdfs.roundValue=10#agent1.sinks.sink1.hdfs.roundUnit=minuteagent1.sinks.sink1.hdfs.useLocalTimeStamp=true#Useachannelwhichbufferseventsinmemoryagent1.channels.channel1.type=memoryagent1.channels.channel1.keep-alive=120agent1.channels.channel1.capacity=500000agent1.channels.channel1.transactionCapacity=600#Bindthesourceandsinktothechannelagent1.sources.source1.channels=channel1agent1.sinks.sink1.channel=channel1任务命令说明下面命令是在FLume的根目录里执行的flume-ngagent-cconf-fconf/spooldir-hdfs.properties-nagent1-Dflume.root.logger=INFO,console参数作用实例conf或-c指定配置文件的地址,包含flume-env.sh和log4j的配置文件-cconf-conf-file或-f当前任务的配置文件地址-fconf-fileconf/spooldir-hdfs.properties-name或-nagent名称-nameagent1-zzookeeper连接的字符串-zzkhost:2181,zkhost1:2181-pzookeeper中的存储路径前缀-p/flume-Dflume启动日志打印到当前控制台-Dflume.root.logger=INFO,console测试任务开启两个窗口,其中一个进入监控的目录,查看HDFS上的目录另一个窗口运行命令,这样就运行成功在/home/logs目录下创建一个文件,就会发现该文件被自动上传到了HDFS上
本文说明本文只是简单的安装,没有使用案例FLume是一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输系统可以实时的把本地文件系统的数据,传输到HDFS上解压[root@master~]#cd/usr/local/src/flume/[root@masterflume]#tar-zxvfapache-flume-1.6.0-bin.tar.gz配置环境变量把flume的bin/目录加入环境变量[root@masterflume]#vi~/.bash_profile#FlumeexportFLUME_HOME=/usr/local/src/flume/apache-flume-1.6.0-binexportPATH=$PATH:$FLUME_HOME/bin:[root@masterflume]#source~/.bash_profile修改配置进入flume的conf/目录下#复制一份模板flume-env.sh[root@masterconf]#cpflume-env.sh.templateflume-env.sh#修改flume-env.sh[root@masterconf]#viflume-env.sh#加入Java的路径#Enviromentvariablescanbesethere.#exportJAVA_HOME=/usr/lib/jvm/java-6-sunexportJAVA_HOME=/usr/java/jdk1.8.0_221#更新flume-env.sh文件[root@masterconf]#sourceflume-env.sh查看Flume的版本,这样就是安装成功了,Flume没有像Hadoop的namenode那样的服务节点Flume在使用时,就是提交一个任务,然后它去执行#注意flume的命令是flume-ng开头[root@masterconf]#flume-ngversionFlume1.6.0Sourcecoderepository:https://git-wip-us.apache.org/repos/asf/flume.gitRevision:2561a23240a71ba20bf288c7c2cda88f443c2080CompiledbyhshreedharanonMonMay1111:15:44PDT2015Fromsourcewithchecksumb29e416802ce9ece3269d34233baf43f
Hive的基本数据类型与Java类似Hive的数据类型Java是数据类型长度TINYINTbyte1byte整数SMALINTshort2byte整数INTint4byte整数BIGINTlong8byte整数BOOLEANboolean布尔值FLOATfloat单精度浮点数DOUBLEdouble双精度浮点数STRINGstring字符串TIMESTAMP时间类型BINARY字节数组Hive的集合数据类型数据类型描述语法实例实例ARRAY相当于Java的数组array<基本类型>[‘Bob’,’bigdataboy’,’cn’]MAP相当与Java的Array集合,键值对map<基本类型-键,基本类型-值>{‘name’,’bigdataboy’}STRUCT是一个复合结构类型struct<>{‘province’:基本类型,’city’:基本类型}集合数据类型测试字段及类型四个字段namestring,friendsarray<string>,childrenmap<string,int>,addrstruct<'省':string,'市':string>分割符说明字段使用,(英文逗号)分隔array<>类型使用_(下划线)分割map<>类型手机用:(冒号)分割测试数据#数据说明Bob,第一个字段aa_bb,第二个array<>字段aa:12_bb:13,第三个map<>字段四川_成都第四个struct<>字段#完整测试数据Bob,aa_bb,aa:12_bb:13,四川_成都Black,cc_dd,cc:24_dd:23,四川_泸州Hive创建表createtableinfor(namestring,friendsarray<string>,childrenmap<string,int>,addrstruct<province:string,city:string>)rowformatdelimitedfieldsterminatedby','collectionitemsterminatedby'_'mapkeysterminatedby':'linesterminatedby'\n';在Hive中创建好表,加载到Hive中loaddatalocalinpath'加载的测试数据文件路径'intotableinfor;Hive查看加载的数据#查看所有数据hive>select*frominfor;OKBob["aa","bb"]{"aa":12,"bb":13}{"province":"四川","city":"成都"}Black["cc","dd"]{"cc":24,"dd":23}{"province":"四川","city":"泸州"}Timetaken:2.4seconds,Fetched:2row(s)#查看不同类型的数据hive>selectname,friends[0],children['aa'],addr.provincefrominfor;OKBobaa12四川BlackccNULL四川Timetaken:0.098seconds,Fetched:2row(s)
以下命令针对的是Hive1.2.2所有的shell命令[root@master~]#hive-helpusage:hive-d,--define<key=value>Variablesubsitutiontoapplytohivecommands.e.g.-dA=Bor--defineA=B--database<databasename>Specifythedatabasetouse-e<quoted-query-string>SQLfromcommandline-f<filename>SQLfromfiles-H,--helpPrinthelpinformation--hiveconf<property=value>Usevalueforgivenproperty--hivevar<key=value>Variablesubsitutiontoapplytohivecommands.e.g.--hivevarA=B-i<filename>InitializationSQLfile-S,--silentSilentmodeininteractiveshell-v,--verboseVerbosemode(echoexecutedSQLtotheconsole)常用命令1-e无需打开Hive执行HQL命令[root@master~]#hive-e'select*fromstudent'Logginginitializedusingconfigurationinjar:file:/usr/local/src/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.propertiesOK1aa2bb3ccTimetaken:12.088seconds,Fetched:3row(s)常用命令2-f无需打开Hive执行文件中的HQL命令\>把查询结果导出到文件[root@master~]#hive-fstu.hql>stu_res.txtLogginginitializedusingconfigurationinjar:file:/usr/local/src/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.propertiesOKTimetaken:12.065seconds,Fetched:3row(s)[root@master~]#ll总用量16-rw-------.1rootroot134011月422:29anaconda-ks.cfg-rw-r--r--.1rootroot2312月2817:22stu.hql-rw-r--r--.1rootroot1512月2817:25stu_res.txt[root@master~]#catstu_res.txt1aa2bb3ccHive常用的交互命令退出命令hive>exit;先隐性提交数据,再退出hive>quit;不提交数据,退出打开HDFS目录dfs-ls/hive>dfs-ls/>;Found2itemsdrwx-wx-wx-rootsupergroup02019-12-2616:46/tmpdrwxr-xr-x-rootsupergroup02019-12-2816:13/user打开本地目录!ls/roothive>!ls/root;anaconda-ks.cfgstudent.txtstu.hqlstu_res.txt查看Hive中输入的所有历史命令在当前用户的根目录查看cat.hivehistory
1、问题描述[ERROR]Terminalinitializationfailed;fallingbacktounsupported处理删除$HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar文件然后重新启动hive即可2、问题描述[root@masterconf]#schematool-dbTypemysql-initSchemaMetastoreconnectionURL:jdbc:mysql://192.168.176.65:3306/hiveMetastoreConnectionDriver:com.mysql.jdbc.DriverMetastoreconnectionUser:rootorg.apache.hadoop.hive.metastore.HiveMetaException:Failedtogetschemaversion.***schemaToolfailed***[root@masterconf]#cd..处理开启MySQL的远程访问,并重启MySQL
相关版本名称详情Hive1.2.2(下载地址)MySQL驱动connector-java-5.1.48(下载地址)MySQL安装教程地址解压Hive的tar包#进入src目录cd/usr/local/src/#创建hive目录mkdirhive#解压tar包到创建的hive目录tar-zxvfapache-hive-1.2.2-bin.tar.gz-Chive/配置环境变量#编辑环境变量vi~/.bash_profile#加入环境变量#HiveexportHIVE_HOME=/usr/local/src/hive/apache-hive-1.2.2-binexportPATH=$HIVE_HOME/bin:$PATH:#更新环境变量source~/.bash_profile修改Hive配置文件#进入conf[root@masterconf]#ll总用量188-rw-rw-r--.1rootroot11394月302015beeline-log4j.properties.template-rw-rw-r--.1rootroot1684316月192015hive-default.xml.template-rw-rw-r--.1rootroot23784月302015hive-env.sh.template-rw-rw-r--.1rootroot26624月302015hive-exec-log4j.properties.template-rw-rw-r--.1rootroot30504月302015hive-log4j.properties.template-rw-rw-r--.1rootroot15934月302015ivysettings.xml#复制hive-env.sh模板[root@masterconf]#cphive-env.sh.templatehive-env.sh#编辑hive-env.sh的内容[root@masterconf]#vihive-env.sh#加入HADOOP_HOME的路径#SetHADOOP_HOMEtopointtoaspecifichadoopinstalldirectoryHADOOP_HOME=/usr/local/src/hadoop/hadoop-2.9.2#修改了HIVE的配置,指定修改的配置的文件#HiveConfigurationDirectorycanbecontrolledby:exportHIVE_CONF_DIR=/usr/local/src/hive/apache-hive-1.2.2-bin/conf#保存退出,更新hive-env.sh文件[root@masterconf]#sourcehive-env.sh尝试进入Hive,在退出在进入之前需要删除一个文件删除$HADOOP_HOME/share/hadoop/yarn/lib/jline-0.9.94.jar文件[root@masterconf]#hiveLogginginitializedusingconfigurationinjar:file:/usr/local/src/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.propertieshive>exit;[root@masterconf]#MySQL配置首先进入MySQL里,修改完成后,记得重启数据库1、开启MySQL远程连接mysql>GRANTALLPRIVILEGESON*.*TO'root'@'%'IDENTIFIEDBY'密码';mysql>flushprivileges;2、创建一个数据库mysql>createdatabasehive;QueryOK,1rowaffected(0.00sec)配置Hive数据库为MySQL需要在$HIVE_HOME的conf目录下复制一个模板如果不复制模板,可以单独创建hive-site.xml文件,注意使用<configuration>标签把配置包裹起来cphive-default.xml.templatehive-site.xml<!--配置MySQL数据库地址--><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://MySQL的IP:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value><description>JDBCconnectstringforaJDBCmetastore</description></property><!--配置MySQL驱动--><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>DriverclassnameforaJDBCmetastore</description></property><!--配置用户名--><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value><description>Usernametouseagainstmetastoredatabase</description></property><!--配置密码--><property><name>javax.jdo.option.ConnectionPassword</name><value>Aa@12345678</value><description>passwordtouseagainstmetastoredatabase</description></property><!--复制模板的需要添加--><property><name>system:java.io.tmpdir</name><value>/usr/local/src/hive/tmpdir</value></property><property><name>system:user.name</name><value>hive</value></property>初始化Hive数据库首先需要把MySQL驱动放入Hive的lib目录下[root@masterconf]#schematool-dbTypemysql-initSchemaMetastoreconnectionURL:jdbc:mysql://192.168.176.65:3306/hiveMetastoreConnectionDriver:com.mysql.jdbc.DriverMetastoreconnectionUser:rootStartingmetastoreschemainitializationto1.2.0Initializationscripthive-schema-1.2.0.mysql.sqlInitializationscriptcompletedschemaToolcompleted(初始化完成)[root@masterconf]#启动Hive[root@masterhive]#hiveLogginginitializedusingconfigurationinjar:file:/usr/local/src/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.propertieshive>推荐阅读:Hive搭建常见错误
1、修改user表的root用户的Host为%mysql>showdatabases;+--------------------+|Database|+--------------------+|information_schema||mysql||performance_schema||sys|+--------------------+4rowsinset(0.00sec)mysql>usemysql;mysql>selectHost,Userfromuser;+-----------+---------------+|Host|User|+-----------+---------------+|localhost|mysql.session||localhost|mysql.sys||localhost|root|+-----------+---------------+3rowsinset(0.00sec)mysql>updateusersetHost='%'whereUser='root';QueryOK,1rowaffected(0.01sec)Rowsmatched:1Changed:1Warnings:0mysql>selectHost,Userfromuser;+-----------+---------------+|Host|User|+-----------+---------------+|%|root||localhost|mysql.session||localhost|mysql.sys|+-----------+---------------+3rowsinset(0.00sec)mysql>flushprivileges;QueryOK,0rowsaffected(0.79sec)mysql>2、直接命令修改mysql>GRANTALLPRIVILEGESON*.*TO'root'@'%'IDENTIFIEDBY'密码';mysql>flushprivileges;这一步一定要做,不然无法成功!这句表示从mysql数据库的grant表中重新加
版本名称详情MySQL5.7(下载地址)平台CentOS7安装MySQL数据库检查CentOS是否安装有MySQL或者mariadb数据库rpm-qa|grep-imysql-qa:检查grep:过滤-i不区分大小写如有安装,卸载掉MySQL或者mariadb数据库[root@masterbin]#rpm-qa|grepmariadbmariadb-libs-5.5.64-1.el7.x86_64#删除mariadb数据库yumremovemariadb-libs-5.5.64-1.el7.x86_64安装MySQL组件以下组件有相互的依赖关系,所以需要安装这样的顺序安装,如果顺序错了,会提示依赖错误rpm-ivhmysql-community-common-5.7.28-1.el7.x86_64.rpmrpm-ivhmysql-community-libs-5.7.28-1.el7.x86_64.rpmrpm-ivhmysql-community-client-5.7.28-1.el7.x86_64.rpmyuminstallperlyuminstallnet-toolsrpm-ivhmysql-community-server-5.7.28-1.el7.x86_64.rpm启动mySQL数据库不启动,不会生成初始密码systemctlrestartmysqld查看随机密码第一种:cat/var/log/mysqld.log第二种:greppassword/var/log/mysqld.log修改密码首先进入数据库密码强度需要大小写母特殊字符数字最少8位SETPASSWORD=PASSWORD('新密码');修改MySQL密码强度策略#查看密码策略SHOWVARIABLESLIKE"%password%";#修改密码长度策略为5就是密码长度最短5位SETGLOBALvalidate_password_length=5;#修改密码强度策略0就不需要大小写字母和符号SETGLOBALvalidate_password_policy=0;
安装MySQL-server出现如下[root@mastermysql]#rpm-ivhMySQL-server-5.6.24-1.el6.x86_64.rpm准备中...#################################[100%]file/usr/share/mysql/charsets/READMEfrominstallofMySQL-server-5.6.24-1.el6.x86_64conflictswithfilefrompackagemariadb-libs-1:5.5.64-1.el7.x86_64file/usr/share/mysql/czech/errmsg.sysfrominstallofMySQL-server-5.6.24-1.el6.x86_64conflictswithfilefrompackagemariadb-libs-1:5.5.64-1.el7.x86_64file/usr/share/mysql/danish/errmsg.sysfrominstallofMySQL-server-5.6.24-1.el6.x86_64conflictswithfilefrompackagemariadb-libs-1:5.5.64-1.el7.x86_64file/usr/share/mysql/dutch/errmsg.sysfrominstallofMySQL-server-5.6.24-1.el6.x86_64conflictswithfilefrompackagemariadb-libs-1:5.5.64-1.el7.x86_64file/usr/share/mysql/english/errmsg.sysfrominstallofMySQL-server-5.6.24-1.el6.x86_64conflictswithfilefrompackagemariadb-libs-1:5.5.64-1.el7.x86_64......原因是系统已经安装了其他版本的mysql-libs包导致不兼容。可以使用命令查看yumlist|grepmysql,查看之前安装的mysql的信息将之前的mysql的lib包删除了即可,删除mysql-libsyumremovemysql-libs安装MySQL没有临时密码centos7通过yum装完mysql5.7,找不到root密码原因是原来安装过mysql残留有数据#第一步删除残留文件[root@master~]rm-rf/var/lib/mysql#第二部启动MySQL服务[root@master~]systemctlrestartmysqld#查看密码[root@master~]grep'temporarypassword'/var/log/mysqld.log
安装步骤装载因为下载的是IOS镜像文件,所以需要装载安装自定义安装自定义安装需要软件,立即安装,等待安装完成(可能有点久)安装完成下载激活工具使用时,请关闭杀毒软件下载链接:点击哈勃查毒报告激活检测版本:Office2016ProPlus64位文件名:SW_DVD5_Office_Professional_Plus_2016_64Bit_ChnSimp_MLF_X20-42426.ISO文件大小:1123452928字节MD5:60DC8B1892F611E41140DD3631F39793SHA1:AEB58DE1BC97685F8BC6BFB0A614A8EF6903E318CRC32:8D8AC6D1下载地址:SW_DVD5_Office_Professional_Plus_2016_64Bit_ChnSimp_MLF_X20-42426.ISO(1.05GB)百度下载链接版本:Office2016ProPlus32位文件名:SW_DVD5_Office_Professional_Plus_2016_W32_ChnSimp_MLF_X20-41351.ISO文件大小:986441728字节MD5:49D97BD1B4DFEAAA6B45E3DD3803DAC1SHA1:0218F50774AAB63AF7755B0986CDB9972B853E44CRC32:FF96B0B5下载地址:SW_DVD5_Office_Professional_Plus_2016_W32_ChnSimp_MLF_X20-41351.ISO(940.74MB)