明非 - 大数据男孩 - 学习编程路上的点点滴滴

未分类

问题描述当你修改环境变量之后，source一下，发现使用各种命令都找不到了[root@master]#hive-bash:hive:未找到命令[root@master]#vi~/.bash_profile-bash:vi:未找到命令[root@master]#vim~/.bash_profile-bash:vim:未找到命令解决办法：在命令行中输入：exportPATH=/usr/bin:/usr/sbin:/bin:/sbin:/usr/X11R6/bin这样可以保证命令行命令暂时可以使用。命令执行完之后不要关闭终端，继续下一步；在命令行中输入vi~/.bash_profile进入到环境变量中；仔细检查自己配置的PATH属性是否有错误，（可能是字母错误或者标点符号错误），改正后:wq保存退出或者shift+zz保存退出。执行source~/.bash_profile使配置生效即可。

解决 CentOS 7 因为修改环境变量而导致的找不到命令错误

2019-11-18 2005 0

未分类

斐波那契数列0，1，1，2，3，5，8，13，21，34….前两个数相加，是第三个数的值假如我需要一亿甚至十亿个数，如果使用range()生成，使用一个list()储存。我们使用迭代器设置好生成的方法，什么使用我们什么时候生成，这样可以节约内存斐波那契数列案例需要一个列表存储生成的数字maxCount=100000#需要生成的个数count=0#记录生成的次数startOne=0#斐波那契数列第一个起始值startTwo=1#斐波那契数列第二个起始值fibonacciList=list()whilecount<maxCount:fibonacciList.append(startOne)startOne,startTwo=startTwo,startOne+startTwocount+=1fornuminfibonacciList:print(num)加入迭代器这个过程没有一个列表classFibonacci(object):def__init__(self,allNum):self.allNum=allNum#需要生成的个数self.count=0#记录生成的次数self.startOne=0#斐波那契数列第一个起始值self.startTwo=1#斐波那契数列第二个起始值def__iter__(self):returnselfdef__next__(self):ifself.count<self.allNum:#记录斐波那契数fiboNum=self.startOne#计算斐波那契数self.startOne,self.startTwo=self.startTwo,self.startOne+self.startTwo#生成次数self.count+=1returnfiboNumelse:#自定义一个异常停止循环raiseStopIteration#传入生成个数fibo=Fibonacci(10000)#一个对象可以使用for循环fornuminfibo:print(num)Python迭代器原理简述

Python 迭代器案例：生成斐波那契数列

2019-11-9 1607 0

未分类

迭代器迭代器就是可以使用for循环的对象,list、str、tuple….那怎样才能实现这样的迭代？nums=[12,13,14]strs="python"tuples=(1,2,3,4,5)fornuminnums:print(num)一个对象可迭代的条件需要这个对象有__iter__()方法classClassmate(object):#有了这个方法该类就是可迭代的def__iter__(self):pass添加一个列表和添加方法给Classmate类classClassmate(object):def__init__(self):#存储self.namesList=list()defadd(self,other):#添加到列表里self.namesList.append(other)#有了这个方法该类就是可迭代的def__iter__(self):pass迭代器一个对象有__iter__()和__next__()方法叫迭代器如果使该对象可以返回迭代的数据：就需要一个迭代器的__next__()方法定义一个迭代器类classClassIterator(object):def__iter__(self):passdef__next__(self):pass完善一个可以执行的迭代器迭代对象的__iter__()方法需要返回一个迭代器对象#可迭代对象classClassmate(object):def__init__(self):self.namesList=list()defadd(self,other):self.namesList.append(other)#有了这个方法该类就是可迭代的def__iter__(self):#需要反对一个迭代器对象#把self传给迭代器,使迭代器可以获取self.namesListreturnClassIterator(self)#迭代器对象classClassIterator(object):def__init__(self,obj):self.nameList=obj.namesList#记录循环的次数self.count=0def__iter__(self):passdef__next__(self):ifself.count<len(self.nameList):name=self.nameList[self.count]self.count+=1returnnameelse:#自定义一个异常停止循环raiseStopIteration#主函数classmate=Classmate()#实例化对象classmate.add("小小")classmate.add("大大")classmate.add("中中")#for循环对象fornameinclassmate:print(name)把迭代对象类和迭代器类合并迭代对象和迭代器是两个类，那是否可以合并classClassmate(object):def__init__(self):self.namesList=list()self.count=0defadd(self,other):self.namesList.append(other)#有了这个方法该类就是可迭代的def__iter__(self):#需要反对一个迭代器对象#把列表传给迭代器returnselfdef__next__(self):ifself.count<len(self.namesList):name=self.namesList[self.count]self.count+=1returnnameelse:#自定义一个异常停止循环raiseStopIteration

Python 迭代器原理简述

2019-11-9 1071 0

未分类

CrawlSpider简介Spider可以做很多爬虫了，但是CrawlSpider是为全站爬取而生创建CrawlSpider爬虫工程scrapystartprojectwxapp创建CrawlSpider爬虫scrapygenspider-tcrawl[爬虫名称][爬取网站的域名]scrapygenspider-tcrawlwxapp_spiderwxapp-union.com修改settings.py设置#设置user-agent#Crawlresponsiblybyidentifyingyourself(andyourwebsite)ontheuser-agentUSER_AGENT=''#关闭机器人协议#Obeyrobots.txtrulesROBOTSTXT_OBEY=False#设置延迟#Configureadelayforrequestsforthesamewebsite(default:0)#Seehttps://docs.scrapy.org/en/latest/topics/settings.html#download-delay#SeealsoautothrottlesettingsanddocsDOWNLOAD_DELAY=1#设置headers#Overridethedefaultrequestheaders:DEFAULT_REQUEST_HEADERS={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Language':'en',}#开启pipelines文件写入#Configureitempipelines#Seehttps://docs.scrapy.org/en/latest/topics/item-pipeline.htmlITEM_PIPELINES={'wxapp.pipelines.WxappPipeline':300,}制定Items传输打开items.pyclassWxappItem(scrapy.Item):#definethefieldsforyouritemherelike:#name=scrapy.Field()#文章标题title=scrapy.Field()#文章内容article=scrapy.Field()编写获取页面和文章链接的规则进入/spiders目录，查看wxapp_spider.py的爬虫类是继承于CrawlSpider类可在rules=()里定义多个Rule()规则Rule()参数详解：LinkExtractor(allow=r’匹配规则’)：指定在页面里匹配链接的规则，支持正则表达式callback=’函数名称’：匹配到链接所执行的函数follow=True：爬取了匹配的页面后，是否在匹配页面再进行匹配(可实现翻页操作)name='wxapp_spder'allowed_domains=['www.wxapp-union.com']start_urls=['http://www.wxapp-union.com/portal.php?mod=list&catid=1&page=1']rules=(#设置匹配页数的规则(可以设置多个匹配连接的Rule())Rule(#页数的连接的匹配规则(支持正则表达式)LinkExtractor(allow=r'.+mod=list&catid=1&page=\d+'),#匹配到链接进行解析的函数follow=True),#匹配文章页面链接Rule(#匹配文章链接LinkExtractor(r'http://www.wxapp-union.com/article-(.*)-(.*)\.html'),#匹配到文章链接的解析函数callback='parse_article',follow=False,))添加解析文章页面的函数parse_article()导入fromwxapp.itemsimportWxappItemdefparse_item(self,response):pass#print(response.url)defparse_article(self,response):title=response.xpath('//div[@class="hhmcl"]/div/h1/text()').get()articleList=response.xpath('//td[@id="article_content"]//text()').getall()article="".join(articleList).split()#拼接返回的列表，去除前后空白字符yieldWxappItem(title=title,article=article)存储爬取数据打开pipeline.py文件导入fromscrapy.exportersimportJsonLinesItemExporter模块classWxappPipeline(object):defopen_spider(self,spider)::#以二进制方式打开文件self.ft=open('articleInfo.json','wb')self.exporter=JsonLinesItemExporter(self.ft,ensure_ascii=False,encoding='utf-8')defclose_spider(self,spider):self.ft.close()defprocess_item(self,item,spider):self.exporter.export_item(item)returnitem运行爬虫scrapycrawlwxapp_spider

Scrapy框架-CrawlSpider爬虫案例：爬取小程序社区文章

2019-11-8 1523 0

未分类

0x00创建爬虫工程进入要创建爬虫的文件下，执行下面命令scrapystartprojectbigdataboy0x01创建爬虫项目进入爬虫工程目录，执行命令，创建第一个爬虫scrapygenspiderbigdataboy_spiderbigdataboy.cn0x02设置爬虫打开settings.py文件,取消下面代码的注释#开启pipelines功能#Configureitempipelines#Seehttps://docs.scrapy.org/en/latest/topics/item-pipeline.htmlITEM_PIPELINES={'bigdataboy.pipelines.BigdataboyPipeline':300,}#添加'User-Agent'#Crawlresponsiblybyidentifyingyourself(andyourwebsite)ontheuser-agentUSER_AGENT=''}0x03编写网页的解析打开bigdataboy_spider.py文件有一个defparse(self,response):函数：这个函数的response是Scrapy框架爬取网页的相应返回值，它是一个scrapy.http.response.html.HtmlResponse对象，所以可以使用xpath，css…提取数据#导入item模型frombigdataboy.itemsimportBigdataboyItemdefparse(self,response):#使用xpath解析网页articleUrl=response.xpath('//div[@class="item-desc"]/a/@href').extract()item=BigdataboyItem(url=articleUrl)#使用定义的item定义的传输参数进行传递yielditem#获取下一页的连接nextUrl=response.xpath('//*[@id="pagenavi"]/div/ol//*[@class="next"]/a/@href').get()#print(nextUrl)ifnextUrl:#传入连接，然后执行的函数yieldscrapy.Request(nextUrl)else:return0x04定义item模型打开items.py文件classBigdataboyItem(scrapy.Item):#definethefieldsforyouritemherelike:#name=scrapy.Field()url=scrapy.Field()#自定义的模型0x05导出数据打开pipeline.py文件fromscrapy.exportersimportJsonItemExporterclassBigdataboyPipeline(object):def__init__(self):self.openFile=open("url.json","wb")#需要使用二进制打开文件,因为在导出过程中是使用的[字节]方式#传入文件打开对象self.exporter=JsonItemExporter(self.openFile)#准备导出self.exporter.start_exporting()#爬虫开始执行调用这个函数defopen_spider(self,spider):print("爬虫开始执行")defprocess_item(self,item,spider):#导出数据self.exporter.export_item(item)returnitem#爬虫执行结束调用这个函数defclose_spider(self,spider):#完成导出self.exporter.finish_exporting()#关闭文件打开self.openFile.close()print("爬虫执行完成")0x06运行爬虫执行命令在Pycharm的Terminal里执行scrapycrawlbigdataboy_spider查看运行结果

Scarpy爬虫简单案例：爬取本网站首页链接

2019-11-3 1327 0

未分类

项目背景我们可以通过爬虫来模拟登录来查询自己的成绩，这其中最重要的就是登录这个关卡，只要通过了，就可以方便的查询自己的成绩了。但是我们还是要在法律的允许条件下爬取数据,下列的代码已进行隐私处理，并不针对任何组织。爬虫分析通过抓包，发现登录需要提交学号、密码、验证码、VIEWSTATE通过分析发现其中的VIEWSTATE参数就在网页中，所以我们可以通过正则表达式匹配出来爬虫项目结构importreimportrequestsclassAPI(object):...classTool(object):...API类·通过抓包，知道了以下接口classAPI(object):#登录页GET_INDEX="http://XXXXXXXX:XXXX/"#获取验证码GET_YZM_URL='http://XXXXXXXX:XXXX/CheckCode.aspx'#登录POST_LOGIN='http://XXXXXXXX:XXXX/default2.aspx'Tool类classTool(object):session=requests.session()VIEWSTATE=""#获取VIEWSTATE参数@classmethoddefgetHtml(cls):response=cls.session.get(API.GET_INDEX).textcls.VIEWSTATE=re.search(r'__VIEWSTATE"value="(.*?)"/>',response).group(1)#下载验证码在当前路径@classmethoddefdownload_yzm(cls):yzm_image=cls.session.get(url=API.GET_YZM_URL)withopen("yzm.jpg",'wb')asfile:file.write(yzm_image.content)#登录方法@classmethoddeflogin(cls,account,pwd,yzm):data={"__VIEWSTATE":cls.VIEWSTATE,"TextBox1":account,"TextBox2":pwd,"TextBox3":yzm,"RadioButtonList1":"%D1%A7%C9%FA","Button1":"",}response=cls.session.post(url=API.POST_LOGIN,data=data)response.encoding=response.apparent_encodingresponse=response.texttry:message=re.search(r">alert\('(.*?)'\);</script>",response).group(1)except:#登录成功跳转到详情页xm=re.search(r'<spanid="xhxm">(.*?)同学</span>',response).group(1)print("欢迎"+xm+"登录成功")else:#打印出登录失败信息print(message)主函数defmain():t=Tool()#实例化类t.getHtml()#获取VIEWSTATEt.download_yzm()#下载验证码account=input("请输入你的学号：")pwd=input("请输入你的密码：")yzm=input("请输入验证码：")t.login(account,pwd,yzm)#运行登录方法

Python爬虫项目：模拟登录正方系统

2019-10-28 1463 0

未分类

打开File—>ProjectStructure选择Artifact这里需要注意一下修改DirectoryforMETA-INF/MANIFEST.MF的路径：选择MainClass后，会默认是..../src/...需要改为.../src最后一步生成Jar包点击Build—>BuildArtifacts…点击Build生成Jar包

IDEA 打 Jar 包教程

2019-10-28 1350 0

未分类

需要导入的包注意导入mapreduce的新版本包带mapreduce的importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.LongWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.Mapper;importorg.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importjava.io.IOException;MapReduce程序结构publicclassWordCountApp{//进行分割的map类publicstaticclassMyMapperextendsMapper<LongWritable,Text,Text,LongWritable>{......}//归并操作的Reducer类publicstaticclassMyReducerextendsReducer<Text,LongWritable,Text,LongWritable>{......}//定义Drive类(main类)publicstaticvoidmain(String[]args)throwsException{......}}map类：对读取的每一行单词进行分割操作，形成键值对Reducer类：把分割完成的键值对-->key,value,进行归并求和并输出Drive类：设置MapReduce作业提交、数据的输入路径、map的处理类、Reducer的处理类、数据的输出路径map分割操作的详解Mapper类四个泛型参数前两个参数为map输入类型，后两个参数为map的输出类型LongWritable：输入数据的首行的偏移量(相当于Java的Long类型)Text：输入的每一行的数据(相当于Java的String类型)Text：分割之后产生的键值对键的类型LongWritable：分割之后产生的键值对值的类型publicstaticclassMyMapperextendsMapper<LongWritable,Text,Text,LongWritable>{/**reduce参数：*LongWritablekey：输入数据的行的偏移量*Textvalue：读取的每一行数据*Contextcontext上下文连接**/@Overrideprotectedvoidmap(LongWritablekey,Textvalue,Contextcontext)throwsIOException,InterruptedException{//接收到的每一行数据，转化成Java的字符串Stringline=value.toString();//把字符串进行空格分隔，返回一个数组String[]words=line.split("");//返回字符串数组//利用循环使用context.write(key,value)；组合成k,v形式for(Stringword:words){//通过context(上下文连接)把map分割的k、v输出context.write(newText(word),newLongWritable(1));//前面设置了返回值为Text，LongWritable类型}}}Reducer归并操作的详解Reducer类四个泛型参数前两个参数为Reducer输入类型，后两个参数为Reducer的输出类型Text：map的输出类型，就是Reduse的输入类型LongWritable：map的输出类型，就是Reduse的输入类型Text：进行归并操作之后的键值对-->键的类型LongWritable：进行归并操作之后的键值对-->的值类型publicstaticclassMyReducerextendsReducer<Text,LongWritable,Text,LongWritable>{/**reduce参数：*Textkey：Map操作后的键值对的键*Iterable<LongWritable>values：当进行Map操作之后，一个键可能有很多对应的值所以是一个迭代类型*Contextcontext上下文连接**/@Overrideprotectedvoidreduce(Textkey,Iterable<LongWritable>values,Contextcontext)throwsIOException,InterruptedException{//这里只需要把出现的迭代类型进行求和longsum=0;for(LongWritablevalue:values){//把LongWritable转成Java的数据类型进行求和sum+=value.get();}//最终的统计结果通过上下文连接输出context.write(key,newLongWritable(sum));}}定义Drive类(main类)publicstaticvoidmain(String[]args)throwsException{//抛出异常//创建一个Configuration对象Configurationconfiguration=newConfiguration();//注意是hadoop里的//创建一个Job,如有异常，先把异常抛出Jobjob=Job.getInstance(configuration,"wordCount");//设置Job的处理类job.setJarByClass(WordCountApp.class);//类名称//设置需要处理数据的输入路径FileInputFormat.setInputPaths(job,newPath(args[0]));//路径通过脚本参数传入//设置map的处理主类job.setMapperClass(MyMapper.class);//指定Mapper处理类job.setMapOutputKeyClass(Text.class);//设置map处理类的k输出类型job.setMapOutputValueClass(LongWritable.class);//设置map处理类的v输出类型//设置reducer的处理主类job.setReducerClass(MyReducer.class);//指定Reduse处理类job.setOutputKeyClass(Text.class);//设置reducer处理类的k输出类型job.setOutputValueClass(LongWritable.class);//设置reducer处理类的v输出类型//设置作业的输出路径FileOutputFormat.setOutputPath(job,newPath(args[1]));//路径通过脚本参数传入//提交作业booleanb=job.waitForCompletion(true);//参数为true确定提交//退出程序System.exit(b?0:1);//程序推出的状态码0正常}上传到hadoop执行首先把程序打成jar包。idea打jar教程hadoop执行jar命令：hadoopjarJar名称输入路径输出路径hadoopjarhadooptrain.jarhdfs://192.168.5.137:9000/words.txthdfs://192.168.5.137:9000/output执行结果

MapReduce 的单词统计案例

2019-10-28 1138 0

未分类

权限修饰符访问表publicprotected(default)不写private同一个类YESYESYESYES同一个包YESYESYESNO不同包的子类YESYESNONO不同包非子类YESNONONOYES:可直接访问NO:不可直接访问publicintnum;//public修饰protectedintnumA;//protected修饰intnumB;//default(不写)修饰privateintnumC;//private修饰

Java的四种权限修饰符

2019-10-26 1170 0

未分类

案例背景电脑通常是支持USB设备的功能，通过USB可以连接鼠标、键盘等，还能进行鼠标的点击，键盘的输入等鼠标的特有操作案例分析进行操作的主类，电脑类，实现电脑使用USB鼠标，USB键盘USB接口类：包含打开设备和关闭设备功能电脑类：包含开机、关机和使用USB设备的功能鼠标：实现USB接口，并有点击的功能键盘：实现USB接口，必有输入的方法案例实现定义USB接口类：publicinterfaceUSB{//开启设备publicabstractvoidopen();//关闭接口publicabstractvoidclose();}定义电脑类：publicclassComputer{//开启电脑方法publicvoidon(){System.out.println("开启电脑");}//关闭电脑方法publicvoidoff(){System.out.println("关闭电脑");}//使用设备publicvoiduseDevice(USBuse){use.open();//USB设备的开启//USB设备特有方法的使用if(useinstanceofMouse){//判断Mouse的父类是不是USBMouseuseMouse=(Mouse)use;//向下转型useMouse.click();}elseif(useinstanceofKeyboard){((Keyboard)use).input();//向下转型}use.close();//USB设备的关闭};}鼠标类：publicclassMouseimplementsUSB{@Overridepublicvoidopen(){System.out.println("打开鼠标");}@Overridepublicvoidclose(){System.out.println("关闭鼠标");}//鼠标的特有操作publicvoidclick(){System.out.println("鼠标点击了");}}键盘类：publicclassKeyboardimplementsUSB{//打开键盘@Overridepublicvoidopen(){System.out.println("键盘打开");}@Overridepublicvoidclose(){System.out.println("键盘关闭");}//键盘的特有操作publicvoidinput(){System.out.println("正在输入中......");}}操作的主类：publicclassMainDemo{publicstaticvoidmain(String[]args){//实例化电脑类Computercomputer=newComputer();computer.on();//电脑开机//USB鼠标的操作USBmouse=newMouse();//多态的写法computer.useDevice(mouse);//USB键盘的操作Keyboardkeyboard=newKeyboard();//不是多态写法USBusbKeyboard=keyboard;//向上转型为USB类computer.useDevice(usbKeyboard);computer.off();//电脑关机}}

Java 接口、抽象类、继承、多态的综合案例

2019-10-26 1354 1

未分类

统计字符串中有多少大写字母、小写字母、数字、其他字符提示：char字节可以自动转化为Ascall码进行比较。首先键盘输入Scanners=newScanner(System.in);System.out.print("请输入任意的字符串：");Stringstrs=s.next();把输入的字符串转换成char[]数组char[]charArray=strs.toCharArray();定义计数的变量intcountUpper=0;//大写字母intcountLower=0;//小写字母intcountNum=0;//数字intcountOther=0;//其他循环判断for(charstr:charArray){if('A'<=str&&str<='Z'){countUpper++;continue;};if('a'<=str&&str<='z'){countLower++;continue;};if('0'<=str&&str<='9'){countNum++;continue;};countOther++;}运行结果

Java字符串案例：统计字符串中有多少大写字母、小写字母、数字和其他字符

2019-10-18 1863 0

未分类

特别提示执行前需要启动hadoopIDEA创建Hadoop-Maven项目下载hadoop.dll放入windows的C:\Windows\System32下相关环境windows10hadoop2.9.2伪分布式搭建idea2018.3.50x00JAVA连接HDFS配置连接publicstaticfinalStringHDFS_PATH="hdfs://192.168.5.137:9000";//HDFS路径FileSystemfileSystem=null;//操作文件系统注意选中apache的类Configurationconfiguration=null;//HDFS配置连接创建连接方法@Before//所有执行完之前执行publicvoidsetUp()throwsException{configuration=newConfiguration();//实例化类fileSystem=FileSystem.get(newURI(HDFS_PATH),configuration,"root");//参数：路径配置类用户}0x01执行结束时释放资源@After//所有执行完之后再执行publicvoidtearDown()throwsException{//结束释放资源configuration=null;fileSystem=null;}0x02创建文件夹@Test//单元测试publicvoidmkidr()throwsException{fileSystem.mkdirs(newPath("/HDFSAPI/test"));}0x03删除文件操作@Testpublicvoiddelete()throwsException{//参数：路径递归删除booleanmsg=fileSystem.delete(newPath("/HDFSAPI/test/a.txt"),true);}0x04创建文件并写入内容@Test//单元测试publicvoidcreate()throwsException{//返回值是一个文件流FSDataOutputStreamoutput=fileSystem.create(newPath("/HDFSAPI/test/e.txt"));//通过流写入一个Bytes[]数组output.write("hellohadoop".getBytes());output.flush();//把文件流关闭output.close();}0x05查看文件内容@Test//单元测试publicvoidcat()throwsException{FSDataInputStreamfile=fileSystem.open(newPath("/HDFSAPI/test/e.txt"));//把内容输出到控制台使用hadoop里的类IOUtils.copyBytes(file,System.out,1024);//文件内容对象输出到控制台缓冲区大小//关闭file.close();}0x06文件的重命名@Test//单元测试publicvoidrename()throwsException{//旧文件的路径PatholdPath=newPath("/HDFSAPI/test/a.txt");//新文件的路径booleanmsg=fileSystem.rename(oldPath,newPath);}0x07本地上传文件到HDFS@TestpublicvoidcopyFromLocalFile()throwsException{//本地路径--->我是在windows上测试的所以是如下地址PathLocalPath=newPath("D://data.txt");//上传到HDFS上的路径PathHDFSPath=newPath("/HDFSAPI/test/");fileSystem.copyFromLocalFile(LocalPath,HDFSPath);}0x08大文件上传带进度条提示@TestpublicvoidcopyFromLocalFileWithProgress()throwsException{//获取需要上传的文件InputStreamfile=newBufferedInputStream(//为了提升效率使用BuffernewFileInputStream(//需要把File转换为StreamnewFile("F://BigData/hadoop/hadoop-2.9.2.tar.gz")));//创建上传的文件路径FSDataOutputStreamoutput=fileSystem.create(newPath("/HDFSAPI/test/newhadoop-2.9.2.tar.gz"),//第一个参数可以进行重命名newProgressable(){//第二个参数打印的进度条@Overridepublicvoidprogress(){System.out.print("*");//提示的进度条图案}});//上传IOUtils.copyBytes(file,output,4096);}0x09下载文件@TestpublicvoidcopyToLocalFrom()throwsException{PathhdfsPath=newPath("/HDFSAPI/test/a.txt");//本地路径PathlocalPath=newPath("F://a.txt");fileSystem.copyToLocalFile(hdfsPath,localPath);}0x10查看目录下的所有文件@TestpublicvoidlistFile()throwsException{//需要查看的hdfs目录PathhdfsPath=newPath("/HDFSAPI/test");FileStatus[]fileStatuses=fileSystem.listStatus(hdfsPath);for(FileStatusfile:fileStatuses){//输出路径StringfilePath=file.getPath().toString();//查看是否是目录StringisDir=file.isDirectory()?"文件夹":"文件";//输出文件大小longfileSize=file.getLen();//输出文件的副本数量shortfileReplication=file.getReplication();//输出打印System.out.println(filePath+"\t"+isDir+"\t"+fileSize+"\t"+fileReplication+"\t");}}常见问题org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V解决方法下载hadoop.dll放入windows的C:\Windows\System32下错误描述Namenodeisinsafemode.Namenode处于安全模式解决方法：关闭安全模式关闭安全模式hadoopdfsadmin-safemodeleave进入安全模式hadoopdfsadmin-safemodeenter通过JavaAPI上传，与Hadoopshell上传的文件，副本系数不一样解释：JavaAPI上传，我们并没有指定副本系数，所以上传的副本数是hadoop默认的3Hadoopshell上传,我在hdfs-site.xml里设置过副本数，所以不会使用默认值

Java API 操作 HDFS 文件系统

2019-10-12 1576 0

作者：明非

共计发布文章395篇