爬虫目标博主在假期遇到有家长,拿着老师的布置的作业来打印,结果是如下这样的电子书,还不能下载,所以打算编写爬虫进行下载,发现有书籍的链接全都通过Js加密了,这不能忍,这么能阻止小学生做作业呢。网站:https://mp.zhizhuma.com/book/shelf.htm?id=4872所用模块:#-*-coding:utf-8-*-importrequestsimporttimeimportreimportrandomimportbase64importjsonimporthashlibfromCrypto.CipherimportAESfromosimportmakedirs爬虫结构#获取数据每一页的链接的Jsondefget_encryptedData(ebookId):pass#获取真正的连接,有时效defget_auth_key(data,differenceDate):#下载保存连接defdownload_and_save(datadict,differenceDate):if__name__=='__main__':#书籍的分享链接share_url='https://mp.zhizhuma.com/book/sample2.htm?id=52753&shelfId=4872'ebookId=re.search(r'id=(\d+)',share_url).group(1)#获取书籍的URLencryptedData=get_encryptedData(ebookId)#获取时间戳参数differenceDate=encryptedData.get('timestamp')#创建文件夹makedirs(str(differenceDate))fordatainencryptedData.get('data'):#获取加密的连接pageNo_url=get_auth_key(data=data,differenceDate=differenceDate)print(pageNo_url)#下载书籍download_and_save(pageNo_url,differenceDate=differenceDate)get_encryptedData()通过Js分析,发现这段Js的解密,我们使用Python来实现,逻辑并不复杂。defget_encryptedData(ebookId):url='https://biz.zhizhuma.com/ebookpageservices/queryAllPageByEbookId.do'data={"ebookId":ebookId,"_timestamp":"1586101527","_nonce":"24430072-41ad-48cb-9c7f-880f990c0886","_sign":"975F1339ED050BB789CD51D66E40DD6B",}j=requests.post(url=url,data=data).json().get('encryptedData')#AES解密密钥Js寻找cipher=AES.new("Suj4XDDt3jPsH9Jj".encode(),AES.MODE_ECB)raw_data=cipher.decrypt(base64.decodebytes(bytes(j,encoding='utf8'))).rstrip(b'\x0f').decode("utf-8")[:-1]j_data=json.loads(raw_data)returnj_dataget_auth_key()拼接图片链接后的参数,不带这个参数,获取不到图片auth_key参数的生成,也是模仿Js的来写就好。#获取真正的连接,有时效defget_auth_key(data,differenceDate):#页数pageNo=data.get('pageNo')#解密构造参数imgurl=data.get('imgurl').split('https://cdnyuntisyspro.bookln.cn')[1]uid="0"rand=str(random.random())timestamp=str(int(time.time())-int((int(time.time()*1000)-differenceDate)/1000)+15)sstring=imgurl+"-"+timestamp+"-"+rand+"-"+uid+"-69731cbade6a64b58d60"md5=hashlib.md5()md5.update(sstring.encode())md5hash=md5.hexdigest()authKey='auth_key='+timestamp+"-"+rand+"-"+uid+"-"+md5hashurl="https://cdnyuntisyspro.bookln.cn"+imgurl+"?"+authKeyreturn{"pageNo":pageNo,"url":url}download_and_save()#下载保存defdownload_and_save(datadict,differenceDate):c=requests.get(url=datadict.get('url')).contentwithopen(str(differenceDate)+"/"+str(datadict.get('pageNo'))+".png","wb")asfile:file.write(c)运行可以很容易扩展成多线程。大数据男孩原创,仅用于交流学习之用
未分类