怎样用Python批量爬取酷我音乐歌曲,针对这个问题,这篇文章详细介绍了相对应的分析和解答,希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。
酷我音乐歌曲爬取
https://www.kugou.com/
python 3.6
pycharm
requests
导入工具
import requests
import re
请求网页
headers = {
'authority': 'wwwapi.kugou.com',
'cookie': 'kg_mid=ac3836df72c523f46a85d8a5fd90fe59; kg_dfid=3ve7aQ2XyGmN0yE3uv3WcaHs; Hm_lvt_aedee6983d4cfc62f509129360d6bb3d=1600260110,1602312707; kg_dfid_collect=d41d8cd98f00b204e9800998ecf8427e; kg_mid_temp=ac3836df72c523f46a85d8a5fd90fe59; Hm_lpvt_aedee6983d4cfc62f509129360d6bb3d=1602312738',
'referer': 'https://www.kugou.com/song/',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
}
url = 'https://www.kugou.com/yy/rank/home/1-8888.html?from=rank'
response = requests.get(url=url, headers=headers)
解析网页数据
def func(url):
hashs = re.findall('"Hash":"(.*?)"', response.text, re.S)
album_ids = re.findall('"album_id":(.*?),"', response.text, re.S)
FileNames = re.findall('"FileName":"(.*?)"', response.text, re.S)
data = zip(hashs, album_ids, FileNames)
for i in data:
hash = i[0]
album_ids = i[1]
FileName = i[2].encode('utf-8').decode('unicode_escape')
# print(hash, album_ids, FileName)
download_url = 'https://wwwapi.kugou.com/yy/index.php'
params = {
'r': 'play/getdata',
'callback': 'jQuery19107150201841602037_1602314563329',
'hash': '{}'.format(hash),
'album_id': '{}'.format(album_ids),
'dfid': '3ve7aQ2XyGmN0yE3uv3WcaHs',
'mid': 'ac3836df72c523f46a85d8a5fd90fe59',
'platid': '4',
'_': '1602312793005',
}
保存数据
def download(url, title):
filename = 'C:\\Users\\Administrator\\Desktop\\新建文件夹\\' + title + '.mp3'
response = requests.get(url=url, headers=headers)
with open(filename, mode='wb') as f:
f.write(response.content)
print(title)
运行代码,效果如下图
关于怎样用Python批量爬取酷我音乐歌曲问题的解答就分享到这里了,希望以上内容可以对大家有一定的帮助,如果你还有很多疑惑没有解开,可以关注亿速云行业资讯频道了解更多相关知识。
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。
原文链接:https://my.oschina.net/u/4848094/blog/4745606