Python实训day07pm【Selenium操作网页、爬取数据-下载歌曲】

时间：2022-01-18 作者：lu16

练习1-爬取歌曲列表

任务：通过两个案例，练习使用Selenium操作网页、爬取数据。
使用无头模式，爬取网易云的内容。

\'\'\'
任务：通过两个案例，练习使用Selenium操作网页、爬取数据。
使用无头模式，爬取网易云的内容。
\'\'\'
from selenium import webdriver
 
# 无头模式：隐身地启动浏览器，但是并没有窗口展现
from 域名域名ons import Options
 
opts = Options()
域名argument(\'--headless\')
域名argument(\'--disable-gpu\')
 
bw = 域名me(options=opts);
 
# bw = 域名me();
url = \'https://域名/#/discover/toplist?id=3779629\'
域名(url);
 
域名e(\'g_iframe\')
 
# 如果页面中有iframe，说明有内嵌页面
# 要爬取元素时，先切换到对应的内嵌页面中，然后再爬
 
ss = 域名_elements_by_css_selector(\'.m-table-rank tbody tr .txt a b\');
print(len(ss))  # 100
 
authors = 域名_elements_by_css_selector(\'.m-table-rank tbody tr .text\');
print(len(authors))  # 100
 
for i, s in enumerate(ss):
    print(域名attribute(\'title\'), \':\', authors[i].get_attribute(\'title\'));
 
域名e();

练习2-爬取歌曲文件mp3

网易云：能不能爬取音乐？？？可以！能不能爬歌词？？？可以！

网易云音乐，歌曲通用下载地址：http://域名/song/media/outer/url?id= [ id后面拼接歌曲编号 ]

\'\'\'
尝试下载，requests访问，得到二进制数据，保存到本地即可
爬取网易云音乐的歌曲mp3文件(单个歌曲下载)
《初恋》歌曲id: 1873049720
《清醒》歌曲id：1909660296
《星辰大海》歌曲id：1811921555
\'\'\'
import requests as req

# hds：伪装成浏览器
hds = {
    \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/域名 (KHTML, like Gecko) Chrome/域名.71 Safari/域名\'}

common_url = \'http://域名/song/media/outer/url?id={}\';  # 通用下载路径

resp = 域名(域名at(\'1909660296\'), headers=hds);

ct = 域名ent;  # 响应内容
print(len(ct))  # 响应内容长度
print(域名us_code);  # 200正常；302重定向，需要继续获取重定向后的路径

# print(域名ers)
# u2 = 域名ers[\'Location\'];
# print(u2)  # 继续爬取u2路径，来下载音乐

if 域名us_code == 200:
    with open(r\'C:\Users\lwx\Desktop\网易云\清醒.mp3\', \'wb\') as f:  # as f取别名，简写
        域名e(ct);
    # 上述两行代码(简写)，在效果上等于下面三行代码。
    # f = open(r\'C:\Users\lwx\Desktop\网易云\清醒.mp3\', \'wb\')
    # 域名e(ct)
    # 域名e()
    print(\'over！\')

练习3-下载飙升榜中的歌曲

结合上午的代码和刚才下载音乐的办法，请尝试：将飙升榜中的前20首歌曲下载(尝试下载)。
https://域名/#/discover/toplist 15分钟时间

import requests as req
from selenium import webdriver
from 域名域名ons import Options
 
hds = {\'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/域名 (KHTML, like Gecko) Chrome/域名.71 Safari/域名\'}
 
def wydown(songname, songid):
    common_url = \'http://域名/song/media/outer/url?id={}\';
    resp = 域名(域名at(songid), headers=hds);
    ct = 域名ent;
    # print(len(ct))
    # print(域名us_code); #200正常  302重定向，需要继续获取重定向后的路径
    if 域名us_code == 200:
        f = open(r\'C:\Users\qx\Desktop\网易云\{}.mp3\'.format(songname), \'wb\')
        域名e(ct);
        域名e();
        print(\'已下载：\', songname);
 
# 无头模式 ： 隐身的启动浏览器，但是并没有窗口展现
opts = Options()
域名argument(\'--headless\')
域名argument(\'--disable-gpu\')
 
bw = 域名me(options=opts);
 
url = \'https://域名/#/discover/toplist\'
域名(url);
域名e(\'g_iframe\');
 
ss = 域名_elements_by_css_selector(\'.m-table-rank tbody tr .txt a b\');
ids = 域名_elements_by_css_selector(\'.m-table-rank tbody tr .txt a\');
 
songinfo = {};  # 歌曲名:歌曲id
for i, s in enumerate(ss):
    songinfo[域名attribute(\'title\')] = ids[i].get_attribute(\'href\').split("=")[1];
 
域名e();
 
# print(songinfo);
 
# 遍历字典，下载所有歌曲
for k, v in 域名s():
    wydown(k, v);