selenium+driver · Python爬虫

使用selenium + driver 可以抓取网页动态数据。 **selenium：** selenium是一个web的自动化测试工具，最初是为网站自动化测试而开发的，selenium可以直接运行在浏览器上，它支持所有主流的浏览器，可以接收指令，让浏览器自动加载页面，获取需要的数据，甚至页面截屏。安装：`pip install selenium` 官方文档：[http://selenium-python.readthedocs.io/api.html](http://selenium-python.readthedocs.io/api.html) **driver：** driver是指浏览器的驱动，不同的浏览器有不同的驱动，使用驱动才能使Python驱动浏览器。 驱动器下载地址： ChromeDriver：https://sites.google.com/a/chromium.org/chromedriver/downloads ChromeDriver（淘宝镜像）：https://npm.taobao.org/mirrors/chromedriver/ FirefoxDriver：https://github.com/mozilla/geckodriver/releases EdgeDriver：https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ SafariDriver：https://webkit.org/blog/6900/webdriver-support-in-safari-10/ 安装：下载完成后解压到指定的英文目录即可。 **Phantomis：** Phantomjs是-个基于webkit的无界面浏览器，它会把网站加载到内存并执行页面上的JavaScript。 **1. Phantomjs案例** ```python """ @Date 2021/3/18 """ from selenium import webdriver # 1. 加载驱动 # 或者将驱动放到Python的Scripts目录下，则可以写成 webdriver.Chrome() driver = webdriver.Chrome("D:/Drivers/ChromeDriver/chromedriver_win32/chromedriver.exe") # 2. 打开浏览器，get就会打开浏览器 driver.get("https://www.baidu.com") # 3. 我们对当前网页截屏 driver.save_screenshot("E:/python/driver/baidu.png") # 定位和操作 driver.find_element_by_id("kw").send_keys("长城") driver.find_element_by_id("su").click() # 获取网页源码 page_source = driver.page_source print(page_source) cookies = driver.get_cookies() print(cookies) current_url = driver.current_url print(current_url) # 4. 退出浏览器 driver.quit() ```