Playwright 教程 · php笔记

## Playwright 教程 **last update: 2022-06-06 10:23:11** ---- [TOC=3,8] ---- ### 背景 https://chromedriver.chromium.org/downloads [Pyppeteer：比selenium更高效的爬虫界的新神器](https://baijiahao.baidu.com/s?id=1660869583480840819&wfr=spider&for=pc) > 然而，Selenium 以不可靠而著称。Selenium 测试通常是不稳定的... ---- Pyppeteer 与 Playwright 的历史： [开源爬虫神器，Playwright vs Puppeteer 对比，你应该选择哪个？_puppeteer safari_9点0频道的博客-CSDN博客](https://blog.csdn.net/limingblogs/article/details/122425455) https://pypi.org/project/pyppeteer/ https://github.com/miyakogi/pyppeteer > Pyppeteer 已经转移到 pyppeteer/pyppeteer https://github.com/pyppeteer/pyppeteer https://pyppeteer.github.io/pyppeteer/ > 注意：这个 repo 是无人维护的，很长时间以来一直没有进行小的改动。请考虑将 **playwright-python** 作为替代方案。 > Unofficial Python port of [GoogleChrome/puppeteer](https://github.com/GoogleChrome/puppeteer) JavaScript (headless) chrome/chromium browser automation library. **puppeteer** JavaScript（无头）chrome/chromium 浏览器自动化库的**非官方 Python 端口**。 https://github.com/microsoft/playwright-python https://playwright.dev/python/ > Playwright 测试和自动化库的 Python 版本。 https://github.com/microsoft/playwright https://playwright.dev > **Playwright** 是一个用于 Web 测试和自动化的框架。它允许使用单个 API 测试 Chromium、Firefox 和 WebKit。 Playwright 旨在实现常绿、功能强大、可靠且快速的跨浏览器 Web 自动化。（Microsoft 微软开发） ---- https://github.com/GoogleChrome/puppeteer https://github.com/puppeteer/puppeteer https://pptr.dev/ > **Puppeteer** 是一个 Node.js 库，它提供了一个高级 API 来通过 DevTools Protocol 控制 Chrome/Chromium。 Puppeteer 默认以无头模式运行，但可以配置为以完整（“有头”）Chrome/Chromium 模式运行。（Chrome DevTools 团队开发） ---- ### 安装 [Installation | Playwright Python](https://playwright.dev/python/docs/intro) ---- ### 简单示例 ---- ### 拦截请求 ---- ### 拦截响应 ---- ### Locator 定位器**操作**会自动等待元素直至可见（直到匹配的元素附加到 DOM）。 https://playwright.dev/python/docs/api/class-page#page-wait-for-timeout https://playwright.dev/python/docs/api/class-locator#locator-wait-for https://playwright.dev/python/docs/actionability https://playwright.dev/python/docs/api/class-locatorassertions#locator-assertions-not-to-be-attached ```python from playwright.sync_api import sync_playwright, expect, TimeoutError as PlaywrightTimeoutError # not wait page.get_by_test_id("directions") # wait # Locator.click(timeout=3_000) page.get_by_test_id("directions").click(timeout=3_000) # wait # Locator.inner_text(timeout=3_000) page.get_by_test_id("directions").inner_text(timeout=3_000) page.get_by_test_id("directions").is_visible(timeout=3_000) page.get_by_test_id("directions").wait_for(timeout=3_000) # wait expect(page.locator(".class")).to_be_visible(timeout=3_000) ``` ### a https://zhuanlan.zhihu.com/p/623669043 目前市面上有不少类似于本案的产品，通过抓取各种内容平台数据，进行数据整理、加工处理后推出相关产品，**长久做下去的必然是和官方有合作的**，单单依靠爬虫进行数据采集的话，只要被平台方发现，告上法庭，必然会构成不正当竞争的，搞不好还有破坏计算机系统罪，广大爬虫工程师们如果公司正在研发此类产品，**一定要确认是否与官方有合作**，多多关注业务的合法合规性，避免走上违法犯罪的道路。