💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
## Playwright 教程 **last update: 2022-06-06 10:23:11** ---- [TOC=3,8] ---- ### 背景 https://chromedriver.chromium.org/downloads [Pyppeteer:比selenium更高效的爬虫界的新神器](https://baijiahao.baidu.com/s?id=1660869583480840819&wfr=spider&for=pc) > 然而,Selenium 以不可靠而著称。Selenium 测试通常是不稳定的... ---- Pyppeteer 与 Playwright 的历史: [开源爬虫神器,Playwright vs Puppeteer 对比,你应该选择哪个?_puppeteer safari_9点0频道的博客-CSDN博客](https://blog.csdn.net/limingblogs/article/details/122425455) https://pypi.org/project/pyppeteer/ https://github.com/miyakogi/pyppeteer > Pyppeteer 已经转移到 pyppeteer/pyppeteer https://github.com/pyppeteer/pyppeteer https://pyppeteer.github.io/pyppeteer/ > 注意:这个 repo 是无人维护的,很长时间以来一直没有进行小的改动。请考虑将 **playwright-python** 作为替代方案。 > Unofficial Python port of [GoogleChrome/puppeteer](https://github.com/GoogleChrome/puppeteer) JavaScript (headless) chrome/chromium browser automation library. **puppeteer** JavaScript(无头)chrome/chromium 浏览器自动化库的**非官方 Python 端口**。 https://github.com/microsoft/playwright-python https://playwright.dev/python/ > Playwright 测试和自动化库的 Python 版本。 https://github.com/microsoft/playwright https://playwright.dev > **Playwright** 是一个用于 Web 测试和自动化的框架。它允许使用单个 API 测试 Chromium、Firefox 和 WebKit。 Playwright 旨在实现常绿、功能强大、可靠且快速的跨浏览器 Web 自动化。(Microsoft 微软开发) ---- https://github.com/GoogleChrome/puppeteer https://github.com/puppeteer/puppeteer https://pptr.dev/ > **Puppeteer** 是一个 Node.js 库,它提供了一个高级 API 来通过 DevTools Protocol 控制 Chrome/Chromium。 Puppeteer 默认以无头模式运行,但可以配置为以完整(“有头”)Chrome/Chromium 模式运行。(Chrome DevTools 团队开发) ---- ### 安装 [Installation | Playwright Python](https://playwright.dev/python/docs/intro) ---- ### 简单示例 ---- ### 拦截请求 ---- ### 拦截响应 ---- ### Locator 定位器**操作**会自动等待元素直至可见(直到匹配的元素附加到 DOM)。 https://playwright.dev/python/docs/api/class-page#page-wait-for-timeout https://playwright.dev/python/docs/api/class-locator#locator-wait-for https://playwright.dev/python/docs/actionability https://playwright.dev/python/docs/api/class-locatorassertions#locator-assertions-not-to-be-attached ```python from playwright.sync_api import sync_playwright, expect, TimeoutError as PlaywrightTimeoutError # not wait page.get_by_test_id("directions") # wait # Locator.click(timeout=3_000) page.get_by_test_id("directions").click(timeout=3_000) # wait # Locator.inner_text(timeout=3_000) page.get_by_test_id("directions").inner_text(timeout=3_000) page.get_by_test_id("directions").is_visible(timeout=3_000) page.get_by_test_id("directions").wait_for(timeout=3_000) # wait expect(page.locator(".class")).to_be_visible(timeout=3_000) ``` ### a https://zhuanlan.zhihu.com/p/623669043 目前市面上有不少类似于本案的产品,通过抓取各种内容平台数据,进行数据整理、加工处理后推出相关产品,**长久做下去的必然是和官方有合作的**,单单依靠爬虫进行数据采集的话,只要被平台方发现,告上法庭,必然会构成不正当竞争的,搞不好还有破坏计算机系统罪,广大爬虫工程师们如果公司正在研发此类产品,**一定要确认是否与官方有合作**,多多关注业务的合法合规性,避免走上违法犯罪的道路。