解析器 · Python爬虫

| 解析器 | 使用方法 | 优势 | 劣势 | | --- | --- | --- | --- | | Python标准库 | BeautifulSoup(markup, "html.parser") | 1. Python的内置标准库；<br/>2. 执行速度适中；<br/>3. 文档容错能⼒强。 | (Python 2.7.3 or 3.2.2)前的版本中文文档容错能力差。 | | lxml HTML 解析器 | BeautifulSoup(markup, "lxml") | 1. 速度快；<br/>2. 文档容错能力强。 | 需要安装C语⾔库。 | 使用案例： ```python """ @Date 2021/3/18 """ from bs4 import BeautifulSoup html = """ <li class="span3"> <div class="thumbnail" style="text-align: center;"> <div class="img_single"> <a href="https://www.buxiuse.com/topic/1976452" class="link"> <img class="height_min" title="试试吧" alt="试试吧" onerror="img_error(this);" \ src="https://tva1.sinaimg.cn/bmiddle/0080xEK2gy1gntyyf21btj30u0140dje.jpg" referrerpolicy="no-referrer"> </a> </div> <hr> <div class="bottombar"> <span class="fl p5"> <a href="https://www.buxiuse.com/topic/1976452" class="link"> 试试吧 </a> </span> <span class="fr p5 meta"> <span class="mstar-empty star" title="加入收藏" topic-image-id="1864845" topic-id="1976452"></span> <span class="starcount" topic-image-id="1864845">0</span> </span> </div> </div> </li> """ soup = BeautifulSoup(html, 'html.parser') # <class 'bs4.BeautifulSoup'> print(type(soup)) # 按照原样输出html print(soup) # 代码格式化后再输出 print(soup.prettify()) ```