urllib.robotparser — Parser for robots.txt · Python3.7.3官方文档简体中文

### 导航 - [索引](../genindex.xhtml "总目录") - [模块](../py-modindex.xhtml "Python 模块索引") | - [下一页](http.xhtml "http --- HTTP 模块") | - [上一页](urllib.error.xhtml "urllib.error --- Exception classes raised by urllib.request") | - ![](https://box.kancloud.cn/a721fc7ec672275e257bbbfde49a4d4e_16x16.png) - [Python](https://www.python.org/) » - zh\_CN 3.7.3 [文档](../index.xhtml) » - [Python 标准库](index.xhtml) » - [互联网协议和支持](internet.xhtml) » - $('.inline-search').show(0); | # [`urllib.robotparser`](#module-urllib.robotparser "urllib.robotparser: Load a robots.txt file and answer questions about fetchability of other URLs.") --- Parser for robots.txt **Source code:** [Lib/urllib/robotparser.py](https://github.com/python/cpython/tree/3.7/Lib/urllib/robotparser.py) \[https://github.com/python/cpython/tree/3.7/Lib/urllib/robotparser.py\] - - - - - - This module provides a single class, [`RobotFileParser`](#urllib.robotparser.RobotFileParser "urllib.robotparser.RobotFileParser"), which answers questions about whether or not a particular user agent can fetch a URL on the Web site that published the `robots.txt` file. For more details on the structure of `robots.txt` files, see <http://www.robotstxt.org/orig.html>. *class* `urllib.robotparser.``RobotFileParser`(*url=''*)This class provides methods to read, parse and answer questions about the `robots.txt` file at *url*. `set_url`(*url*)Sets the URL referring to a `robots.txt` file. `read`()Reads the `robots.txt` URL and feeds it to the parser. `parse`(*lines*)Parses the lines argument. `can_fetch`(*useragent*, *url*)Returns `True` if the *useragent* is allowed to fetch the *url*according to the rules contained in the parsed `robots.txt`file. `mtime`()Returns the time the `robots.txt` file was last fetched. This is useful for long-running web spiders that need to check for new `robots.txt` files periodically. `modified`()Sets the time the `robots.txt` file was last fetched to the current time. `crawl_delay`(*useragent*)Returns the value of the `Crawl-delay` parameter from `robots.txt`for the *useragent* in question. If there is no such parameter or it doesn't apply to the *useragent* specified or the `robots.txt` entry for this parameter has invalid syntax, return `None`. 3\.6 新版功能. `request_rate`(*useragent*)Returns the contents of the `Request-rate` parameter from `robots.txt` as a [named tuple](../glossary.xhtml#term-named-tuple)`RequestRate(requests, seconds)`. If there is no such parameter or it doesn't apply to the *useragent*specified or the `robots.txt` entry for this parameter has invalid syntax, return `None`. 3\.6 新版功能. The following example demonstrates basic use of the [`RobotFileParser`](#urllib.robotparser.RobotFileParser "urllib.robotparser.RobotFileParser")class: ``` >>> import urllib.robotparser >>> rp = urllib.robotparser.RobotFileParser() >>> rp.set_url("http://www.musi-cal.com/robots.txt") >>> rp.read() >>> rrate = rp.request_rate("*") >>> rrate.requests 3 >>> rrate.seconds 20 >>> rp.crawl_delay("*") 6 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") False >>> rp.can_fetch("*", "http://www.musi-cal.com/") True ``` ### 导航 - [索引](../genindex.xhtml "总目录") - [模块](../py-modindex.xhtml "Python 模块索引") | - [下一页](http.xhtml "http --- HTTP 模块") | - [上一页](urllib.error.xhtml "urllib.error --- Exception classes raised by urllib.request") | - ![](https://box.kancloud.cn/a721fc7ec672275e257bbbfde49a4d4e_16x16.png) - [Python](https://www.python.org/) » - zh\_CN 3.7.3 [文档](../index.xhtml) » - [Python 标准库](index.xhtml) » - [互联网协议和支持](internet.xhtml) » - $('.inline-search').show(0); | © [版权所有](../copyright.xhtml) 2001-2019, Python Software Foundation. Python 软件基金会是一个非盈利组织。 [请捐助。](https://www.python.org/psf/donations/) 最后更新于 5月 21, 2019. [发现了问题](../bugs.xhtml)？使用[Sphinx](http://sphinx.pocoo.org/)1.8.4 创建。