Python Requests 教程 · ZetCode 中文系列教程

# Python Requests 教程 > 原文： [http://zetcode.com/python/requests/](http://zetcode.com/python/requests/) 在本教程中，我们展示了如何使用 Python Requests 模块。我们获取数据，发布数据，流数据并连接到安全的网页。在示例中，我们使用在线服务，nginx 服务器，Python HTTP 服务器和 Flask 应用。 ZetCode 也有一个简洁的 [Python 教程](/lang/python/)。超文本传输协议（HTTP）是用于分布式协作超媒体信息系统的应用协议。 HTTP 是万维网数据通信的基础。 ## Python Requests Requests 是一个简单优雅的 Python HTTP 库。它提供了通过 HTTP 访问 Web 资源的方法。 Requests 是内置的 Python 模块。 ```py $ sudo service nginx start ``` 我们在本地主机上运行 nginx Web 服务器。我们的一些示例使用`nginx`服务器。 ## Python Requests 版本第一个程序打印请求库的版本。 `version.py` ```py #!/usr/bin/env python3 import requests print(requests.__version__) print(requests.__copyright__) ``` 该程序将打印请求的版本和版权。 ```py $ ./version.py 2.21.0 Copyright 2018 Kenneth Reitz ``` 这是示例的示例输出。 ## Python Requests 读取网页 `get()`方法发出 GET 请求；它获取由给定 URL 标识的文档。 `read_webpage.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://www.webcode.me") print(resp.text) ``` 该脚本获取`www.webcode.me`网页的内容。 ```py resp = req.get("http://www.webcode.me") ``` `get()`方法返回一个响应对象。 ```py print(resp.text) ``` `text`属性包含响应的内容，以 Unicode 表示。 ```py $ ./read_webpage.py <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My html page</title> </head> <body> Today is a beautiful day. We go swimming and fishing. Hello there. How are you? </body> </html> ``` 这是`read_webpage.py`脚本的输出。以下程序获取一个小型网页，并剥离其 HTML 标签。 `strip_tags.py` ```py #!/usr/bin/env python3 import requests as req import re resp = req.get("http://www.webcode.me") content = resp.text stripped = re.sub('<[^<]+?>', '', content) print(stripped) ``` 该脚本会剥离`www.webcode.me`网页的 HTML 标签。 ```py stripped = re.sub('<[^<]+?>', '', content) ``` 一个简单的正则表达式用于剥离 HTML 标记。 ## HTTP 请求 HTTP 请求是从客户端发送到浏览器的消息，以检索某些信息或采取某些措施。 `Request`的`request`方法创建一个新请求。请注意，`request`模块具有一些更高级的方法，例如`get()`，`post()`或`put()`，为我们节省了一些输入。 `create_request.py` ```py #!/usr/bin/env python3 import requests as req resp = req.request(method='GET', url="http://www.webcode.me") print(resp.text) ``` 该示例创建一个 GET 请求并将其发送到`http://www.webcode.me`。 ## Python Requests 获取状态 `Response`对象包含服务器对 HTTP 请求的响应。其`status_code`属性返回响应的 HTTP 状态代码，例如 200 或 404。 `get_status.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://www.webcode.me") print(resp.status_code) resp = req.get("http://www.webcode.me/news") print(resp.status_code) ``` 我们使用`get()`方法执行两个 HTTP 请求，并检查返回的状态。 ```py $ ./get_status.py 200 404 ``` 200 是成功 HTTP 请求的标准响应，而 404 则表明找不到所请求的资源。 ## Python Requests HEAD 方法 `head()`方法检索文档标题。标头由字段组成，包括日期，服务器，内容类型或上次修改时间。 `head_request.py` ```py #!/usr/bin/env python3 import requests as req resp = req.head("http://www.webcode.me") print("Server: " + resp.headers['server']) print("Last modified: " + resp.headers['last-modified']) print("Content type: " + resp.headers['content-type']) ``` 该示例打印服务器，`www.webcode.me`网页的上次修改时间和内容类型。 ```py $ ./head_request.py Server: nginx/1.6.2 Last modified: Sat, 20 Jul 2019 11:49:25 GMT Content type: text/html ``` 这是`head_request.py`程序的输出。 ## Python Requests GET 方法 `get()`方法向服务器发出 GET 请求。 GET 方法请求指定资源的表示形式。 `httpbin.org`是免费提供的 HTTP 请求&响应服务。 `mget.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("https://httpbin.org/get?name=Peter") print(resp.text) ``` 该脚本将具有值的变量发送到`httpbin.org`服务器。该变量直接在 URL 中指定。 ```py $ ./mget.py { "args": { "name": "Peter" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, ... } ``` 这是示例的输出。 `mget2.py` ```py #!/usr/bin/env python3 import requests as req payload = {'name': 'Peter', 'age': 23} resp = req.get("https://httpbin.org/get", params=payload) print(resp.url) print(resp.text) ``` `get()`方法采用`params`参数，我们可以在其中指定查询参数。 ```py payload = {'name': 'Peter', 'age': 23} ``` 数据在 Python 字典中发送。 ```py resp = req.get("https://httpbin.org/get", params=payload) ``` 我们将 GET 请求发送到`httpbin.org`站点，并传递`params`参数中指定的数据。 ```py print(resp.url) print(resp.text) ``` 我们将 URL 和响应内容打印到控制台。 ```py $ ./mget2.py http://httpbin.org/get?name=Peter&age=23 { "args": { "age": "23", "name": "Peter" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, ... } ``` 这是示例的输出。 ## Python Requests 重定向重定向是将一个 URL 转发到另一个 URL 的过程。 HTTP 响应状态代码 301“永久移动”用于永久 URL 重定向； 302 找到临时重定向。 `redirect.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("https://httpbin.org/redirect-to?url=/") print(resp.status_code) print(resp.history) print(resp.url) ``` 在示例中，我们向`https://httpbin.org/redirect-to`页面发出 GET 请求。该页面重定向到另一个页面；重定向响应存储在响应的`history`属性中。 ```py $ ./redirect.py 200 [<Response [302]>] https://httpbin.org/ ``` 对`https://httpbin.org/redirect-to`的 GET 请求被重定向到`https://httpbin.org` 302。在第二个示例中，我们不遵循重定向。 `redirect2.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("https://httpbin.org/redirect-to?url=/", allow_redirects=False) print(resp.status_code) print(resp.url) ``` `allow_redirects`参数指定是否遵循重定向。默认情况下，重定向之后。 ```py $ ./redirect2.py 302 https://httpbin.org/redirect-to?url=/ ``` 这是示例的输出。 ## 用 nginx 重定向在下一个示例中，我们显示如何在 nginx 服务器中设置页面重定向。 ```py location = /oldpage.html { return 301 /newpage.html; } ``` 将这些行添加到位于 Debian 上`/etc/nginx/sites-available/default`的 Nginx 配置文件中。 ```py $ sudo service nginx restart ``` 编辑完文件后，我们必须重新启动 nginx 才能应用更改。 `oldpage.html` ```py <!DOCTYPE html> <html> <head> <title>Old page</title> </head> <body> This is old page </body> </html> ``` 这是位于 nginx 文档根目录中的`oldpage.html`文件。 `newpage.html` ```py <!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> This is a new page </body> </html> ``` 这是`newpage.html`。 `redirect3.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://localhost/oldpage.html") print(resp.status_code) print(resp.history) print(resp.url) print(resp.text) ``` 该脚本访问旧页面并遵循重定向。如前所述，默认情况下，请求遵循重定向。 ```py $ ./redirect3.py 200 (<Response [301]>,) http://localhost/files/newpage.html <!DOCTYPE html> <html> <head> <title>New page</title> </head> <body> This is a new page </body> </html> ``` 这是示例的输出。 ```py $ sudo tail -2 /var/log/nginx/access.log 127.0.0.1 - - [21/Jul/2019:07:41:27 -0400] "GET /oldpage.html HTTP/1.1" 301 184 "-" "python-requests/2.4.3 CPython/3.4.2 Linux/3.16.0-4-amd64" 127.0.0.1 - - [21/Jul/2019:07:41:27 -0400] "GET /newpage.html HTTP/1.1" 200 109 "-" "python-requests/2.4.3 CPython/3.4.2 Linux/3.16.0-4-amd64" ``` 从`access.log`文件中可以看到，该请求已重定向到新的文件名。通信包含两个 GET 请求。 ## 用户代理在本节中，我们指定用户代理的名称。我们创建自己的 Python HTTP 服务器。 `http_server.py` ```py #!/usr/bin/env python3 from http.server import BaseHTTPRequestHandler, HTTPServer class MyHandler(BaseHTTPRequestHandler): def do_GET(self): message = "Hello there" self.send_response(200) if self.path == '/agent': message = self.headers['user-agent'] self.send_header('Content-type', 'text/html') self.end_headers() self.wfile.write(bytes(message, "utf8")) return def main(): print('starting server on port 8081...') server_address = ('127.0.0.1', 8081) httpd = HTTPServer(server_address, MyHandler) httpd.serve_forever() main() ``` 我们有一个简单的 Python HTTP 服务器。 ```py if self.path == '/agent': message = self.headers['user-agent'] ``` 如果路径包含`'/agent'`，则返回指定的用户代理。 `user_agent.py` ```py #!/usr/bin/env python3 import requests as req headers = {'user-agent': 'Python script'} resp = req.get("http://localhost:8081/agent", headers=headers) print(resp.text) ``` 该脚本向我们的 Python HTTP 服务器创建一个简单的 GET 请求。要向请求添加 HTTP 标头，我们将字典传递给`headers`参数。 ```py headers = {'user-agent': 'Python script'} ``` 标头值放置在 Python 字典中。 ```py resp = req.get("http://localhost:8081/agent", headers=headers) ``` 这些值将传递到`headers`参数。 ```py $ simple_server.py starting server on port 8081... ``` 首先，我们启动服务器。 ```py $ ./user_agent.py Python script ``` 然后我们运行脚本。服务器使用我们随请求发送的代理名称进行了响应。 ## Python Requests POST 值 `post`方法在给定的 URL 上调度 POST 请求，为填写的表单内容提供键/值对。 `post_value.py` ```py #!/usr/bin/env python3 import requests as req data = {'name': 'Peter'} resp = req.post("https://httpbin.org/post", data) print(resp.text) ``` 脚本使用具有`Peter`值的`name`键发送请求。 POST 请求通过`post`方法发出。 ```py $ ./post_value.py { "args": {}, "data": "", "files": {}, "form": { "name": "Peter" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "10", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "json": null, ... } ``` 这是`post_value.py`脚本的输出。 ## Python Requests 上传图像在以下示例中，我们将上传图片。我们使用 Flask 创建一个 Web 应用。 `app.py` ```py #!/usr/bin/env python3 import os from flask import Flask, request app = Flask(__name__) @app.route("/") def home(): return 'This is home page' @app.route("/upload", methods=['POST']) def handleFileUpload(): msg = 'failed to upload image' if 'image' in request.files: photo = request.files['image'] if photo.filename != '': photo.save(os.path.join('.', photo.filename)) msg = 'image uploaded successfully' return msg if __name__ == '__main__': app.run() ``` 这是具有两个端点的简单应用。 `/upload`端点检查是否有某些图像并将其保存到当前目录。 `upload_file.py` ```py #!/usr/bin/env python3 import requests as req url = 'http://localhost:5000/upload' with open('sid.jpg', 'rb') as f: files = {'image': f} r = req.post(url, files=files) print(r.text) ``` 我们将图像发送到 Flask 应用。该文件在`post()`方法的`files`属性中指定。 ## JSON 格式 JSON （JavaScript 对象表示法）是一种轻量级的数据交换格式。人类很容易读写，机器也很容易解析和生成。 JSON 数据是键/值对的集合；在 Python 中，它是通过字典实现的。 ### 读取 JSON 在第一个示例中，我们从 PHP 脚本读取 JSON 数据。 `send_json.php` ```py <?php $data = [ 'name' => 'Jane', 'age' => 17 ]; header('Content-Type: application/json'); echo json_encode($data); ``` PHP 脚本发送 JSON 数据。它使用`json_encode()`函数完成该工作。 `read_json.py` ```py #!/usr/bin/env python3 import requests as req resp = req.get("http://localhost/send_json.php") print(resp.json()) ``` `read_json.py`读取 PHP 脚本发送的 JSON 数据。 ```py print(resp.json()) ``` `json()`方法返回响应的 json 编码内容（如果有）。 ```py $ ./read_json.py {'age': 17, 'name': 'Jane'} ``` 这是示例的输出。 ### 发送 JSON 接下来，我们将 JSON 数据从 Python 脚本发送到 PHP 脚本。 `parse_json.php` ```py <?php $data = file_get_contents("php://input"); $json = json_decode($data , true); foreach ($json as $key => $value) { if (!is_array($value)) { echo "The $key is $value\n"; } else { foreach ($value as $key => $val) { echo "The $key is $value\n"; } } } ``` 该 PHP 脚本读取 JSON 数据，并发送带有已解析值的消息。 `send_json.py` ```py #!/usr/bin/env python3 import requests as req data = {'name': 'Jane', 'age': 17} resp = req.post("http://localhost/parse_json.php", json=data) print(resp.text) ``` 该脚本将 JSON 数据发送到 PHP 应用并读取其响应。 ```py data = {'name': 'Jane', 'age': 17} ``` 这是要发送的数据。 ```py resp = req.post("http://localhost/parse_json.php", json=data) ``` 包含 JSON 数据的字典将传递给`json`参数。 ```py $ ./send_json.py The name is Jane The age is 17 ``` 这是示例输出。 ## 从字典中检索定义在以下示例中，我们在 [www.dictionary.com](http://www.dictionary.com) 上找到术语的定义。要解析 HTML，我们使用`lxml`模块。 ```py $ pip install lxml ``` 我们使用`pip`工具安装`lxml`模块。 `get_term.py` ```py #!/usr/bin/env python3 import requests as req from lxml import html import textwrap term = "dog" resp = req.get("http://www.dictionary.com/browse/" + term) root = html.fromstring(resp.content) for sel in root.xpath("//span[contains(@class, 'one-click-content')]"): if sel.text: s = sel.text.strip() if (len(s) > 3): print(textwrap.fill(s, width=50)) ``` 在此脚本中，我们在`www.dictionary.com`上找到了术语狗的定义。 `lxml`模块用于解析 HTML 代码。 > **注意**：包含定义的标签可能会在一夜之间发生变化。在这种情况下，我们需要调整脚本。 ```py from lxml import html ``` `lxml`模块可用于解析 HTML。 ```py import textwrap ``` `textwrap`模块用于将文本包装到特定宽度。 ```py resp = req.get("http://www.dictionary.com/browse/" + term) ``` 为了执行搜索，我们在 URL 的末尾附加了该词。 ```py root = html.fromstring(resp.content) ``` 我们需要使用`resp.content`而不是`resp.text`，因为`html.fromstring()`隐式地希望字节作为输入。（`resp.content`以字节为单位返回内容，而`resp.text`以 Unicode 文本形式返回。 ```py for sel in root.xpath("//span[contains(@class, 'one-click-content')]"): if sel.text: s = sel.text.strip() if (len(s) > 3): print(textwrap.fill(s, width=50)) ``` 我们解析内容。主要定义位于`span`标签内部，该标签具有`one-click-content`属性。我们通过消除多余的空白和杂散字符来改善格式。文字宽度最大为 50 个字符。请注意，此类解析可能会更改。 ```py $ ./get_term.py a domesticated canid, any carnivore of the dog family Canidae, having prominent canine teeth and, in the wild state, a long and slender muzzle, a deep-chested muscular body, a bushy tail, and large, erect ears. ... ``` 这是定义的部分列表。 ## Python Requests 流请求流正在传输音频和/或视频数据的连续流，同时正在使用较早的部分。 `Requests.iter_lines()`遍历响应数据，一次一行。在请求上设置`stream=True`可以避免立即将内容读取到内存中以获得较大响应。 `streaming.py` ```py #!/usr/bin/env python3 import requests as req url = "https://docs.oracle.com/javase/specs/jls/se8/jls8.pdf" local_filename = url.split('/')[-1] r = req.get(url, stream=True) with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=1024): f.write(chunk) ``` 该示例流式传输 PDF 文件并将其写入磁盘。 ```py r = req.get(url, stream=True) ``` 在发出请求时将`stream`设置为`True`，除非我们消耗掉所有数据或调用`Response.close()`，否则请求无法释放回池的连接。 ```py with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=1024): f.write(chunk) ``` 我们按 1 KB 的块读取资源，并将其写入本地文件。 ## Python Requests 凭证 `auth`参数提供基本的 HTTP 认证；它使用一个元组的名称和密码来用于领域。安全领域是一种用于保护 Web 应用资源的机制。 ```py $ sudo apt-get install apache2-utils $ sudo htpasswd -c /etc/nginx/.htpasswd user7 New password: Re-type new password: Adding password for user user7 ``` 我们使用`htpasswd`工具创建用于基本 HTTP 认证的用户名和密码。 ```py location /secure { auth_basic "Restricted Area"; auth_basic_user_file /etc/nginx/.htpasswd; } ``` 在 nginx `/etc/nginx/sites-available/default`配置文件中，我们创建一个安全页面。领域的名称是`"Restricted Area"`。 `index.html` ```py <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> This is a secure page. </body> </html> ``` 在`/usr/share/nginx/html/secure`目录中，我们有这个 HTML 文件。 `credentials.py` ```py #!/usr/bin/env python3 import requests as req user = 'user7' passwd = '7user' resp = req.get("http://localhost/secure/", auth=(user, passwd)) print(resp.text) ``` 该脚本连接到安全网页；它提供访问该页面所需的用户名和密码。 ```py $ ./credentials.py <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> This is a secure page. </body> </html> ``` 使用正确的凭据，`credentials.py`脚本返回受保护的页面。在本教程中，我们使用了 Python Requests 模块。您可能对以下相关教程感兴趣： [Python 列表推导式](/articles/pythonlistcomprehensions/)， [Python SimpleJson 教程](/python/simplejson/)， [Python FTP 教程](/python/ftp/)， [OpenPyXL 教程](/articles/openpyxl/)，[ [Python CSV 教程](/python/csv/)和 [Python 教程](/lang/python/)。列出[所有 Python 教程](/all/#python)。