requests网络请求模块 · python笔记

[TOC] Requests模块是一个用于网络访问的模块 ## 一、导入下载完成后，导入模块很简单，代码如下： ```python import requests ``` ## 二、请求url 这里我们列出最常见的发送get或者post请求的语法。 ### 1.发送无参数的get请求： ```python r=requests.get("http://pythontab.com/justTest") ``` 现在，我们得到了一个响应对象r，我们可以利用这个对象得到我们想要的任何信息。上面的例子中，get请求没有任何参数，那如果请求需要参数怎么办呢？ ### 2.发送带参数的get请求 ```python payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get("http://pythontab.com/justTest", params=payload) ``` 以上得知，我们的get参数是以params关键字参数传递的。我们可以打印请求的具体url来看看到底对不对： ```python >>>print r.url http://pythontab.com/justTest?key2=value2&key1=value1 ``` 可以看到确实访问了正确的url。还可以传递一个list给一个请求参数： ```python >>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']} >>> r = requests.get("http://pythontab.com/justTest", params=payload) >>> print r.url http://pythontab.com/justTest?key1=value1&key2=value2&key2=value3 ``` 以上就是get请求的基本形式。 ## 3.发送post请求 ```python r = requests.post("http://pythontab.com/postTest", data = {"key":"value"}) ``` 以上得知，post请求参数是以data关键字参数来传递的。现在的data参数传递的是字典，我们也可以传递一个json格式的数据，如下： ```python >>> import json >>> import requests >>> payload = {"key":"value"} >>> r = requests.post("http://pythontab.com/postTest", data = json.dumps(payload)) ``` 由于发送json格式数据太常见了，所以在Requests模块的高版本中，又加入了json这个关键字参数，可以直接发送json数据给post请求而不用再使用json模块了，见下： ```python >>> payload = {"key":"value"} >>> r = requests.post("http://pythontab.com/postTest", json=payload) ``` 如果我们想post一个文件怎么办呢？这个时候就需要用到files参数了： ```python >>> url = 'http://pythontab.com/postTest' >>> files = {'file': open('report.xls', 'rb')} >>> r = requests.post(url, files=files) >>> r.text ``` 我们还可以在post文件时指定文件名等额外的信息： ```python >>> url = 'http://pythontab.com/postTest' >>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})} >>> r = requests.post(url, files=files) ``` tips：强烈建议使用二进制模式打开文件，因为如果以文本文件格式打开时，可能会因为“Content-Length”这个header而出错。可以看到，使用Requests发送请求简单吧！ ## 三、获取返回信息下面我们来看下发送请求后如何获取返回信息。我们继续使用最上面的例子： ```python >>> import requests >>> r=requests.get('http://pythontab.com/justTest') >>> r.text ``` r.text是以什么编码格式输出的呢？ ```python >>> r.encoding 'utf-8' ``` 原来是以utf-8格式输出的。那如果我想改一下r.text的输出格式呢？ ```python >>> r.encoding = 'ISO-8859-1' ``` 这样就把输出格式改为“ISO-8859-1”了。还有一个输出语句，叫r.content，那么这个和r.text有什么区别呢？r.content返回的是字节流，如果我们请求一个图片地址并且要保存图片的话，就可以用到，这里举个代码片段如下： ```python def saveImage( imgUrl,imgName ="default.jpg" ): r = requests.get(imgUrl, stream=True) image = r.content destDir="D:\" print("保存图片"+destDir+imgName+"\n") try: with open(destDir+imgName ,"wb") as jpg: jpg.write(image) return except IOError: print("IO Error") return finally: jpg.close ``` 刚才介绍的r.text返回的是字符串，那么，如果请求对应的响应是一个json，那我可不可以直接拿到json格式的数据呢？r.json()就是为这个准备的。我们还可以拿到服务器返回的原始数据，使用r.raw.read()就可以了。不过，如果你确实要拿到原始返回数据的话，记得在请求时加上“stream=True”的选项，如： ```python r = requests.get('https://api.github.com/events', stream=True)。 ``` 我们也可以得到响应状态码： ```python >>> r = requests.get('http://pythontab.com/justTest') >>> r.status_code 200 ``` 也可以用requests.codes.ok来指代200这个返回值： ```python >>> r.status_code == requests.codes.ok True ``` ## 四、关于headers 我们可以打印出响应头： ```python >>> r= requests.get("http://pythontab.com/justTest") >>> r.headers ``` ｀r.headers｀返回的是一个字典，例如： ```python { 'content-encoding': 'gzip', 'transfer-encoding': 'chunked', 'connection': 'close', 'server': 'nginx/1.0.4', 'x-runtime': '147ms', 'etag': '"e1ca502697e5c9317743dc078f67693a"', 'content-type': 'application/json' } ``` 我们可以使用如下方法来取得部分响应头以做判断： ```python r.headers['Content-Type'] ``` 或者 ```python r.headers.get('Content-Type') ``` 如果我们想获得请求头（也就是我们向服务器发送的头信息）该怎么办呢？可以使用r.request.headers直接获得。同时，我们在请求数据时也可以加上自定义的headers（通过headers关键字参数传递）： ```python >>> headers = {'user-agent': 'myagent'} >>> r= requests.get("http://pythontab.com/justTest",headers=headers) ``` ## 五、关于Cookies 如果一个响应包含cookies的话，我们可以使用下面方法来得到它们： ```python >>> url = 'http://www.pythontab.com' >>> r = requests.get(url) >>> r.cookies['example_cookie_name'] 'example_cookie_value' ``` 我们也可以发送自己的cookie(使用cookies关键字参数)： ```python >>> url = 'http://pythontab.com/cookies' >>> cookies={'cookies_are':'working'} >>> r = requests.get(url, cookies=cookies) ``` ## 六、关于重定向有时候我们在请求url时，服务器会自动把我们的请求重定向，比如github会把我们的http请求重定向为https请求。我们可以使用r.history来查看重定向： ```python >>> r = requests.get('http://pythontab.com/') >>> r.url 'http://pythontab.com/' >>> r.history [] ``` 从上面的例子中可以看到，我们使用http协议访问，结果在r.url中，打印的却是https协议。那如果我非要服务器使用http协议，也就是禁止服务器自动重定向，该怎么办呢？使用allow_redirects 参数： ```python r = requests.get('http://pythontab.com', allow_redirects=False) ``` ## 七、关于请求时间我们可以使用timeout参数来设定url的请求超时时间（时间单位为秒）： ```python requests.get('http://pythontab.com', timeout=1) ``` ## 八、关于代理我们也可以在程序中指定代理来进行http或https访问（使用proxies关键字参数），如下： ```python proxies = { "http": "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080", } requests.get("http://pythontab.com", proxies=proxies) ``` ## 九、关于session 我们有时候会有这样的情况，我们需要登录某个网站，然后才能请求相关url，这时就可以用到session了，我们可以先使用网站的登录api进行登录，然后得到session，最后就可以用这个session来请求其他url了： ```python s=requests.Session() login_data={'form_email':'youremail@example.com','form_password':'yourpassword'} s.post("http://pythontab.com/testLogin",login_data) r = s.get('http://pythontab.com/notification/') print r.text ``` 其中，form_email和form_password是豆瓣登录框的相应元素的name值。 ## 十、下载页面使用Requests模块也可以下载网页，代码如下： ```python r=requests.get("http://www.pythontab.com") with open("haha.html","wb") as html: html.write(r.content) html.close() ```