## **Requests的常用方法**
### Requests库常用的函数方法
```
requests.get() 获取Html的主要方法,模拟发送get请求
requests.post() 向html提交post请求方法
requests.put() 向html提交put请求方法
requests.patch 向html 提交局部修改的请求
requests.delete() 向html 提交删除的请求
```
### 1.Get请求
~~~
import requests
import json
r = requests.get('http://httpbin.org/get')
html = r.text
html2 = json.loads(html)
print(html)
print(type(html),type(html2))
print(html["url"])
print(html2["url"])
运行结果如下:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
Traceback (most recent call last):
"User-Agent": "python-requests/2.22.0"
},
"origin": "114.248.162.218, 114.248.162.218",
File "F:/Desktop/Project/课件代码/1.py", line 8, in <module>
"url": "https://httpbin.org/get"
}
print(html["url"])
TypeError: string indices must be integers
<class 'str'> <class 'dict'>
~~~
### 2.POST请求
~~~
import requests
data = {'name': 'germey', 'age': '22'}
r = requests.post("http://httpbin.org/post", data=data)
print(r.text)
运行结果
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "18",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"json": null,
"origin": "114.248.162.218, 114.248.162.218",
"url": "https://httpbin.org/post"
}
~~~
### 3.添加header
~~~
import requests
r1 = requests.get("https://www.zhihu.com/explore")
print(r1.text)
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac oS X 10 11 _4) AppleWebKit/537. 36 (KHTML, like Gecko)'
}
r2 = requests.get("https://www.zhihu.com/explore",headers=headers)
print(r2.text)
运行结果
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<hr><center>openresty</center>
</body>
</html>
==============
<!doctype html>
<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-react-helmet="true">发现 - 知乎</title><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"/><meta name="renderer" content="webkit"/><meta name="force-rendering" content="webkit"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta name="google-site-verification" content="FTeR0c8arOPKh8c5DYh_9uu98_zJbaWw53J-Sch9MTg"/><meta name="description" property="og:description" content="有问题,上知乎。知乎,可信赖的问答社区,以让每个人高效获得可信赖的解答为使命。知乎凭借认真、专业和友善的社区氛围,结构化、易获得的优质内容,基于问答的内容生产方式和独特的社区机制,吸引、聚集了各行各业中大量的亲历者、内行人、领域专家、领域爱好者,将高质量的内容透过人的节点来成规模地生产和分享。用户通过问答等交流方式建立信任和连接,打造和提升个人影响力,并发现、获得新机会。"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.67c7b278.png"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.67c7b278.png" sizes="152x152"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-120.b3e6278d.png" sizes="120x120"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-76.7a750095.png" sizes="76x76"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-60.a4a761d4.png" sizes="60x60"/><link rel="shortcut icon" type="image/x-icon" href="https://static.zhihu.com/static/favicon.ico"/><link rel="search" type="application/opensearchdescription+xml" href="https://static.zhihu.com/static/search.xml" title="知乎"/><link rel="dns-prefetch" href="//static.zhimg.com"/><link rel="dns-prefetch" href="//pic1.zhimg.com"/><link rel="dns-prefetch" href="//pic2.zhimg.com"/><link rel="dns-prefetch" href="//pic3.zhimg.com"/><link rel="dns-prefetch" href="//pic4.zhimg.com"/><style>
.u-safeAreaInset-top {
height: constant(safe-area-inset-top) !important;
height: env(safe-area-inset-top) !important;
}
.u-safeAreaInset-bottom {
height: constant(safe-area-inset-bottom) !important;
height: env(safe-area-inset-bottom) !important;
}
~~~
### 4.文件上传
~~~
import requests
files = {'file': open('favicon.png', 'rb')}
r = requests. post("http://httpbin.org/post", files=files)
print(r.text)
运行结果
{
"args": {},
"data": "",
"files": {
"file": "data:application/octet-stream;base64,iVBORw0KGgoAAAANSUhEUgAAAhwAAAECCAMAAACCFP44AAAACXBIWXMAAAsTAAALEwEAmpwYAAAKTWlDQ1BQaG90b3Nob3AgSUNDIHByb2ZpbGUAAHjanVN3WJP3Fj7f92UPVkLY8LGXbIEAIiOsCMgQWaIQkgBhhBASQMWFiApWFBURnEhVxILVCkidiOKgKLhnQYqIWotVXDjuH9yntX167+3t+9f7vOec5/zOec8PgBESJpHmomoAOVKFPDrYH49PS"
},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "8024",
"Content-Type": "multipart/form-data; boundary=ae576c1072214f7675389b19c437283d",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"json": null,
"origin": "114.248.162.218, 114.248.162.218",
"url": "https://httpbin.org/post"
}
~~~
### 5.代理设置
对于某些网站,在测试的时候请求几次,能正常获取内容。但是一- 旦开始大规模爬取,对于大规
模且频繁的请求,网站可能会弹出验证码,或者跳转到登录认证页面,更甚者可能会直接封禁客户端
的IP,导致一定时间段内无法访问。
那么,为了防止这种情况发生,我们需要设置代理来解决这个问题,这就需要用到proxies参数。
可以用这样的方式设置:
~~~
import requests
proxies = {
"http": "http://sun:qq123456.@192.168.66.211:520",
}
r1 = requests.get('http://httpbin.org/get')
r2 = requests.get('http://httpbin.org/get',proxies=proxies)
print(r1.text)
print(r2.text)
运行结果:
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"origin": "114.248.162.218, 114.248.162.218",
"url": "https://httpbin.org/get"
}
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0"
},
"origin": "175.98.194.165, 175.98.194.165",
"url": "https://httpbin.org/get"
}
~~~
### 超时设置
在本机网络状况不好或者服务器网络响应太慢甚至无响应时,我们可能会等待特别久的时间才可
能收到响应,甚至到最后收不到响应而报错。为了防止服务器不能及时响应,应该设置一个超时时间,
即超过了这个时间还没有得到响应,那就报错。这需要用到timeout参数。这个时间的计算是发出请
求到服务器返回响应的时间。示例如下:
~~~
#设置超时
import requests
r = requests.get("https://www.taobao.com", timeout = 0.0001)
print(r.status_code)
运行结果
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.taobao.com', port=443): Read timed out. (read timeout=0.0001)
#永不超时
import requests
r = requests.get("https://www.taobao.com", timeout = 1)
print(r.status_code)
r = requests.get( 'https://www.google.com',timeout=None)
print(r.text)
~~~
### 会话保持
在requests中,如果直接利用get()或post()等方法的确可以做到模拟网页的请求,但是这实际
上是相当于不同的会话,也就是说相当于你用了两个浏览器打开了不同的页面。
设想这样一个场景,第一个请求利用post()方法登录了某个网站,第二次想获取成功登录后的自
己的个人信息,你又用了一次get()方法去请求个人信息页面。实际上,这相当于打开了两个浏览器,
是两个完全不相关的会话,能成功获取个人信息吗?那当然不能。
有小伙伴可能说了,我在两次请求时设置一样的cookies 不就行了?可以,但这样做起来显得很
烦琐,我们有更简单的解决方法。
其实解决这个问题的主要方法就是维持同--个会话,也就是相当于打开一个新的浏览器选项
卡而不是新开- - 个浏览器。但是我又不想每次设置cookies, 那该怎么办呢?这时候就有了新的
利器--- Session 对象。
利用它,我们可以方便地维护一一个会话,而且不用担心cookies 的问题,它会帮我们自动处理好。
~~~
get测试:
import requests
requests .get('http://httpbin.org/cookies/set/number/123456789')
r = requests .get('http://httpbin.org/cookies')
print(r.text)
运行结果:
{
"cookies": {}
}
使用会话进行测试:
import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/number/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
运行结果:
{
"cookies": {
"number": "123456789"
}
}
~~~