多应用+插件架构,代码干净,二开方便,首家独创一键云编译技术,文档视频完善,免费商用码云13.8K 广告
## **Requests的常用方法** ### Requests库常用的函数方法 ``` requests.get() 获取Html的主要方法,模拟发送get请求 requests.post() 向html提交post请求方法 requests.put()            向html提交put请求方法 requests.patch            向html 提交局部修改的请求 requests.delete()         向html 提交删除的请求 ``` ### 1.Get请求 ~~~ import requests import json r = requests.get('http://httpbin.org/get') html = r.text html2 = json.loads(html) print(html) print(type(html),type(html2)) print(html["url"]) print(html2["url"]) 运行结果如下: { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", Traceback (most recent call last): "User-Agent": "python-requests/2.22.0" }, "origin": "114.248.162.218, 114.248.162.218", File "F:/Desktop/Project/课件代码/1.py", line 8, in <module> "url": "https://httpbin.org/get" } print(html["url"]) TypeError: string indices must be integers <class 'str'> <class 'dict'> ~~~ ### 2.POST请求 ~~~ import requests data = {'name': 'germey', 'age': '22'} r = requests.post("http://httpbin.org/post", data=data) print(r.text) 运行结果 { "args": {}, "data": "", "files": {}, "form": { "age": "22", "name": "germey" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "18", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0" }, "json": null, "origin": "114.248.162.218, 114.248.162.218", "url": "https://httpbin.org/post" } ~~~ ### 3.添加header ~~~ import requests r1 = requests.get("https://www.zhihu.com/explore") print(r1.text) headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac oS X 10 11 _4) AppleWebKit/537. 36 (KHTML, like Gecko)' } r2 = requests.get("https://www.zhihu.com/explore",headers=headers) print(r2.text) 运行结果 <html> <head><title>400 Bad Request</title></head> <body bgcolor="white"> <center><h1>400 Bad Request</h1></center> <hr><center>openresty</center> </body> </html> ============== <!doctype html> <html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-react-helmet="true">发现 - 知乎</title><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"/><meta name="renderer" content="webkit"/><meta name="force-rendering" content="webkit"/><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"/><meta name="google-site-verification" content="FTeR0c8arOPKh8c5DYh_9uu98_zJbaWw53J-Sch9MTg"/><meta name="description" property="og:description" content="有问题,上知乎。知乎,可信赖的问答社区,以让每个人高效获得可信赖的解答为使命。知乎凭借认真、专业和友善的社区氛围,结构化、易获得的优质内容,基于问答的内容生产方式和独特的社区机制,吸引、聚集了各行各业中大量的亲历者、内行人、领域专家、领域爱好者,将高质量的内容透过人的节点来成规模地生产和分享。用户通过问答等交流方式建立信任和连接,打造和提升个人影响力,并发现、获得新机会。"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.67c7b278.png"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.67c7b278.png" sizes="152x152"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-120.b3e6278d.png" sizes="120x120"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-76.7a750095.png" sizes="76x76"/><link data-react-helmet="true" rel="apple-touch-icon" href="https://static.zhihu.com/heifetz/assets/apple-touch-icon-60.a4a761d4.png" sizes="60x60"/><link rel="shortcut icon" type="image/x-icon" href="https://static.zhihu.com/static/favicon.ico"/><link rel="search" type="application/opensearchdescription+xml" href="https://static.zhihu.com/static/search.xml" title="知乎"/><link rel="dns-prefetch" href="//static.zhimg.com"/><link rel="dns-prefetch" href="//pic1.zhimg.com"/><link rel="dns-prefetch" href="//pic2.zhimg.com"/><link rel="dns-prefetch" href="//pic3.zhimg.com"/><link rel="dns-prefetch" href="//pic4.zhimg.com"/><style> .u-safeAreaInset-top { height: constant(safe-area-inset-top) !important; height: env(safe-area-inset-top) !important; } .u-safeAreaInset-bottom { height: constant(safe-area-inset-bottom) !important; height: env(safe-area-inset-bottom) !important; } ~~~ ### 4.文件上传 ~~~ import requests files = {'file': open('favicon.png', 'rb')} r = requests. post("http://httpbin.org/post", files=files) print(r.text) 运行结果 { "args": {}, "data": "", "files": { "file": "data:application/octet-stream;base64,iVBORw0KGgoAAAANSUhEUgAAAhwAAAECCAMAAACCFP44AAAACXBIWXMAAAsTAAALEwEAmpwYAAAKTWlDQ1BQaG90b3Nob3AgSUNDIHByb2ZpbGUAAHjanVN3WJP3Fj7f92UPVkLY8LGXbIEAIiOsCMgQWaIQkgBhhBASQMWFiApWFBURnEhVxILVCkidiOKgKLhnQYqIWotVXDjuH9yntX167+3t+9f7vOec5/zOec8PgBESJpHmomoAOVKFPDrYH49PS" }, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "8024", "Content-Type": "multipart/form-data; boundary=ae576c1072214f7675389b19c437283d", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0" }, "json": null, "origin": "114.248.162.218, 114.248.162.218", "url": "https://httpbin.org/post" } ~~~ ### 5.代理设置 对于某些网站,在测试的时候请求几次,能正常获取内容。但是一- 旦开始大规模爬取,对于大规 模且频繁的请求,网站可能会弹出验证码,或者跳转到登录认证页面,更甚者可能会直接封禁客户端 的IP,导致一定时间段内无法访问。 那么,为了防止这种情况发生,我们需要设置代理来解决这个问题,这就需要用到proxies参数。 可以用这样的方式设置: ~~~ import requests proxies = { "http": "http://sun:qq123456.@192.168.66.211:520", } r1 = requests.get('http://httpbin.org/get') r2 = requests.get('http://httpbin.org/get',proxies=proxies) print(r1.text) print(r2.text) 运行结果: { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0" }, "origin": "114.248.162.218, 114.248.162.218", "url": "https://httpbin.org/get" } { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0" }, "origin": "175.98.194.165, 175.98.194.165", "url": "https://httpbin.org/get" } ~~~ ### 超时设置 在本机网络状况不好或者服务器网络响应太慢甚至无响应时,我们可能会等待特别久的时间才可 能收到响应,甚至到最后收不到响应而报错。为了防止服务器不能及时响应,应该设置一个超时时间, 即超过了这个时间还没有得到响应,那就报错。这需要用到timeout参数。这个时间的计算是发出请 求到服务器返回响应的时间。示例如下: ~~~ #设置超时 import requests r = requests.get("https://www.taobao.com", timeout = 0.0001) print(r.status_code) 运行结果 requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.taobao.com', port=443): Read timed out. (read timeout=0.0001) #永不超时 import requests r = requests.get("https://www.taobao.com", timeout = 1) print(r.status_code) r = requests.get( 'https://www.google.com',timeout=None) print(r.text) ~~~ ### 会话保持 在requests中,如果直接利用get()或post()等方法的确可以做到模拟网页的请求,但是这实际 上是相当于不同的会话,也就是说相当于你用了两个浏览器打开了不同的页面。 设想这样一个场景,第一个请求利用post()方法登录了某个网站,第二次想获取成功登录后的自 己的个人信息,你又用了一次get()方法去请求个人信息页面。实际上,这相当于打开了两个浏览器, 是两个完全不相关的会话,能成功获取个人信息吗?那当然不能。 有小伙伴可能说了,我在两次请求时设置一样的cookies 不就行了?可以,但这样做起来显得很 烦琐,我们有更简单的解决方法。 其实解决这个问题的主要方法就是维持同--个会话,也就是相当于打开一个新的浏览器选项 卡而不是新开- - 个浏览器。但是我又不想每次设置cookies, 那该怎么办呢?这时候就有了新的 利器--- Session 对象。 利用它,我们可以方便地维护一一个会话,而且不用担心cookies 的问题,它会帮我们自动处理好。 ~~~ get测试: import requests requests .get('http://httpbin.org/cookies/set/number/123456789') r = requests .get('http://httpbin.org/cookies') print(r.text) 运行结果: { "cookies": {} } 使用会话进行测试: import requests s = requests.Session() s.get('http://httpbin.org/cookies/set/number/123456789') r = s.get('http://httpbin.org/cookies') print(r.text) 运行结果: { "cookies": { "number": "123456789" } } ~~~