企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
## **爬取当当前500数据-如何设计一个完整的请求程序** 实战:爬取当当网 Top 500 本五星好评书籍 #url样式 [http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1) ...... [http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-3](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-3) ...... [http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-25](http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-25) <span style="color:red;">1.首次尝试</span> ~~~ import requests import random url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-1" user_agent_list =[ "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko", "Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999", ] header = { "User-Agent":random.choice(user_agent_list) } print(header) response = requests.get(url,headers=header) print(response.status_code) print(response.text) ~~~ <span style="color:red;">2.二次改进循环遍历获取全部</span> ~~~ import requests import random user_agent_list =[ "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko", "Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999", ] for x in range(26): url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-{}" header = { "User-Agent": random.choice(user_agent_list) } print(header) url = url.format(x) print(url) response = requests.get(url,headers=header) print(response.status_code) ~~~ <span style="color:red;">3.三次改进,增加捕获异常,增加时间等待</span> ~~~ import requests import random user_agent_list =[ "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko", "Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999", ] for x in range(26): url = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-{}" header = { "User-Agent": random.choice(user_agent_list) } print(header) url = url.format(x) print(url) #对请求做错误兼容,这样就不会造成程序的异常终止 try: response = requests.get(url,headers=header) except Exception as e: print(e) # 打印错误 continue print(response.status_code) ~~~ <span style="color:red;">4.知识点</span> 1.Python2.6 开始,新增了一种格式化字符串的函数str.format(),它增强了字符串格式化的功能。 基本语法是通过{},和:来代替以前的%。 例如: ~~~ res = "{} {}".format("hello", "world") # 不设置指定位置,按默认顺序 print(res) hello world ~~~ ~~~ res = "{0} {1}".format("hello", "world") # 设置指定位置 print(res) 'hello world' ~~~ ~~~ res = "{1} {0} {1}".format("hello", "world") # 设置指定位置 print(res) 'world hello world' ~~~ ~~~ res = "我叫:{name}, 我家在 {add}".format( add="Shanxi",name="sunyuwei") print(res) ~~~ 2. try-except异常处理 参考网址: [https://www.runoob.com/python/python-exceptions.html](https://www.runoob.com/python/python-exceptions.html) ~~~ try: 正常的操作 ...................... except: 发生异常,执行这块代码 ...................... else: 如果没有异常执行这块代码 ~~~ 3.try-finally 语句 try-finally 语句无论是否发生异常都将执行最后的代码。 ~~~ try: 正常的操作 ...................... finally: 无论是否发生异常,都执行这块代码 ...................... ~~~